amiga-bootcamp/05_reversing/static/compilers/sasc.md

45 KiB
Raw Permalink Blame History

← Home · Reverse Engineering · Static Analysis · Compilers

SAS/C 5.x/6.x — Reverse Engineering Field Manual

Overview

SAS/C (originally Lattice C, rebranded at version 5) was the dominant commercial C compiler for AmigaOS from 1988 through 1996. Version 6.58 is the final release. An estimated 7080% of Amiga C applications and libraries from this era were compiled with SAS/C, making it the most common binary format a reverse engineer encounters. It produces code with a distinctive LINK A5, #-N + 9-register MOVEM.L prologue, absolute (relocated) string addressing, and library calls via JSR -$XXX(A6) with explicit global-to-A6 loads before each library call block.

Key constraints to internalize immediately:

  • A5 is the frame pointer — always; SAS/C never omits the frame pointer. Arguments at positive offsets from A5 (+8, +12, ...), locals at negative offsets.
  • A4 is the small-data base — when -b0 (small data model) is enabled, global variables are addressed via $offset(A4). When -b1 (large data model), absolute addressing + relocation is used.
  • __saveds saves 13 registersD2-D7/A2-A6 — the largest register save of any Amiga compiler. This is a unique fingerprint.
  • String constants are absolute-addressedMOVE.L #string, Dn — and rely on HUNK_RELOC32 for relocation at load time. This is the opposite of GCC's PC-relative approach.
graph TB
    subgraph "Source (.c)"
        SRC["C source code"]
    end
    subgraph "SAS/C Compiler (lc)"
        LC["lc (compiler)"]
        BLINK["blink (linker)"]
        CO["c.o / cres.o (startup)"]
        SC["sc.lib (runtime)"]
    end
    subgraph "Binary Output"
        HUNK["Amiga HUNK executable"]
        CODE["HUNK_CODE<br/>hunk names: CODE, DATA, BSS"]
        SYNTH["HUNK_SYMBOL<br/>(optional, SAS stabs format)"]
        DEBUG["HUNK_DEBUG<br/>(=APS tag)"]
    end

    SRC --> LC
    CO --> BLINK
    SC --> BLINK
    LC --> BLINK
    BLINK --> HUNK
    HUNK --> CODE
    HUNK --> SYNTH
    HUNK --> DEBUG

Binary Identification — The SAS/C Signature

Hunk Names

SAS/C uses standard Amiga hunk names:

Hunk 0: CODE   (executable code + read-only data like jump tables)
Hunk 1: DATA   (initialized global variables)
Hunk 2: BSS    (zero-initialized global variables)

Unlike GCC (which uses .text, .data, .bss per Unix convention), SAS/C uses the Amiga-native CODE/DATA/BSS names. This is the first filter — if you see .text as a hunk name, it's NOT SAS/C.

When -b0 (small data model) is active, an additional __MERGED hunk may appear for the small-data segment.

Function Prologue — The Canonical Pattern

The SAS/C prologue is the most recognizable pattern in Amiga reverse engineering:

; Standard SAS/C function prologue:
_function:
    LINK    A5, #-N                ; A5 = frame pointer, allocate N bytes for locals
    MOVEM.L D2-D7/A2-A4, -(SP)    ; save 9 callee-saved registers

; ... function body ...

; Standard SAS/C function epilogue:
    MOVEM.L (SP)+, D2-D7/A2-A4    ; restore registers (reverse order)
    UNLK    A5                     ; deallocate frame, restore old A5
    RTS                            ; return

Frame pointer offset convention:

(Saved A5)         ← A5 points here after LINK
Return address     ← +$04(A5)
Arg 1              ← +$08(A5)
Arg 2              ← +$0C(A5)
...
Local var 1        ← -$04(A5)
Local var 2        ← -$08(A5)
...
Saved D2           ← first saved register on stack (below locals)

Note

Fast double-check: Count the MOVEM.L registers in the prologue. If it's D2-D7/A2-A4 (9 registers), it's SAS/C. If it's D3-D7 (5 registers), it's Aztec C. If it's D2-D7/A2-A6 (11 registers), it's SAS/C __saveds. If there's no LINK at all, it's GCC or VBCC.

String Addressing — The Globally Relocated Pattern

SAS/C stores string constants in the DATA hunk and references them via absolute addresses with relocation:

; SAS/C string reference:
    MOVE.L  #.string_const, D1     ; D1 = pointer to string
    ; The value #.string_const is patched by HUNK_RELOC32 at load time
    JSR     -$384(A6)              ; PutStr(D1)

; In the DATA hunk:
.string_const:  DC.B "Hello, World!", 0

This is the key disambiguator between SAS/C and GCC:

  • SAS/C: MOVE.L #string, Dn — absolute address, requiring HUNK_RELOC32
  • GCC: LEA string(PC), A0 — PC-relative, no relocation needed

In IDA/Ghidra: SAS/C string xrefs are DATA XREF from the code to the DATA hunk. GCC strings appear as CODE XREF because the LEA references the string within the same hunk (GCC places strings in .text when PC-relative).


Calling Conventions

SAS/C supports four calling conventions within a single binary. Recognizing each from the prologue alone is essential for correct function boundary analysis — but understanding the parameter mapping for each convention is equally critical for reconstructing function prototypes in IDA/Ghidra.

SAS/C Register Roles — Quick Reference

Register __stdargs Role __reg/__regargs Role __saveds/__interrupt Role
D0 Return value; arg1 Return value; arg1 Saved (interrupt) / scratch (saveds)
D1 arg2; 64-bit return high word arg2 Saved (interrupt) / scratch (saveds)
D2 Callee-saved local/arg3+ arg3 (callee-saved) Saved
D3 Callee-saved local/arg4+ arg4 (callee-saved) Saved
D4 Callee-saved local arg5 (callee-saved) Saved
D5 Callee-saved local arg6 (callee-saved) Saved
D6 Callee-saved local arg7 (callee-saved) Saved
D7 Callee-saved local arg8 (callee-saved) Saved
A0 Scratch / arg pointer arg9 (scratch) Saved (interrupt) / scratch (saveds)
A1 Scratch / arg pointer arg10 (scratch) Saved (interrupt) / scratch (saveds)
A2 Callee-saved local Callee-saved (args 11+ on stack) Saved
A3 Callee-saved local Callee-saved Saved
A4 Small-data base (-b0) or callee-saved Small-data base or callee-saved Saved
A5 Frame pointer (LINK A5) Frame pointer Saved (callee's A5)
A6 Library base (destroyed across calls) Library base (destroyed) Saved (__saveds only)

1. __stdargs — Standard C Calling (Default)

; __stdargs function prologue:
    LINK    A5, #-N
    MOVEM.L D2-D7/A2-A4, -(SP)    ; 9 registers

; Parameter passing:
;   D0, D1 = first two integer/pointer args (through registers)
;   (A5+8), (A5+12), ... = remaining args on stack (right-to-left push)
; Register preservation:
;   D2-D7, A2-A4 preserved across call
;   D0, D1, A0, A1 are scratch (caller-saved)
;   A4 preserved (small-data base or callee-saved)

Parameter layout in the callee:

After LINK A5, #-N and MOVEM.L D2-D7/A2-A4, -(SP):

  ┌──────────────────────────────┐  Higher addresses
  │ arg5                         │  $18(A5)  ← fifth stack arg
  │ arg4                         │  $14(A5)  ← fourth stack arg
  │ arg3                         │  $10(A5)  ← third stack arg
  │ arg2 (if >2 params total)    │  $0C(A5)  ← second stack arg
  │ arg1 (if >0 params total)    │  $08(A5)  ← first stack arg
  ├──────────────────────────────┤
  │ return address               │  $04(A5)
  ├──────────────────────────────┤
  │ saved A5 (caller's frame)    │  $00(A5)  ← A5 points here
  │ local var 1                  │ -$04(A5)
  │ local var 2                  │ -$08(A5)
  │ ...                          │
  │ local var N                  │ -$N(A5)
  │ saved D2                     │ -(N+4)(A5)  ← first saved register
  │ saved D3                     │
  │ ...                          │
  │ saved A4                     │
  └──────────────────────────────┘  Lower addresses (SP) = -(N+36)(A5)

Note

The "+8 offset rule" for __stdargs: The first stack-based argument is always at $08(A5), even in functions with zero parameters. This is because $00(A5) = saved A5, $04(A5) = return address, $08(A5) = caller's space for arg1. SAS/C always allocates space for the first two register args on the stack as well (they're at $08(A5) and $0C(A5)), even though the callee accesses them via D0/D1. This means $10(A5) is the third parameter (the first truly-stack-based one).

Locating parameters in SAS/C __stdargs disassembly:

Parameter Location Disassembly Pattern
arg1 D0 on entry Look for MOVE.L D0, Dn or MOVE.L D0, -$XX(A5) early in the function
arg2 D1 on entry Look for MOVE.L D1, Dn immediately after arg1 is handled
arg3 $10(A5) First truly-stack argument — MOVE.L $10(A5), Dn
arg4 $14(A5) MOVE.L $14(A5), Dn
arg5 $18(A5) MOVE.L $18(A5), Dn — sequential 4-byte increments
; Example: function with 5 arguments in __stdargs convention
; C prototype: LONG Process(BPTR fh, STRPTR buf, LONG size, LONG flags, LONG mode)

_Process:
    LINK    A5, #-$10                ; 16 bytes of locals
    MOVEM.L D2-D4/A2, -(SP)        ; save 4 regs (16 bytes)

    MOVE.L  D0, D2                  ; D2 = fh (arg1 from D0)
    MOVEA.L D1, A2                  ; A2 = buf (arg2 from D1)
    MOVE.L  $10(A5), D3             ; D3 = size (arg3 from stack)
    MOVE.L  $14(A5), D4             ; D4 = flags (arg4 from stack)
    MOVE.L  $18(A5), -$04(A5)       ; mode → local (arg5 from stack)
    ; ... function body ...

2. __reg / __regargs — Register Argument Convention

; __reg function prologue:
    LINK    A5, #-N
    MOVEM.L D2-D7/A2-A4, -(SP)    ; same save set as __stdargs

; Parameter passing difference:
;   D0  = arg1     D4 = arg5     A0 = arg9
;   D1  = arg2     D5 = arg6     A1 = arg10
;   D2  = arg3     D6 = arg7     arg11+ on stack at $08(A5), $0C(A5)...
;   D3  = arg4     D7 = arg8
;   Up to 10 register arguments before stack overflow
;   More efficient for small functions with register-friendly param types

__reg parameter-to-register mapping:

Param # Register Notes
1 D0 Also holds return value on exit
2 D1 Also holds 64-bit return high word
3 D2 Callee-saved — caller must not expect D2 preserved across this call
4 D3 Callee-saved in normal convention, but passed as arg here
5 D4 Callee-saved in normal convention
6 D5 Callee-saved in normal convention
7 D6 Callee-saved in normal convention
8 D7 Callee-saved in normal convention
9 A0 Normally scratch — passed as arg here
10 A1 Normally scratch — passed as arg here
11+ Stack at $08(A5), $0C(A5)... Same stack layout as __stdargs

Warning

__reg breaks the normal callee-saved contract. Under __stdargs, D2-D7 are callee-saved — the caller can expect them to survive a function call. Under __reg, D2-D7 carry arguments and ARE destroyed by the callee. The SAS/C linker tracks which convention each function uses and generates correct caller-side code, but in hand-analysis this means you CANNOT assume D2-D7 survive a call unless you know the callee's convention.

Identification: A __reg function looks identical to __stdargs in the prologue (LINK A5 + MOVEM.L D2-D7/A2-A4). The difference is at the call site__reg functions are called with args in many registers, while __stdargs uses only D0/D1 + stack. The SAS/C linker resolves the correct convention through its own internal calling-convention table.

3. __saveds — All-Registers-Saved Convention

; __saveds function — unique 13-register save:
_saveds_func:
    MOVEM.L D2-D7/A2-A6, -(SP)    ; 13 registers — the SAS/C fingerprint
    ; Note: NO LINK A5 before MOVEM.L — __saveds uses a different frame setup
    LINK    A5, #-N                ; frame pointer after register save (sometimes)

; Typical use cases:
;   - Interrupt handlers
;   - Hook callback functions (struct Hook.h_Entry)
;   - Library entry points (LibOpen, LibClose, LibExpunge)
;   - Functions called from a different task context

Warning

__saveds preserves A4 (the small-data base) in addition to the standard set. If you see MOVEM.L D2-D7/A2-A6, -(SP) (11 registers for data, 2 for address + A6), it's unequivocally SAS/C __saveds. No other Amiga compiler saves A6 in the callee-save set.

__saveds parameter passing: Uses the same __stdargs parameter convention (D0/D1 + stack). The only difference is the prologue saves 13 registers instead of 9, making the function safe to call from any context — even if the caller expects A4, A5, or A6 to be preserved.

4. __interrupt — Interrupt Handler Convention

; __interrupt handler — saves ALL registers:
_int_handler:
    MOVEM.L D0-D7/A0-A6, -(SP)    ; save every register (15 regs)
    ; ... interrupt body ...
    MOVEM.L (SP)+, D0-D7/A0-A6    ; restore all
    RTE                            ; Return From Exception (not RTS!)

Critical identification: The RTE instruction (not RTS) marks this as an interrupt handler. Search for RTE instructions in the binary — every one points to an interrupt handler, most of which are SAS/C __interrupt functions.


Parameter Passing — Locating Args in the Disassembly

This section provides a systematic method for identifying function parameters in SAS/C binary output, organized by convention and argument position.

For __stdargs functions (most common):

; Function: LONG DoSomething(LONG a, LONG b, LONG c, LONG d)
;           a=D0, b=D1, c=$10(A5), d=$14(A5)

_DoSomething:
    LINK    A5, #-$08
    MOVEM.L D2-D3, -(SP)

    MOVE.L  D0, D2                  ; D2 = a (register → callee-saved)
    MOVE.L  D1, D3                  ; D3 = b (register → callee-saved)
    MOVE.L  $10(A5), -(SP)          ; push c (stack arg → push for sub-call)
    MOVE.L  $14(A5), D0             ; D0 = d (stack arg → scratch)
    ; ...

For __reg/__regargs functions:

; Function: LONG __reg DoFast(LONG a, LONG b, LONG c, LONG d, LONG e, LONG f)
;           a=D0, b=D1, c=D2, d=D3, e=D4, f=D5

_DoFast:
    LINK    A5, #$00                 ; no locals needed
    MOVEM.L D2-D5, -(SP)            ; save regs that hold args (they're callee-saved in this convention!)

    ; Note: D0-D1 are NOT saved (they're scratch + return value)
    ; D2-D5 ARE saved because __reg treats them as callee-saved AFTER receiving args
    ; This means: after the MOVEM, D2=arg3, D3=arg4, etc. are on the stack
    ; The function body must reload them if needed:
    MOVE.L  (SP), D2                ; reload arg3 from stack (was D2)
    ADD.L   4(SP), D2               ; add arg4 (was D3)
    ; ...

Note

The __reg save paradox: In __reg functions, D2-D7 carry arguments on entry, BUT the callee saves them in the prologue. This means the register values are on the stack after MOVEM.L. If you see a __reg function that saves D2-D5 and then immediately reloads them from the stack, it's not redundant — it's the convention keeping the callee-save contract while using those registers for parameter passing.

Register vs Stack Variables in SAS/C

SAS/C's register allocator differs from GCC's in important ways. Understanding how SAS/C decides between register and stack allocation is essential for tracking variable lifetimes in disassembly.

SAS/C Register Allocation Heuristics

Factor SAS/C Behavior
Optimization level -O0: everything on stack. -O1: scalars to registers. -O2: loop counters and frequently-used to D6/D7. -O3: aggressive register coloring, may keep 6+ locals in registers.
Variable type 32-bit integers and pointers preferred for D2-D7. 8/16-bit values go to D2-D7 but with masking. Structs and arrays ALWAYS on stack.
Address-taken (&x) Forces stack allocation — SAS/C cannot take the address of a register.
register keyword Strong hint to prefer D2-D7. SAS/C respects register more aggressively than GCC.
Loop counters At -O2+, loop induction variables are placed in D6 or D7 and kept there for the loop body. Look for DBRA D7, loop.
Spill strategy SAS/C spills D2 first, then D3, etc. (D2 is "least valuable" under SAS/C's cost model). A5-relative stack slots for spills are at negative offsets.

Identifying Register Variables

; SAS/C -O2: count and in_word in registers
_CountWords:
    LINK    A5, #-$08              ; 8 bytes of locals (but count & in_word won't use them!)
    MOVEM.L D2-D3, -(SP)          ; D2-D3 saved → they WILL be used as named locals

    MOVEQ   #0, D2                 ; D2 = count  ← initialized in register
    MOVEQ   #0, D3                 ; D3 = in_word ← initialized in register
    MOVEA.L $08(A5), A0            ; A0 = str (arg1, loaded once)

.loop:
    ; ... D2 incremented with ADDQ.L #1, D2 — never loaded from stack
    ; ... D3 tested with TST.B D3 — always in register
    DBRA    D7, .loop

    MOVE.L  D2, D0                 ; return count from D2 (not from stack)
    MOVEM.L (SP)+, D2-D3
    UNLK    A5
    RTS

Key signs of register variables in SAS/C:

  • Saved in the MOVEM.L prologue → the register hosts a named local for the function's lifetime
  • Modified with ADDQ, SUBQ, MOVEQ, or MOVE.L #imm, Dn operating directly on the register
  • Tested with TST.B Dn, CMP.L Dn, Dm without a preceding stack load
  • Returned via MOVE.L Dn, D0 at function exit
  • Absence of frame-offset references — the -$XX(A5) offsets that would correspond to the variable never appear in load/store instructions

Identifying Stack Variables

; SAS/C -O0: everything on stack
_CountWords_O0:
    LINK    A5, #-$08              ; 8 bytes: -$04(A5) = count, -$08(A5) = in_word
    MOVEM.L D2-D7/A2-A4, -(SP)    ; full save (O0 always saves all)

    CLR.L   -$04(A5)               ; count = 0  ← direct stack write
    CLR.L   -$08(A5)               ; in_word = 0
    MOVEA.L $08(A5), A0

    ; ...
    ADDQ.L  #1, -$04(A5)           ; count++ — read-modify-write to stack
    ; ...
    MOVE.L  -$08(A5), D0           ; load in_word from stack for test
    TST.B   D0

.done:
    MOVE.L  -$04(A5), D0           ; return count from stack
    MOVEM.L (SP)+, D2-D7/A2-A4
    UNLK    A5
    RTS

Key signs of stack variables in SAS/C:

  • -$04(A5), -$08(A5), etc. appear repeatedly in MOVE.L and ADDQ.L instructions
  • Every read is a MOVE.L $offset(A5), Dn
  • Every modification is MOVE Dn, $offset(A5) or read-modify-write (ADDQ #1, $offset(A5))
  • The same offset is used in multiple non-consecutive instructions
  • At -O0: ALL locals are stack-based regardless of type

SAS/C Spill Recognition

SAS/C spills registers when a function has more live variables than available registers (D2-D7, A2-A4 = 9 registers max for scalars):

; 11 local variables + 3 parameters → register pressure at -O2:
_BigFunction:
    LINK    A5, #-$30              ; 48 bytes of locals (many stack-resident)
    MOVEM.L D2-D7/A2-A4, -(SP)    ; all usable regs saved

    MOVE.L  D0, D7                 ; D7 = arg1
    MOVE.L  D1, D6                 ; D6 = arg2
    ; ... D2-D5 used for 4 frequently-accessed locals ...
    ; Remaining 7 locals live on stack at -$04(A5) through -$1C(A5)

    ; Spill: D2 needed for a computation, but D2 holds 'count':
    MOVE.L  D2, -$20(A5)           ; spill count to reserved slot
    ; ... use D2 for temp computation ...
    MOVE.L  -$20(A5), D2           ; reload count

Spill identification in SAS/C:

  • A register is saved in the prologue MOVEM
  • Mid-function, the register's value is stored to a frame offset that appears ONLY in one store/load pair
  • The register is then used for a different purpose
  • Later, the value is reloaded from that same offset
  • The spill slot is typically at a larger negative offset (past the named locals)

Optimization Level → Variable Location Quick-Reference

Level Register Variables Stack Variables Spills
-O0 None — D0/D1 only for expression temps ALL locals, including loop counters Only for __reg param overflow
-O1 Scalar locals with register keyword; simple loop counters Arrays, structs, address-taken vars Rare — simple functions
-O2 Most scalar locals (≤ 9); loop counters in D6/D7 Arrays, structs, address-taken vars Functions with >9 scalar locals
-O3 Aggressive: keeps variables in regs across basic blocks Same as -O2 More common due to aggressive inlining increasing register pressure

Function Call Site Patterns

Recognizing how callers set up arguments reveals both the callee's convention AND the caller's variable layout.

; ─── Calling a __stdargs function with 3 args ───
; C: result = Process(fh, buf, size);

    MOVE.L  -$04(A5), -(SP)         ; push size (arg3, right-to-left)
    MOVEA.L -$08(A5), A0            ; buf into scratch reg
    MOVE.L  A0, D1                  ; D1 = buf (arg2)
    MOVE.L  -$0C(A5), D0            ; D0 = fh (arg1)
    BSR     _Process
    ADDQ.L  #4, SP                  ; caller cleans stack arg
    ; D0 = return value

; ─── Calling a __reg function with 6 args ───
; C: result = DoFast(a, b, c, d, e, f);

    MOVE.L  f, -(SP)                ; arg6 on stack (args 11+, but only 6 here)
    MOVEA.L e_ptr, A1               ; A1 = arg5 (address)
    MOVE.L  d_val, D7               ; D7 = arg4 (but D7 is callee-saved!)
    MOVE.L  c_val, D6               ; D6 = arg3
    MOVE.L  b_val, D5               ; D5 = arg2  ← NOTE: D5, not D1!
    MOVE.L  a_val, D4               ; D4 = arg1  ← NOTE: D4, not D0!
    BSR     _DoFast                  ; __reg uses D0-D7 in CALLEE's parameter order
    ; No stack cleanup needed if only 1 stack arg (popped by callee or ignored)

; ─── Calling a __stdargs function with >2 args (SAS/C -O2 pattern) ───
; The classic pattern: args beyond D0/D1 pushed right-to-left

    MOVE.L  D2, -(SP)               ; arg5 — might be a register variable
    MOVE.L  $14(A5), -(SP)          ; arg4 — might be a stack variable
    MOVEA.L $10(A5), A0             ; arg3 → A0, then to stack
    MOVE.L  A0, -(SP)               ; push arg3
    MOVEA.L D3, A0                  ; arg2 → A0, then to D1
    MOVE.L  A0, D1                  ; D1 = arg2
    MOVE.L  D4, D0                  ; D0 = arg1
    BSR     _TargetFunc
    LEA     $0C(SP), SP             ; clean 12 bytes (3 stack args)

Key call-site patterns by convention:

Convention Register Args Stack Args Caller Cleanup
__stdargs D0, D1 only Push remaining right-to-left ADDQ.L #N*4, SP or LEA N*4(SP), SP
__reg/__regargs D0-D7, A0-A1 (sequential) Push remaining right-to-left Same as __stdargs
__saveds D0, D1 only (uses __stdargs param convention) Same as __stdargs Same as __stdargs
__interrupt N/A (called by CPU, not by code) N/A N/A (RTE handles stack)

Note

The __reg D0/D1 anomaly: In __reg functions, D0 and D1 are arg1 and arg2 — just like __stdargs. The difference starts at arg3: under __stdargs it's on the stack; under __reg it's in D2. This means a __reg call with up to 2 parameters looks IDENTICAL to __stdargs at the call site. Only with 3+ parameters can you distinguish them (D2 loaded with a value before the BSR means __reg; stack push means __stdargs).

Library Call Patterns

The Classic Library Call Sequence

SAS/C library calls follow a rigid, predictable pattern:

; Step 1: Load library base from global variable
    MOVEA.L _DOSBase, A6          ; global → A6 (absolute address + relocation)

; Step 2: Set up arguments in registers
    MOVE.L  D7, D1                ; arg1: file handle
    MOVE.L  buffer_ptr, D2        ; arg2: buffer pointer
    MOVE.L  #$100, D3             ; arg3: length

; Step 3: JSR through LVO
    JSR     -$2A(A6)              ; Read() — LVO = -$2A = -42 decimal
    ; D0 = return value (bytes read, or -1 on error)

; Step 4: Check return value
    TST.L   D0
    BMI.S   .error_handler        ; negative = error

Why the explicit global load? SAS/C does not cache A6 across function calls. After every JSR/BSR that might modify A6, SAS/C reloads the library base from a named global variable (_DOSBase, _IntuitionBase, _GfxBase, etc.) before the next library call. This creates consistent MOVE.L _LibBase, A6JSR -$XXX(A6) pairs that IDA can use to:

  1. Identify which library each call targets
  2. Rename library calls from LVO offsets
  3. Trace library open/close sequences

Global Library Base Variables

Global Name Library Typical Open Pattern
_SysBase exec.library MOVEA.L 4.W, A6 at startup
_DOSBase dos.library OpenLibrary("dos.library", 0)
_IntuitionBase intuition.library OpenLibrary("intuition.library", version)
_GfxBase graphics.library OpenLibrary("graphics.library", version)
_UtilityBase utility.library OpenLibrary("utility.library", version)

These names typically appear in HUNK_SYMBOL if debug info is present. Even without symbols, the pattern MOVE.L $xxxxxxxx, A6 followed by JSR -$XXX(A6) where $xxxxxxxx is in the DATA hunk identifies a library base global.

LVO Dispatch

; All library calls: JSR -$offset(A6)
; where offset = LVO * 6 (each LVO entry is 6 bytes: JMP instruction)
; LVO $01 = offset -$06
; LVO $1E = offset -$B4 (for Open)
; LVO $2A = offset -$FC (for Read)

; Common SAS/C library call frames:
    MOVEA.L _DOSBase, A6
    JSR     -$1E(A6)              ; Open()      — LVO $05
    JSR     -$24(A6)              ; Close()     — LVO $06
    JSR     -$2A(A6)              ; Read()      — LVO $07
    JSR     -$30(A6)              ; Write()     — LVO $08
    JSR     -$36(A6)              ; Seek()      — LVO $09

Pragmas and Code Generation Effects

SAS/C #pragma directives alter code generation in ways visible in disassembly:

#pragma amicall — Library Call Convention

#pragma amicall(DOSBase, 0x1E, Open(d1, d2))
// Generates: args in D1, D2; return in D0; A6 = DOSBase

In disassembly, amicall functions are indistinguishable from __reg functions — they use register arguments. The difference is that amicall functions rely on the pragma for their calling convention rather than the __reg keyword.

#pragma amiga-align — Struct Alignment

This pragma changes struct field alignment from the compiler default to AmigaOS natural alignment. In disassembly, it affects struct field offsets — without this pragma, 16-bit fields might be at odd offsets (breaking hardware register access). With it, all fields align to their natural boundaries.

Detection: If you see struct access at offsets consistent with sizeof(UWORD) = 2 and sizeof(ULONG) = 4 alignment, the code was compiled with amiga-align. If you see misaligned access (e.g., MOVE.W $000F(A0), odd offset), alignment is off.

#pragma donotcombine — Inhibit Optimization

Prevents the optimizer from combining adjacent operations. In disassembly, this produces "unoptimized-looking" code even at -O2 — sequential loads/stores that a normal optimizer would merge into a single MOVE.L pair.

#pragma stackextent — Stack Size Specification

#pragma stackextent 8192    // 8 KB stack
// Embedded in hunk header: HUNK_HEADER stack_size field

Visible in the HUNK header, not the code. The hunkinfo tool shows the stack size. In IDA, check the hunk header fields.


Optimization Levels — Reading the Tea Leaves

SAS/C optimization levels produce progressively more aggressive transformations visible in disassembly:

Level Flag What Changes in the Binary
-O0 (none) No flag Every C statement → separate instruction sequence. Redundant loads/stores. Full register save even when unused.
-O1 (basic) -O Dead code elimination, constant folding. MOVE.L #0, D0MOVEQ #0, D0. Simple peephole.
-O2 (global) -O -O Common subexpression elimination. Loop-invariant code motion. for loop counter in register, not stack.
-O3 (aggressive) -O -O -O Function inlining (small static functions). Branch optimization. MOVE.L (A0)+, (A1)+ for struct copies.

How to identify optimization level from binary:

  • -O0: Every local variable lives at -$XX(A5) (on stack). Every expression is loaded, computed, stored separately.
  • -O1+: Variables kept in registers across multiple statements. Stack traffic reduced.
  • -O2+: Loop counters in D6/D7. DBRA Dn, loop patterns.
  • -O3: Small helper functions inlined — no BSR/RTS for functions called once.

Startup Code — c.o vs cres.o

Standard Startup (c.o — CLI/Workbench)

; SAS/C c.o entry point — the first code in HUNK_CODE:
_start:
    MOVE.L  4.W, A6               ; SysBase = *(ULONG *)4
    MOVE.L  A6, _SysBase          ; store in global
    MOVE.L  D0, _RawCommandLen    ; save CLI arg length
    MOVE.L  A0, _RawCommandStr    ; save CLI arg pointer
    TST.L   A1                     ; A1 = NULL → CLI, non-NULL → Workbench
    BEQ.S   .cli_entry

.wb_entry:
    MOVE.L  A1, _WBenchMsg        ; save WBStartup message
    JSR     _OpenLibraries
    BSR     _main                  ; call main() — BSR, not JSR
    BRA.S   .exit

.cli_entry:
    JSR     _OpenLibraries
    BSR     _main

.exit:
    MOVE.L  D0, _ReturnCode
    JSR     _CloseLibraries
    RTS

Key RE insight: To find main(), locate the _start entry point and look for the first BSR after the library open sequence. That BSR target is main().

Resident Startup (cres.o — Libraries/ROM)

; cres.o generates a RomTag for auto-init libraries:
_romtag:
    DC.W    $4AFC                    ; RTC_MATCHWORD
    DC.L    _romtag                  ; RT_MATCHTAG (self-pointer)
    DC.L    _endskip                 ; RT_ENDSKIP
    DC.B    RTF_AUTOINIT             ; RT_FLAGS
    DC.B    39                       ; RT_VERSION (V39 = OS 3.1)
    DC.B    NT_LIBRARY               ; RT_TYPE
    DC.B    0                        ; RT_PRI
    DC.L    _libname                 ; RT_NAME
    DC.L    _idstring                ; RT_IDSTRING
    DC.L    _inittable               ; RT_INIT (InitTable)

Debug Info — SAS Stabs Format

SAS/C uses its own stabs variant with the =APS tag in HUNK_DEBUG:

HUNK_DEBUG format:
  =APS tag at start of debug hunk
  Source file:  =APS filename.c
  Function:     =APS _funcname
  Line number:  =APS 123
  Local var:    =APS varname:D(0,13)  ← D0-D7 or A0-A7 (13=offset in stack)
                =APS varname:S(4)     ← S(offset) = stack-based

IDA/Ghidra integration: The Amiga HUNK loader plugin for IDA can parse SAS stabs and create local variable names. Without the plugin, the =APS strings are visible in the debug hunk as ASCII strings that can be manually cross-referenced.


Same C Function — SAS/C Output

; CountWords() — SAS/C 6.58, -O2, -b1 (large data):
; C prototype: ULONG CountWords(CONST_STRPTR str)

_CountWords:
    LINK    A5, #-$08              ; 8 bytes for locals: count, in_word
    MOVEM.L D2-D3, -(SP)          ; save D2-D3 (only registers actually used)
    
    ; count = 0
    MOVEQ   #0, D2                 ; D2 = count (register variable)
    
    ; in_word = FALSE
    MOVEQ   #0, D3                 ; D3 = in_word (register variable)
    
    ; str → A0
    MOVEA.L $08(A5), A0            ; A0 = str (arg1)
    
    BRA.S   .loop_test

.loop_body:
    MOVEQ   #' ', D0               ; D0 = ' '
    CMP.B   (A0), D0               ; *str == ' '?
    BEQ.S   .set_not_word
    
    MOVEQ   #'\t', D0              ; D0 = '\t'
    CMP.B   (A0), D0               ; *str == '\t'?
    BEQ.S   .set_not_word
    
    MOVEQ   #'\n', D0              ; D0 = '\n'
    CMP.B   (A0), D0               ; *str == '\n'?
    BEQ.S   .set_not_word
    
    TST.B   D3                      ; in_word == TRUE?
    BNE.S   .next_char
    
    ADDQ.L  #1, D2                  ; count++
    MOVEQ   #1, D3                  ; in_word = TRUE
    BRA.S   .next_char

.set_not_word:
    MOVEQ   #0, D3                  ; in_word = FALSE

.next_char:
    ADDQ.L  #1, A0                  ; str++

.loop_test:
    TST.B   (A0)                    ; *str != '\0'?
    BNE.S   .loop_body

.return:
    MOVE.L  D2, D0                  ; return count
    MOVEM.L (SP)+, D2-D3           ; restore
    UNLK    A5
    RTS

SAS/C-specific observations in this output:

  1. LINK A5, #-$08 — frame pointer allocated even though locals are in registers. SAS/C always creates a frame.
  2. $08(A5) — argument access at fixed positive offset from A5 frame pointer.
  3. MOVEQ for small constants — SAS/C peephole optimizer converts MOVE.L #0, Dn to MOVEQ #0, Dn.
  4. Individual CMP.B chains — even with -O2, SAS/C 6.x doesn't merge adjacent compare constants into a jump table for 3 cases. At -O3, it might unroll further.
  5. BRA.S .loop_test — explicit branch to loop test at top. At -O0, the loop test would be duplicated (once at entry, once at bottom).
  6. Register variable assignment: D2 = count, D3 = in_word — optimizer keeps loop variables in registers, not on stack.

Compare with other compilers:

  • SAS/C uses MOVEQ #' ', D0CMP.B (A0), D0 (load constant, then compare)
  • GCC uses CMPI.B #' ', (A0) (compare immediate to memory) — fewer instructions
  • VBCC uses tail-call optimization (BRA.S to shared epilogue) more aggressively

Named Antipatterns

"The A6 Blind Spot" — Assuming Constant Library Base

; BAD analysis: assuming A6 = exec throughout this function
    MOVEA.L _DOSBase, A6          ; A6 = DOS
    JSR     -$1E(A6)              ; Open() — correct
    ; ... many lines later ...
    MOVEA.L _execbase, D0         ; oh wait, loaded something else
    MOVEA.L D0, A6
    JSR     -$C6(A6)              ; THIS IS AllocMem, NOT Write!
    ; If you misidentify A6, every JSR LVO after this point is WRONG

Fix: Track every MOVEA.L xxx, A6 in the function. Each one potentially switches the library context. Search for MOVE.*A6 patterns in IDA.

"The Missing A4" — Small Data Model Confusion

; BAD analysis: treating A4-relative access as unknown offset
    MOVE.L  -$7FFC(A4), D0        ; "what is this? some struct at negative offset?"
; WRONG — this is a small-data global variable accessed via A4 base

; CORRECT identification:
; SAS/C -b0 (small data model): A4 = small-data base pointer
; -$7FFC(A4) = first global in the small-data segment
; This offset is patched at link time by blink

"The Phantom RTS" — Multiple Return Points

; SAS/C functions often have multiple return points from inlined cleanup:
_func:
    LINK    A5, #-N
    MOVEM.L D2-D7/A2-A4, -(SP)
    ; ... code ...
    BEQ     .error_exit            ; early return path
    
    ; ... more code ...
    MOVEM.L (SP)+, D2-D7/A2-A4
    UNLK    A5
    RTS                            ; return point 1

.error_exit:
    MOVEQ   #-1, D0                ; error code
    MOVEM.L (SP)+, D2-D7/A2-A4
    UNLK    A5
    RTS                            ; return point 2
; Both RTS belong to the same C function!

Pitfalls & Common Mistakes

1. Confusing SAS/C __saveds with Interrupt Handlers

; __saveds (NOT an interrupt — returns with RTS):
_saveds_func:
    MOVEM.L D2-D7/A2-A6, -(SP)    ; 13 regs — user-mode function, likely a hook
    ; ...
    MOVEM.L (SP)+, D2-D7/A2-A6
    RTS                            ; RTS, not RTE!

; __interrupt (IS an interrupt — returns with RTE):
_int_func:
    MOVEM.L D0-D7/A0-A6, -(SP)    ; 15 regs (D0-D1, A0-A1 too)
    ; ...
    MOVEM.L (SP)+, D0-D7/A0-A6
    RTE                            ; Return From Exception

Key distinction: __saveds saves D2-D7/A2-A6 (13 registers) and uses RTS. __interrupt saves D0-D7/A0-A6 (15 registers) and uses RTE. The RTE vs RTS tells you whether this runs in user or supervisor context.

2. Misidentifying -b0 (Small Data) Globals as Stack Variables

; Small data model (-b0): A4-relative addressing
    MOVEQ   #0, D0
    MOVE.W  D0, -$1234(A4)        ; stores to global at offset -$1234 from A4 base
; This is NOT a stack access — A4 is NOT the stack pointer (A7/SP is)
; A4 is the small-data base, loaded once at startup and never modified

3. Overlooking __no_stack_check Functions

SAS/C normally inserts stack overflow checks at function entry:

; Normal function (with stack check):
    LINK    A5, #-$200             ; large frame
    JSR     ___check_stack         ; SAS/C stack probe — if missing, __no_stack_check

When __no_stack_check is in effect, the JSR ___check_stack call is absent. This is common in leaf functions and performance-critical code. The absence of this call is a signal that the function was compiled with __no_stack_check.


Use Cases

Software Known to Be SAS/C-Compiled

Application Version RE Clues
Directory Opus 4/5 SAS/C 6.x Complex module system with ARexx integration; _DOSBase/_IntuitionBase globals visible
FinalWriter SAS/C 6.x Large DATA hunk with relocated string tables; custom memory allocator wraps AllocMem
AmigaOS 3.1 ROM SAS/C 6.x + assembly CODE/DATA hunks; RomTag structures at hunk 0 start; __saveds library entries
Deluxe Paint IV SAS/C 5.x Mixed C + assembly; C modules use LINK A5 prologues between hand-tuned asm sections
VistaPro SAS/C 6.x Heavy math; FPU calls via 68881 coprocessor interface; __saveds interrupt handlers
Most NDK 3.9 example code SAS/C 6.x Demonstrates all conventions: __stdargs, __reg, __saveds, pragma libcall

Library Example — Typical .library Compiled with SAS/C

; A SAS/C-compiled shared library entry:
_romtag:
    DC.W    $4AFC
    DC.L    _romtag
    DC.L    _endskip
    DC.B    RTF_AUTOINIT
    DC.B    39                      ; V39
    DC.B    NT_LIBRARY
    DC.B    0
    DC.L    _libname                ; "mylib.library"
    DC.L    _idstring               ; "mylib 39.1 (2026-01-01)"
    DC.L    _inittable

_inittable:
    DC.L    _libsize                ; sizeof(struct MyLibBase)
    DC.L    _funcTable              ; function pointer array
    DC.L    _dataTable              ; NULL for most libraries
    DC.L    _initFunc               ; LibInit() — __saveds

_funcTable:
    DC.L    _LibOpen                ; LVO -$1E
    DC.L    _LibClose               ; LVO -$24
    DC.L    _LibExpunge             ; LVO -$2A
    DC.L    _LibReserved            ; LVO -$30
    DC.L    _MyFunc1                ; LVO -$36
    DC.L    -1                      ; terminator

IDA Python — SAS/C Auto-Detection

def detect_sasc():
    """Detect SAS/C binaries by checking for signature patterns."""
    import idautils, idc

    link_a5_count = 0
    saveds_count = 0
    interrupt_count = 0
    total_functions = 0

    for func_ea in idautils.Functions():
        total_functions += 1
        # Check first 4 instructions of each function
        ea = func_ea
        for i in range(4):
            mnem = idc.print_insn_mnem(ea)
            if mnem == 'LINK' and 'A5' in idc.print_operand(ea, 0):
                link_a5_count += 1
                break
            elif mnem == 'MOVEM.L':
                operands = idc.print_operand(ea, 0) + idc.print_operand(ea, 1)
                if 'A6' in operands:
                    saveds_count += 1
                if all(r in operands for r in ['D0', 'A0']):
                    interrupt_count += 1
                break
            ea = idc.next_head(ea)

    ratio = link_a5_count / total_functions if total_functions > 0 else 0
    if ratio > 0.7:
        print(f"SAS/C DETECTED: {ratio*100:.0f}% functions use LINK A5")
        print(f"  __saveds functions: {saveds_count}")
        print(f"  __interrupt handlers: {interrupt_count}")
        return True
    return False

Cross-Platform Comparison

Platform Equivalent Compiler Similarities to SAS/C Key Differences
Classic Mac OS MPW C / THINK C A5-world for globals (similar to SAS/C -b0 small data); LINK A6 prologues Mac used A5 as "current world" pointer; SAS/C uses A4 for small data
Atari ST (TOS) Lattice C 5.x / Pure C Same Lattice heritage; similar LINK A5/UNLK A5 patterns Atari ST has no library LVO dispatch; TOS calls are TRAP-based
DOS (real mode) Borland Turbo C / Microsoft C 6.0 Same era, similar optimization levels DOS uses BP as frame pointer (like A5), but segmented memory changes everything
Unix (m68k) SunOS cc / System V m68k Same CPU ISA, same register conventions (A6=FP) Unix doesn't use LVO dispatch; shared libraries are dynamic-linked at load time
Modern (x86-64) GCC / Clang with -O2 Same C language, similar optimizer passes (CSE, dead code, peephole) x86-64 uses RBP as frame pointer, but modern compilers omit it by default (unlike SAS/C)

Historical Context

SAS/C evolved from Lattice C (versions 3.x and 4.x), which was the first commercial C compiler for the Amiga. When SAS Institute acquired the product in 1988, they rebranded it as SAS/C starting with version 5.0.

Key timeline:

  • 1985: Lattice C 3.0 — first Amiga C compiler
  • 1988: SAS/C 5.0 — rebranded, major optimizer improvements
  • 1990: SAS/C 5.10 — small data model (-b0), profiler, better debug
  • 1993: SAS/C 6.0 — global optimizer, 68040/060 support
  • 1996: SAS/C 6.58 — final release

SAS/C's dominance meant its conventions became de facto Amiga standards: the LINK A5 frame pointer, the _LibBase global naming, the HUNK_SYMBOL/HUNK_DEBUG format with =APS stabs, and the #pragma libcall calling convention. Later compilers (StormC, even GCC bebbo to some extent) maintained SAS/C compatibility where possible.

The reason SAS/C preserves A5 as a frame pointer in every function (even when -fomit-frame-pointer would be safe) is for stack traceability: SAS/C's profiler (sprof) and debugger (CodeProbe) relied on the linked list of A5 frames to walk the call stack. This is a deliberate tradeoff — slightly larger/slower code in exchange for debuggability.


Modern Analogies

SAS/C Concept Modern Equivalent Notes
LINK A5 frame pointer chain RBP frame chain on x86-64 (when -fno-omit-frame-pointer) Same purpose: debugger call stack unwinding; SAS/C never omits it
_SysBase / _DOSBase globals GOT (Global Offset Table) entries in ELF shared libraries Both provide indirect access to library bases; SAS/C uses named globals, ELF uses GOT slots
#pragma libcall with register encoding __attribute__((fastcall)) or register calling conventions Both let C code match non-standard ABIs; SAS/C's pragma is more explicit about which registers
SAS stabs (=APS) debug info DWARF .debug_info sections Both encode source-level debug data; stabs is simpler, DWARF is far richer
c.o / cres.o startup modules crt0.o / crti.o / crtn.o in GCC Both provide the glue between OS loader and C main()

FAQ

Q: How do I distinguish SAS/C from Lattice C in disassembly? A: Lattice C 3.x uses a simpler prologue (fewer saved registers, less aggressive optimization). SAS/C 5.x+ uses MOVEM.L D2-D7/A2-A4 (9 registers). Lattice C typically saves only D3-D7 (5 registers). Also check the startup code — SAS/C _start includes _WBenchMsg handling; Lattice C may not.

Q: Why do SAS/C binaries have both _main and _init_main symbols? A: _init_main is called by the startup code to run C++ static constructors and initialize the C runtime (if using cres.o). It calls the real _main after initialization. Not all SAS/C binaries have both — it depends on the startup module (c.o vs cres.o).

Q: How do I find all global variables in a SAS/C binary? A: Follow HUNK_RELOC32 entries in the DATA hunk — each relocation points to a global variable. For the small data model (-b0), globals are accessed via $offset(A4) — search for MOVE.x xxx(A4) patterns with negative offsets.

Q: What does _ReturnCode mean at the end of startup? A: It's the global where main()'s return value is stored. SAS/C startup saves D0 to _ReturnCode after calling _main, then returns that value to AmigaDOS as the process return code.


References