amiga-bootcamp/05_reversing/static/compilers/sasc.md

1006 lines
45 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[← Home](../../../README.md) · [Reverse Engineering](../../README.md) · [Static Analysis](../README.md) · [Compilers](README.md)
# SAS/C 5.x/6.x — Reverse Engineering Field Manual
## Overview
**SAS/C** (originally Lattice C, rebranded at version 5) was the dominant commercial C compiler for AmigaOS from 1988 through 1996. Version 6.58 is the final release. An estimated **7080% of Amiga C applications and libraries** from this era were compiled with SAS/C, making it the most common binary format a reverse engineer encounters. It produces code with a distinctive `LINK A5, #-N` + 9-register `MOVEM.L` prologue, absolute (relocated) string addressing, and library calls via `JSR -$XXX(A6)` with explicit global-to-A6 loads before each library call block.
Key constraints to internalize immediately:
- **A5 is the frame pointer** — always; SAS/C never omits the frame pointer. Arguments at positive offsets from A5 (+8, +12, ...), locals at negative offsets.
- **A4 is the small-data base** — when `-b0` (small data model) is enabled, global variables are addressed via `$offset(A4)`. When `-b1` (large data model), absolute addressing + relocation is used.
- **`__saveds` saves 13 registers** — `D2-D7/A2-A6` — the largest register save of any Amiga compiler. This is a unique fingerprint.
- **String constants are absolute-addressed** — `MOVE.L #string, Dn` — and rely on `HUNK_RELOC32` for relocation at load time. This is the opposite of GCC's PC-relative approach.
```mermaid
graph TB
subgraph "Source (.c)"
SRC["C source code"]
end
subgraph "SAS/C Compiler (lc)"
LC["lc (compiler)"]
BLINK["blink (linker)"]
CO["c.o / cres.o (startup)"]
SC["sc.lib (runtime)"]
end
subgraph "Binary Output"
HUNK["Amiga HUNK executable"]
CODE["HUNK_CODE<br/>hunk names: CODE, DATA, BSS"]
SYNTH["HUNK_SYMBOL<br/>(optional, SAS stabs format)"]
DEBUG["HUNK_DEBUG<br/>(=APS tag)"]
end
SRC --> LC
CO --> BLINK
SC --> BLINK
LC --> BLINK
BLINK --> HUNK
HUNK --> CODE
HUNK --> SYNTH
HUNK --> DEBUG
```
---
## Binary Identification — The SAS/C Signature
### Hunk Names
SAS/C uses standard Amiga hunk names:
```
Hunk 0: CODE (executable code + read-only data like jump tables)
Hunk 1: DATA (initialized global variables)
Hunk 2: BSS (zero-initialized global variables)
```
Unlike GCC (which uses `.text`, `.data`, `.bss` per Unix convention), SAS/C uses the Amiga-native `CODE`/`DATA`/`BSS` names. This is the **first filter** — if you see `.text` as a hunk name, it's NOT SAS/C.
When `-b0` (small data model) is active, an additional `__MERGED` hunk may appear for the small-data segment.
### Function Prologue — The Canonical Pattern
The SAS/C prologue is the most recognizable pattern in Amiga reverse engineering:
```asm
; Standard SAS/C function prologue:
_function:
LINK A5, #-N ; A5 = frame pointer, allocate N bytes for locals
MOVEM.L D2-D7/A2-A4, -(SP) ; save 9 callee-saved registers
; ... function body ...
; Standard SAS/C function epilogue:
MOVEM.L (SP)+, D2-D7/A2-A4 ; restore registers (reverse order)
UNLK A5 ; deallocate frame, restore old A5
RTS ; return
```
**Frame pointer offset convention**:
```
(Saved A5) ← A5 points here after LINK
Return address ← +$04(A5)
Arg 1 ← +$08(A5)
Arg 2 ← +$0C(A5)
...
Local var 1 ← -$04(A5)
Local var 2 ← -$08(A5)
...
Saved D2 ← first saved register on stack (below locals)
```
> [!NOTE]
> **Fast double-check**: Count the MOVEM.L registers in the prologue. If it's `D2-D7/A2-A4` (9 registers), it's SAS/C. If it's `D3-D7` (5 registers), it's Aztec C. If it's `D2-D7/A2-A6` (11 registers), it's SAS/C `__saveds`. If there's no LINK at all, it's GCC or VBCC.
### String Addressing — The Globally Relocated Pattern
SAS/C stores string constants in the DATA hunk and references them via absolute addresses with relocation:
```asm
; SAS/C string reference:
MOVE.L #.string_const, D1 ; D1 = pointer to string
; The value #.string_const is patched by HUNK_RELOC32 at load time
JSR -$384(A6) ; PutStr(D1)
; In the DATA hunk:
.string_const: DC.B "Hello, World!", 0
```
This is the **key disambiguator** between SAS/C and GCC:
- SAS/C: `MOVE.L #string, Dn` — absolute address, requiring `HUNK_RELOC32`
- GCC: `LEA string(PC), A0` — PC-relative, no relocation needed
**In IDA/Ghidra**: SAS/C string xrefs are `DATA XREF` from the code to the DATA hunk. GCC strings appear as `CODE XREF` because the LEA references the string *within the same hunk* (GCC places strings in `.text` when PC-relative).
---
## Calling Conventions
SAS/C supports **four calling conventions** within a single binary. Recognizing each from the prologue alone is essential for correct function boundary analysis — but understanding the parameter mapping for each convention is equally critical for reconstructing function prototypes in IDA/Ghidra.
### SAS/C Register Roles — Quick Reference
| Register | `__stdargs` Role | `__reg`/`__regargs` Role | `__saveds`/`__interrupt` Role |
|---|---|---|---|
| **D0** | Return value; arg1 | Return value; arg1 | Saved (interrupt) / scratch (saveds) |
| **D1** | arg2; 64-bit return high word | arg2 | Saved (interrupt) / scratch (saveds) |
| **D2** | Callee-saved local/arg3+ | arg3 (callee-saved) | Saved |
| **D3** | Callee-saved local/arg4+ | arg4 (callee-saved) | Saved |
| **D4** | Callee-saved local | arg5 (callee-saved) | Saved |
| **D5** | Callee-saved local | arg6 (callee-saved) | Saved |
| **D6** | Callee-saved local | arg7 (callee-saved) | Saved |
| **D7** | Callee-saved local | arg8 (callee-saved) | Saved |
| **A0** | Scratch / arg pointer | arg9 (scratch) | Saved (interrupt) / scratch (saveds) |
| **A1** | Scratch / arg pointer | arg10 (scratch) | Saved (interrupt) / scratch (saveds) |
| **A2** | Callee-saved local | Callee-saved (args 11+ on stack) | Saved |
| **A3** | Callee-saved local | Callee-saved | Saved |
| **A4** | Small-data base (`-b0`) or callee-saved | Small-data base or callee-saved | Saved |
| **A5** | **Frame pointer** (LINK A5) | **Frame pointer** | Saved (callee's A5) |
| **A6** | Library base (destroyed across calls) | Library base (destroyed) | Saved (`__saveds` only) |
### 1. `__stdargs` — Standard C Calling (Default)
```asm
; __stdargs function prologue:
LINK A5, #-N
MOVEM.L D2-D7/A2-A4, -(SP) ; 9 registers
; Parameter passing:
; D0, D1 = first two integer/pointer args (through registers)
; (A5+8), (A5+12), ... = remaining args on stack (right-to-left push)
; Register preservation:
; D2-D7, A2-A4 preserved across call
; D0, D1, A0, A1 are scratch (caller-saved)
; A4 preserved (small-data base or callee-saved)
```
**Parameter layout in the callee**:
```
After LINK A5, #-N and MOVEM.L D2-D7/A2-A4, -(SP):
┌──────────────────────────────┐ Higher addresses
│ arg5 │ $18(A5) ← fifth stack arg
│ arg4 │ $14(A5) ← fourth stack arg
│ arg3 │ $10(A5) ← third stack arg
│ arg2 (if >2 params total) │ $0C(A5) ← second stack arg
│ arg1 (if >0 params total) │ $08(A5) ← first stack arg
├──────────────────────────────┤
│ return address │ $04(A5)
├──────────────────────────────┤
│ saved A5 (caller's frame) │ $00(A5) ← A5 points here
│ local var 1 │ -$04(A5)
│ local var 2 │ -$08(A5)
│ ... │
│ local var N │ -$N(A5)
│ saved D2 │ -(N+4)(A5) ← first saved register
│ saved D3 │
│ ... │
│ saved A4 │
└──────────────────────────────┘ Lower addresses (SP) = -(N+36)(A5)
```
> [!NOTE]
> **The "+8 offset rule" for `__stdargs`**: The first stack-based argument is always at `$08(A5)`, even in functions with zero parameters. This is because `$00(A5)` = saved A5, `$04(A5)` = return address, `$08(A5)` = caller's space for arg1. SAS/C always allocates space for the first two register args on the stack as well (they're at `$08(A5)` and `$0C(A5)`), even though the callee accesses them via D0/D1. This means `$10(A5)` is the third parameter (the first truly-stack-based one).
**Locating parameters in SAS/C `__stdargs` disassembly**:
| Parameter | Location | Disassembly Pattern |
|---|---|---|
| **arg1** | D0 on entry | Look for `MOVE.L D0, Dn` or `MOVE.L D0, -$XX(A5)` early in the function |
| **arg2** | D1 on entry | Look for `MOVE.L D1, Dn` immediately after arg1 is handled |
| **arg3** | `$10(A5)` | First truly-stack argument — `MOVE.L $10(A5), Dn` |
| **arg4** | `$14(A5)` | `MOVE.L $14(A5), Dn` |
| **arg5** | `$18(A5)` | `MOVE.L $18(A5), Dn` — sequential 4-byte increments |
```asm
; Example: function with 5 arguments in __stdargs convention
; C prototype: LONG Process(BPTR fh, STRPTR buf, LONG size, LONG flags, LONG mode)
_Process:
LINK A5, #-$10 ; 16 bytes of locals
MOVEM.L D2-D4/A2, -(SP) ; save 4 regs (16 bytes)
MOVE.L D0, D2 ; D2 = fh (arg1 from D0)
MOVEA.L D1, A2 ; A2 = buf (arg2 from D1)
MOVE.L $10(A5), D3 ; D3 = size (arg3 from stack)
MOVE.L $14(A5), D4 ; D4 = flags (arg4 from stack)
MOVE.L $18(A5), -$04(A5) ; mode → local (arg5 from stack)
; ... function body ...
```
### 2. `__reg` / `__regargs` — Register Argument Convention
```asm
; __reg function prologue:
LINK A5, #-N
MOVEM.L D2-D7/A2-A4, -(SP) ; same save set as __stdargs
; Parameter passing difference:
; D0 = arg1 D4 = arg5 A0 = arg9
; D1 = arg2 D5 = arg6 A1 = arg10
; D2 = arg3 D6 = arg7 arg11+ on stack at $08(A5), $0C(A5)...
; D3 = arg4 D7 = arg8
; Up to 10 register arguments before stack overflow
; More efficient for small functions with register-friendly param types
```
**`__reg` parameter-to-register mapping**:
| Param # | Register | Notes |
|---|---|---|
| 1 | D0 | Also holds return value on exit |
| 2 | D1 | Also holds 64-bit return high word |
| 3 | D2 | Callee-saved — caller must not expect D2 preserved across this call |
| 4 | D3 | Callee-saved in normal convention, but passed as arg here |
| 5 | D4 | Callee-saved in normal convention |
| 6 | D5 | Callee-saved in normal convention |
| 7 | D6 | Callee-saved in normal convention |
| 8 | D7 | Callee-saved in normal convention |
| 9 | A0 | Normally scratch — passed as arg here |
| 10 | A1 | Normally scratch — passed as arg here |
| 11+ | Stack at `$08(A5)`, `$0C(A5)`... | Same stack layout as `__stdargs` |
> [!WARNING]
> **`__reg` breaks the normal callee-saved contract.** Under `__stdargs`, D2-D7 are callee-saved — the caller can expect them to survive a function call. Under `__reg`, D2-D7 carry arguments and ARE destroyed by the callee. The SAS/C linker tracks which convention each function uses and generates correct caller-side code, but in hand-analysis this means you CANNOT assume D2-D7 survive a call unless you know the callee's convention.
**Identification**: A `__reg` function looks identical to `__stdargs` in the prologue (LINK A5 + MOVEM.L D2-D7/A2-A4). The difference is at the **call site**`__reg` functions are called with args in many registers, while `__stdargs` uses only D0/D1 + stack. The SAS/C linker resolves the correct convention through its own internal calling-convention table.
### 3. `__saveds` — All-Registers-Saved Convention
```asm
; __saveds function — unique 13-register save:
_saveds_func:
MOVEM.L D2-D7/A2-A6, -(SP) ; 13 registers — the SAS/C fingerprint
; Note: NO LINK A5 before MOVEM.L — __saveds uses a different frame setup
LINK A5, #-N ; frame pointer after register save (sometimes)
; Typical use cases:
; - Interrupt handlers
; - Hook callback functions (struct Hook.h_Entry)
; - Library entry points (LibOpen, LibClose, LibExpunge)
; - Functions called from a different task context
```
> [!WARNING]
> `__saveds` preserves A4 (the small-data base) in addition to the standard set. If you see `MOVEM.L D2-D7/A2-A6, -(SP)` (11 registers for data, 2 for address + A6), it's unequivocally SAS/C `__saveds`. No other Amiga compiler saves A6 in the callee-save set.
**`__saveds` parameter passing**: Uses the same `__stdargs` parameter convention (D0/D1 + stack). The only difference is the prologue saves 13 registers instead of 9, making the function safe to call from any context — even if the caller expects A4, A5, or A6 to be preserved.
### 4. `__interrupt` — Interrupt Handler Convention
```asm
; __interrupt handler — saves ALL registers:
_int_handler:
MOVEM.L D0-D7/A0-A6, -(SP) ; save every register (15 regs)
; ... interrupt body ...
MOVEM.L (SP)+, D0-D7/A0-A6 ; restore all
RTE ; Return From Exception (not RTS!)
```
**Critical identification**: The `RTE` instruction (not `RTS`) marks this as an interrupt handler. Search for `RTE` instructions in the binary — every one points to an interrupt handler, most of which are SAS/C `__interrupt` functions.
---
### Parameter Passing — Locating Args in the Disassembly
This section provides a systematic method for identifying function parameters in SAS/C binary output, organized by convention and argument position.
**For `__stdargs` functions (most common):**
```asm
; Function: LONG DoSomething(LONG a, LONG b, LONG c, LONG d)
; a=D0, b=D1, c=$10(A5), d=$14(A5)
_DoSomething:
LINK A5, #-$08
MOVEM.L D2-D3, -(SP)
MOVE.L D0, D2 ; D2 = a (register → callee-saved)
MOVE.L D1, D3 ; D3 = b (register → callee-saved)
MOVE.L $10(A5), -(SP) ; push c (stack arg → push for sub-call)
MOVE.L $14(A5), D0 ; D0 = d (stack arg → scratch)
; ...
```
**For `__reg`/`__regargs` functions:**
```asm
; Function: LONG __reg DoFast(LONG a, LONG b, LONG c, LONG d, LONG e, LONG f)
; a=D0, b=D1, c=D2, d=D3, e=D4, f=D5
_DoFast:
LINK A5, #$00 ; no locals needed
MOVEM.L D2-D5, -(SP) ; save regs that hold args (they're callee-saved in this convention!)
; Note: D0-D1 are NOT saved (they're scratch + return value)
; D2-D5 ARE saved because __reg treats them as callee-saved AFTER receiving args
; This means: after the MOVEM, D2=arg3, D3=arg4, etc. are on the stack
; The function body must reload them if needed:
MOVE.L (SP), D2 ; reload arg3 from stack (was D2)
ADD.L 4(SP), D2 ; add arg4 (was D3)
; ...
```
> [!NOTE]
> **The `__reg` save paradox**: In `__reg` functions, D2-D7 carry arguments on entry, BUT the callee saves them in the prologue. This means the register values are on the stack after `MOVEM.L`. If you see a `__reg` function that saves D2-D5 and then immediately reloads them from the stack, it's not redundant — it's the convention keeping the callee-save contract while using those registers for parameter passing.
### Register vs Stack Variables in SAS/C
SAS/C's register allocator differs from GCC's in important ways. Understanding how SAS/C decides between register and stack allocation is essential for tracking variable lifetimes in disassembly.
#### SAS/C Register Allocation Heuristics
| Factor | SAS/C Behavior |
|---|---|
| **Optimization level** | -O0: everything on stack. -O1: scalars to registers. -O2: loop counters and frequently-used to D6/D7. -O3: aggressive register coloring, may keep 6+ locals in registers. |
| **Variable type** | 32-bit integers and pointers preferred for D2-D7. 8/16-bit values go to D2-D7 but with masking. Structs and arrays ALWAYS on stack. |
| **Address-taken (`&x`)** | Forces stack allocation — SAS/C cannot take the address of a register. |
| **`register` keyword** | Strong hint to prefer D2-D7. SAS/C respects `register` more aggressively than GCC. |
| **Loop counters** | At -O2+, loop induction variables are placed in D6 or D7 and kept there for the loop body. Look for `DBRA D7, loop`. |
| **Spill strategy** | SAS/C spills D2 first, then D3, etc. (D2 is "least valuable" under SAS/C's cost model). A5-relative stack slots for spills are at negative offsets. |
#### Identifying Register Variables
```asm
; SAS/C -O2: count and in_word in registers
_CountWords:
LINK A5, #-$08 ; 8 bytes of locals (but count & in_word won't use them!)
MOVEM.L D2-D3, -(SP) ; D2-D3 saved → they WILL be used as named locals
MOVEQ #0, D2 ; D2 = count ← initialized in register
MOVEQ #0, D3 ; D3 = in_word ← initialized in register
MOVEA.L $08(A5), A0 ; A0 = str (arg1, loaded once)
.loop:
; ... D2 incremented with ADDQ.L #1, D2 — never loaded from stack
; ... D3 tested with TST.B D3 — always in register
DBRA D7, .loop
MOVE.L D2, D0 ; return count from D2 (not from stack)
MOVEM.L (SP)+, D2-D3
UNLK A5
RTS
```
**Key signs of register variables in SAS/C:**
- Saved in the `MOVEM.L` prologue → the register hosts a named local for the function's lifetime
- Modified with `ADDQ`, `SUBQ`, `MOVEQ`, or `MOVE.L #imm, Dn` operating directly on the register
- Tested with `TST.B Dn`, `CMP.L Dn, Dm` without a preceding stack load
- Returned via `MOVE.L Dn, D0` at function exit
- **Absence of frame-offset references** — the `-$XX(A5)` offsets that would correspond to the variable never appear in load/store instructions
#### Identifying Stack Variables
```asm
; SAS/C -O0: everything on stack
_CountWords_O0:
LINK A5, #-$08 ; 8 bytes: -$04(A5) = count, -$08(A5) = in_word
MOVEM.L D2-D7/A2-A4, -(SP) ; full save (O0 always saves all)
CLR.L -$04(A5) ; count = 0 ← direct stack write
CLR.L -$08(A5) ; in_word = 0
MOVEA.L $08(A5), A0
; ...
ADDQ.L #1, -$04(A5) ; count++ — read-modify-write to stack
; ...
MOVE.L -$08(A5), D0 ; load in_word from stack for test
TST.B D0
.done:
MOVE.L -$04(A5), D0 ; return count from stack
MOVEM.L (SP)+, D2-D7/A2-A4
UNLK A5
RTS
```
**Key signs of stack variables in SAS/C:**
- `-$04(A5)`, `-$08(A5)`, etc. appear repeatedly in `MOVE.L` and `ADDQ.L` instructions
- Every read is a `MOVE.L $offset(A5), Dn`
- Every modification is `MOVE Dn, $offset(A5)` or read-modify-write (`ADDQ #1, $offset(A5)`)
- The same offset is used in multiple non-consecutive instructions
- At -O0: ALL locals are stack-based regardless of type
#### SAS/C Spill Recognition
SAS/C spills registers when a function has more live variables than available registers (D2-D7, A2-A4 = 9 registers max for scalars):
```asm
; 11 local variables + 3 parameters → register pressure at -O2:
_BigFunction:
LINK A5, #-$30 ; 48 bytes of locals (many stack-resident)
MOVEM.L D2-D7/A2-A4, -(SP) ; all usable regs saved
MOVE.L D0, D7 ; D7 = arg1
MOVE.L D1, D6 ; D6 = arg2
; ... D2-D5 used for 4 frequently-accessed locals ...
; Remaining 7 locals live on stack at -$04(A5) through -$1C(A5)
; Spill: D2 needed for a computation, but D2 holds 'count':
MOVE.L D2, -$20(A5) ; spill count to reserved slot
; ... use D2 for temp computation ...
MOVE.L -$20(A5), D2 ; reload count
```
**Spill identification in SAS/C**:
- A register is saved in the prologue MOVEM
- Mid-function, the register's value is stored to a frame offset that appears ONLY in one store/load pair
- The register is then used for a different purpose
- Later, the value is reloaded from that same offset
- The spill slot is typically at a larger negative offset (past the named locals)
#### Optimization Level → Variable Location Quick-Reference
| Level | Register Variables | Stack Variables | Spills |
|---|---|---|---|
| **-O0** | None — D0/D1 only for expression temps | ALL locals, including loop counters | Only for `__reg` param overflow |
| **-O1** | Scalar locals with `register` keyword; simple loop counters | Arrays, structs, address-taken vars | Rare — simple functions |
| **-O2** | Most scalar locals (≤ 9); loop counters in D6/D7 | Arrays, structs, address-taken vars | Functions with >9 scalar locals |
| **-O3** | Aggressive: keeps variables in regs across basic blocks | Same as -O2 | More common due to aggressive inlining increasing register pressure |
### Function Call Site Patterns
Recognizing how callers set up arguments reveals both the callee's convention AND the caller's variable layout.
```asm
; ─── Calling a __stdargs function with 3 args ───
; C: result = Process(fh, buf, size);
MOVE.L -$04(A5), -(SP) ; push size (arg3, right-to-left)
MOVEA.L -$08(A5), A0 ; buf into scratch reg
MOVE.L A0, D1 ; D1 = buf (arg2)
MOVE.L -$0C(A5), D0 ; D0 = fh (arg1)
BSR _Process
ADDQ.L #4, SP ; caller cleans stack arg
; D0 = return value
; ─── Calling a __reg function with 6 args ───
; C: result = DoFast(a, b, c, d, e, f);
MOVE.L f, -(SP) ; arg6 on stack (args 11+, but only 6 here)
MOVEA.L e_ptr, A1 ; A1 = arg5 (address)
MOVE.L d_val, D7 ; D7 = arg4 (but D7 is callee-saved!)
MOVE.L c_val, D6 ; D6 = arg3
MOVE.L b_val, D5 ; D5 = arg2 ← NOTE: D5, not D1!
MOVE.L a_val, D4 ; D4 = arg1 ← NOTE: D4, not D0!
BSR _DoFast ; __reg uses D0-D7 in CALLEE's parameter order
; No stack cleanup needed if only 1 stack arg (popped by callee or ignored)
; ─── Calling a __stdargs function with >2 args (SAS/C -O2 pattern) ───
; The classic pattern: args beyond D0/D1 pushed right-to-left
MOVE.L D2, -(SP) ; arg5 — might be a register variable
MOVE.L $14(A5), -(SP) ; arg4 — might be a stack variable
MOVEA.L $10(A5), A0 ; arg3 → A0, then to stack
MOVE.L A0, -(SP) ; push arg3
MOVEA.L D3, A0 ; arg2 → A0, then to D1
MOVE.L A0, D1 ; D1 = arg2
MOVE.L D4, D0 ; D0 = arg1
BSR _TargetFunc
LEA $0C(SP), SP ; clean 12 bytes (3 stack args)
```
**Key call-site patterns by convention**:
| Convention | Register Args | Stack Args | Caller Cleanup |
|---|---|---|---|
| **`__stdargs`** | D0, D1 only | Push remaining right-to-left | `ADDQ.L #N*4, SP` or `LEA N*4(SP), SP` |
| **`__reg`/`__regargs`** | D0-D7, A0-A1 (sequential) | Push remaining right-to-left | Same as `__stdargs` |
| **`__saveds`** | D0, D1 only (uses `__stdargs` param convention) | Same as `__stdargs` | Same as `__stdargs` |
| **`__interrupt`** | N/A (called by CPU, not by code) | N/A | N/A (RTE handles stack) |
> [!NOTE]
> **The `__reg` D0/D1 anomaly**: In `__reg` functions, D0 and D1 are arg1 and arg2 — just like `__stdargs`. The difference starts at arg3: under `__stdargs` it's on the stack; under `__reg` it's in D2. This means a `__reg` call with up to 2 parameters looks IDENTICAL to `__stdargs` at the call site. Only with 3+ parameters can you distinguish them (D2 loaded with a value before the BSR means `__reg`; stack push means `__stdargs`).
## Library Call Patterns
### The Classic Library Call Sequence
SAS/C library calls follow a rigid, predictable pattern:
```asm
; Step 1: Load library base from global variable
MOVEA.L _DOSBase, A6 ; global → A6 (absolute address + relocation)
; Step 2: Set up arguments in registers
MOVE.L D7, D1 ; arg1: file handle
MOVE.L buffer_ptr, D2 ; arg2: buffer pointer
MOVE.L #$100, D3 ; arg3: length
; Step 3: JSR through LVO
JSR -$2A(A6) ; Read() — LVO = -$2A = -42 decimal
; D0 = return value (bytes read, or -1 on error)
; Step 4: Check return value
TST.L D0
BMI.S .error_handler ; negative = error
```
**Why the explicit global load?** SAS/C does **not** cache A6 across function calls. After every `JSR`/`BSR` that might modify A6, SAS/C reloads the library base from a named global variable (`_DOSBase`, `_IntuitionBase`, `_GfxBase`, etc.) before the next library call. This creates consistent `MOVE.L _LibBase, A6``JSR -$XXX(A6)` pairs that IDA can use to:
1. Identify which library each call targets
2. Rename library calls from LVO offsets
3. Trace library open/close sequences
### Global Library Base Variables
| Global Name | Library | Typical Open Pattern |
|---|---|---|
| `_SysBase` | exec.library | `MOVEA.L 4.W, A6` at startup |
| `_DOSBase` | dos.library | `OpenLibrary("dos.library", 0)` |
| `_IntuitionBase` | intuition.library | `OpenLibrary("intuition.library", version)` |
| `_GfxBase` | graphics.library | `OpenLibrary("graphics.library", version)` |
| `_UtilityBase` | utility.library | `OpenLibrary("utility.library", version)` |
These names typically appear in `HUNK_SYMBOL` if debug info is present. Even without symbols, the pattern `MOVE.L $xxxxxxxx, A6` followed by `JSR -$XXX(A6)` where `$xxxxxxxx` is in the DATA hunk identifies a library base global.
### LVO Dispatch
```asm
; All library calls: JSR -$offset(A6)
; where offset = LVO * 6 (each LVO entry is 6 bytes: JMP instruction)
; LVO $01 = offset -$06
; LVO $1E = offset -$B4 (for Open)
; LVO $2A = offset -$FC (for Read)
; Common SAS/C library call frames:
MOVEA.L _DOSBase, A6
JSR -$1E(A6) ; Open() — LVO $05
JSR -$24(A6) ; Close() — LVO $06
JSR -$2A(A6) ; Read() — LVO $07
JSR -$30(A6) ; Write() — LVO $08
JSR -$36(A6) ; Seek() — LVO $09
```
---
## Pragmas and Code Generation Effects
SAS/C `#pragma` directives alter code generation in ways visible in disassembly:
### `#pragma amicall` — Library Call Convention
```c
#pragma amicall(DOSBase, 0x1E, Open(d1, d2))
// Generates: args in D1, D2; return in D0; A6 = DOSBase
```
In disassembly, `amicall` functions are indistinguishable from `__reg` functions — they use register arguments. The difference is that `amicall` functions rely on the pragma for their calling convention rather than the `__reg` keyword.
### `#pragma amiga-align` — Struct Alignment
This pragma changes struct field alignment from the compiler default to AmigaOS natural alignment. In disassembly, it affects **struct field offsets** — without this pragma, 16-bit fields might be at odd offsets (breaking hardware register access). With it, all fields align to their natural boundaries.
**Detection**: If you see struct access at offsets consistent with `sizeof(UWORD) = 2` and `sizeof(ULONG) = 4` alignment, the code was compiled with `amiga-align`. If you see misaligned access (e.g., `MOVE.W $000F(A0)`, odd offset), alignment is off.
### `#pragma donotcombine` — Inhibit Optimization
Prevents the optimizer from combining adjacent operations. In disassembly, this produces "unoptimized-looking" code even at `-O2` — sequential loads/stores that a normal optimizer would merge into a single `MOVE.L` pair.
### `#pragma stackextent` — Stack Size Specification
```c
#pragma stackextent 8192 // 8 KB stack
// Embedded in hunk header: HUNK_HEADER stack_size field
```
Visible in the HUNK header, not the code. The `hunkinfo` tool shows the stack size. In IDA, check the hunk header fields.
---
## Optimization Levels — Reading the Tea Leaves
SAS/C optimization levels produce progressively more aggressive transformations visible in disassembly:
| Level | Flag | What Changes in the Binary |
|---|---|---|
| **-O0** (none) | No flag | Every C statement → separate instruction sequence. Redundant loads/stores. Full register save even when unused. |
| **-O1** (basic) | `-O` | Dead code elimination, constant folding. `MOVE.L #0, D0``MOVEQ #0, D0`. Simple peephole. |
| **-O2** (global) | `-O -O` | Common subexpression elimination. Loop-invariant code motion. `for` loop counter in register, not stack. |
| **-O3** (aggressive) | `-O -O -O` | Function inlining (small static functions). Branch optimization. `MOVE.L (A0)+, (A1)+` for struct copies. |
**How to identify optimization level from binary**:
- **-O0**: Every local variable lives at `-$XX(A5)` (on stack). Every expression is loaded, computed, stored separately.
- **-O1+**: Variables kept in registers across multiple statements. Stack traffic reduced.
- **-O2+**: Loop counters in D6/D7. `DBRA Dn, loop` patterns.
- **-O3**: Small helper functions inlined — no `BSR`/`RTS` for functions called once.
---
## Startup Code — `c.o` vs `cres.o`
### Standard Startup (`c.o` — CLI/Workbench)
```asm
; SAS/C c.o entry point — the first code in HUNK_CODE:
_start:
MOVE.L 4.W, A6 ; SysBase = *(ULONG *)4
MOVE.L A6, _SysBase ; store in global
MOVE.L D0, _RawCommandLen ; save CLI arg length
MOVE.L A0, _RawCommandStr ; save CLI arg pointer
TST.L A1 ; A1 = NULL → CLI, non-NULL → Workbench
BEQ.S .cli_entry
.wb_entry:
MOVE.L A1, _WBenchMsg ; save WBStartup message
JSR _OpenLibraries
BSR _main ; call main() — BSR, not JSR
BRA.S .exit
.cli_entry:
JSR _OpenLibraries
BSR _main
.exit:
MOVE.L D0, _ReturnCode
JSR _CloseLibraries
RTS
```
**Key RE insight**: To find `main()`, locate the `_start` entry point and look for the **first `BSR`** after the library open sequence. That `BSR` target is `main()`.
### Resident Startup (`cres.o` — Libraries/ROM)
```asm
; cres.o generates a RomTag for auto-init libraries:
_romtag:
DC.W $4AFC ; RTC_MATCHWORD
DC.L _romtag ; RT_MATCHTAG (self-pointer)
DC.L _endskip ; RT_ENDSKIP
DC.B RTF_AUTOINIT ; RT_FLAGS
DC.B 39 ; RT_VERSION (V39 = OS 3.1)
DC.B NT_LIBRARY ; RT_TYPE
DC.B 0 ; RT_PRI
DC.L _libname ; RT_NAME
DC.L _idstring ; RT_IDSTRING
DC.L _inittable ; RT_INIT (InitTable)
```
---
## Debug Info — SAS Stabs Format
SAS/C uses its own stabs variant with the `=APS` tag in `HUNK_DEBUG`:
```
HUNK_DEBUG format:
=APS tag at start of debug hunk
Source file: =APS filename.c
Function: =APS _funcname
Line number: =APS 123
Local var: =APS varname:D(0,13) ← D0-D7 or A0-A7 (13=offset in stack)
=APS varname:S(4) ← S(offset) = stack-based
```
**IDA/Ghidra integration**: The Amiga HUNK loader plugin for IDA can parse SAS stabs and create local variable names. Without the plugin, the `=APS` strings are visible in the debug hunk as ASCII strings that can be manually cross-referenced.
---
## Same C Function — SAS/C Output
```asm
; CountWords() — SAS/C 6.58, -O2, -b1 (large data):
; C prototype: ULONG CountWords(CONST_STRPTR str)
_CountWords:
LINK A5, #-$08 ; 8 bytes for locals: count, in_word
MOVEM.L D2-D3, -(SP) ; save D2-D3 (only registers actually used)
; count = 0
MOVEQ #0, D2 ; D2 = count (register variable)
; in_word = FALSE
MOVEQ #0, D3 ; D3 = in_word (register variable)
; str → A0
MOVEA.L $08(A5), A0 ; A0 = str (arg1)
BRA.S .loop_test
.loop_body:
MOVEQ #' ', D0 ; D0 = ' '
CMP.B (A0), D0 ; *str == ' '?
BEQ.S .set_not_word
MOVEQ #'\t', D0 ; D0 = '\t'
CMP.B (A0), D0 ; *str == '\t'?
BEQ.S .set_not_word
MOVEQ #'\n', D0 ; D0 = '\n'
CMP.B (A0), D0 ; *str == '\n'?
BEQ.S .set_not_word
TST.B D3 ; in_word == TRUE?
BNE.S .next_char
ADDQ.L #1, D2 ; count++
MOVEQ #1, D3 ; in_word = TRUE
BRA.S .next_char
.set_not_word:
MOVEQ #0, D3 ; in_word = FALSE
.next_char:
ADDQ.L #1, A0 ; str++
.loop_test:
TST.B (A0) ; *str != '\0'?
BNE.S .loop_body
.return:
MOVE.L D2, D0 ; return count
MOVEM.L (SP)+, D2-D3 ; restore
UNLK A5
RTS
```
**SAS/C-specific observations in this output**:
1. **`LINK A5, #-$08`** — frame pointer allocated even though locals are in registers. SAS/C always creates a frame.
2. **`$08(A5)`** — argument access at fixed positive offset from A5 frame pointer.
3. **`MOVEQ`** for small constants — SAS/C peephole optimizer converts `MOVE.L #0, Dn` to `MOVEQ #0, Dn`.
4. **Individual `CMP.B` chains** — even with `-O2`, SAS/C 6.x doesn't merge adjacent compare constants into a jump table for 3 cases. At `-O3`, it might unroll further.
5. **`BRA.S .loop_test`** — explicit branch to loop test at top. At `-O0`, the loop test would be duplicated (once at entry, once at bottom).
6. **Register variable assignment**: `D2 = count`, `D3 = in_word` — optimizer keeps loop variables in registers, not on stack.
**Compare with other compilers**:
- SAS/C uses `MOVEQ #' ', D0``CMP.B (A0), D0` (load constant, then compare)
- GCC uses `CMPI.B #' ', (A0)` (compare immediate to memory) — fewer instructions
- VBCC uses tail-call optimization (`BRA.S` to shared epilogue) more aggressively
---
## Named Antipatterns
### "The A6 Blind Spot" — Assuming Constant Library Base
```asm
; BAD analysis: assuming A6 = exec throughout this function
MOVEA.L _DOSBase, A6 ; A6 = DOS
JSR -$1E(A6) ; Open() — correct
; ... many lines later ...
MOVEA.L _execbase, D0 ; oh wait, loaded something else
MOVEA.L D0, A6
JSR -$C6(A6) ; THIS IS AllocMem, NOT Write!
; If you misidentify A6, every JSR LVO after this point is WRONG
```
**Fix**: Track **every** `MOVEA.L xxx, A6` in the function. Each one potentially switches the library context. Search for `MOVE.*A6` patterns in IDA.
### "The Missing A4" — Small Data Model Confusion
```asm
; BAD analysis: treating A4-relative access as unknown offset
MOVE.L -$7FFC(A4), D0 ; "what is this? some struct at negative offset?"
; WRONG — this is a small-data global variable accessed via A4 base
; CORRECT identification:
; SAS/C -b0 (small data model): A4 = small-data base pointer
; -$7FFC(A4) = first global in the small-data segment
; This offset is patched at link time by blink
```
### "The Phantom RTS" — Multiple Return Points
```asm
; SAS/C functions often have multiple return points from inlined cleanup:
_func:
LINK A5, #-N
MOVEM.L D2-D7/A2-A4, -(SP)
; ... code ...
BEQ .error_exit ; early return path
; ... more code ...
MOVEM.L (SP)+, D2-D7/A2-A4
UNLK A5
RTS ; return point 1
.error_exit:
MOVEQ #-1, D0 ; error code
MOVEM.L (SP)+, D2-D7/A2-A4
UNLK A5
RTS ; return point 2
; Both RTS belong to the same C function!
```
---
## Pitfalls & Common Mistakes
### 1. Confusing SAS/C `__saveds` with Interrupt Handlers
```asm
; __saveds (NOT an interrupt — returns with RTS):
_saveds_func:
MOVEM.L D2-D7/A2-A6, -(SP) ; 13 regs — user-mode function, likely a hook
; ...
MOVEM.L (SP)+, D2-D7/A2-A6
RTS ; RTS, not RTE!
; __interrupt (IS an interrupt — returns with RTE):
_int_func:
MOVEM.L D0-D7/A0-A6, -(SP) ; 15 regs (D0-D1, A0-A1 too)
; ...
MOVEM.L (SP)+, D0-D7/A0-A6
RTE ; Return From Exception
```
**Key distinction**: `__saveds` saves D2-D7/A2-A6 (13 registers) and uses `RTS`. `__interrupt` saves D0-D7/A0-A6 (15 registers) and uses `RTE`. The `RTE` vs `RTS` tells you whether this runs in user or supervisor context.
### 2. Misidentifying `-b0` (Small Data) Globals as Stack Variables
```asm
; Small data model (-b0): A4-relative addressing
MOVEQ #0, D0
MOVE.W D0, -$1234(A4) ; stores to global at offset -$1234 from A4 base
; This is NOT a stack access — A4 is NOT the stack pointer (A7/SP is)
; A4 is the small-data base, loaded once at startup and never modified
```
### 3. Overlooking `__no_stack_check` Functions
SAS/C normally inserts stack overflow checks at function entry:
```asm
; Normal function (with stack check):
LINK A5, #-$200 ; large frame
JSR ___check_stack ; SAS/C stack probe — if missing, __no_stack_check
```
When `__no_stack_check` is in effect, the `JSR ___check_stack` call is absent. This is common in leaf functions and performance-critical code. The absence of this call is a signal that the function was compiled with `__no_stack_check`.
---
## Use Cases
### Software Known to Be SAS/C-Compiled
| Application | Version | RE Clues |
|---|---|---|
| **Directory Opus 4/5** | SAS/C 6.x | Complex module system with ARexx integration; `_DOSBase`/`_IntuitionBase` globals visible |
| **FinalWriter** | SAS/C 6.x | Large DATA hunk with relocated string tables; custom memory allocator wraps AllocMem |
| **AmigaOS 3.1 ROM** | SAS/C 6.x + assembly | `CODE`/`DATA` hunks; RomTag structures at hunk 0 start; `__saveds` library entries |
| **Deluxe Paint IV** | SAS/C 5.x | Mixed C + assembly; C modules use LINK A5 prologues between hand-tuned asm sections |
| **VistaPro** | SAS/C 6.x | Heavy math; FPU calls via 68881 coprocessor interface; `__saveds` interrupt handlers |
| **Most NDK 3.9 example code** | SAS/C 6.x | Demonstrates all conventions: `__stdargs`, `__reg`, `__saveds`, `pragma libcall` |
### Library Example — Typical .library Compiled with SAS/C
```asm
; A SAS/C-compiled shared library entry:
_romtag:
DC.W $4AFC
DC.L _romtag
DC.L _endskip
DC.B RTF_AUTOINIT
DC.B 39 ; V39
DC.B NT_LIBRARY
DC.B 0
DC.L _libname ; "mylib.library"
DC.L _idstring ; "mylib 39.1 (2026-01-01)"
DC.L _inittable
_inittable:
DC.L _libsize ; sizeof(struct MyLibBase)
DC.L _funcTable ; function pointer array
DC.L _dataTable ; NULL for most libraries
DC.L _initFunc ; LibInit() — __saveds
_funcTable:
DC.L _LibOpen ; LVO -$1E
DC.L _LibClose ; LVO -$24
DC.L _LibExpunge ; LVO -$2A
DC.L _LibReserved ; LVO -$30
DC.L _MyFunc1 ; LVO -$36
DC.L -1 ; terminator
```
---
## IDA Python — SAS/C Auto-Detection
```python
def detect_sasc():
"""Detect SAS/C binaries by checking for signature patterns."""
import idautils, idc
link_a5_count = 0
saveds_count = 0
interrupt_count = 0
total_functions = 0
for func_ea in idautils.Functions():
total_functions += 1
# Check first 4 instructions of each function
ea = func_ea
for i in range(4):
mnem = idc.print_insn_mnem(ea)
if mnem == 'LINK' and 'A5' in idc.print_operand(ea, 0):
link_a5_count += 1
break
elif mnem == 'MOVEM.L':
operands = idc.print_operand(ea, 0) + idc.print_operand(ea, 1)
if 'A6' in operands:
saveds_count += 1
if all(r in operands for r in ['D0', 'A0']):
interrupt_count += 1
break
ea = idc.next_head(ea)
ratio = link_a5_count / total_functions if total_functions > 0 else 0
if ratio > 0.7:
print(f"SAS/C DETECTED: {ratio*100:.0f}% functions use LINK A5")
print(f" __saveds functions: {saveds_count}")
print(f" __interrupt handlers: {interrupt_count}")
return True
return False
```
---
## Cross-Platform Comparison
| Platform | Equivalent Compiler | Similarities to SAS/C | Key Differences |
|---|---|---|---|
| **Classic Mac OS** | MPW C / THINK C | A5-world for globals (similar to SAS/C -b0 small data); LINK A6 prologues | Mac used A5 as "current world" pointer; SAS/C uses A4 for small data |
| **Atari ST (TOS)** | Lattice C 5.x / Pure C | Same Lattice heritage; similar LINK A5/UNLK A5 patterns | Atari ST has no library LVO dispatch; TOS calls are TRAP-based |
| **DOS (real mode)** | Borland Turbo C / Microsoft C 6.0 | Same era, similar optimization levels | DOS uses BP as frame pointer (like A5), but segmented memory changes everything |
| **Unix (m68k)** | SunOS cc / System V m68k | Same CPU ISA, same register conventions (A6=FP) | Unix doesn't use LVO dispatch; shared libraries are dynamic-linked at load time |
| **Modern (x86-64)** | GCC / Clang with `-O2` | Same C language, similar optimizer passes (CSE, dead code, peephole) | x86-64 uses RBP as frame pointer, but modern compilers omit it by default (unlike SAS/C) |
---
## Historical Context
SAS/C evolved from **Lattice C** (versions 3.x and 4.x), which was the first commercial C compiler for the Amiga. When SAS Institute acquired the product in 1988, they rebranded it as SAS/C starting with version 5.0.
Key timeline:
- **1985**: Lattice C 3.0 — first Amiga C compiler
- **1988**: SAS/C 5.0 — rebranded, major optimizer improvements
- **1990**: SAS/C 5.10 — small data model (`-b0`), profiler, better debug
- **1993**: SAS/C 6.0 — global optimizer, 68040/060 support
- **1996**: SAS/C 6.58 — final release
SAS/C's dominance meant its conventions became **de facto Amiga standards**: the `LINK A5` frame pointer, the `_LibBase` global naming, the `HUNK_SYMBOL`/`HUNK_DEBUG` format with `=APS` stabs, and the `#pragma libcall` calling convention. Later compilers (StormC, even GCC bebbo to some extent) maintained SAS/C compatibility where possible.
The reason SAS/C preserves A5 as a frame pointer in **every** function (even when `-fomit-frame-pointer` would be safe) is for stack traceability: SAS/C's profiler (`sprof`) and debugger (CodeProbe) relied on the linked list of A5 frames to walk the call stack. This is a deliberate tradeoff — slightly larger/slower code in exchange for debuggability.
---
## Modern Analogies
| SAS/C Concept | Modern Equivalent | Notes |
|---|---|---|
| `LINK A5` frame pointer chain | `RBP` frame chain on x86-64 (when `-fno-omit-frame-pointer`) | Same purpose: debugger call stack unwinding; SAS/C never omits it |
| `_SysBase` / `_DOSBase` globals | GOT (Global Offset Table) entries in ELF shared libraries | Both provide indirect access to library bases; SAS/C uses named globals, ELF uses GOT slots |
| `#pragma libcall` with register encoding | `__attribute__((fastcall))` or register calling conventions | Both let C code match non-standard ABIs; SAS/C's pragma is more explicit about which registers |
| SAS stabs (`=APS`) debug info | DWARF `.debug_info` sections | Both encode source-level debug data; stabs is simpler, DWARF is far richer |
| `c.o` / `cres.o` startup modules | `crt0.o` / `crti.o` / `crtn.o` in GCC | Both provide the glue between OS loader and C `main()` |
---
## FAQ
**Q: How do I distinguish SAS/C from Lattice C in disassembly?**
A: Lattice C 3.x uses a simpler prologue (fewer saved registers, less aggressive optimization). SAS/C 5.x+ uses `MOVEM.L D2-D7/A2-A4` (9 registers). Lattice C typically saves only `D3-D7` (5 registers). Also check the startup code — SAS/C `_start` includes `_WBenchMsg` handling; Lattice C may not.
**Q: Why do SAS/C binaries have both `_main` and `_init_main` symbols?**
A: `_init_main` is called by the startup code to run C++ static constructors and initialize the C runtime (if using `cres.o`). It calls the real `_main` after initialization. Not all SAS/C binaries have both — it depends on the startup module (`c.o` vs `cres.o`).
**Q: How do I find all global variables in a SAS/C binary?**
A: Follow `HUNK_RELOC32` entries in the DATA hunk — each relocation points to a global variable. For the small data model (`-b0`), globals are accessed via `$offset(A4)` — search for `MOVE.x xxx(A4)` patterns with negative offsets.
**Q: What does `_ReturnCode` mean at the end of startup?**
A: It's the global where `main()`'s return value is stored. SAS/C startup saves D0 to `_ReturnCode` after calling `_main`, then returns that value to AmigaDOS as the process return code.
---
## References
- [13_toolchain/sasc.md](../../../13_toolchain/sasc.md) — SAS/C usage and compiler flags
- [compiler_fingerprints.md](../../compiler_fingerprints.md) — Quick compiler identification
- [startup_code.md](../../../04_linking_and_libraries/startup_code.md) — Startup code internals (c.o, cres.o)
- [register_conventions.md](../../../04_linking_and_libraries/register_conventions.md) — AmigaOS register ABI
- [pragma_format.md](../../../13_toolchain/pragmas.md) — SAS/C pragma encoding details
- [hunk_debug_info.md](../../../03_loader_and_exec_format/hunk_debug_info.md) — SAS stabs format
- *SAS/C 6.x Programmer's Guide* — Code generation appendix
- *SAS/C 6.x Linker Manual* — blink flags and hunk layout
- See also: [gcc.md](gcc.md), [vbcc.md](vbcc.md), [stormc.md](stormc.md) — compare with other compilers