[← Home](../../README.md) · [Reverse Engineering](../README.md)
# Compiler-Specific Code Generation Patterns
## Overview
You've loaded a HUNK binary into IDA Pro. Before you can even begin tracing logic, you need to answer a basic question: **which compiler produced this code?** The answer determines everything else — whether strings are PC-relative or absolute, whether `main()` starts with `LINK A5` or `SUBQ.L #N,SP`, whether `DIVS.L` is a compiler intrinsic or a library call.
Amiga compilers — SAS/C, GCC, VBCC, StormC, Aztec C — each leave a **fingerprint** in the generated assembly. These fingerprints are consistent enough that a single function prologue can identify the compiler with >90% accuracy. This article catalogs the distinguishing patterns for each major Amiga compiler and provides a systematic methodology for compiler identification from disassembly alone.
```mermaid
graph LR
subgraph "Source"
C["C source
myapp.c"]
end
subgraph "Compiler Backend"
SAS["SAS/C
LINK A5 + MOVEM
absolute strings"]
G["GCC
SUBQ.L + PC-relative
libgcc helpers"]
VB["VBCC
MOVEM only
tight loops"]
end
subgraph "Disassembly Fingerprint"
IDA["IDA Pro / Ghidra"]
end
C --> SAS
C --> G
C --> VB
SAS -->|"LINK A5 #-N
MOVE.L #_str,D1"| IDA
G -->|"SUBQ.L #N,SP
LEA _str(PC),A0"| IDA
VB -->|"MOVEM.L D2-D4,-(SP)
minimal overhead"| IDA
```
---
## SAS/C 6.x Patterns
### Function Prologue / Epilogue
```asm
; Non-leaf function with local vars:
LINK A5, #-N ; allocate N bytes of locals on stack
MOVEM.L D2-D7/A2-A3, -(SP) ; save preserved registers
...
MOVEM.L (SP)+, D2-D7/A2-A3
UNLK A5
RTS
; Leaf function (no locals, no preserved regs):
; — no LINK, pure computation, ends in RTS
```
### D0 Save Pattern
SAS/C saves D0 at the start of functions that need it later:
```asm
MOVE.L D0, -(SP) ; save return value from previous call
JSR another_func
MOVE.L (SP)+, D0 ; restore
```
### Register Argument Passing
SAS/C passes OS call args via `#pragma amicall` register placement. Inside application functions, SAS/C uses a **stack-based C ABI** (unlike OS calls):
```asm
; C function call in SAS/C: push args right-to-left
MOVE.L arg3, -(SP)
MOVE.L arg2, -(SP)
MOVE.L arg1, -(SP)
JSR _myfunction
ADDQ.L #12, SP ; clean args (caller cleanup)
```
### String Constants
SAS/C places string literals in the **data hunk**, referenced via absolute addresses requiring `HUNK_RELOC32`:
```asm
MOVE.L #_str_hello, D1 ; absolute address → RELOC32 entry
MOVEA.L _DOSBase, A6
JSR (-48,A6) ; Write(stdout, "hello", ...)
```
---
## GCC (m68k-amigaos / bebbo) Patterns
### PC-Relative String Access
GCC uses PC-relative addressing by default, eliminating most HUNK_RELOC32 entries:
```asm
LEA _str_hello(PC), A0 ; PC-relative — no reloc needed
```
### No Frame Pointer (Default)
```asm
; GCC -O2 leaf function:
MOVEM.L D2/A2, -(SP) ; only save what's used
...
MOVEM.L (SP)+, D2/A2
RTS
; No LINK/UNLK — pure register allocation
```
### GCC Function Prologues
```asm
; Non-leaf with GCC -fno-omit-frame-pointer:
LINK A6, #-N ; note: GCC uses A6 as frame pointer here
; (conflicts with OS library base usage — rare)
; More common with -O2:
SUBQ.L #N, SP ; allocate locals without frame pointer
```
### Integer Division / Modulo
GCC emits calls to `__divsi3`, `__modsi3` from `libgcc`:
```asm
JSR ___divsi3 ; 32-bit signed divide (libgcc helper)
; operands in D0:D1, result in D0
```
SAS/C uses the 68k `DIVS.L` instruction directly (available on 020+) or `DIVS.W`.
---
## VBCC Patterns
VBCC generates very tight code with minimal function overhead:
```asm
; VBCC typical function (no frame pointer, minimal saves):
MOVEM.L D2-D4, -(SP)
...
MOVEM.L (SP)+, D2-D4
RTS
```
VBCC's OS call inline expansion looks identical to GCC's inline-asm stubs.
---
## StormC 3.x / 4.x Patterns
StormC was the first native Amiga C++ IDE. It used a custom frontend (based on EDG) but generated Amiga hunk output directly.
### Function Prologue
```asm
; StormC typical function:
LINK A6, #-N ; StormC uses A6 as frame pointer by default
MOVEM.L D2-D7/A2-A5, -(SP) ; aggressive register save
; ...
MOVEM.L (SP)+, D2-D7/A2-A5
UNLK A6
RTS
```
> [!WARNING]
> StormC's use of `A6` as a frame pointer conflicts with the OS convention of `A6` = library base. In StormC-compiled code, A6 near `LINK`/`UNLK` is a frame pointer, NOT a library base. This is the #1 misidentification cause when reversing StormC output.
### Distinguishing from SAS/C
| Pattern | SAS/C | StormC |
|---|---|---|
| Frame pointer register | A5 | A6 |
| Preserved registers | D2-D7/A2-A3 | D2-D7/A2-A5 |
| Startup module | `__main` | `_main` with C++ static constructor calls |
---
## Aztec C 5.x Patterns
Aztec C (Manx) was a popular budget compiler in the late 1980s. Its code generation is primitive compared to SAS/C or GCC.
### Distinctive Features
```asm
; Aztec C function — no LINK, uses stack offset from SP directly:
MOVE.L D2, -(SP) ; save only what's needed
...
MOVE.L (SP)+, D2
RTS
```
Aztec C is identifiable by **absence of LINK/UNLK** combined with **16-bit MOVE** instructions where other compilers use 32-bit (e.g., `MOVE.W D0, 4(SP)` instead of `MOVE.L`). It also generates `JSR ___ltoa` and `JSR ___printf` calls with AZTEC-prefixed helper names.
---
## Hand-Coded Assembly (Assembler)
Not all Amiga code came from a compiler. Demos, games, and high-performance libraries were often hand-written in assembler.
### Telltale Signs
| Sign | What It Means |
|---|---|
| `MOVEM.L D0-D7/A0-A6, -(SP)` | No compiler saves ALL registers — this is hand-coded |
| `BTST #6, ($BFE001)` | Direct CIA register read — compilers go through `cia.resource` or `graphics.library` |
| `MOVE.W #$4000, ($DFF09A)` | Direct custom chip register write — compilers use OS functions |
| `LEA _copperlist(PC), A0` + `MOVE.L A0, ($DFF080)` | Hardware banging with PC-relative addressing |
| `MOVE SR, D0` / `ANDI #$F8FF, SR` | Supervisor mode toggling — no compiler generates this |
| Missing startup stub | No `MOVE.L 4.W, A6` — the code runs bare-metal |
If you see direct register pokes to `$DFFxxx` or `$BFExxx` without any OS library calls, you're looking at hand-coded assembly, and standard compiler identification doesn't apply.
---
---
## Distinguishing Compiler Artefacts from Logic
| Pattern | Compiler | Meaning |
|---|---|---|
| `LINK A5, #-N` | SAS/C | Function with locals |
| `LINK A6, #-N` | GCC (rare) | Frame pointer mode |
| `JSR ___divsi3` | GCC | Software 32-bit division |
| `DIVS.L D1, D0` | SAS/C (020+) | Hardware divide |
| `MULS.L D1, D0` | SAS/C (020+) | Hardware multiply |
| `LEA str(PC), A0` | GCC | PC-relative string ref |
| `MOVE.L #_str, D1` | SAS/C | Absolute string ref (reloc'd) |
| `JSR _main` | Startup | C main() entry point |
| `MOVE.L 4.W, A6` | Startup | SysBase load |
| `JSR -552(A6)` | Any | exec.library OpenLibrary |
---
## Locating `main()` via Startup Skip
After identifying the startup stub (`MOVE.L 4.W, A6` → `JSR _OpenLibraries`):
1. Find the first `JSR` or `BSR` after library opens
2. That target is `__main` or directly `_main`
3. If `__main`: follow its internal `JSR _main` call
4. Label the target `main` in IDA
---
## Decision Guide — Compiler Identification Flowchart
```mermaid
graph TD
Q["Unknown function
prologue seen"]
Q -->|"LINK A5, #-N"| SAS["SAS/C 6.x
confirmed"]
Q -->|"LINK A6, #-N"| BRANCH["Check register save"]
BRANCH -->|"Saves A2-A5"| STORM["StormC
confirmed"]
BRANCH -->|"Saves A2-A3"| GCC_RARE["GCC with
-fno-omit-frame-pointer"]
Q -->|"SUBQ.L #N,SP
or MOVEM only"| BRANCH2["Check string refs"]
BRANCH2 -->|"LEA str(PC), A0"| GCC["GCC (bebbo)
confirmed"]
BRANCH2 -->|"MOVE.L #_str, D1"| SAS2["SAS/C (leaf
function)"]
Q -->|"No LINK/UNLK
16-bit MOVEs"| AZTEC["Aztec C 5.x
confirmed"]
Q -->|"Direct $DFFxxx
register writes"| ASM["Hand-coded
assembly"]
```
| Clue | Compiler | Confidence |
|---|---|---|
| `LINK A5, #-N` + `MOVEM.L D2-D7/A2-A3` | SAS/C 6.x | >95% |
| `LINK A6, #-N` + `MOVEM.L D2-D7/A2-A5` | StormC | >90% |
| `SUBQ.L #N, SP` + `LEA (PC), An` | GCC (bebbo) | >95% |
| `JSR ___divsi3` / `JSR ___modsi3` | GCC (bebbo) | 100% |
| `DIVS.L` with no JSR | SAS/C 6.x | >90% |
| `MOVE.W D0, 4(SP)` (16-bit stack ops) | Aztec C | >80% |
| Direct `$DFFxxx` write, no OS calls | Assembler | 100% |
| `MOVEM.L D0-D7/A0-A6` | Assembler | 100% |
---
## Named Antipatterns
### 1. "The Frame Pointer Confusion"
**What it looks like** — assuming A6 always holds a library base in StormC-compiled code:
```asm
LINK A6, #-24 ; A6 is now a FRAME POINTER, not a library base
MOVEM.L D2-D5/A2-A5, -(SP)
MOVEA.L (_DOSBase).L, A6 ; NOW A6 is a library base — but LINK changed it
JSR (-30,A6) ; this works only because A6 was reloaded
```
**Why it fails:** StormC uses A6 as the C frame pointer. Between `LINK A6` and the library base reload, A6 points to the stack frame, not a library. Any `JSR (-N,A6)` in that window hits the stack as a fake "JMP table" and crashes.
**Correct:** In StormC output, always verify that A6 was reloaded from a known library global before treating `JSR (-N,A6)` as a library call.
### 2. "The String Reloc Mirage"
**What it looks like** — seeing `MOVE.L #$XXXXXXXX, D1` and assuming it's an immediate value when it's actually a relocation:
```asm
MOVE.L #$00001234, D1 ; in the raw binary, this is $00001234
; after HUNK_RELOC32, it becomes actual string addr
JSR (-48,A6) ; Write(stdout, ???)
```
**Why it fails:** Without parsing `HUNK_RELOC32` entries, `#$00001234` looks like a constant. But it's a placeholder that exec replaces with the actual address at load time. You can't know what string it points to from static analysis alone — you need to read the relocation target.
**Correct:** Always cross-reference `HUNK_RELOC32` entries (see [hunk_reconstruction.md](hunk_reconstruction.md)) before interpreting `MOVE.L #immediate` as a value in SAS/C output.
---
## Use-Case Cookbook
### Pattern 1: Identify the Compiler from a Single Function
```ascii
┌─────────────────────────────────┐
│ 1. Look at function prologue │
│ ├─ LINK A5? → SAS/C │
│ ├─ LINK A6? → StormC │
│ ├─ SUBQ.L #N,SP? → GCC │
│ └─ None? → Continue │
│ 2. Look at string references │
│ ├─ LEA str(PC),An? → GCC │
│ └─ MOVE.L #str,An? → SAS/C │
│ 3. Look at division │
│ ├─ JSR ___divsi3? → GCC │
│ └─ DIVS.L? → SAS/C │
│ 4. Look at startup stub │
│ ├─ JSR ___main? → GCC │
│ └─ JSR _main? → SAS/C │
└─────────────────────────────────┘
```
### Pattern 2: Find All Functions in a Compiler-Specific Binary
SAS/C functions start with `LINK A5, #-N` followed by `MOVEM.L`. Search IDA for:
```
Search → Text → "LINK A5"
```
Every hit is a function entry point. Press `P` on each to create an IDA function.
GCC functions start with `SUBQ.L #N,SP` or `MOVEM.L`. Search for:
```
Search → Text → "MOVEM.L"
```
Filter to those NOT preceded by `LINK` — those are GCC leaf or non-leaf functions.
### Pattern 3: Distinguish OS Glue Code from Application Logic
OS glue (the startup stub + compiler helper functions) precedes `main()` and follows a fixed pattern:
```asm
; Universal OS glue pattern:
MOVE.L 4.W, A6 ; SysBase
; ... library opens ...
JSR _main ; application logic starts HERE
; ... library closes ...
MOVEQ #0, D0 ; return 0
RTS ; back to DOS
```
Everything before the `JSR _main` is compiler/OS glue — skip it when tracing application logic.
---
## Cross-Platform Comparison
| Amiga Concept | Win32 Equivalent | Linux ELF Equivalent | Notes |
|---|---|---|---|
| SAS/C `LINK A5` prologue | MSVC `push ebp; mov ebp, esp` | GCC `push rbp; mov rbp, rsp` | Same frame-pointer setup, different register |
| GCC PC-relative strings | Position-independent code (`/DYNAMICBASE`) | `-fPIC` + GOT-relative access | Same goal: eliminate relocations for security/performance |
| SAS/C absolute strings | Non-PIE executables | Non-PIE, absolute addresses | Relocation-heavy; simpler but slower to load |
| Compiler fingerprinting | `.rdata` section compiler strings | `.comment` ELF section | Amiga has NO embedded compiler ID — must deduce from code patterns |
| `JSR ___divsi3` (libgcc) | `__alldiv` (MSVC runtime) | `__divdi3` (libgcc) | All compilers call helper functions for complex operations |
| HUNK_RELOC32 in disassembly | PE `.reloc` section | ELF `.rela.dyn` | Same concept; Amiga relocs are embedded in the hunk stream |
---
## FAQ
### Can a single binary use multiple compilers?
Yes — and it's common. An application compiled with SAS/C may link a third-party library compiled with GCC. The startup stub and `main()` follow one compiler's pattern, but library functions (especially if statically linked) may show another compiler's fingerprints. Always identify the compiler for each code segment independently.
### What about the AmigaOS ROM itself?
The Kickstart ROM was compiled with Green Hills C (later versions) or SAS/C (earlier versions). ROM code is identifiable by its use of **absolute addresses** rather than base-relative PSI (Program Segment Independence) linking. The startup stub is absent — ROM code begins at a RomTag structure.
### How do I tell SAS/C 5.x from SAS/C 6.x?
SAS/C 6.x generates `MOVEM.L D2-D7/A2-A3` in prologues. SAS/C 5.x saves fewer registers (`MOVEM.L D2-D5`). Also, 6.x uses `LINK A5, #-.w` for small frames and `LINK A5, #-.l` for large ones; 5.x only uses the `.w` variant.
### Does this work for C++ code?
StormC is the primary Amiga C++ compiler. C++ code is identifiable by:
- `JSR ___nw__FUl` (operator new) calls
- Virtual function tables — arrays of function pointers in the data hunk
- `this` pointer in A0 (StormC convention) for method calls
- Static constructor calls in the startup sequence
## References
- SAS/C 6.x manual — code generation chapter
- GCC for m68k: https://github.com/bebbo/amiga-gcc
- VBCC manual: http://www.compilers.de/vbcc.html
- *Amiga ROM Kernel Reference Manual: Libraries* — register conventions
- [code_vs_data_disambiguation.md](code_vs_data_disambiguation.md) — distinguishing code bytes from data/variables