15 KiB
Compiler-Specific Code Generation Patterns
Overview
You've loaded a HUNK binary into IDA Pro. Before you can even begin tracing logic, you need to answer a basic question: which compiler produced this code? The answer determines everything else — whether strings are PC-relative or absolute, whether main() starts with LINK A5 or SUBQ.L #N,SP, whether DIVS.L is a compiler intrinsic or a library call.
Amiga compilers — SAS/C, GCC, VBCC, StormC, Aztec C — each leave a fingerprint in the generated assembly. These fingerprints are consistent enough that a single function prologue can identify the compiler with >90% accuracy. This article catalogs the distinguishing patterns for each major Amiga compiler and provides a systematic methodology for compiler identification from disassembly alone.
graph LR
subgraph "Source"
C["C source<br/>myapp.c"]
end
subgraph "Compiler Backend"
SAS["SAS/C<br/>LINK A5 + MOVEM<br/>absolute strings"]
G["GCC<br/>SUBQ.L + PC-relative<br/>libgcc helpers"]
VB["VBCC<br/>MOVEM only<br/>tight loops"]
end
subgraph "Disassembly Fingerprint"
IDA["IDA Pro / Ghidra"]
end
C --> SAS
C --> G
C --> VB
SAS -->|"LINK A5 #-N<br/>MOVE.L #_str,D1"| IDA
G -->|"SUBQ.L #N,SP<br/>LEA _str(PC),A0"| IDA
VB -->|"MOVEM.L D2-D4,-(SP)<br/>minimal overhead"| IDA
SAS/C 6.x Patterns
Function Prologue / Epilogue
; Non-leaf function with local vars:
LINK A5, #-N ; allocate N bytes of locals on stack
MOVEM.L D2-D7/A2-A3, -(SP) ; save preserved registers
...
MOVEM.L (SP)+, D2-D7/A2-A3
UNLK A5
RTS
; Leaf function (no locals, no preserved regs):
; — no LINK, pure computation, ends in RTS
D0 Save Pattern
SAS/C saves D0 at the start of functions that need it later:
MOVE.L D0, -(SP) ; save return value from previous call
JSR another_func
MOVE.L (SP)+, D0 ; restore
Register Argument Passing
SAS/C passes OS call args via #pragma amicall register placement. Inside application functions, SAS/C uses a stack-based C ABI (unlike OS calls):
; C function call in SAS/C: push args right-to-left
MOVE.L arg3, -(SP)
MOVE.L arg2, -(SP)
MOVE.L arg1, -(SP)
JSR _myfunction
ADDQ.L #12, SP ; clean args (caller cleanup)
String Constants
SAS/C places string literals in the data hunk, referenced via absolute addresses requiring HUNK_RELOC32:
MOVE.L #_str_hello, D1 ; absolute address → RELOC32 entry
MOVEA.L _DOSBase, A6
JSR (-48,A6) ; Write(stdout, "hello", ...)
GCC (m68k-amigaos / bebbo) Patterns
PC-Relative String Access
GCC uses PC-relative addressing by default, eliminating most HUNK_RELOC32 entries:
LEA _str_hello(PC), A0 ; PC-relative — no reloc needed
No Frame Pointer (Default)
; GCC -O2 leaf function:
MOVEM.L D2/A2, -(SP) ; only save what's used
...
MOVEM.L (SP)+, D2/A2
RTS
; No LINK/UNLK — pure register allocation
GCC Function Prologues
; Non-leaf with GCC -fno-omit-frame-pointer:
LINK A6, #-N ; note: GCC uses A6 as frame pointer here
; (conflicts with OS library base usage — rare)
; More common with -O2:
SUBQ.L #N, SP ; allocate locals without frame pointer
Integer Division / Modulo
GCC emits calls to __divsi3, __modsi3 from libgcc:
JSR ___divsi3 ; 32-bit signed divide (libgcc helper)
; operands in D0:D1, result in D0
SAS/C uses the 68k DIVS.L instruction directly (available on 020+) or DIVS.W.
VBCC Patterns
VBCC generates very tight code with minimal function overhead:
; VBCC typical function (no frame pointer, minimal saves):
MOVEM.L D2-D4, -(SP)
...
MOVEM.L (SP)+, D2-D4
RTS
VBCC's OS call inline expansion looks identical to GCC's inline-asm stubs.
StormC 3.x / 4.x Patterns
StormC was the first native Amiga C++ IDE. It used a custom frontend (based on EDG) but generated Amiga hunk output directly.
Function Prologue
; StormC typical function:
LINK A6, #-N ; StormC uses A6 as frame pointer by default
MOVEM.L D2-D7/A2-A5, -(SP) ; aggressive register save
; ...
MOVEM.L (SP)+, D2-D7/A2-A5
UNLK A6
RTS
Warning
StormC's use of
A6as a frame pointer conflicts with the OS convention ofA6= library base. In StormC-compiled code, A6 nearLINK/UNLKis a frame pointer, NOT a library base. This is the #1 misidentification cause when reversing StormC output.
Distinguishing from SAS/C
| Pattern | SAS/C | StormC |
|---|---|---|
| Frame pointer register | A5 | A6 |
| Preserved registers | D2-D7/A2-A3 | D2-D7/A2-A5 |
| Startup module | __main |
_main with C++ static constructor calls |
Aztec C 5.x Patterns
Aztec C (Manx) was a popular budget compiler in the late 1980s. Its code generation is primitive compared to SAS/C or GCC.
Distinctive Features
; Aztec C function — no LINK, uses stack offset from SP directly:
MOVE.L D2, -(SP) ; save only what's needed
...
MOVE.L (SP)+, D2
RTS
Aztec C is identifiable by absence of LINK/UNLK combined with 16-bit MOVE instructions where other compilers use 32-bit (e.g., MOVE.W D0, 4(SP) instead of MOVE.L). It also generates JSR ___ltoa and JSR ___printf calls with AZTEC-prefixed helper names.
Hand-Coded Assembly (Assembler)
Not all Amiga code came from a compiler. Demos, games, and high-performance libraries were often hand-written in assembler.
Telltale Signs
| Sign | What It Means |
|---|---|
MOVEM.L D0-D7/A0-A6, -(SP) |
No compiler saves ALL registers — this is hand-coded |
BTST #6, ($BFE001) |
Direct CIA register read — compilers go through cia.resource or graphics.library |
MOVE.W #$4000, ($DFF09A) |
Direct custom chip register write — compilers use OS functions |
LEA _copperlist(PC), A0 + MOVE.L A0, ($DFF080) |
Hardware banging with PC-relative addressing |
MOVE SR, D0 / ANDI #$F8FF, SR |
Supervisor mode toggling — no compiler generates this |
| Missing startup stub | No MOVE.L 4.W, A6 — the code runs bare-metal |
If you see direct register pokes to $DFFxxx or $BFExxx without any OS library calls, you're looking at hand-coded assembly, and standard compiler identification doesn't apply.
Distinguishing Compiler Artefacts from Logic
| Pattern | Compiler | Meaning |
|---|---|---|
LINK A5, #-N |
SAS/C | Function with locals |
LINK A6, #-N |
GCC (rare) | Frame pointer mode |
JSR ___divsi3 |
GCC | Software 32-bit division |
DIVS.L D1, D0 |
SAS/C (020+) | Hardware divide |
MULS.L D1, D0 |
SAS/C (020+) | Hardware multiply |
LEA str(PC), A0 |
GCC | PC-relative string ref |
MOVE.L #_str, D1 |
SAS/C | Absolute string ref (reloc'd) |
JSR _main |
Startup | C main() entry point |
MOVE.L 4.W, A6 |
Startup | SysBase load |
JSR -552(A6) |
Any | exec.library OpenLibrary |
Locating main() via Startup Skip
After identifying the startup stub (MOVE.L 4.W, A6 → JSR _OpenLibraries):
- Find the first
JSRorBSRafter library opens - That target is
__mainor directly_main - If
__main: follow its internalJSR _maincall - Label the target
mainin IDA
Decision Guide — Compiler Identification Flowchart
graph TD
Q["Unknown function<br/>prologue seen"]
Q -->|"LINK A5, #-N"| SAS["SAS/C 6.x<br/>confirmed"]
Q -->|"LINK A6, #-N"| BRANCH["Check register save"]
BRANCH -->|"Saves A2-A5"| STORM["StormC<br/>confirmed"]
BRANCH -->|"Saves A2-A3"| GCC_RARE["GCC with<br/>-fno-omit-frame-pointer"]
Q -->|"SUBQ.L #N,SP<br/>or MOVEM only"| BRANCH2["Check string refs"]
BRANCH2 -->|"LEA str(PC), A0"| GCC["GCC (bebbo)<br/>confirmed"]
BRANCH2 -->|"MOVE.L #_str, D1"| SAS2["SAS/C (leaf<br/>function)"]
Q -->|"No LINK/UNLK<br/>16-bit MOVEs"| AZTEC["Aztec C 5.x<br/>confirmed"]
Q -->|"Direct $DFFxxx<br/>register writes"| ASM["Hand-coded<br/>assembly"]
| Clue | Compiler | Confidence |
|---|---|---|
LINK A5, #-N + MOVEM.L D2-D7/A2-A3 |
SAS/C 6.x | >95% |
LINK A6, #-N + MOVEM.L D2-D7/A2-A5 |
StormC | >90% |
SUBQ.L #N, SP + LEA (PC), An |
GCC (bebbo) | >95% |
JSR ___divsi3 / JSR ___modsi3 |
GCC (bebbo) | 100% |
DIVS.L with no JSR |
SAS/C 6.x | >90% |
MOVE.W D0, 4(SP) (16-bit stack ops) |
Aztec C | >80% |
Direct $DFFxxx write, no OS calls |
Assembler | 100% |
MOVEM.L D0-D7/A0-A6 |
Assembler | 100% |
Named Antipatterns
1. "The Frame Pointer Confusion"
What it looks like — assuming A6 always holds a library base in StormC-compiled code:
LINK A6, #-24 ; A6 is now a FRAME POINTER, not a library base
MOVEM.L D2-D5/A2-A5, -(SP)
MOVEA.L (_DOSBase).L, A6 ; NOW A6 is a library base — but LINK changed it
JSR (-30,A6) ; this works only because A6 was reloaded
Why it fails: StormC uses A6 as the C frame pointer. Between LINK A6 and the library base reload, A6 points to the stack frame, not a library. Any JSR (-N,A6) in that window hits the stack as a fake "JMP table" and crashes.
Correct: In StormC output, always verify that A6 was reloaded from a known library global before treating JSR (-N,A6) as a library call.
2. "The String Reloc Mirage"
What it looks like — seeing MOVE.L #$XXXXXXXX, D1 and assuming it's an immediate value when it's actually a relocation:
MOVE.L #$00001234, D1 ; in the raw binary, this is $00001234
; after HUNK_RELOC32, it becomes actual string addr
JSR (-48,A6) ; Write(stdout, ???)
Why it fails: Without parsing HUNK_RELOC32 entries, #$00001234 looks like a constant. But it's a placeholder that exec replaces with the actual address at load time. You can't know what string it points to from static analysis alone — you need to read the relocation target.
Correct: Always cross-reference HUNK_RELOC32 entries (see hunk_reconstruction.md) before interpreting MOVE.L #immediate as a value in SAS/C output.
Use-Case Cookbook
Pattern 1: Identify the Compiler from a Single Function
┌─────────────────────────────────┐
│ 1. Look at function prologue │
│ ├─ LINK A5? → SAS/C │
│ ├─ LINK A6? → StormC │
│ ├─ SUBQ.L #N,SP? → GCC │
│ └─ None? → Continue │
│ 2. Look at string references │
│ ├─ LEA str(PC),An? → GCC │
│ └─ MOVE.L #str,An? → SAS/C │
│ 3. Look at division │
│ ├─ JSR ___divsi3? → GCC │
│ └─ DIVS.L? → SAS/C │
│ 4. Look at startup stub │
│ ├─ JSR ___main? → GCC │
│ └─ JSR _main? → SAS/C │
└─────────────────────────────────┘
Pattern 2: Find All Functions in a Compiler-Specific Binary
SAS/C functions start with LINK A5, #-N followed by MOVEM.L. Search IDA for:
Search → Text → "LINK A5"
Every hit is a function entry point. Press P on each to create an IDA function.
GCC functions start with SUBQ.L #N,SP or MOVEM.L. Search for:
Search → Text → "MOVEM.L"
Filter to those NOT preceded by LINK — those are GCC leaf or non-leaf functions.
Pattern 3: Distinguish OS Glue Code from Application Logic
OS glue (the startup stub + compiler helper functions) precedes main() and follows a fixed pattern:
; Universal OS glue pattern:
MOVE.L 4.W, A6 ; SysBase
; ... library opens ...
JSR _main ; application logic starts HERE
; ... library closes ...
MOVEQ #0, D0 ; return 0
RTS ; back to DOS
Everything before the JSR _main is compiler/OS glue — skip it when tracing application logic.
Cross-Platform Comparison
| Amiga Concept | Win32 Equivalent | Linux ELF Equivalent | Notes |
|---|---|---|---|
SAS/C LINK A5 prologue |
MSVC push ebp; mov ebp, esp |
GCC push rbp; mov rbp, rsp |
Same frame-pointer setup, different register |
| GCC PC-relative strings | Position-independent code (/DYNAMICBASE) |
-fPIC + GOT-relative access |
Same goal: eliminate relocations for security/performance |
| SAS/C absolute strings | Non-PIE executables | Non-PIE, absolute addresses | Relocation-heavy; simpler but slower to load |
| Compiler fingerprinting | .rdata section compiler strings |
.comment ELF section |
Amiga has NO embedded compiler ID — must deduce from code patterns |
JSR ___divsi3 (libgcc) |
__alldiv (MSVC runtime) |
__divdi3 (libgcc) |
All compilers call helper functions for complex operations |
| HUNK_RELOC32 in disassembly | PE .reloc section |
ELF .rela.dyn |
Same concept; Amiga relocs are embedded in the hunk stream |
FAQ
Can a single binary use multiple compilers?
Yes — and it's common. An application compiled with SAS/C may link a third-party library compiled with GCC. The startup stub and main() follow one compiler's pattern, but library functions (especially if statically linked) may show another compiler's fingerprints. Always identify the compiler for each code segment independently.
What about the AmigaOS ROM itself?
The Kickstart ROM was compiled with Green Hills C (later versions) or SAS/C (earlier versions). ROM code is identifiable by its use of absolute addresses rather than base-relative PSI (Program Segment Independence) linking. The startup stub is absent — ROM code begins at a RomTag structure.
How do I tell SAS/C 5.x from SAS/C 6.x?
SAS/C 6.x generates MOVEM.L D2-D7/A2-A3 in prologues. SAS/C 5.x saves fewer registers (MOVEM.L D2-D5). Also, 6.x uses LINK A5, #-.w for small frames and LINK A5, #-.l for large ones; 5.x only uses the .w variant.
Does this work for C++ code?
StormC is the primary Amiga C++ compiler. C++ code is identifiable by:
JSR ___nw__FUl(operator new) calls- Virtual function tables — arrays of function pointers in the data hunk
thispointer in A0 (StormC convention) for method calls- Static constructor calls in the startup sequence
References
- SAS/C 6.x manual — code generation chapter
- GCC for m68k: https://github.com/bebbo/amiga-gcc
- VBCC manual: http://www.compilers.de/vbcc.html
- Amiga ROM Kernel Reference Manual: Libraries — register conventions
- code_vs_data_disambiguation.md — distinguishing code bytes from data/variables