32 KiB
ANSI C Reverse Engineering — Reconstructing C from m68k Assembly
Overview
The vast majority of Amiga applications and libraries were written in C — SAS/C, GCC, VBCC, StormC, and Aztec C dominated the ecosystem from 1988 onward. Reversing C binaries means recognizing the language semantics underneath the compiler's code generation: struct field access patterns, switch-statement jump tables, malloc/free lifetime tracking, and control-flow reconstruction. Unlike hand-written assembly, C binaries leave a rich trail of standard-library calls, predictable stack-frame layouts, and relocatable data references that serve as anchors for reconstructing the original source-level intent.
C reverse engineering on Amiga has one huge advantage over other platforms: the OS itself is written in C (with assembly for hot paths). Nearly every data structure — struct Task, struct MsgPort, struct IORequest — is publicly documented in the NDK headers. When you see MOVE.L $1C(A0), D0 and A0 is a library base, offset $1C is the lib_OpenCnt field. This tight coupling between disassembly patterns and known OS structures makes Amiga C RE uniquely tractable.
graph TB
subgraph "C Source (.c)"
STRUCT["struct Gadget { ... }"]
SWITCH["switch(expr) { case: }"]
MALLOC["p = AllocMem(size);"]
LOOP["for(i=0; i<n; i++)"]
CALLBACK["Hook callback"]
end
subgraph "m68k Disassembly"
OFF["MOVE.L (A0), D0<br/>MOVE.L $1C(A0), D1<br/>→ struct field access at offsets"]
JT["MOVE.W offset(PC,Dn.W), D0<br/>JMP (PC, D0.W)<br/>→ switch jump table"]
ALLOC["JSR -$C6(A6)<br/>→ AllocMem() LVO"]
DBCC["MOVEQ #n-1, D0<br/>loop: ... DBRA D0, loop"]
HOOK["h_Entry: MOVE.L A2, -(SP)<br/>→ hook dispatch"]
end
STRUCT --> OFF
SWITCH --> JT
MALLOC --> ALLOC
LOOP --> DBCC
CALLBACK --> HOOK
Architecture: C-to-Assembly Mapping
The Standard C ABI on Amiga
| Concern | Convention | Notes |
|---|---|---|
| Return value | D0 (32-bit), D0/D1 (64-bit), or hidden pointer in A0 | Struct returns: caller allocates space, passes pointer in A0 |
| Scratch registers | D0, D1, A0, A1 | Caller-saved; callee may destroy |
| Preserved registers | D2–D7, A2–A6 | Callee must save/restore if used |
| Frame pointer | A5 (SAS/C) or A6 (GCC with -fomit-frame-pointer skips this) |
Used for local variable and argument access |
| Stack growth | Toward lower addresses | LINK A5, #-N allocates N bytes |
| Library base | A6 | Set to library base before JSR LVO(A6) |
| Argument passing | D0–D7, then stack (right-to-left push for SAS/C) | See register_conventions.md |
SAS/C Calling Convention Variants
SAS/C supports multiple calling conventions within a single binary. Recognizing them is essential for correct function boundary and parameter analysis:
| Convention | Keyword | Register Save | Parameter Passing | Prologue Pattern |
|---|---|---|---|---|
| Standard | __stdargs (default) |
D2–D7, A2–A5 | D0, D1, then stack (right-to-left) | LINK A5, #-N / MOVEM.L D2-D7/A2-A4, -(SP) |
| Register args | __reg / __regargs |
D2–D7, A2–A5 | First 2 integer args in D0, D1; rest on stack | Same as standard but D0/D1 hold parameters |
| Save all | __saveds |
D2–D7, A2–A6 (every non-scratch register) | All on stack | MOVEM.L D2-D7/A2-A6, -(SP) at entry — distinctive 13-register save |
| Interrupt | __interrupt |
D0–D7, A0–A6 (all registers) | All on stack | Full register save + RTE instead of RTS |
| No stack check | __no_stack_check |
Varies | Varies | Omits the stack overflow check call at function entry |
How to identify in disassembly:
; __saveds function (typically used for interrupt handlers and callbacks):
_saveds_func:
MOVEM.L D2-D7/A2-A6, -(SP) ; 13 registers saved = __saveds signature
; ... function body ...
MOVEM.L (SP)+, D2-D7/A2-A6 ; restore
UNLK A5
RTS
; __reg function (fastcall — first args in registers):
_reg_func:
LINK A5, #-local_size
MOVEM.L D2-D4, -(SP) ; may save fewer registers
; D0 = first argument, D1 = second argument
; ... function body ...
; __stdargs function (standard C calling convention):
_std_func:
LINK A5, #-local_size
MOVEM.L D2-D7/A2-A4, -(SP) ; standard SAS/C save set
; Arguments on stack: (A5+8) = arg1, (A5+12) = arg2, ...
Compiler-Specific Frame Layouts
| Compiler | Frame Pointer | Prologue | Epilogue | String Addressing |
|---|---|---|---|---|
| SAS/C 6.x | A5 | LINK A5, #-N / MOVEM.L D2-D7/A2-A4, -(SP) |
MOVEM.L (SP)+, D2-D7/A2-A4 / UNLK A5 / RTS |
Absolute (MOVE.L #string, D1) |
| GCC 2.95.x | A6 (optional) | LINK A6, #-N or SUBQ.L #N, SP |
UNLK A6 / RTS or ADDQ.L #N, SP / RTS |
PC-relative (LEA string(PC), A0) |
| VBCC | None (typical) | MOVEM.L D2-D4, -(SP) (only used regs) |
MOVEM.L (SP)+, D2-D4 / RTS |
PC-relative |
| StormC | A5 | LINK A5, #-N |
UNLK A5 / RTS |
Absolute (similar to SAS/C) |
| Aztec C | A5 | LINK A5, #-N / MOVEM.L D3-D7, -(SP) |
MOVEM.L (SP)+, D3-D7 / UNLK A5 / RTS |
Absolute |
Common C Constructs → Assembly
| C Construct | Typical m68k Pattern |
|---|---|
x = y + z |
MOVE.L y(FP), D0 / ADD.L z(FP), D0 / MOVE.L D0, x(FP) |
if (cond) |
TST.L cond / BEQ skip |
if (!ptr) |
MOVE.L ptr, D0 / BEQ null_case |
for (i=0; i<n; i++) |
MOVEQ #0, D7 / loop: ... / ADDQ.L #1, D7 / CMP.L n, D7 / BLT loop |
while (*p++) |
MOVE.L (A0)+, D0 / BNE loop (combined load+increment+test) |
switch (x) |
CMP chain for sparse cases; MOVE.W jt(PC, Dn.W), D0 / JMP (PC, D0.W) for dense |
struct->field |
MOVE.L $offset(A0), D0 — offset matches sizeof of preceding fields |
array[i] |
MOVE.L #array, A0 / MOVE.L i, D0 / ASL.L #2, D0 / MOVE.L 0(A0, D0.W), D1 |
malloc(size) → AllocMem |
MOVE.L size, D0 / MOVE.L #MEMF_CLEAR, D1 / JSR -$C6(A6) |
free(ptr) → FreeMem |
MOVE.L ptr, A1 / MOVE.L size, D0 / JSR -$D2(A6) |
do { ... } while (cond) |
loop: ... / TST cond / BNE loop (test at bottom) |
goto label |
BRA label (unconditional) |
setjmp / longjmp |
JSR _setjmp / JSR _longjmp — saves/restores all registers + SP |
fn_ptr(args) (function pointer call) |
MOVE.L fn_ptr, A0 / JSR (A0) |
printf(fmt, ...) |
Push args right-to-left, JSR _printf — no LVO, direct lib call |
sprintf / strcpy chain |
Repeated MOVE.B (A0)+, (A1)+ with null termination check |
memcpy (large) |
MOVE.L (A0)+, (A1)+ / SUBQ.L #1, D0 / BNE loop |
memset (zero/pattern fill) |
MOVE.L D0, (A0)+ / loop |
strcmp / strncmp |
CMPM.B (A0)+, (A1)+ / DBNE D0, loop |
bsearch / custom binary search |
Midpoint calculation via ADD.L/ASR.L, compare, branch |
qsort callback |
Passes comparison function pointer; calls JSR (A2) per comparison |
static local variable |
Stored in DATA hunk (not stack); accessed via absolute or PC-relative addressing |
const global (read-only data) |
May be placed in CODE hunk alongside instructions |
volatile access |
Generates separate load/store for each access; never optimizes across register reuse |
The BPTR: AmigaOS's Unique Pointer Type
BPTR (Byte Pointer) is a legacy from BCPL/Tripos that persists throughout AmigaOS. Understanding it is essential for DOS-related reverse engineering:
/* BPTR definition from NDK headers:
* A BPTR stores a word-aligned address shifted right by 2 bits.
* BADDR(bptr) converts BPTR → real address: bptr << 2
* MKBADDR(addr) converts real address → BPTR: addr >> 2
*/
#define BADDR(bptr) ((APTR)((ULONG)(bptr) << 2))
#define MKBADDR(addr) ((BPTR)((ULONG)(addr) >> 2))
In disassembly:
; DOS call returning a BPTR (e.g., Lock() returns a BPTR file lock):
JSR -$54(A6) ; Lock(name, mode) — returns BPTR in D0
; D0 now contains a BPTR, NOT a usable address!
MOVE.L D0, lock_bptr(FP) ; store BPTR
; Later, to use this BPTR with another DOS call:
MOVE.L lock_bptr(FP), D1 ; pass BPTR directly to Examine(), UnLock(), etc.
JSR -$66(A6) ; UnLock(bptr) — accepts BPTR directly
; To dereference a BPTR to access the underlying struct:
MOVE.L lock_bptr(FP), D0 ; D0 = BPTR
LSL.L #2, D0 ; D0 = real address (BPTR << 2)
MOVE.L D0, A0 ; A0 = real address of FileLock struct
; Now you can access A0->fl_Key, A0->fl_Volume, etc.
Key RE identification:
LSR.L #2, Dnbefore a memory access = BPTR → address conversion (MKBADDR)LSL.L #2, Dnbefore a DOS call = address → BPTR conversion- BPTRs are passed directly to DOS library calls without conversion (the library does the conversion internally)
- Common BPTR uses: file locks, directory locks, seglists (loaded executables), DOS process handles
Warning
Mistaking a BPTR for a real pointer and dereferencing it without the
<< 2conversion will access the wrong address — 4× lower than intended. This is one of the most common errors in Amiga C RE.
AmigaOS-Specific C Patterns
| OS Pattern | Disassembly Signature |
|---|---|
Hook callback (struct Hook) |
h_Entry: MOVE.L A2, -(SP) / ... / RTS — A2=object, A1=message, A0=hook |
Tag list processing (TagItem *) |
loop: MOVE.L (A0)+, D0 / BEQ end — iterate ti_Tag+ti_Data pairs until TAG_DONE (0) |
| BPTR dereference | LSL.L #2, D0 (BPTR→address) or LSR.L #2, D0 (address→BPTR) — see BPTR section above |
| LVO dispatch | JSR -$XXX(A6) — library vector table call; offset encodes function |
| Forbid/Permit pairs | JSR -$84(A6) (Forbid) / JSR -$8A(A6) (Permit) — critical section markers |
| Signal wait loops | MOVE.L sigmask, D0 / JSR -$13E(A6) (Wait) — blocking on signal bits |
| Message port patterns | JSR -$180(A6) (PutMsg) / JSR -$174(A6) (GetMsg) / JSR -$17A(A6) (WaitPort) |
| Exec list traversal | MOVE.L (A0), A0 — follow ln_Succ (offset $00); struct Node / struct List iteration |
| Device I/O | MOVE.L io, A1 / JSR -$1C8(A6) (DoIO) or JSR -$1CE(A6) (SendIO) + wait |
| Resource tracking | JSR -$1E6(A6) (OpenResource) followed by resource-specific dispatch |
Decision Guide: C Binary Analysis Workflow
graph TD
BIN["C binary loaded"]
ID_COMPILER["Identify compiler<br/>(fingerprints)"]
MAP_LIBS["Map library calls<br/>via LVO xrefs"]
FIND_MAIN["Find main() or<br/>entry point"]
RECONSTRUCT["Reconstruct types,<br/>structs, locals"]
ANNOTATE["Annotate IDA database"]
VERIFY["Dynamic verification<br/>(FS-UAE debugger)"]
BIN --> ID_COMPILER
ID_COMPILER --> MAP_LIBS
MAP_LIBS --> FIND_MAIN
FIND_MAIN --> RECONSTRUCT
RECONSTRUCT --> ANNOTATE
ANNOTATE --> VERIFY
When to Use C-Focused RE vs Alternatives
| Scenario | Approach |
|---|---|
Binary has LINK A5 / JSR LVO(A6) patterns |
Standard C RE (this article) |
| Binary has no library calls, direct hardware access | See asm68k_binaries.md |
Binary has __vtbl references, new/delete patterns |
See cpp_vtables_reversing.md |
| Binary is from AMOS, Blitz, or other non-C language | See other_languages.md |
| Binary is packed/crunched | Unpack first; then re-evaluate |
| Binary is a shared library (.library) | Standard C RE + library structure analysis (RomTag, JMP table, MakeLibrary) |
Methodology
Phase 1: Compiler Identification
Before anything else, determine the compiler. The register conventions, string addressing mode, and library call patterns differ substantially between SAS/C, GCC, and VBCC. See compiler_fingerprints.md and m68k_codegen_patterns.md for the complete catalog.
Phase 2: Library Call Anchoring
Every JSR LVO(A6) is an anchor point. Cross-reference the LVO offset against the exec LVO table or dos LVO table. Once you know the function:
- Input: D0–D7 and stack arguments tell you the parameter types
- Output: D0 return value tells you what was computed
- Context: The surrounding code tells you why the call was made
Phase 3: Struct Reconstruction
C struct access patterns are systematic: MOVE.L $08(A0), D0 then MOVE.L $0C(A0), D1 — repeated offsets that don't overlap suggest struct fields. See struct_recovery.md for the complete methodology.
Phase 4: Call Graph Reconstruction
- Identify function boundaries: LINK/UNLK pairs, SUBQ/ADDQ pairs, or standalone RTS-terminated blocks
- Build caller-callee matrix: Every JSR target becomes a node; every JSR source is an edge
- Resolve indirect calls:
JSR (A0)where A0 was loaded from a vtable or function pointer table - Group by library context: Which library is A6 set to before each JSR LVO block?
Phase 5: Type Inference
- Pointer vs integer: A value used as a base register for offset addressing is a pointer. A value only used in arithmetic is an integer.
- Signed vs unsigned:
BLT/BGEafter compare = signed;BCS/BCCafter operation = unsigned - Struct pointer typing: Consistent offset patterns (+$00, +$04, +$08...) with known library struct sizes reveal the type
- BPTR detection:
LSR.L #2before use as address = BPTR (BCPL byte pointer)
Phase 6: Dynamic Verification
Tool-Specific Workflows
IDA Pro
Ghidra
FS-UAE Debugger
Best Practices
- Identify the compiler before anything else — it determines your prologue/epilogue patterns, string addressing mode, and register conventions
- Map library calls first — every
JSR LVO(A6)is a documented function with known parameters; use this to type inputs and outputs - Reconstruct structs from offset patterns — consistent offset sequences reveal field layout
- Cross-reference NDK headers — AmigaOS structs are publicly documented; match your discovered offsets to known structures
- Use the relocation table —
HUNK_RELOC32entries tell you exactly which absolute addresses are inter-hunk references - Track A6 assignments — each library call block sets A6 to a specific library base; identify which library is in use
- Decompile library calls to C prototypes — rename
JSR -$C6(A6)toAllocMem()in IDA, notsub_1234 - Verify with dynamic analysis — breakpoint on suspicious code paths in FS-UAE to confirm your static analysis
- Document register conventions per function — build a register map to catch type errors early
- Leverage HUNK_SYMBOL debug info — if present, it gives you function names and sometimes local variable names
Antipatterns
1. The Global Confusion
Wrong: Treating every absolute address as a global variable.
Why: SAS/C uses absolute addressing for globals (relocated at load), GCC uses PC-relative, and some addresses are actually hardware registers. Confusing $DFF000 with a C global variable leads to nonsense decompilation.
2. The Void Pointer Over-Generalization
Wrong: Marking all unknown pointers as void *.
Why: Without type information, you lose the ability to see struct field access patterns. A pointer that's always offset by +$08, +$0C, +$1C is almost certainly a typed struct pointer.
3. The Missing Return
Wrong: Assuming every RTS marks the end of a meaningful function.
Why: Compilers sometimes tail-duplicate, merge epilogues, or generate multiple return points. A single C function may produce 3–5 RTS instructions in the assembly.
4. The Single-Library Assumption
Wrong: Assuming A6 always holds the same library base throughout the program.
Why: Real C programs switch A6 between exec, dos, intuition, graphics, and custom libraries. A JSR -$C6(A6) at one point in the code may call AllocMem (exec), while the same JSR -$C6(A6) after an A6 switch calls something entirely different. You must track A6 reloads.
5. The BPTR Blindness
Wrong: Treating LSR.L #2, D0 / MOVE.L (A0) based on D0 as a confusing bit-shift.
Why: BCPL legacy: AmigaOS uses byte pointers (BPTRs) for file handles, locks, and DOS structures. The LSR.L #2 converts a BPTR (shifted by 2 for historical reasons) to a real word-aligned address. Missing this means you misidentify DOS API call results.
6. The Tag List Blind Spot
Wrong: Seeing a loop that processes (A0)+ pairs and dismissing it as a custom iterator.
Why: Tag lists (TagItem arrays of ti_Tag/ti_Data pairs terminated by TAG_DONE=0) are used pervasively in AmigaOS. This is one of the most common patterns in Amiga C and a strong indicator you're looking at an OS API call setup.
7. The Signal Confusion
Wrong: Assuming a Wait() call with a magic constant is waiting on a single event.
Why: Signal bits are allocated dynamically via AllocSignal(). A MOVE.L #$00001000, D0 / JSR -$13E(A6) (Wait) doesn't tell you what it's waiting for unless you trace where that signal bit was allocated and who sends it.
8. The Inline Copy Assumption
Wrong: Identifying every MOVE.L (A0)+, (A1)+ loop as a custom memcpy.
Why: Compilers inline memcpy for small fixed sizes, but the same pattern also appears in struct copy operations, array initialization, and DMA buffer filling. The context (source/destination, loop count, surrounding code) tells you which.
Pitfalls
1. Register Variable Aliasing
SAS/C with __register or GCC with register keyword may keep variables in registers across function calls, breaking the standard "arguments go on stack" mental model.
2. Inlined memcpy / strcpy
Compilers often inline small copies as MOVE.L (A0)+, (A1)+ loops. These look like custom struct copy routines but are really compiler-generated memcpy.
3. Structure Padding
The m68k ABI aligns struct fields naturally: UWORD at even addresses, ULONG at multiples of 4. Compiler-inserted padding bytes create gaps in the offset sequence that can confuse field counting.
4. Compiler Optimizations That Break Pattern Recognition
5. Library Base Switching
6. Mixed C and Assembly in the Same Binary
7. SAS/C __saveds vs __stdargs vs __reg Calling Conventions
8. GCC __asm__ Inline Assembly Blocks
9. BSS vs DATA Confusion
10. CLI vs WB Startup Path
11. SAS/C #pragma Anomalies
Use-Case Cookbook
Pattern 1: Identifying main() Across Compilers
Pattern 2: Reconstructing a struct List Traversal
Pattern 3: Mapping AllocMem/FreeMem Pairs to Find Memory Leaks
Pattern 4: Recovering switch Statement Cases
Pattern 5: Reconstructing a Hook Callback Dispatch Chain
Pattern 6: Identifying Open/Close Resource Pairs
Pattern 7: Recovering the Startup Code Chain
Pattern 8: Tracing Tag List Construction
Pattern 9: Decompiling Device I/O Sequences
Pattern 10: Reconstructing a Message Port Protocol
Pattern 11: Identifying SAS/C __saveds Functions
Pattern 12: Differentiating printf Variants from Disassembly
Real-World Examples
Applications
Libraries
Games
Cross-Platform Comparison
| Platform | C RE Challenge | Amiga Difference |
|---|---|---|
| DOS (Watcom/Borland) | Segment juggling, near/far pointers | Amiga flat 32-bit address space simplifies pointer tracking |
| Mac OS (MPW C) | A5-world jump table, segmented loader | Amiga A6 per-library base is more modular |
| Unix (GCC) | Position-independent code (PIC), PLT/GOT | Amiga executables are non-PIC; relocations are explicit |
| Windows (MSVC) | __stdcall vs __cdecl, SEH frames |
Amiga has single ABI; no calling convention variants |
| Embedded ARM | Thumb interworking, constant pools | m68k has no Thumb equivalent; constants are inline |
| Linux (GCC) | vDSO, IFUNC resolvers, symbol versioning | Amiga OS calls are flat JMP table; no symbol versioning |
| Classic Mac OS (CodeWarrior) | Transition vectors, mixed 68k/PPC | Amiga 68k is simpler; no mixed-ISA binaries until PowerUP/WarpOS |
Historical Context — Why C Dominated Amiga Development
Modern Analogies
| Amiga C Concept | Modern Analogy | Where It Holds / Breaks |
|---|---|---|
| LVO dispatch table | Dynamic linker PLT/GOT | Holds: indirect function call table; breaks: LVO is static ABI, PLT is runtime-resolved |
| A6 library base | this pointer / vtable dispatch |
Holds: base register for method/function lookup; breaks: A6 is shared, this is per-object |
| TagItem arrays | Named parameters / option structs in C | Holds: extensible key-value config passing; breaks: TagItems are untyped until consumed |
| BPTR | Handle / opaque pointer | Holds: abstracted pointer type; breaks: BPTR carries encoding (>>2), modern handles are transparent |
OpenLibrary with version |
dlopen with version check |
Holds: runtime library loading; breaks: Amiga libraries are shared singletons |
Forbid/Permit |
spin_lock / mutex_lock |
Holds: critical section entry/exit; breaks: Forbid disables ALL multitasking, not just one resource |
| Resident modules (RomTag) | Shared library constructors / .init_array |
Holds: auto-initialized code at load time; breaks: RomTags are persistent kernel objects |
FAQ
Q1: How do I tell SAS/C from GCC output without looking at strings?
Q2: Why does this function have no LINK but accesses locals?
Q3: How do I recover the original struct field names?
Q4: How do I identify which library A6 currently points to?
Q5: Why are there two different calling conventions in the same binary?
Q6: How do I identify printf format strings in DATA?
Q7: How do I decompile a BPTR-based DOS call sequence?
Q8: What does JSR -$1CE(A6) mean without the LVO table?
Q9: How do I handle programs compiled with Lattice C (pre-SAS/C)?
Q10: How do I identify custom AllocMem wrappers?
Q11: How do I tell if a struct is from the OS or custom?
Q12: What are the SAS/C #pragma directives that affect codegen?
FPGA / Emulation Impact
References
- compiler_fingerprints.md — Compiler identification
- Per-Compiler RE Field Manuals — In-depth per-compiler analysis:
- m68k_codegen_patterns.md — Code generation idiom catalog
- struct_recovery.md — Struct layout reconstruction
- api_call_identification.md — Library call recognition
- asm68k_binaries.md — Hand-written assembly RE
- cpp_vtables_reversing.md — C++ OOP RE
- register_conventions.md — m68k ABI on AmigaOS
- library_structure.md — Library internals and JMP table layout
- startup_code.md — CLI vs WB startup
- exec_lvo_table.md — exec.library LVO offsets
- dos_lvo_table.md — dos.library LVO offsets
- Amiga ROM Kernel Reference Manual: Libraries — Function signatures
- Amiga ROM Kernel Reference Manual: Devices — Device I/O protocol