amiga-bootcamp/05_reversing/static/compilers/README.md

103 lines
5.7 KiB
Markdown
Raw Permalink Normal View History

[← Home](../../../README.md) · [Reverse Engineering](../../README.md) · [Static Analysis](../README.md)
# Per-Compiler Reverse Engineering — Binary Field Manuals
## Overview
This section provides **compiler-specific reverse engineering field manuals**. Each article answers one question: *"I have a binary produced by this compiler — what does it look like in IDA/Ghidra, and how do I read it?"* Rather than discussing compiler usage (see [13_toolchain](../../../13_toolchain/README.md) for that), these articles focus exclusively on **binary output**: hunk naming conventions, prologue/epilogue patterns, stack frame layouts, string addressing modes, startup code, optimization patterns, and debug info formats.
Every article includes the **same C function compiled by each compiler** — a side-by-side comparison that reveals exactly how `for` loops, `switch` statements, struct access, and AmigaOS library calls differ at the assembly level.
## Compiler Identification Decision Flowchart
```mermaid
graph TD
BIN["m68k binary loaded in disassembler"]
HUNK{"Hunk names?"}
L_A5{"LINK A5 present?"}
L_A6{"LINK A6 present?"}
STR_ABS{"String addressing?"}
REG_SAVE{"MOVEM.L save set size?"}
FP_DEFAULT{"Default frame pointer?"}
BIN --> HUNK
HUNK -->|"CODE/DATA/BSS"| L_A5
HUNK -->|".text/.data/.bss"| L_A6
HUNK -->|"CODE/DATA + __MERGED"| VBCC["→ VBCC"]
HUNK -->|"Custom prefix"| STORMC["→ StormC"]
L_A5 -->|"Yes"| STR_ABS
L_A5 -->|"No, LINK absent"| FP_DEFAULT
STR_ABS -->|"Absolute (MOVE.L #str,Dn)"| SASC["→ SAS/C"]
STR_ABS -->|"PC-relative (LEA str(PC))"| DICE["→ DICE C"]
L_A6 -->|"Yes"| GCC["→ GCC 2.95.x"]
L_A6 -->|"No, LINK absent"| VBCC2["→ VBCC"]
FP_DEFAULT -->|"None (only used regs saved)"| VBCC3["→ VBCC"]
FP_DEFAULT -->|"A5 frame pointer"| AZTEC["→ Aztec C / Lattice C"]
REG_SAVE -->|"D2-D7/A2-A4 (9 regs)"| SASC2["→ SAS/C"]
REG_SAVE -->|"D3-D7 (5 regs)"| AZTEC2["→ Aztec C"]
REG_SAVE -->|"D2-D7/A2-A6 (11 regs)"| GENSASC["→ SAS/C __saveds"]
REG_SAVE -->|"Minimal, per-function"| VBCC4["→ VBCC"]
```
## Quick Identification Matrix
| Criterion | SAS/C 6.x | GCC 2.95.x | VBCC | StormC | Aztec C | Lattice C | DICE C |
|---|---|---|---|---|---|---|---|
| **Hunk names** | `CODE`, `DATA`, `BSS` | `.text`, `.data`, `.bss` | `CODE`, `DATA`, `BSS` + `__MERGED` | `CODE`, `DATA` (Amiga standard) | `CODE`, `DATA`, `BSS` | `CODE`, `DATA`, `BSS` | `CODE`, `DATA`, `BSS` |
| **Frame pointer** | A5 (`LINK A5, #-N`) | A6 (or none with `-fomit-frame-pointer`) | None (rarely A5) | A5 (`LINK A5, #-N`) | A5 (`LINK A5, #-N`) | A5 (`LINK A5, #-N`) | None typically |
| **String addressing** | Absolute + relocated | PC-relative | PC-relative | Absolute | Absolute | Absolute | PC-relative |
| **Register save set** | D2-D7/A2-A4 (9 regs) | D2-D3/A2 (per-function) | Only used regs | D2-D7/A2-A4 (9 regs) | D3-D7 (5 regs) | D2-D5/A2-A3 | Per-function |
| **Startup entry** | `_start` / `c.o` | `_start` / `libnix` | `_start` / `startup.o` | `_STORM_` prefix | `_start` / `aztec.o` | `_start` / `lc.o` | `_mainCRTStartup` |
| **Library call style** | `JSR -$XXX(A6)` after loading global | `JSR -$XXX(A6)` with tighter code | `JSR -$XXX(A6)` via `__reg()` | `JSR -$XXX(A6)` SAS/C-like | `JSR -$XXX(A6)` | `JSR -$XXX(A6)` | `JSR -$XXX(A6)` |
| **Era** | 19881996 | 1995present | 1995present | 19962000 | 19851992 | 19851989 | 19921995 |
| **RE article** | [sasc.md](sasc.md) | [gcc.md](gcc.md) | [vbcc.md](vbcc.md) | [stormc.md](stormc.md) | [aztec_c.md](aztec_c.md) | [lattice_c.md](lattice_c.md) | [dice_c.md](dice_c.md) |
## Articles
| File | Compiler | Key RE Distinguishing Feature |
|---|---|---|
| [sasc.md](sasc.md) | SAS/C 5.x/6.x | `LINK A5` + 9-register MOVEM.L save — the most common Amiga C prologue |
| [gcc.md](gcc.md) | GCC 2.95.x | `LINK A6` (or no frame pointer) + PC-relative strings + `__CTOR_LIST__`/`__DTOR_LIST__` arrays |
| [vbcc.md](vbcc.md) | VBCC | No frame pointer + per-function register save + `__reg()` calling convention + `__MERGED` hunks |
| [stormc.md](stormc.md) | StormC / StormC++ | A5 frame pointer + C++ vtable differences from GCC + integrated debug info |
| [aztec_c.md](aztec_c.md) | Manx Aztec C | `LINK A5` + D3-D7 only (5 regs) — distinct from SAS/C 9-reg save |
| [lattice_c.md](lattice_c.md) | Lattice C 3.x/4.x | Predecessor to SAS/C; less aggressive optimization, different startup stub |
| [dice_c.md](dice_c.md) | DICE C | No frame pointer + PC-relative strings + extremely fast compilation marker patterns |
## Cross-Compiler Comparison — Same C Function
Every per-compiler article includes this reference function compiled by that compiler:
```c
/* Reference function used in all compiler comparison tables */
ULONG CountWords(CONST_STRPTR str) {
ULONG count = 0;
BOOL in_word = FALSE;
while (*str) {
if (*str == ' ' || *str == '\t' || *str == '\n') {
in_word = FALSE;
} else if (!in_word) {
count++;
in_word = TRUE;
}
str++;
}
return count;
}
```
Each article shows the full assembly output, annotated with which patterns are compiler-specific and which are universal m68k idioms.
## See Also
- [compiler_fingerprints.md](../../compiler_fingerprints.md) — Quick compiler identification guide
- [ansi_c_reversing.md](../ansi_c_reversing.md) — General C reverse engineering methodology
- [m68k_codegen_patterns.md](../m68k_codegen_patterns.md) — m68k code generation idiom catalog
- [startup_code.md](../../../04_linking_and_libraries/startup_code.md) — CLI vs WB startup internals
- [13_toolchain/](../../../13_toolchain/README.md) — Compiler usage and configuration (not RE)