mirror of
https://github.com/alfishe/amiga-bootcamp.git
synced 2026-06-13 00:26:28 +00:00
More information. Especially about RE and AREXX automation
This commit is contained in:
parent
05452c6c12
commit
0aafceb31e
30 changed files with 8826 additions and 66 deletions
102
05_reversing/static/compilers/README.md
Normal file
102
05_reversing/static/compilers/README.md
Normal file
|
|
@ -0,0 +1,102 @@
|
|||
[← Home](../../../README.md) · [Reverse Engineering](../../README.md) · [Static Analysis](../README.md)
|
||||
|
||||
# Per-Compiler Reverse Engineering — Binary Field Manuals
|
||||
|
||||
## Overview
|
||||
|
||||
This section provides **compiler-specific reverse engineering field manuals**. Each article answers one question: *"I have a binary produced by this compiler — what does it look like in IDA/Ghidra, and how do I read it?"* Rather than discussing compiler usage (see [13_toolchain](../../../13_toolchain/README.md) for that), these articles focus exclusively on **binary output**: hunk naming conventions, prologue/epilogue patterns, stack frame layouts, string addressing modes, startup code, optimization patterns, and debug info formats.
|
||||
|
||||
Every article includes the **same C function compiled by each compiler** — a side-by-side comparison that reveals exactly how `for` loops, `switch` statements, struct access, and AmigaOS library calls differ at the assembly level.
|
||||
|
||||
## Compiler Identification Decision Flowchart
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
BIN["m68k binary loaded in disassembler"]
|
||||
HUNK{"Hunk names?"}
|
||||
L_A5{"LINK A5 present?"}
|
||||
L_A6{"LINK A6 present?"}
|
||||
STR_ABS{"String addressing?"}
|
||||
REG_SAVE{"MOVEM.L save set size?"}
|
||||
FP_DEFAULT{"Default frame pointer?"}
|
||||
|
||||
BIN --> HUNK
|
||||
HUNK -->|"CODE/DATA/BSS"| L_A5
|
||||
HUNK -->|".text/.data/.bss"| L_A6
|
||||
HUNK -->|"CODE/DATA + __MERGED"| VBCC["→ VBCC"]
|
||||
HUNK -->|"Custom prefix"| STORMC["→ StormC"]
|
||||
|
||||
L_A5 -->|"Yes"| STR_ABS
|
||||
L_A5 -->|"No, LINK absent"| FP_DEFAULT
|
||||
STR_ABS -->|"Absolute (MOVE.L #str,Dn)"| SASC["→ SAS/C"]
|
||||
STR_ABS -->|"PC-relative (LEA str(PC))"| DICE["→ DICE C"]
|
||||
|
||||
L_A6 -->|"Yes"| GCC["→ GCC 2.95.x"]
|
||||
L_A6 -->|"No, LINK absent"| VBCC2["→ VBCC"]
|
||||
|
||||
FP_DEFAULT -->|"None (only used regs saved)"| VBCC3["→ VBCC"]
|
||||
FP_DEFAULT -->|"A5 frame pointer"| AZTEC["→ Aztec C / Lattice C"]
|
||||
|
||||
REG_SAVE -->|"D2-D7/A2-A4 (9 regs)"| SASC2["→ SAS/C"]
|
||||
REG_SAVE -->|"D3-D7 (5 regs)"| AZTEC2["→ Aztec C"]
|
||||
REG_SAVE -->|"D2-D7/A2-A6 (11 regs)"| GENSASC["→ SAS/C __saveds"]
|
||||
REG_SAVE -->|"Minimal, per-function"| VBCC4["→ VBCC"]
|
||||
```
|
||||
|
||||
## Quick Identification Matrix
|
||||
|
||||
| Criterion | SAS/C 6.x | GCC 2.95.x | VBCC | StormC | Aztec C | Lattice C | DICE C |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **Hunk names** | `CODE`, `DATA`, `BSS` | `.text`, `.data`, `.bss` | `CODE`, `DATA`, `BSS` + `__MERGED` | `CODE`, `DATA` (Amiga standard) | `CODE`, `DATA`, `BSS` | `CODE`, `DATA`, `BSS` | `CODE`, `DATA`, `BSS` |
|
||||
| **Frame pointer** | A5 (`LINK A5, #-N`) | A6 (or none with `-fomit-frame-pointer`) | None (rarely A5) | A5 (`LINK A5, #-N`) | A5 (`LINK A5, #-N`) | A5 (`LINK A5, #-N`) | None typically |
|
||||
| **String addressing** | Absolute + relocated | PC-relative | PC-relative | Absolute | Absolute | Absolute | PC-relative |
|
||||
| **Register save set** | D2-D7/A2-A4 (9 regs) | D2-D3/A2 (per-function) | Only used regs | D2-D7/A2-A4 (9 regs) | D3-D7 (5 regs) | D2-D5/A2-A3 | Per-function |
|
||||
| **Startup entry** | `_start` / `c.o` | `_start` / `libnix` | `_start` / `startup.o` | `_STORM_` prefix | `_start` / `aztec.o` | `_start` / `lc.o` | `_mainCRTStartup` |
|
||||
| **Library call style** | `JSR -$XXX(A6)` after loading global | `JSR -$XXX(A6)` with tighter code | `JSR -$XXX(A6)` via `__reg()` | `JSR -$XXX(A6)` SAS/C-like | `JSR -$XXX(A6)` | `JSR -$XXX(A6)` | `JSR -$XXX(A6)` |
|
||||
| **Era** | 1988–1996 | 1995–present | 1995–present | 1996–2000 | 1985–1992 | 1985–1989 | 1992–1995 |
|
||||
| **RE article** | [sasc.md](sasc.md) | [gcc.md](gcc.md) | [vbcc.md](vbcc.md) | [stormc.md](stormc.md) | [aztec_c.md](aztec_c.md) | [lattice_c.md](lattice_c.md) | [dice_c.md](dice_c.md) |
|
||||
|
||||
## Articles
|
||||
|
||||
| File | Compiler | Key RE Distinguishing Feature |
|
||||
|---|---|---|
|
||||
| [sasc.md](sasc.md) | SAS/C 5.x/6.x | `LINK A5` + 9-register MOVEM.L save — the most common Amiga C prologue |
|
||||
| [gcc.md](gcc.md) | GCC 2.95.x | `LINK A6` (or no frame pointer) + PC-relative strings + `__CTOR_LIST__`/`__DTOR_LIST__` arrays |
|
||||
| [vbcc.md](vbcc.md) | VBCC | No frame pointer + per-function register save + `__reg()` calling convention + `__MERGED` hunks |
|
||||
| [stormc.md](stormc.md) | StormC / StormC++ | A5 frame pointer + C++ vtable differences from GCC + integrated debug info |
|
||||
| [aztec_c.md](aztec_c.md) | Manx Aztec C | `LINK A5` + D3-D7 only (5 regs) — distinct from SAS/C 9-reg save |
|
||||
| [lattice_c.md](lattice_c.md) | Lattice C 3.x/4.x | Predecessor to SAS/C; less aggressive optimization, different startup stub |
|
||||
| [dice_c.md](dice_c.md) | DICE C | No frame pointer + PC-relative strings + extremely fast compilation marker patterns |
|
||||
|
||||
## Cross-Compiler Comparison — Same C Function
|
||||
|
||||
Every per-compiler article includes this reference function compiled by that compiler:
|
||||
|
||||
```c
|
||||
/* Reference function used in all compiler comparison tables */
|
||||
ULONG CountWords(CONST_STRPTR str) {
|
||||
ULONG count = 0;
|
||||
BOOL in_word = FALSE;
|
||||
|
||||
while (*str) {
|
||||
if (*str == ' ' || *str == '\t' || *str == '\n') {
|
||||
in_word = FALSE;
|
||||
} else if (!in_word) {
|
||||
count++;
|
||||
in_word = TRUE;
|
||||
}
|
||||
str++;
|
||||
}
|
||||
return count;
|
||||
}
|
||||
```
|
||||
|
||||
Each article shows the full assembly output, annotated with which patterns are compiler-specific and which are universal m68k idioms.
|
||||
|
||||
## See Also
|
||||
|
||||
- [compiler_fingerprints.md](../../compiler_fingerprints.md) — Quick compiler identification guide
|
||||
- [ansi_c_reversing.md](../ansi_c_reversing.md) — General C reverse engineering methodology
|
||||
- [m68k_codegen_patterns.md](../m68k_codegen_patterns.md) — m68k code generation idiom catalog
|
||||
- [startup_code.md](../../../04_linking_and_libraries/startup_code.md) — CLI vs WB startup internals
|
||||
- [13_toolchain/](../../../13_toolchain/README.md) — Compiler usage and configuration (not RE)
|
||||
125
05_reversing/static/compilers/aztec_c.md
Normal file
125
05_reversing/static/compilers/aztec_c.md
Normal file
|
|
@ -0,0 +1,125 @@
|
|||
[← Home](../../../README.md) · [Reverse Engineering](../../README.md) · [Static Analysis](../README.md) · [Compilers](README.md)
|
||||
|
||||
# Manx Aztec C — Reverse Engineering Field Manual
|
||||
|
||||
## Overview
|
||||
|
||||
**Manx Aztec C** (versions 3.x–5.x, 1985–1992) was the first widely used C compiler for the Amiga, predating Lattice C's market dominance. It targets early AmigaOS (1.1–1.3) and produces code with a distinctive **`LINK A5, #-N` + `MOVEM.L D3-D7, -(SP)`** prologue — saving only 5 data registers (D3–D7) instead of SAS/C's 9 registers. This narrower save set is the single most reliable Aztec C fingerprint.
|
||||
|
||||
Key constraints:
|
||||
- **5-register data save (D3–D7 only)** — Aztec C preserves fewer registers than any other Amiga C compiler. D2 is considered scratch by Aztec, while SAS/C, GCC, and VBCC all preserve D2.
|
||||
- **A5 frame pointer** — standard `LINK A5, #-N` convention, like SAS/C.
|
||||
- **Absolute string addressing** — like SAS/C, strings are in DATA with `HUNK_RELOC32` relocation.
|
||||
- **Pre-MakeLibrary era** — Aztec C libraries use a different initialization pattern than later RTF_AUTOINIT libraries.
|
||||
- **Early AmigaOS focus** — code may assume OS 1.1/1.2 behavior that changed in 2.0+.
|
||||
|
||||
```asm
|
||||
; Aztec C function prologue (THE signature):
|
||||
_func:
|
||||
LINK A5, #-$14 ; allocate frame
|
||||
MOVEM.L D3-D7, -(SP) ; save D3-D7 ONLY (5 regs!)
|
||||
; Note: D2 is NOT saved (unlike SAS/C, GCC, VBCC)
|
||||
; Note: A2-A4 are not saved (unlike SAS/C)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Binary Identification
|
||||
|
||||
| Criterion | Aztec C | SAS/C |
|
||||
|---|---|---|
|
||||
| **Register save** | `D3-D7` (5 regs) | `D2-D7/A2-A4` (9 regs) |
|
||||
| **D2 preservation** | NOT preserved — call-clobbered | Preserved — callee-saved |
|
||||
| **A2-A4 preservation** | NOT preserved by default | Always preserved |
|
||||
| **Frame pointer** | A5 (`LINK A5`) | A5 (`LINK A5`) |
|
||||
| **String addressing** | Absolute + relocation | Absolute + relocation |
|
||||
| **Startup module** | `aztec.o` | `c.o` |
|
||||
| **Hunk names** | `CODE`, `DATA`, `BSS` | `CODE`, `DATA`, `BSS` |
|
||||
| **Era** | 1985–1992 (OS 1.1–1.3) | 1988–1996 (OS 1.2–3.1) |
|
||||
|
||||
### Detecting D2 as Scratch Register
|
||||
|
||||
The most distinctive Aztec C behavior: **D2 is call-clobbered**. After a function call, Aztec C must reload D2 if it was using it. SAS/C, GCC, and VBCC all preserve D2 across calls.
|
||||
|
||||
```asm
|
||||
; Aztec C: D2 is NOT preserved across calls
|
||||
MOVE.L #value, D2 ; D2 = important value
|
||||
BSR _some_func ; D2 may be destroyed!
|
||||
MOVEQ #0, D2 ; reload D2 (Aztec C knows D2 is scratch)
|
||||
; SAS/C would NOT need this reload — D2 is callee-saved there
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Historical Context
|
||||
|
||||
Manx Software Systems produced Aztec C for multiple platforms (CP/M, DOS, Macintosh, Amiga, Atari ST). The Amiga version was one of the earliest C compilers available — released in 1985 alongside the Amiga 1000 launch. Its 5-register save convention (D3-D7 only) reflects the era's emphasis on minimizing prologue/epilogue overhead on the 7.14 MHz 68000.
|
||||
|
||||
Aztec C was superceded by Lattice C (which became SAS/C) in the late 1980s, though Manx continued to release versions into the early 1990s. Most Aztec C binaries date from 1985–1989 — the Amiga's formative years.
|
||||
|
||||
Software known to use Aztec C:
|
||||
- Early Amiga utilities (1985–1987 era)
|
||||
- Some Commodore-developed tools
|
||||
- ABasiC (the Amiga BASIC compiler)
|
||||
- Early versions of certain games ported from other platforms
|
||||
|
||||
---
|
||||
|
||||
## Same C Function — Aztec C Output
|
||||
|
||||
```asm
|
||||
; CountWords() — Manx Aztec C 5.x:
|
||||
; (Note: smaller register save set, but structurally similar to SAS/C)
|
||||
|
||||
_CountWords:
|
||||
LINK A5, #-$08
|
||||
MOVEM.L D3-D4, -(SP) ; ONLY D3-D4 (not D2-D3!)
|
||||
|
||||
MOVEQ #0, D3 ; D3 = count
|
||||
MOVEQ #0, D4 ; D4 = in_word
|
||||
|
||||
MOVEA.L $08(A5), A0 ; str
|
||||
|
||||
BRA.S .loop_test
|
||||
|
||||
.loop_body:
|
||||
MOVEQ #' ', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ.S .not_word
|
||||
MOVEQ #'\t', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ.S .not_word
|
||||
MOVEQ #'\n', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ.S .not_word
|
||||
|
||||
TST.B D4
|
||||
BNE.S .next_char
|
||||
ADDQ.L #1, D3
|
||||
MOVEQ #1, D4
|
||||
BRA.S .next_char
|
||||
|
||||
.not_word:
|
||||
MOVEQ #0, D4
|
||||
|
||||
.next_char:
|
||||
ADDQ.L #1, A0
|
||||
|
||||
.loop_test:
|
||||
TST.B (A0)
|
||||
BNE.S .loop_body
|
||||
|
||||
MOVE.L D3, D0
|
||||
MOVEM.L (SP)+, D3-D4
|
||||
UNLK A5
|
||||
RTS
|
||||
```
|
||||
|
||||
**Aztec C observations**: The function body is nearly identical to SAS/C, but notice D2 is **not used** — Aztec C skips D2 and starts local register allocation at D3. If you see functions that never touch D2, it's likely Aztec C (or early Lattice C).
|
||||
|
||||
---
|
||||
## References
|
||||
|
||||
- [compiler_fingerprints.md](../../compiler_fingerprints.md) — Quick identification
|
||||
- Aztec C 68k Manual (archive.org)
|
||||
- See also: [sasc.md](sasc.md), [lattice_c.md](lattice_c.md) — compare with other compilers
|
||||
135
05_reversing/static/compilers/dice_c.md
Normal file
135
05_reversing/static/compilers/dice_c.md
Normal file
|
|
@ -0,0 +1,135 @@
|
|||
[← Home](../../../README.md) · [Reverse Engineering](../../README.md) · [Static Analysis](../README.md) · [Compilers](README.md)
|
||||
|
||||
# DICE C — Reverse Engineering Field Manual
|
||||
|
||||
## Overview
|
||||
|
||||
**DICE C** (by Matt Dillon, 1992–1995) was a fast, lean C compiler for AmigaOS known for its incredible compilation speed — often 10–50× faster than SAS/C. It was the compiler of choice for rapid development cycles and produced tight, no-frills code. Its key RE characteristics: **no frame pointer** (like GCC/VBCC), **PC-relative string addressing** (like GCC), and **minimal register saves** (per-function, like VBCC). DICE C binaries look most similar to VBCC output but with some distinctive patterns.
|
||||
|
||||
Key constraints:
|
||||
- **No frame pointer** — DICE C omits the frame pointer by default. Functions use SP-relative addressing.
|
||||
- **PC-relative strings** — Like GCC and VBCC, DICE uses `LEA string(PC), A0`.
|
||||
- **Extremely fast compilation** — DICE's speed came from a simpler optimizer; the binary output is clean but not as aggressively optimized as SAS/C -O2 or GCC -O2.
|
||||
- **Custom startup** — `_mainCRTStartup` (not `_start`) is the typical entry point name.
|
||||
- **Hunk names**: `CODE`, `DATA`, `BSS` (Amiga standard)
|
||||
|
||||
```asm
|
||||
; DICE C function — no frame pointer, PC-relative, per-function save:
|
||||
_func:
|
||||
MOVEM.L D2-D4/A2-A3, -(SP) ; save only what's used
|
||||
; ... function body, SP-relative access ...
|
||||
MOVEM.L (SP)+, D2-D4/A2-A3
|
||||
RTS
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Binary Identification
|
||||
|
||||
| Criterion | DICE C | SAS/C | GCC | VBCC |
|
||||
|---|---|---|---|---|
|
||||
| **Frame pointer** | None | A5 always | A6 or none | None |
|
||||
| **String addressing** | PC-relative | Absolute + reloc | PC-relative | PC-relative |
|
||||
| **Register save** | Per-function | Fixed 9 regs | Per-function | Per-function |
|
||||
| **Startup entry** | `_mainCRTStartup` | `_start` | `_start` | `_start` |
|
||||
| **Hunk names** | `CODE`, `DATA`, `BSS` | `CODE`, `DATA`, `BSS` | `.text`, `.data`, `.bss` | `CODE`, `DATA`, `BSS` |
|
||||
| **Optimizer** | Moderate | Aggressive | Aggressive | Aggressive (peephole) |
|
||||
| **Compile speed** | Very fast | Moderate | Slow | Fast |
|
||||
|
||||
### Key Distinguishing Patterns
|
||||
|
||||
1. **`_mainCRTStartup` entry point** — unique to DICE C. No other Amiga compiler uses this name for the startup entry.
|
||||
2. **`ADDQ.L #4, SP` argument cleanup** — DICE C often uses `ADDQ` to pop arguments after function calls, where SAS/C would use `LEA`.
|
||||
3. **Conservative optimization** — DICE C may not perform CSE or loop-invariant code motion as aggressively as SAS/C or GCC.
|
||||
|
||||
---
|
||||
|
||||
## Library Call Patterns
|
||||
|
||||
```asm
|
||||
; DICE C library call:
|
||||
MOVEA.L (_SysBase).L, A6
|
||||
JSR -$C6(A6) ; AllocMem
|
||||
; DICE C may not cache A6 — reloads from global for each call block
|
||||
```
|
||||
|
||||
DICE C is notable for using **`MOVEA.L (_LibBase).L, A6`** (absolute long with relocation) rather than `MOVEA.L _LibBase, A6` (absolute with reloc). The `().L` suffix is a DICE C assembler convention that appears in the disassembly.
|
||||
|
||||
---
|
||||
|
||||
## Historical Context
|
||||
|
||||
**Matt Dillon** (later known for DragonFly BSD, the HAMMER filesystem, and the D compiler) wrote DICE C as a side project while developing Amiga software. Its claim to fame was compiling the entire DICE C compiler itself in **under 10 seconds** on a stock Amiga 3000 — a feat SAS/C needed minutes for.
|
||||
|
||||
DICE C was particularly popular in the Amiga demoscene and shareware community, where fast edit-compile-test cycles mattered more than squeezing every last cycle out of the generated code. It also shipped with a suite of development tools including a linker, librarian, and debugger.
|
||||
|
||||
DICE C's development effectively ended when Matt Dillon moved to FreeBSD development in the mid-1990s. The final version was released as freeware.
|
||||
|
||||
Software known or likely to use DICE C:
|
||||
- **DICE C itself** (self-hosting — compiled with DICE C)
|
||||
- Various Amiga shareware utilities (1992–1995 era)
|
||||
- Some demoscene tools and intros
|
||||
- Early Amiga networking utilities
|
||||
|
||||
---
|
||||
|
||||
## Same C Function — DICE C Output
|
||||
|
||||
```asm
|
||||
; CountWords() — DICE C:
|
||||
; (No frame pointer, PC-relative strings, per-function save)
|
||||
|
||||
_CountWords:
|
||||
MOVEM.L D2-D3, -(SP) ; save D2-D3 only
|
||||
|
||||
MOVEQ #0, D2 ; D2 = count
|
||||
MOVEQ #0, D3 ; D3 = in_word
|
||||
|
||||
MOVEA.L $0C(SP), A0 ; A0 = str (SP+12, after saved regs + ret addr)
|
||||
|
||||
BRA.S .loop_test
|
||||
|
||||
.loop_body:
|
||||
MOVEQ #' ', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ.S .not_word
|
||||
MOVEQ #'\t', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ.S .not_word
|
||||
MOVEQ #'\n', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ.S .not_word
|
||||
|
||||
TST.B D3
|
||||
BNE.S .next_char
|
||||
ADDQ.L #1, D2
|
||||
MOVEQ #1, D3
|
||||
BRA.S .next_char
|
||||
|
||||
.not_word:
|
||||
MOVEQ #0, D3
|
||||
|
||||
.next_char:
|
||||
ADDQ.L #1, A0
|
||||
|
||||
.loop_test:
|
||||
TST.B (A0)
|
||||
BNE.S .loop_body
|
||||
|
||||
MOVE.L D2, D0
|
||||
MOVEM.L (SP)+, D2-D3
|
||||
RTS
|
||||
```
|
||||
|
||||
**DICE C observations**: For this simple function, DICE C's output is nearly identical to GCC and VBCC. The distinction emerges in:
|
||||
- **Startup code naming** (`_mainCRTStartup` vs `_start`)
|
||||
- **Argument cleanup patterns** (`ADDQ.L #4, SP` after calls)
|
||||
- **Less aggressive CSE** in more complex functions
|
||||
|
||||
---
|
||||
## References
|
||||
|
||||
- [compiler_fingerprints.md](../../compiler_fingerprints.md) — Quick identification
|
||||
- DICE C distribution (Aminet: `dev/c/dice`)
|
||||
- Matt Dillon's DICE C documentation (archive.org)
|
||||
- See also: [sasc.md](sasc.md), [gcc.md](gcc.md), [vbcc.md](vbcc.md) — compare with other compilers
|
||||
741
05_reversing/static/compilers/gcc.md
Normal file
741
05_reversing/static/compilers/gcc.md
Normal file
|
|
@ -0,0 +1,741 @@
|
|||
[← Home](../../../README.md) · [Reverse Engineering](../../README.md) · [Static Analysis](../README.md) · [Compilers](README.md)
|
||||
|
||||
# GCC 2.95.x — Reverse Engineering Field Manual
|
||||
|
||||
## Overview
|
||||
|
||||
**GCC 2.95.x** for m68k-amigaos (variants: GeekGadgets, bebbo's modern port, and the original GCC 2.95.3) is the second most common compiler encountered in Amiga reverse engineering, particularly for software from 1995 onward. Unlike SAS/C's rigid "always LINK A5" convention, GCC is far more flexible — it uses **A6** as frame pointer when enabled, defaults to **no frame pointer at all**, uses **PC-relative string addressing**, and generates per-function `MOVEM.L` save sets (saving only the registers actually used, not a fixed set).
|
||||
|
||||
Key constraints to internalize immediately:
|
||||
- **No default frame pointer** — GCC optimizes away the frame pointer whenever possible. Locals and arguments are accessed via `$offset(SP)`. This makes function boundary detection harder initially but produces tighter code.
|
||||
- **A6 is the frame pointer, not A5** — when `-fno-omit-frame-pointer` is used. This is the primary visual disambiguator from SAS/C.
|
||||
- **PC-relative everything** — strings are addressed via `LEA string(PC), A0`. Constants live in the CODE hunk alongside instructions. No `HUNK_RELOC32` for string references.
|
||||
- **`__CTOR_LIST__` / `__DTOR_LIST__`** — global constructor/destructor arrays unique to GCC C++ and GCC with `-finit-priority`.
|
||||
- **`.text` / `.data` / `.bss` hunk names** — Unix convention, unlike SAS/C's Amiga-native `CODE`/`DATA`/`BSS`.
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Source (.c / .cpp)"
|
||||
SRC["C/C++ source"]
|
||||
end
|
||||
subgraph "GCC Compiler Pipeline"
|
||||
CC1["cc1 (C frontend)"]
|
||||
CC1PLUS["cc1plus (C++ frontend)"]
|
||||
AS["vasm / GNU as"]
|
||||
LD["vlink / GNU ld"]
|
||||
LIBNIX["libnix / clib2 (startup)"]
|
||||
end
|
||||
subgraph "Binary Output"
|
||||
HUNK["Amiga HUNK executable"]
|
||||
TEXT[".text hunk — code + PC-relative data"]
|
||||
DATA[".data hunk — initialized globals"]
|
||||
BSS[".bss hunk — zero-filled globals"]
|
||||
CTOR["__CTOR_LIST__ / __DTOR_LIST__ arrays"]
|
||||
SYMBOL["HUNK_SYMBOL — GCC mangled names"]
|
||||
end
|
||||
|
||||
SRC --> CC1 & CC1PLUS
|
||||
CC1 & CC1PLUS --> AS --> LD
|
||||
LIBNIX --> LD
|
||||
LD --> HUNK
|
||||
HUNK --> TEXT & DATA & BSS
|
||||
HUNK --> CTOR
|
||||
HUNK --> SYMBOL
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Binary Identification — The GCC Signature
|
||||
|
||||
### Hunk Names (Unix Convention)
|
||||
|
||||
```
|
||||
Hunk 0: .text (code + read-only data including strings and jump tables)
|
||||
Hunk 1: .data (initialized global variables)
|
||||
Hunk 2: .bss (zero-initialized globals)
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> **The `.text` hunk name is the single fastest way to identify GCC output.** SAS/C, Aztec, Lattice, and StormC all use `CODE`/`DATA`/`BSS`. Only GCC (and sometimes VBCC with certain linker scripts) produces `.text`/`.data`/`.bss`. However, some GCC ports have been configured to emit Amiga-standard names — check multiple indicators.
|
||||
|
||||
### Function Prologue — The Minimalist Approach
|
||||
|
||||
GCC's prologue varies dramatically based on how many registers the function actually uses:
|
||||
|
||||
```asm
|
||||
; GCC with -fomit-frame-pointer (default) — leaf function, no locals:
|
||||
_leaf_func:
|
||||
; NO prologue at all — just starts executing
|
||||
; ... function body ...
|
||||
RTS
|
||||
|
||||
; GCC — function with a few locals, no calls:
|
||||
_modest_func:
|
||||
MOVEM.L D2/A2, -(SP) ; save ONLY the 2 registers actually used
|
||||
; ... function body ...
|
||||
MOVEM.L (SP)+, D2/A2
|
||||
RTS
|
||||
|
||||
; GCC with -fno-omit-frame-pointer:
|
||||
_frame_func:
|
||||
LINK A6, #-N ; A6 frame pointer — NOT A5!
|
||||
MOVEM.L D2-D3/A2-A3, -(SP) ; only actually-used regs
|
||||
; ... function body ...
|
||||
MOVEM.L (SP)+, D2-D3/A2-A3
|
||||
UNLK A6 ; UNLK A6, not UNLK A5
|
||||
RTS
|
||||
|
||||
; GCC — large function with many locals:
|
||||
_large_func:
|
||||
MOVEM.L D2-D7/A2-A5, -(SP) ; many regs — still not all 9
|
||||
LEA -$400(SP), SP ; allocate large frame (ADD/SUB alternative)
|
||||
; ... function body ...
|
||||
LEA $400(SP), SP
|
||||
MOVEM.L (SP)+, D2-D7/A2-A5
|
||||
RTS
|
||||
```
|
||||
|
||||
**Key identification**: the register save set is **per-function, tailored to actual usage**. If you see `MOVEM.L D2-D3/A2, -(SP)` in one function and `MOVEM.L D2-D7/A2-A4, -(SP)` in another, it's GCC (or VBCC). SAS/C always saves the same fixed set.
|
||||
|
||||
### String Addressing — PC-Relative
|
||||
|
||||
```asm
|
||||
; GCC string reference — PC-relative:
|
||||
LEA .LC0(PC), A0 ; A0 = "Hello, World!\n"
|
||||
JSR _Printf ; call Printf(A0)
|
||||
|
||||
; ... later in the same .text hunk:
|
||||
.LC0:
|
||||
DC.B "Hello, World!", $0A, 00
|
||||
```
|
||||
|
||||
**Critical RE implication**: GCC strings live in `.text` next to the code that references them. In IDA, the string appears as inline data within the code segment, creating a `CODE XREF` from the `LEA` instruction. This means:
|
||||
1. Strings are **not separately relocatable** — they move with the code hunk
|
||||
2. String cross-references in IDA are `CODE XREF`, not `DATA XREF`
|
||||
3. The `LEA` pattern is unambiguous — `LEA $XXXXXXXX(PC), An` where the target is ASCII data
|
||||
|
||||
---
|
||||
|
||||
## Calling Conventions
|
||||
|
||||
GCC uses a simpler calling convention model than SAS/C — one primary convention with variations controlled by function attributes. However, what GCC lacks in convention count it makes up for in **register allocation flexibility**: every function gets a customized stack frame and register save set based on exactly which variables the compiler decides to keep in registers.
|
||||
|
||||
### Primary Convention (cdecl, the GCC default)
|
||||
|
||||
| Aspect | GCC Convention |
|
||||
|---|---|
|
||||
| **Return value** | D0 (32-bit integer/pointer), D0:D1 (64-bit `long long`), FP0 (float/double on FPU systems). Structs > 8 bytes: caller allocates space, passes hidden pointer in **A0**. |
|
||||
| **First 2 integer args** | D0, D1 — passed in registers. These are **caller-saved** (the callee may destroy them). |
|
||||
| **All remaining args** | Pushed onto the stack **right-to-left** before the call. The **caller** cleans the stack after the call returns (cdecl convention). |
|
||||
| **Callee-saved registers** | D2-D7, A2-A5 — but GCC saves **only the subset actually used** by the function. This is the key identifiability feature. |
|
||||
| **Caller-saved registers** | D0, D1, A0, A1 — destroyed across calls. If the caller needs these values after a call, it must save them itself. |
|
||||
| **Frame pointer** | A6 when not omitted (`-fno-omit-frame-pointer`); otherwise SP-relative access for both locals and incoming stack args. |
|
||||
| **Library base** | A6 — loaded per-library at call sites. GCC neither preserves A6 across library calls nor uses A6 for any other purpose during library call sequences. |
|
||||
|
||||
> [!NOTE]
|
||||
> Unlike SAS/C's `#pragma libcall` which bakes the register assignment into the pragma, GCC uses inline assembly stubs (`<inline/exec.h>`, `<inline/dos.h>`) or the `__asm()` keyword to set up library calls. In the binary, the result looks identical — `MOVE.L args, Dn` / `JSR -$XXX(A6)` — but the surrounding code pattern differs (GCC is tighter, fewer redundant loads).
|
||||
|
||||
### Parameter Passing — Detailed Breakdown
|
||||
|
||||
Understanding exactly which parameter lands in which register vs which stack slot is essential for reconstructing function prototypes in IDA/Ghidra.
|
||||
|
||||
```
|
||||
Caller side (before BSR/JSR _func):
|
||||
Stack layout after BSR:
|
||||
MOVE.L arg1, D0 ─┐ ┌──────────────────────┐
|
||||
MOVE.L arg2, D1 ├ registers │ arg8 (last pushed) │ SP+28
|
||||
MOVE.L arg3, -(SP) ─┐ │ arg7 │ SP+24
|
||||
MOVE.L arg4, -(SP) ├ stack │ arg6 │ SP+20
|
||||
... │ │ arg5 │ SP+16
|
||||
MOVE.L argN, -(SP) ─┘ │ arg4 │ SP+12
|
||||
BSR _func │ arg3 │ SP+8 ← first stack arg
|
||||
│ return address │ SP+4
|
||||
ADD.L #N*4, SP ← caller cleans │ (saved regs...) │ SP+0
|
||||
└──────────────────────┘
|
||||
```
|
||||
|
||||
**Identifying parameters in disassembly:**
|
||||
|
||||
| Parameter | Location in Callee | How to Find It |
|
||||
|---|---|---|
|
||||
| **arg1** | D0 (may be moved to a callee-saved reg immediately) | Look for `MOVE.L D0, Dn` early in the function |
|
||||
| **arg2** | D1 (same — often moved to a callee-saved reg) | Look for `MOVE.L D1, Dn` after D0 is saved |
|
||||
| **arg3** | `$04(SP)` or `$0C(A6)` (after return address + saved regs) | First stack arg — offset depends on prologue |
|
||||
| **arg4+** | `$08(SP)`, `$0C(SP)`... or `$10(A6)`, `$14(A6)`... | Sequential 4-byte slots above arg3 |
|
||||
|
||||
**With frame pointer (A6):**
|
||||
```asm
|
||||
; Function with LINK A6, #-$10 and MOVEM.L D2-D4, -(SP):
|
||||
_func:
|
||||
LINK A6, #-$10 ; A6 = SP, SP -= 16 (locals)
|
||||
MOVEM.L D2-D4, -(SP) ; save 3 regs (12 bytes)
|
||||
|
||||
; Now the stack looks like:
|
||||
; $08(A6) = return address
|
||||
; $0C(A6) = arg3 (first stack arg at A6+12)
|
||||
; $10(A6) = arg4 ; A6+16
|
||||
; $14(A6) = arg5 ; A6+20
|
||||
|
||||
MOVE.L $0C(A6), D2 ; D2 = arg3 (typical: move to callee-saved)
|
||||
; ...
|
||||
MOVEM.L (SP)+, D2-D4
|
||||
UNLK A6
|
||||
RTS
|
||||
```
|
||||
|
||||
**Without frame pointer (default -O2):**
|
||||
```asm
|
||||
; Function with only MOVEM.L D2-D3, -(SP):
|
||||
_func:
|
||||
MOVEM.L D2-D3, -(SP) ; save 2 regs (8 bytes)
|
||||
|
||||
; Now args are at:
|
||||
; $0C(SP) = arg3 (12 = 4 ret addr + 8 saved regs)
|
||||
; $10(SP) = arg4 ; SP+16
|
||||
|
||||
MOVE.L $0C(SP), D2 ; D2 = arg3
|
||||
; ...
|
||||
MOVEM.L (SP)+, D2-D3
|
||||
RTS
|
||||
```
|
||||
|
||||
> [!WARNING]
|
||||
> **SP-relative offsets are unstable.** If the function uses `ADDQ.L/SUBQ.L` on SP, `PEA`, or pushes temporary values, the SP-relative offset for the same argument shifts. With A6-relative addressing (frame pointer enabled), offsets are constant throughout the function body.
|
||||
|
||||
### Special Argument Types
|
||||
|
||||
| Type | Convention | Disassembly Pattern |
|
||||
|---|---|---|
|
||||
| **64-bit `long long`** | D0:D1 (low 32 in D0, high 32 in D1). If not first param, passed on stack as 8-byte aligned pair. | `MOVE.L D0, D2` / `MOVE.L D1, D3` — pair of moves to callee-saved regs |
|
||||
| **Struct ≤ 8 bytes** | Passed in D0:D1 (if first param) or on stack. | Look for byte-field extraction: `ANDI.B #$FF, D0` / `LSR.L #8, D0` |
|
||||
| **Struct > 8 bytes** | Caller allocates space, passes hidden pointer in **A0**. Callee copies if needed. | `MOVEA.L A0, A2` — A0 moved to callee-saved address reg early in prologue |
|
||||
| **`float` (FPU)** | FP0 (if FPU codegen enabled). With `-msoft-float`, passed as 32-bit integer in D0 or stack. | `FMOVE.S X, FP0` vs `MOVE.L #$3F800000, D0` (1.0f as integer) |
|
||||
| **`double` (FPU)** | FP0 (FPU). With `-msoft-float`, passed as 64-bit pair in D0:D1 or on stack. | `FMOVE.D X, FP0` vs D0:D1 pair |
|
||||
|
||||
### GCC Register Allocation — Recognizing Register vs Stack Variables
|
||||
|
||||
GCC's register allocator is the single most important thing to understand when reading GCC output, because it determines whether a C variable appears as a persistent register value or a frame-relative stack slot.
|
||||
|
||||
#### How GCC Assigns Registers to Variables
|
||||
|
||||
GCC 2.95.x uses a **priority-based graph coloring allocator**. The heuristic, simplified:
|
||||
|
||||
1. **Most-referenced variables get registers first.** A loop counter used 50 times wins over a flag set once.
|
||||
2. **Address-taken variables go to stack.** If a variable's address is taken (`&x`), it MUST live in memory — GCC can't keep it in a register.
|
||||
3. **D2-D7 used for integer/pointer values.** Data registers are the first choice for arithmetic and pointer-sized values.
|
||||
4. **A2-A5 used for pointer chasing and base addresses.** Address registers are preferred for `struct->field` access and array indexing.
|
||||
5. **Register pressure causes spilling.** If a function uses more live variables than available registers, the least-frequently-used variable gets evicted to a stack slot.
|
||||
|
||||
#### Identifying Register Variables in Disassembly
|
||||
|
||||
```asm
|
||||
; GCC -O2 function with register-allocated locals:
|
||||
_count_words:
|
||||
MOVEM.L D2-D3, -(SP) ; D2-D3 saved → they WILL be used as locals
|
||||
|
||||
MOVE.L D0, D2 ; D2 = str (arg1 moved to callee-saved reg)
|
||||
MOVEQ #0, D3 ; D3 = count (initialized to 0, stays in D3)
|
||||
MOVEQ #0, D1 ; D1 = in_word (scratch — destroyed across calls)
|
||||
|
||||
.loop:
|
||||
TST.B (D2) ; D2 used as pointer (not reloaded from stack)
|
||||
BEQ.S .done
|
||||
CMPI.B #' ', (D2)
|
||||
BNE.S .not_space
|
||||
MOVEQ #0, D1 ; D1 modified directly — no stack write
|
||||
.not_space:
|
||||
; ...
|
||||
ADDQ.L #1, D3 ; D3 incremented in-register — no stack read/modify/write
|
||||
BRA.S .loop
|
||||
|
||||
.done:
|
||||
MOVE.L D3, D0 ; return count (from D3, not from a stack load)
|
||||
MOVEM.L (SP)+, D2-D3
|
||||
RTS
|
||||
```
|
||||
|
||||
**Key signs a variable lives in a register:**
|
||||
- The register is saved in the prologue → it's being used as a named local
|
||||
- The variable's value is modified with `ADDQ`, `SUBQ`, `MOVEQ` operating on that register — never with `MOVE $offset(A6), Dn` / modify / `MOVE Dn, $offset(A6)`
|
||||
- The variable is read **without a preceding stack load** and written **without a following stack store**
|
||||
- At function exit, the value returns from the register, not from a reload
|
||||
|
||||
#### Identifying Stack Variables in Disassembly
|
||||
|
||||
```asm
|
||||
; Same function compiled -O0 (everything on stack):
|
||||
_count_words_O0:
|
||||
LINK A6, #-$08 ; 8 bytes of locals
|
||||
MOVEM.L D2-D3, -(SP)
|
||||
|
||||
MOVE.L $08(A6), D0 ; load arg1 from stack
|
||||
MOVE.L D0, -$04(A6) ; spill to local: str
|
||||
CLR.L -$08(A6) ; count = 0 (on stack)
|
||||
|
||||
.loop:
|
||||
MOVEA.L -$04(A6), A0 ; load str from stack
|
||||
TST.B (A0)
|
||||
BEQ.S .done
|
||||
; ... modify count ...
|
||||
ADDQ.L #1, -$08(A6) ; count++ — READ-MODIFY-WRITE to stack slot
|
||||
BRA.S .loop
|
||||
|
||||
.done:
|
||||
MOVE.L -$08(A6), D0 ; return count (load from stack)
|
||||
MOVEM.L (SP)+, D2-D3
|
||||
UNLK A6
|
||||
RTS
|
||||
```
|
||||
|
||||
**Key signs a variable lives on the stack:**
|
||||
- Every read is preceded by `MOVE.L $offset(A6), Dn`
|
||||
- Every write follows `MOVE.L Dn, $offset(A6)`
|
||||
- Increments are three instructions: load→add→store (read-modify-write)
|
||||
- The same frame offset (`-$04(A6)`) appears in multiple load/store instructions
|
||||
- Variables are never held in callee-saved registers across statements
|
||||
|
||||
#### Recognizing Spilled Registers
|
||||
|
||||
When register pressure exceeds available registers, GCC **spills** a variable temporarily to the stack:
|
||||
|
||||
```asm
|
||||
; D2 holds 'count', but we need D2 for a DIVU operation:
|
||||
MOVE.L D2, -$04(A6) ; spill count to stack
|
||||
MOVE.L denominator, D2
|
||||
DIVU D2, D0 ; D0/D2 → D0 (D2 destroyed)
|
||||
MOVE.L -$04(A6), D2 ; reload count from stack
|
||||
```
|
||||
|
||||
**Spill identification**: look for a `MOVE.L Dn, $offset(A6)` followed later by `MOVE.L $offset(A6), Dn` where `Dn` is used for a different purpose in between. The frame offset is typically in the local-variable area (negative offset from A6, or positive offset from SP+0).
|
||||
|
||||
#### Register Allocation Quick-Reference
|
||||
|
||||
| Pattern | Register Variable | Stack Variable | Spilled Variable |
|
||||
|---|---|---|---|
|
||||
| **Prologue saves it** | ✅ Saved in MOVEM | ❌ Not saved specifically | ✅ Saved in MOVEM |
|
||||
| **Read pattern** | Value already in Dn — no load | `MOVE.L $offset, Dn` before every use | `MOVE.L Dn, $offset` (store) then later `MOVE.L $offset, Dn` (load) |
|
||||
| **Write pattern** | `MOVEQ/ADDQ/SUBQ Dn` — register direct | `MOVE Dn, $offset` + `ADDQ $offset` or separate modify+store | `MOVE.L Dn, $offset` (spill); `MOVE.L $offset, Dn` (reload) |
|
||||
| **Typical compiler** | GCC -O2, -Os, -O3 | GCC -O0; SAS/C with low optimization | GCC under register pressure; SAS/C with many locals |
|
||||
| **RE effort** | Harder — must track register lifetime | Easier — named stack slot = stable location | Hardest — intermittent storage |
|
||||
|
||||
### Function Call Setup Patterns
|
||||
|
||||
GCC's call-site code reveals whether the caller passes parameters in registers or had to push to the stack:
|
||||
|
||||
```asm
|
||||
; Calling a function with 2 or fewer args (register-only):
|
||||
MOVE.L filename, D0 ; arg1 in D0
|
||||
MOVEQ #MODE_OLDFILE, D1 ; arg2 in D1
|
||||
BSR _OpenFile ; no stack setup, no cleanup
|
||||
|
||||
; Calling a function with 4 args (2 register + 2 stack):
|
||||
MOVE.L count, -(SP) ; arg4 pushed first (right-to-left!)
|
||||
MOVE.L buffer, -(SP) ; arg3 pushed second
|
||||
MOVE.L fh, D1 ; arg2 in D1
|
||||
MOVE.L #1024, D0 ; arg1 in D0
|
||||
BSR _ReadData
|
||||
ADDQ.L #8, SP ; caller cleans 8 bytes of stack args
|
||||
|
||||
; Calling a varargs function (all args on stack — no register args):
|
||||
MOVE.L arg3, -(SP)
|
||||
MOVE.L arg2, -(SP)
|
||||
MOVE.L arg1, -(SP)
|
||||
BSR _Printf
|
||||
LEA $0C(SP), SP ; caller cleans 12 bytes
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> **Varargs functions** (like `Printf`, `sprintf`, custom `Format()`) force ALL arguments onto the stack in GCC 2.95.x — even the first two. This is a reliable disambiguator: if you see a call with 3+ stack pushes and NO register args, the target is likely a varargs function.
|
||||
|
||||
### `__attribute__((interrupt))` — Interrupt Handler
|
||||
|
||||
```asm
|
||||
; GCC interrupt handler:
|
||||
_int_handler:
|
||||
MOVEM.L D0-D7/A0-A6, -(SP) ; save ALL regs
|
||||
; ... handler body ...
|
||||
MOVEM.L (SP)+, D0-D7/A0-A6
|
||||
RTE ; Return From Exception
|
||||
```
|
||||
|
||||
### `__attribute__((noreturn))` — No-Return Functions
|
||||
|
||||
```asm
|
||||
; GCC noreturn function — NO RTS at end:
|
||||
_exit_func:
|
||||
; ... cleanup ...
|
||||
JSR _exit ; tail-call to exit()
|
||||
; No RTS — compiler knows this never returns
|
||||
; May be followed by ILLEGAL or DC.B 0 padding
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Library Call Patterns
|
||||
|
||||
### GCC Library Call Style
|
||||
|
||||
```asm
|
||||
; GCC library call — characteristic patterns:
|
||||
; 1. Library base loaded once, may be reused across calls
|
||||
MOVEA.L (_SysBase).L, A6 ; load from absolute address (or PC-relative)
|
||||
|
||||
; 2. Arguments set up with minimal register traffic
|
||||
MOVE.L D3, D1 ; arg1 already in D3, just move to D1
|
||||
MOVE.L #$100, D2 ; immediate arg2
|
||||
|
||||
; 3. LVO call
|
||||
JSR -$C6(A6) ; AllocMem
|
||||
|
||||
; 4. Return value used immediately
|
||||
MOVE.L D0, A0 ; ptr → A0 for immediate use
|
||||
```
|
||||
|
||||
Compared to SAS/C:
|
||||
- GCC is more likely to reuse A6 across multiple library calls without reloading
|
||||
- GCC uses `MOVE.L Dreg, D1` (register-to-register) where SAS/C would reload from stack
|
||||
- GCC may use `LEA (xxx).L, A0` or `MOVEA.L (xxx).L, A0` for address loads
|
||||
|
||||
### Position-Independent Code (`-fPIC`)
|
||||
|
||||
```asm
|
||||
; GCC -fPIC: PC-relative indirection through GOT-like table
|
||||
LEA _GLOBAL_OFFSET_TABLE_(PC), A4 ; A4 = GOT base
|
||||
MOVEA.L (_SysBase@GOT)(A4), A6 ; load SysBase via GOT slot
|
||||
JSR -$C6(A6) ; AllocMem
|
||||
```
|
||||
|
||||
When `-fPIC` is enabled, globals are accessed through a GOT (Global Offset Table) similar to ELF shared libraries. This pattern uses `A4` as the GOT base register and `LEA xxx(PC), A4` at function entry.
|
||||
|
||||
---
|
||||
|
||||
## C++ Support — What It Means for RE
|
||||
|
||||
### Global Constructors and Destructors
|
||||
|
||||
GCC 2.95.x emits two arrays for C++ global object initialization:
|
||||
|
||||
```
|
||||
__CTOR_LIST__ format:
|
||||
┌──────────────────────┐
|
||||
│ count (N) │ __CTOR_LIST__[0]
|
||||
├──────────────────────┤
|
||||
│ constructor_1 │ function pointer
|
||||
├──────────────────────┤
|
||||
│ constructor_2 │
|
||||
├──────────────────────┤
|
||||
│ ... │
|
||||
├──────────────────────┤
|
||||
│ 0x00000000 │ Terminator (NULL)
|
||||
└──────────────────────┘
|
||||
|
||||
__DTOR_LIST__ — identical format for destructors.
|
||||
```
|
||||
|
||||
**In disassembly**:
|
||||
```asm
|
||||
; The startup code processes __CTOR_LIST__ before calling main():
|
||||
_do_global_ctors:
|
||||
MOVEA.L #__CTOR_LIST__, A0 ; A0 = ctor array
|
||||
MOVE.L (A0)+, D0 ; D0 = count
|
||||
SUBQ.L #1, D0
|
||||
BMI.S .done
|
||||
|
||||
.ctor_loop:
|
||||
MOVEA.L (A0)+, A1 ; A1 = ctor function pointer
|
||||
JSR (A1) ; call ctor
|
||||
DBRA D0, .ctor_loop
|
||||
.done:
|
||||
RTS
|
||||
```
|
||||
|
||||
**RE importance**: If you see `__CTOR_LIST__` in the symbol table or a constructor-processing loop in the startup code, the binary was compiled with GCC and likely contains C++ code. SAS/C does not use this mechanism.
|
||||
|
||||
### Vtable Layout (GCC 2.95.x m68k C++)
|
||||
|
||||
See [cpp_vtables_reversing.md](../cpp_vtables_reversing.md) for the complete GCC C++ vtable/RTTI layout. Key points for compiler identification:
|
||||
- Vtable symbol naming: `_ZTV6Window` (GCC mangled)
|
||||
- RTTI pointer at `vtable[-1]`
|
||||
- `offset_to_top` at `vtable[-2]`
|
||||
- C++ name mangling follows GCC 2.95 conventions (different from StormC++)
|
||||
|
||||
---
|
||||
|
||||
## Optimization Level Fingerprints
|
||||
|
||||
| Level | Flag | Binary Characteristics |
|
||||
|---|---|---|
|
||||
| **-O0** | Default | Every variable on stack. No register allocation across statements. Full `LINK A6` frame. `MOVE.L D0, -4(A6)` / `MOVE.L -4(A6), D0` store-reload pairs. |
|
||||
| **-O1** | `-O` | Basic register allocation. Dead code removed. Constants folded. `MOVEQ` for small values. Redundant stack traffic eliminated. |
|
||||
| **-O2** | `-O2` | Aggressive CSE (common subexpression elimination). Loop invariants hoisted. `-fomit-frame-pointer` implied. Loop induction variable optimization. |
|
||||
| **-Os** | `-Os` | `-O2` but favoring smaller code. May use `BSR` instead of inlining. `DBRA` loops preferred over unrolled sequences. |
|
||||
| **-O3** | `-O3` | Function inlining (`-finline-functions`). `__builtin_memcpy` expansion. Aggressive loop unrolling. |
|
||||
|
||||
**How to identify**:
|
||||
- **-O0**: Distinctive store-immediate-reload pattern. Look for `MOVE.L D0, -N(A6)` followed immediately by `MOVE.L -N(A6), D0` — the compiler stores then reloads the same value.
|
||||
- **-O2+**: Variables stay in registers across compound statements. The `LINK A6` instruction is absent in most functions.
|
||||
- **-O3**: You'll find expanded inline code where a function call would normally appear. Look for repeated code blocks with slightly different register assignments.
|
||||
|
||||
### Tail-Call Optimization
|
||||
|
||||
GCC aggressively applies tail-call optimization:
|
||||
|
||||
```asm
|
||||
; Instead of:
|
||||
BSR _helper_func
|
||||
RTS
|
||||
|
||||
; GCC generates:
|
||||
BRA _helper_func ; JMP to helper — no return, no stack growth
|
||||
```
|
||||
|
||||
The `BRA` to another function (not a local label) is GCC's tail-call signature. SAS/C rarely does this.
|
||||
|
||||
---
|
||||
|
||||
## Startup Code — libnix vs clib2 vs ixemul
|
||||
|
||||
### libnix Startup (Most Common)
|
||||
|
||||
```asm
|
||||
; libnix gcrt0.S — minimal startup:
|
||||
_start:
|
||||
MOVEA.L 4.W, A6 ; SysBase
|
||||
JSR ___startup_SysBase ; store SysBase, init libnix internals
|
||||
|
||||
; Open dos.library
|
||||
LEA .dosname(PC), A1
|
||||
MOVEQ #0, D0
|
||||
JSR -$228(A6) ; OpenLibrary (LVO differs by build)
|
||||
|
||||
; Parse CLI args
|
||||
JSR ___parse_args ; sets up __argc, __argv globals
|
||||
|
||||
; Call main()
|
||||
JSR _main
|
||||
|
||||
; Exit
|
||||
MOVE.L D0, -(SP)
|
||||
JSR ___exit
|
||||
|
||||
.dosname: .asciz "dos.library"
|
||||
```
|
||||
|
||||
**Finding `main()`**: Locate `_start`, find the `JSR _main` call. In GCC/libnix binaries, the `_main` symbol is typically preserved even without debug info, because the startup code must reference it.
|
||||
|
||||
### ixemul Startup (Unix-like)
|
||||
|
||||
ixemul provides a much richer Unix-like environment. The startup code is substantially larger and includes `__init_env`, `__parse_shell_args`, and signal setup. ixemul binaries require `ixemul.library` at runtime — a unique dependency that strongly identifies the binary.
|
||||
|
||||
---
|
||||
|
||||
## Same C Function — GCC Output
|
||||
|
||||
```asm
|
||||
; CountWords() — GCC 2.95.3, -O2, -fomit-frame-pointer:
|
||||
; C prototype: ULONG CountWords(CONST_STRPTR str)
|
||||
|
||||
_CountWords:
|
||||
MOVEM.L D2-D3, -(SP) ; save only D2-D3 (no LINK, no A2-A6)
|
||||
|
||||
MOVEQ #0, D2 ; D2 = count
|
||||
MOVEQ #0, D3 ; D3 = in_word
|
||||
|
||||
MOVEA.L $0C(SP), A0 ; A0 = str (arg at SP+12, after saved regs)
|
||||
|
||||
BRA.S .L2
|
||||
|
||||
.L5:
|
||||
CMPI.B #' ', (A0) ; compare immediate to memory — GCC style
|
||||
BEQ.S .L3
|
||||
CMPI.B #'\t', (A0)
|
||||
BEQ.S .L3
|
||||
CMPI.B #'\n', (A0)
|
||||
BEQ.S .L3
|
||||
|
||||
TST.B D3
|
||||
BNE.S .L4
|
||||
ADDQ.L #1, D2
|
||||
MOVEQ #1, D3
|
||||
BRA.S .L4
|
||||
|
||||
.L3:
|
||||
MOVEQ #0, D3 ; in_word = 0
|
||||
|
||||
.L4:
|
||||
ADDQ.L #1, A0 ; str++
|
||||
|
||||
.L2:
|
||||
TST.B (A0)
|
||||
BNE.S .L5
|
||||
|
||||
MOVE.L D2, D0 ; return count
|
||||
MOVEM.L (SP)+, D2-D3
|
||||
RTS
|
||||
```
|
||||
|
||||
**GCC-specific observations**:
|
||||
1. **No `LINK` instruction** — frame pointer omitted. Arg accessed as `$0C(SP)` (SP + saved regs + return address).
|
||||
2. **`CMPI.B #' ', (A0)`** — compare-immediate-to-memory instruction. GCC uses `CMPI` where SAS/C uses `MOVEQ`+`CMP`. This is more compact (one instruction vs two).
|
||||
3. **Minimal register save** — only `D2-D3` saved (two registers actually used). SAS/C would save 9 (or at minimum D2-D3 but with LINK).
|
||||
4. **`BRA.S .L4`** — unconditional branch to common `str++` code. GCC's optimizer merges the increment code.
|
||||
5. **SP-relative argument access** — `$0C(SP)` instead of `$08(A5)`. This changes as the stack grows/shrinks within the function.
|
||||
|
||||
**SAS/C comparison (same function)**:
|
||||
| Aspect | SAS/C | GCC |
|
||||
|---|---|---|
|
||||
| Frame setup | `LINK A5, #-$08` + `MOVEM.L D2-D3, -(SP)` | `MOVEM.L D2-D3, -(SP)` only |
|
||||
| First char compare | `MOVEQ #' ', D0` / `CMP.B (A0), D0` | `CMPI.B #' ', (A0)` |
|
||||
| Arg access | `$08(A5)` — stable throughout function | `$0C(SP)` — changes if SP moves |
|
||||
| Total instructions | 28 (varies by optimization) | 25 |
|
||||
| Code size | ~52 bytes | ~48 bytes |
|
||||
|
||||
---
|
||||
|
||||
## Named Antipatterns
|
||||
|
||||
### "The Unix Hunk Assumption" — Confusing `.text` with CODE
|
||||
|
||||
```asm
|
||||
; WRONG: treating .text hunk as just "code" and ignoring PC-relative data:
|
||||
; If you see this and think "that's just a weird instruction":
|
||||
LEA .LC0(PC), A0
|
||||
; ... but .LC0 is actually a string embedded in .text:
|
||||
.LC0: DC.B "Hello", 0
|
||||
; These two are in the SAME hunk. IDA may not split them properly.
|
||||
```
|
||||
|
||||
**Fix**: After loading a GCC binary in IDA, search for `LEA xxx(PC), A0` patterns and check if `xxx` resolves to ASCII data. If so, convert the bytes at `xxx` to a string type. For strings that follow a function's `RTS` instruction, create a separate data segment in the `.text` hunk area.
|
||||
|
||||
### "The Missing Frame" — Assuming Every Function Has LINK
|
||||
|
||||
```asm
|
||||
; WRONG: looking for LINK/UNLK to find function boundaries
|
||||
; GCC function with no frame pointer:
|
||||
_myfunc:
|
||||
MOVEM.L D2-D4, -(SP)
|
||||
; ... 200 lines of code ...
|
||||
MOVEM.L (SP)+, D2-D4
|
||||
RTS
|
||||
; If you search for LINK, you'll never find this function's boundary
|
||||
```
|
||||
|
||||
**Fix**: Function boundaries in GCC are marked by `RTS` (return) instructions. A GCC function can start at any address after a previous `RTS`/`RTE`/`ILLEGAL`/`JMP` that terminates execution flow. Use IDA's auto-analysis or Ghidra's function detection, which look for `RTS` boundaries.
|
||||
|
||||
### "The A6 Confusion" — GCC Frame Pointer vs Library Base
|
||||
|
||||
```asm
|
||||
; CRITICAL: A6 plays TWO roles in GCC binaries:
|
||||
; Role 1: Frame pointer (when -fno-omit-frame-pointer)
|
||||
; Role 2: Library base (during JSR -$XXX(A6) calls)
|
||||
;
|
||||
; WRONG: seeing LINK A6 and thinking A6 is the exec base:
|
||||
_func:
|
||||
LINK A6, #-$14 ; A6 = FRAME POINTER here
|
||||
MOVEM.L D2, -(SP)
|
||||
; ...
|
||||
MOVEA.L (_DOSBase).L, A6 ; A6 = DOS BASE now (overwrites frame ptr!)
|
||||
JSR -$2A(A6) ; Read() via DOS base
|
||||
; After JSR, A6 is NO LONGER VALID as frame pointer or library base
|
||||
; GCC will RELOAD A6 from global before next library call
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pitfalls & Common Mistakes
|
||||
|
||||
### 1. Misidentifying `-fomit-frame-pointer` Code as Hand-Written Assembly
|
||||
|
||||
```asm
|
||||
; GCC -O2 output can look surprisingly like hand-optimized asm:
|
||||
MOVEM.L D2/A2, -(SP)
|
||||
LEA .LC0(PC), A0 ; string reference
|
||||
MOVEA.L (_DOSBase).L, A6
|
||||
MOVE.L (A1), D1
|
||||
JSR -$2A(A6)
|
||||
; The combination of PC-relative string + SP-relative access + per-function save
|
||||
; looks like hand-crafted code. It's just GCC -O2.
|
||||
```
|
||||
|
||||
### 2. Missing `__CTOR_LIST__` Means Missing C++ Globals
|
||||
|
||||
If the binary has `__CTOR_LIST__` / `__DTOR_LIST__` but you don't trace them, you'll miss global C++ objects that execute code before `main()` runs. These constructors can allocate memory, open resources, or register callbacks — essential for understanding program behavior.
|
||||
|
||||
### 3. Tail-Call Optimization Confusion
|
||||
|
||||
```asm
|
||||
; You might incorrectly identify function boundaries here:
|
||||
_funcA:
|
||||
; ... code ...
|
||||
BRA _funcB ; THIS IS A TAIL CALL, not the end of funcA
|
||||
; _funcB inherits funcA's stack frame and returns directly to funcA's caller
|
||||
; The call graph should show: caller → funcA → funcB (not two parallel calls)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Software Known to Be GCC-Compiled
|
||||
|
||||
| Application | Compiler | RE Clues |
|
||||
|---|---|---|
|
||||
| **AmigaAMP** | GCC 2.95.x | `.text`/.`data` hunks; PC-relative strings; libnix startup; plugin architecture via `dlopen`-like mechanism |
|
||||
| **ScummVM (Amiga port)** | GCC 6.x (bebbo) | Modern GCC codegen; large `.text` hunk; C++ vtables with GCC mangling |
|
||||
| **Miami TCP/IP** | GCC 2.95.x | Mixed C/asm; `libnix` startup; `__CTOR_LIST__` for global initializers |
|
||||
| **AmiTCP** | GCC 2.7.x | Early GCC codegen; less aggressive optimization; no tail-call |
|
||||
| **Various 1996–2000 ports** | GCC 2.95.x (GeekGadgets) | Unix-to-Amiga ports; often ixemul-dependent; `.text` hunk naming |
|
||||
| **MUI 3.x custom classes** | Various, including GCC | C++ vtables need GCC-specific handling; BOOPSI dispatch patterns |
|
||||
|
||||
---
|
||||
|
||||
## Historical Context
|
||||
|
||||
GCC on Amiga arrived relatively late. While Lattice/SAS C dominated the late 1980s, the **GeekGadgets** project (1995) brought a complete GCC-based Unix-like environment to AmigaOS, including GCC 2.7.x and later 2.95.x. This opened the door for Unix software ports and attracted developers who preferred GCC's familiar GNU toolchain.
|
||||
|
||||
Key timeline:
|
||||
- **1995**: GeekGadgets — first usable GCC for AmigaOS (2.7.2)
|
||||
- **1996**: GCC 2.95.3 — stable, well-tested, becomes the standard
|
||||
- **2000s**: Various GCC 3.x/4.x ports (limited adoption due to code size)
|
||||
- **2015–present**: bebbo's GCC 6.5 cross-compiler — modern GCC for retro development
|
||||
|
||||
GCC's PC-relative addressing is a fundamental design difference from SAS/C. It stems from GCC's Unix heritage where position-independent code (PIC) is essential for shared libraries. On AmigaOS, PC-relative code has the practical benefit that the `.text` hunk can be loaded anywhere without relocation — the HUNK loader doesn't need to patch string references.
|
||||
|
||||
The A6 frame pointer choice (rather than A5) comes from the System V m68k ABI, which designated A6 as the frame pointer. GCC followed this convention because the m68k backend was shared across all m68k targets (Sun, HP, Amiga, Atari).
|
||||
|
||||
---
|
||||
|
||||
## Modern Analogies
|
||||
|
||||
| GCC 2.95.x Concept | Modern Equivalent | Notes |
|
||||
|---|---|---|
|
||||
| `-fomit-frame-pointer` | Default in modern compilers (`-O2` on x86-64 omits RBP) | Same tradeoff: faster code vs harder debugging |
|
||||
| PC-relative string addressing | `-fpic` code on modern ELF systems | Same principle: load-time relocation avoidance |
|
||||
| `__CTOR_LIST__` / `__DTOR_LIST__` | `.init_array` / `.fini_array` sections in ELF | Same purpose: global constructor/destructor registration; modern ELF is more structured |
|
||||
| `libnix` minimal runtime | Newlib / picolibc for embedded systems | Both provide compact C runtime for constrained environments |
|
||||
| `ixemul` Unix emulation | Cygwin / MSYS2 DLL (Unix-on-Windows) | Both provide Unix API layer on top of non-Unix kernel |
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
**Q: How do I tell GCC 2.95.x from GCC 6.x (bebbo) in a binary?**
|
||||
A: GCC 2.95.x uses gcc-specific HUNK_SYMBOL patterns (`.Lxxx` local labels). GCC 6.x with bebbo's toolchain uses `vasm`/`vlink` which generate `CODE`/`DATA` hunk names (Amiga standard, not `.text`). GCC 6.x also applies more aggressive optimizations — if you see heavy loop unrolling and auto-vectorization patterns on m68k, it's modern GCC.
|
||||
|
||||
**Q: Why are there no `__CTOR_LIST__` entries in my GCC binary?**
|
||||
A: `__CTOR_LIST__` only exists if the binary uses C++ with global objects, or if compiled with `-finit-priority` in C. Pure C programs without global constructors won't have it.
|
||||
|
||||
**Q: How do I find `main()` in a stripped GCC binary?**
|
||||
A: Search for libnix startup signature: `MOVE.L 4.W, A6` / `JSR ___startup_SysBase`. The `JSR` after `dos.library` open is `_main`. Even in stripped binaries, the startup code is typically at the beginning of `.text` and the call pattern is consistent.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [13_toolchain/gcc_amiga.md](../../../13_toolchain/gcc_amiga.md) — GCC setup and compilation
|
||||
- [compiler_fingerprints.md](../../compiler_fingerprints.md) — Quick compiler identification
|
||||
- [cpp_vtables_reversing.md](../cpp_vtables_reversing.md) — GCC C++ vtable layout and RTTI
|
||||
- [startup_code.md](../../../04_linking_and_libraries/startup_code.md) — libnix/clib2 startup internals
|
||||
- *bebbo's amiga-gcc*: https://codeberg.org/bebbo/amiga-gcc
|
||||
- *GeekGadgets*: GCC 2.95 for AmigaOS (archived documentation)
|
||||
- See also: [sasc.md](sasc.md), [vbcc.md](vbcc.md) — compare with other compilers
|
||||
153
05_reversing/static/compilers/lattice_c.md
Normal file
153
05_reversing/static/compilers/lattice_c.md
Normal file
|
|
@ -0,0 +1,153 @@
|
|||
[← Home](../../../README.md) · [Reverse Engineering](../../README.md) · [Static Analysis](../README.md) · [Compilers](README.md)
|
||||
|
||||
# Lattice C 3.x/4.x — Reverse Engineering Field Manual
|
||||
|
||||
## Overview
|
||||
|
||||
**Lattice C** (versions 3.x–4.x, 1985–1989) is the direct predecessor to SAS/C. When SAS Institute acquired the Lattice C product line in 1988, they rebranded version 5.0 as "SAS/C". Lattice C 3.x and 4.x binaries represent the first generation of commercial C compilers for AmigaOS. Their code generation is recognizably similar to SAS/C but with less aggressive optimization and some distinct early patterns.
|
||||
|
||||
Key constraints:
|
||||
- **The transition point**: Lattice C 3.x → 4.x → SAS/C 5.x form a continuous evolution. Code from 3.x looks noticeably "early" — simpler register allocation, less peephole optimization, longer function prologues.
|
||||
- **LINK A5 + D2-D5/A2-A3 save** — Lattice C 3.x typically saves fewer registers than SAS/C (D2-D5 + A2-A3, 6 registers total) but more than Aztec C (5 regs, data only).
|
||||
- **Startup code evolution** — Lattice C 3.x's `lc.o` startup is simpler than SAS/C's `c.o` — may not handle Workbench launches correctly, may not support `argc`/`argv` parsing.
|
||||
- **Hunk names**: `CODE`, `DATA`, `BSS` (same as SAS/C — established this convention)
|
||||
|
||||
```asm
|
||||
; Lattice C 3.x function prologue (less aggressive than SAS/C):
|
||||
_func:
|
||||
LINK A5, #-$14
|
||||
MOVEM.L D2-D5/A2-A3, -(SP) ; 4 data + 2 address = 6 registers
|
||||
; Compare: SAS/C saves D2-D7/A2-A4 (9 registers)
|
||||
; Compare: Aztec C saves D3-D7 only (5 registers, data only)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Binary Identification — Lattice C vs SAS/C
|
||||
|
||||
| Criterion | Lattice C 3.x | Lattice C 4.x | SAS/C 5.x/6.x |
|
||||
|---|---|---|---|
|
||||
| **Register save** | D2-D5, A2-A3 (6 regs) | D2-D6, A2-A3 (7 regs) | D2-D7, A2-A4 (9 regs) |
|
||||
| **D6/D7 usage** | Rarely used | Sometimes used | Frequently used |
|
||||
| **Peephole optimization** | Minimal | Moderate | Aggressive |
|
||||
| **MOVEQ for small values** | Inconsistent | Common | Always |
|
||||
| **Stack frame** | LINK A5 always | LINK A5 always | LINK A5 always |
|
||||
| **Library calls** | `JSR -$XXX(A6)` | `JSR -$XXX(A6)` | `JSR -$XXX(A6)` |
|
||||
| **Startup** | `lc.o` (simpler) | `lc.o` (improved) | `c.o` (full-featured) |
|
||||
| **Era** | 1985–1987 | 1987–1989 | 1988–1996 |
|
||||
|
||||
### Evolutionary Markers
|
||||
|
||||
The Lattice→SAS/C evolution is visible in the binary:
|
||||
|
||||
1. **Register save set grows** — 6→7→9 registers as the optimizer learned to use more registers effectively
|
||||
2. **MOVEQ adoption** — Lattice 3.x uses `MOVE.L #0, D0`; Lattice 4.x uses `MOVEQ #0, D0`; SAS/C always uses MOVEQ
|
||||
3. **Library call density** — Lattice 3.x loads A6 before every single library call; SAS/C may reuse A6 across calls
|
||||
4. **Stack frame size** — Lattice 3.x often allocates oversized frames (locals * sizeof(LONG) rounded up to nice boundary)
|
||||
|
||||
---
|
||||
|
||||
## Historical Context
|
||||
|
||||
Lattice, Inc. was an early cross-platform compiler vendor. Their C compiler for the Amiga was the first commercially viable option, shipping in 1985. Commodore itself used Lattice C for some system development before adopting SAS/C.
|
||||
|
||||
Key timeline:
|
||||
- **1985**: Lattice C 3.0 — first commercial Amiga C compiler
|
||||
- **1986**: Lattice C 3.1 — improved optimizer, bug fixes
|
||||
- **1987**: Lattice C 4.0 — major update, AmigaOS 1.2 support
|
||||
- **1988**: SAS Institute acquires Lattice C product line
|
||||
- **1989**: Rebranded as SAS/C 5.0
|
||||
|
||||
Any binary from 1985–1989 is likely Lattice C. After 1989, the brand transitioned to SAS/C, though Lattice C was still sold through existing channels for a time.
|
||||
|
||||
Software likely compiled with Lattice C:
|
||||
- Commodore's early Amiga utilities (1985–1986)
|
||||
- Early third-party tools like `DiskMon`, `CLImate`, `Memacs`
|
||||
- Amiga 1000 launch-era software
|
||||
- Early versions of `ARP` (AmigaDOS Replacement Project) components
|
||||
- Early `WShell` / `ZShell` versions
|
||||
|
||||
---
|
||||
|
||||
## Same C Function — Lattice C Output
|
||||
|
||||
```asm
|
||||
; CountWords() — Lattice C 4.x:
|
||||
; (Notably simpler than SAS/C — less aggressive optimizer)
|
||||
|
||||
_CountWords:
|
||||
LINK A5, #-$08
|
||||
MOVEM.L D2-D4/A2, -(SP) ; 4 regs saved (D2-D4 + A2)
|
||||
; Note: A2 saved even though it's not used — Lattice C saves a fixed set
|
||||
|
||||
MOVEQ #0, D2 ; D2 = count
|
||||
MOVEQ #0, D3 ; D3 = in_word
|
||||
|
||||
MOVEA.L $08(A5), A0 ; A0 = str
|
||||
|
||||
BRA .loop_test ; Lattice C uses BRA (long), not BRA.S
|
||||
|
||||
.loop_body:
|
||||
MOVE.L #' ', D0 ; MOVE.L for char constant (should be MOVEQ!)
|
||||
CMP.B (A0), D0
|
||||
BEQ .not_word ; BEQ (long), not BEQ.S
|
||||
|
||||
MOVE.L #'\t', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ .not_word
|
||||
|
||||
MOVE.L #'\n', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ .not_word
|
||||
|
||||
TST.B D3
|
||||
BNE .next_char
|
||||
|
||||
ADDQ.L #1, D2
|
||||
MOVEQ #1, D3
|
||||
BRA .next_char
|
||||
|
||||
.not_word:
|
||||
MOVEQ #0, D3
|
||||
|
||||
.next_char:
|
||||
ADDQ.L #1, A0
|
||||
|
||||
.loop_test:
|
||||
TST.B (A0)
|
||||
BNE .loop_body
|
||||
|
||||
MOVE.L D2, D0
|
||||
MOVEM.L (SP)+, D2-D4/A2
|
||||
UNLK A5
|
||||
RTS
|
||||
```
|
||||
|
||||
**Lattice C observations**:
|
||||
1. **`MOVE.L #' ', D0`** instead of `MOVEQ #' ', D0` — Lattice C doesn't always use MOVEQ for constants that fit in 8 bits. This wastes 2 bytes and 4 cycles per constant load.
|
||||
2. **`BRA`/`BEQ`/`BNE`** (long, 4-byte) instead of `BRA.S`/`BEQ.S`/`BNE.S` (short, 2-byte) — Lattice C's branch target distance calculation is conservative.
|
||||
3. **A2 saved but unused** — Lattice C saves a fixed register set rather than analyzing which registers are actually needed.
|
||||
|
||||
---
|
||||
|
||||
## Differences from SAS/C — Summary
|
||||
|
||||
```
|
||||
Lattice C 3.x/4.x → SAS/C 5.x/6.x improvements visible in disassembly:
|
||||
✓ MOVEQ substituted for MOVE.L #small_const
|
||||
✓ BRA.S/BEQ.S/BNE.S used where target is within 8-bit range
|
||||
✓ Dead register saves eliminated (per-function save analysis)
|
||||
✓ Common subexpression elimination (CSE) more aggressive
|
||||
✓ Loop induction variables kept in registers, not on stack
|
||||
✓ Struct copy inlined as MOVE.L (A0)+, (A1)+ for small structs
|
||||
✓ Tail-call optimization in some cases (rare but present in SAS/C 6.x)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [compiler_fingerprints.md](../../compiler_fingerprints.md) — Quick identification
|
||||
- *Lattice C 3.x/4.x Manual* (archive.org)
|
||||
- See also: [sasc.md](sasc.md) — SAS/C (direct successor)
|
||||
- See also: [aztec_c.md](aztec_c.md) — contemporary competitor
|
||||
1006
05_reversing/static/compilers/sasc.md
Normal file
1006
05_reversing/static/compilers/sasc.md
Normal file
File diff suppressed because it is too large
Load diff
321
05_reversing/static/compilers/stormc.md
Normal file
321
05_reversing/static/compilers/stormc.md
Normal file
|
|
@ -0,0 +1,321 @@
|
|||
[← Home](../../../README.md) · [Reverse Engineering](../../README.md) · [Static Analysis](../README.md) · [Compilers](README.md)
|
||||
|
||||
# StormC / StormC++ — Reverse Engineering Field Manual
|
||||
|
||||
## Overview
|
||||
|
||||
**StormC** (by Haage & Partner, 1996–2000) was a native Amiga IDE with integrated C and C++ compiler. It occupies a unique position in Amiga RE: it's the **only native Amiga compiler with full C++ support** (exceptions, RTTI, STL), yet its C++ ABI is **incompatible with GCC's C++ ABI** — StormC uses its own name mangling, vtable layout, and exception handling mechanism. For the RE practitioner, StormC binaries look like SAS/C at the C level (A5 frame pointer, absolute strings) but diverge significantly when C++ constructs appear.
|
||||
|
||||
Key constraints:
|
||||
- **A5 frame pointer** — StormC follows the SAS/C convention (`LINK A5, #-N`), making C-level code appear SAS/C-compatible.
|
||||
- **C++ ABI is unique** — StormC's name mangling, vtable layout, RTTI, and exception handling differ from both GCC and the Itanium C++ ABI. StormC++ libraries cannot link with GCC C++ code.
|
||||
- **Native IDE integration** — StormC embeds project metadata (source paths, build configs) in the binary via custom HUNK_DEBUG entries.
|
||||
- **PowerPC support (v3+)** — StormC 3.0+ could target PPC (WarpOS/PowerUP). PPC code sections use a different hunk type and appear as foreign code in 68k disassembly.
|
||||
- **Hunk names**: `CODE`, `DATA` (Amiga standard, SAS/C-compatible)
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Source (.c / .cpp)"
|
||||
SRC["C/C++ source"]
|
||||
end
|
||||
subgraph "StormC IDE"
|
||||
IDE["Project Manager"]
|
||||
EDITOR["GUI Editor"]
|
||||
COMPILER["StormC Compiler"]
|
||||
LINKER["StormLink"]
|
||||
end
|
||||
subgraph "Binary Output"
|
||||
HUNK["Amiga HUNK executable"]
|
||||
CODE["CODE hunk — 68k code"]
|
||||
PPC["PPC_CODE (optional, v3+)"]
|
||||
SYMBOL["HUNK_SYMBOL — StormC mangled names"]
|
||||
DEBUG["HUNK_DEBUG — project metadata + line info"]
|
||||
end
|
||||
|
||||
SRC --> IDE
|
||||
IDE --> COMPILER --> LINKER
|
||||
LINKER --> HUNK
|
||||
HUNK --> CODE & PPC
|
||||
HUNK --> SYMBOL & DEBUG
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Binary Identification
|
||||
|
||||
### C-Level Code (SAS/C-Compatible)
|
||||
|
||||
At the C level, StormC output is deliberately SAS/C-compatible:
|
||||
|
||||
```asm
|
||||
; StormC C function (looks identical to SAS/C):
|
||||
_my_c_function:
|
||||
LINK A5, #-$10 ; A5 frame pointer
|
||||
MOVEM.L D2-D7/A2-A4, -(SP) ; 9-reg save — same as SAS/C
|
||||
; ... function body ...
|
||||
MOVEM.L (SP)+, D2-D7/A2-A4
|
||||
UNLK A5
|
||||
RTS
|
||||
```
|
||||
|
||||
**How to distinguish from SAS/C**: Without symbols, C-level StormC code is nearly indistinguishable from SAS/C. Look for:
|
||||
1. **Project metadata in HUNK_DEBUG** — StormC embeds source file paths and project names
|
||||
2. **StormC-specific startup code** — different library open sequence
|
||||
3. **C++ markers** — if you see C++ constructs with non-GCC mangling, it's StormC
|
||||
|
||||
### C++ Level — Where StormC Diverges
|
||||
|
||||
StormC++ uses its own ABI:
|
||||
|
||||
```asm
|
||||
; StormC++ virtual method dispatch (different from GCC!):
|
||||
MOVEA.L obj_ptr(FP), A0 ; A0 = object pointer
|
||||
MOVE.L (A0), D0 ; D0 = vtable pointer (at offset +$00)
|
||||
MOVEA.L D0, A1
|
||||
JSR $XX(A1) ; call virtual method at vtable[XX]
|
||||
; No offset_to_top, no RTTI pointer before vtable!
|
||||
```
|
||||
|
||||
### Name Mangling — StormC vs GCC
|
||||
|
||||
| Construct | StormC++ Mangled | GCC 2.95.x Mangled |
|
||||
|---|---|---|
|
||||
| `Window::Draw()` | `Draw__6Window` | `Draw__6Window` — *can be identical for simple cases* |
|
||||
| `Window::SetPos(int,int)` | `SetPos__6WindowFii` | `SetPos__6Windowii` (no `F`) |
|
||||
| `operator new(unsigned long)` | `__nw__FUl` | `__nw__FUl` (may match) |
|
||||
| Constructor | `__ct__6Window` | `__6Window` (GCC uses different prefix) |
|
||||
| Destructor | `__dt__6Window` | `__6Window` (GCC encodes in vtable entry type) |
|
||||
|
||||
**Key disambiguator**: StormC prepends `__ct__` and `__dt__` to constructor/destructor names. GCC encodes the constructor/destructor type in the vtable offset, not the name.
|
||||
|
||||
### Vtable Layout Differences
|
||||
|
||||
```
|
||||
GCC 2.95.x vtable layout: StormC++ vtable layout:
|
||||
┌──────────────────────┐ ┌──────────────────────┐
|
||||
│ offset_to_top = 0 │ vtable[-2] │ (no offset_to_top) │
|
||||
├──────────────────────┤ ├──────────────────────┤
|
||||
│ RTTI pointer │ vtable[-1] │ (RTTI pointer or 0) │
|
||||
├──────────────────────┤ ← vptr ├──────────────────────┤ ← vptr
|
||||
│ virtual destructor │ vtable[0] │ first virtual method │ vtable[0]
|
||||
├──────────────────────┤ ├──────────────────────┤
|
||||
│ virtual method 1 │ vtable[1] │ second virtual meth │ vtable[1]
|
||||
├──────────────────────┤ ├──────────────────────┤
|
||||
│ ... │ │ ... │
|
||||
└──────────────────────┘ └──────────────────────┘
|
||||
```
|
||||
|
||||
> [!WARNING]
|
||||
> StormC++ vtables start at the first virtual function. There is no `offset_to_top` field at `vtable[-2]`. If your struct layout assumes the GCC layout, all vtable offsets will be wrong by 2 entries.
|
||||
|
||||
---
|
||||
|
||||
## Library Call Patterns
|
||||
|
||||
StormC uses SAS/C-compatible library calls:
|
||||
|
||||
```asm
|
||||
MOVEA.L _DOSBase, A6 ; load from global
|
||||
MOVE.L filename, D1
|
||||
MOVE.L #MODE_OLDFILE, D2
|
||||
JSR -$1E(A6) ; Open()
|
||||
```
|
||||
|
||||
The difference is in **how** `_DOSBase` is initialized — StormC's startup code may use different symbol naming or library open order.
|
||||
|
||||
---
|
||||
|
||||
## C++ Exception Handling
|
||||
|
||||
StormC 3.0+ supports C++ exceptions with a custom unwinding mechanism:
|
||||
|
||||
```asm
|
||||
; Exception handling setup (simplified):
|
||||
; StormC registers an exception handler frame on the stack:
|
||||
PEA .exception_handler ; handler address
|
||||
MOVE.L ___current_exception_frame, -(SP)
|
||||
MOVE.L SP, ___current_exception_frame
|
||||
|
||||
; ... try block code ...
|
||||
|
||||
; Cleanup on normal exit:
|
||||
MOVE.L (SP)+, ___current_exception_frame
|
||||
ADDQ.L #4, SP ; discard handler
|
||||
|
||||
.exception_handler:
|
||||
; Exception recovery code
|
||||
```
|
||||
|
||||
This is structurally different from GCC's exception handling (which uses DWARF2 unwinding tables or setjmp/longjmp). In the binary, look for a global `___current_exception_frame` variable being pushed/popped in functions with try/catch blocks.
|
||||
|
||||
---
|
||||
|
||||
## Startup Code
|
||||
|
||||
StormC's startup differs from SAS/C `c.o`:
|
||||
|
||||
```asm
|
||||
; StormC startup (typical pattern):
|
||||
_start:
|
||||
MOVEA.L 4.W, A6 ; SysBase
|
||||
MOVE.L A6, ___SysBase
|
||||
|
||||
; StormC may use different library open order:
|
||||
JSR ___OpenStormCLibs ; open DOS, Intuition, etc.
|
||||
|
||||
; C++ static constructors (if C++ code present):
|
||||
JSR ___init_cpp ; calls __ct__ functions
|
||||
|
||||
; Call main()
|
||||
BSR _main
|
||||
|
||||
; C++ static destructors:
|
||||
JSR ___exit_cpp ; calls __dt__ functions
|
||||
|
||||
; Cleanup
|
||||
JSR ___CloseStormCLibs
|
||||
MOVE.L D0, ___ReturnCode
|
||||
RTS
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Same C Function — StormC Output
|
||||
|
||||
```asm
|
||||
; CountWords() — StormC 4.0, C mode, -O2:
|
||||
; (Structurally identical to SAS/C — StormC's C codegen mirrors SAS/C)
|
||||
|
||||
_CountWords:
|
||||
LINK A5, #-$08
|
||||
MOVEM.L D2-D3, -(SP)
|
||||
|
||||
MOVEQ #0, D2 ; count
|
||||
MOVEQ #0, D3 ; in_word
|
||||
|
||||
MOVEA.L $08(A5), A0 ; str (arg1 at A5+8)
|
||||
|
||||
BRA.S .loop_test
|
||||
|
||||
.loop_body:
|
||||
MOVEQ #' ', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ.S .not_word
|
||||
MOVEQ #'\t', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ.S .not_word
|
||||
MOVEQ #'\n', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ.S .not_word
|
||||
|
||||
TST.B D3
|
||||
BNE.S .next_char
|
||||
ADDQ.L #1, D2
|
||||
MOVEQ #1, D3
|
||||
BRA.S .next_char
|
||||
|
||||
.not_word:
|
||||
MOVEQ #0, D3
|
||||
|
||||
.next_char:
|
||||
ADDQ.L #1, A0
|
||||
|
||||
.loop_test:
|
||||
TST.B (A0)
|
||||
BNE.S .loop_body
|
||||
|
||||
MOVE.L D2, D0
|
||||
MOVEM.L (SP)+, D2-D3
|
||||
UNLK A5
|
||||
RTS
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Named Antipatterns
|
||||
|
||||
### "The GCC-C++ Assumption" — Using GCC Vtable Layout on StormC++
|
||||
|
||||
Applying GCC vtable offsets to StormC++ binaries will misidentify every virtual method by 2 slots and miss `offset_to_top`. Always determine the C++ compiler BEFORE applying vtable layout assumptions.
|
||||
|
||||
### "The StormC-C++ Silence" — Missing C++ in What Looks Like C
|
||||
|
||||
StormC C code looks identical to SAS/C. But if the binary was compiled with StormC++ (C++ mode), global constructors run before `main()`, exceptions unwind, and objects have vtables — all invisible at the C codegen level. Check `HUNK_SYMBOL` for `__ct__` and `__dt__` prefixes.
|
||||
|
||||
---
|
||||
|
||||
## Pitfalls & Common Mistakes
|
||||
|
||||
### 1. Linking StormC++ Objects with GCC Code
|
||||
|
||||
StormC++ and GCC C++ share NO ABI compatibility. Name mangling, vtable layout, RTTI, and exception handling all differ. If you're patching a binary and need to add C++ code, you must use the same compiler that produced the original.
|
||||
|
||||
### 2. PowerPC Code Sections (StormC 3+)
|
||||
|
||||
```asm
|
||||
; In the HUNK structure, PPC code appears as a separate hunk type:
|
||||
; If your disassembler only handles HUNK_CODE ($03E9), PPC sections
|
||||
; will appear as unknown hunk types. StormC PPC sections use custom
|
||||
; hunk types for WarpOS/PowerUP code.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Software Known to Be StormC-Compiled
|
||||
|
||||
| Application | Version | Notes |
|
||||
|---|---|---|
|
||||
| **AmigaWriter** | StormC 3/4 | Word processor with C++ document model |
|
||||
| **Various MUI applications** | StormC 3+ | MUI class wizard generated C++ classes |
|
||||
| **WarpOS/PowerUP software** | StormC 3+ | Mixed 68k/PPC binaries — check for PPC hunk sections |
|
||||
| **Late-era Amiga games** | StormC 3/4 | C++ game engines with 68k-optimized inner loops |
|
||||
|
||||
---
|
||||
|
||||
## Historical Context
|
||||
|
||||
StormC arrived at a pivotal moment: the Amiga market had shrunk, SAS/C was abandoned after 6.58, and developers wanted a modern IDE. Haage & Partner (known for AmigaOS 3.5/3.9) positioned StormC as the future of native Amiga development. It offered features no other native compiler had: a GUI debugger, C++ with exceptions, PowerPC support, and integrated MUI class generation.
|
||||
|
||||
However, the PowerPC era fragmented quickly (WarpOS vs PowerUP), the Amiga market collapsed, and Haage & Partner ceased operations. StormC 4.0 was the last release. Today, GCC (cross-compilation) and VBCC dominate, but StormC binaries remain in the wild — particularly late-1998 to 2000 era C++ applications.
|
||||
|
||||
---
|
||||
|
||||
## Modern Analogies
|
||||
|
||||
| StormC Concept | Modern Equivalent |
|
||||
|---|---|
|
||||
| Native IDE with built-in compiler | Xcode with Clang, Visual Studio with MSVC |
|
||||
| Proprietary C++ ABI | MSVC's C++ ABI (incompatible with Itanium/GCC ABI) |
|
||||
| Mixed 68k/PPC binaries | Universal Binaries (Intel + ARM) on macOS |
|
||||
| MUI class generation wizard | Qt Creator's class wizard, Visual Studio's MFC wizard |
|
||||
|
||||
---
|
||||
|
||||
## FPGA / Emulation Impact
|
||||
|
||||
- **PowerPC sections**: If the binary contains PPC hunk sections (StormC 3+), a 68k-only FPGA core cannot execute them — a PowerPC emulation layer (like WarpOS emulation in WinUAE) is required.
|
||||
- **C++ exception handling**: StormC's custom exception mechanism uses a linked list of exception frames on the stack — the 68000 core must support `MOVE.L SP, An` correctly (standard ISA support, no issues).
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
**Q: How do I tell StormC from SAS/C if both use LINK A5?**
|
||||
A: Check `HUNK_SYMBOL` — SAS/C uses `_name` with `=APS` stabs; StormC uses `__ct__`/`__dt__` prefixes for C++. Check `HUNK_DEBUG` for project metadata strings (StormC embeds source paths). Check startup code — StormC's `___OpenStormCLibs` vs SAS/C's `_OpenLibraries`.
|
||||
|
||||
**Q: Can I link StormC objects with SAS/C objects?**
|
||||
A: For C-only code, possibly yes if the calling conventions match. For C++ code, absolutely not — the ABIs are incompatible.
|
||||
|
||||
**Q: Does StormC support `__saveds`?**
|
||||
A: Yes — StormC supports SAS/C calling convention keywords for compatibility: `__saveds`, `__stdargs`, `__reg`, `__interrupt`.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [13_toolchain/stormc.md](../../../13_toolchain/stormc.md) — StormC usage and features
|
||||
- [compiler_fingerprints.md](../../compiler_fingerprints.md) — Quick identification
|
||||
- [cpp_vtables_reversing.md](../cpp_vtables_reversing.md) — C++ vtable layouts (GCC focus — StormC differences noted)
|
||||
- See also: [sasc.md](sasc.md), [gcc.md](gcc.md) — compare with other compilers
|
||||
327
05_reversing/static/compilers/vbcc.md
Normal file
327
05_reversing/static/compilers/vbcc.md
Normal file
|
|
@ -0,0 +1,327 @@
|
|||
[← Home](../../../README.md) · [Reverse Engineering](../../README.md) · [Static Analysis](../README.md) · [Compilers](README.md)
|
||||
|
||||
# VBCC — Reverse Engineering Field Manual
|
||||
|
||||
## Overview
|
||||
|
||||
**VBCC** (Volker Barthelmann's C Compiler) is a portable, retargetable ISO C89 compiler that produces the smallest binaries among Amiga compilers. Its key RE characteristics are: **no frame pointer** (SP-relative access only), **per-function register saves** (only what's actually used), **PC-relative string addressing**, and a distinctive **`__reg()`** calling convention for AmigaOS library calls. VBCC generates clean, tight code that can look deceptively like hand-optimized assembly.
|
||||
|
||||
Key constraints:
|
||||
- **No LINK instruction** — VBCC never uses `LINK A5` or `LINK A6`. Locals are accessed via `$offset(SP)`. Function boundaries are defined by `MOVEM.L ... -(SP)` at entry and `RTS` at exit.
|
||||
- **Minimal register saves** — Unlike SAS/C (9 registers always) or GCC (per-function but often substantial), VBCC saves only the exact registers used. A leaf function with no locals has no prologue at all.
|
||||
- **Tail-call optimization** — VBCC uses `BRA.S` to common epilogue blocks and `BRA` to tail-call other functions more aggressively than any other Amiga compiler.
|
||||
- **`__MERGED` hunks** — VBCC sometimes merges CODE and DATA into a single hunk when the small data model is active.
|
||||
- **Hunk names**: `CODE`, `DATA`, `BSS` (+ optional `__MERGED` for small-data)
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Source (.c)"
|
||||
SRC["C source (C89)"]
|
||||
end
|
||||
subgraph "VBCC Compiler Pipeline"
|
||||
VC["vc (driver)"]
|
||||
VBCCM68K["vbccm68k (code generator)"]
|
||||
VASM["vasm (assembler)"]
|
||||
VLINK["vlink (linker)"]
|
||||
end
|
||||
subgraph "Binary Output"
|
||||
HUNK["Amiga HUNK executable"]
|
||||
CODE["CODE hunk"]
|
||||
DATA["DATA hunk"]
|
||||
MERGED["__MERGED (optional, small-data)"]
|
||||
end
|
||||
|
||||
SRC --> VC
|
||||
VC --> VBCCM68K --> VASM --> VLINK
|
||||
VLINK --> HUNK
|
||||
HUNK --> CODE & DATA
|
||||
HUNK --> MERGED
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Binary Identification — The VBCC Signature
|
||||
|
||||
### Function Prologue — Nothing or Minimal
|
||||
|
||||
```asm
|
||||
; VBCC leaf function (no locals, no calls):
|
||||
_simple_func:
|
||||
; NO prologue at all
|
||||
; ... function body ...
|
||||
RTS
|
||||
|
||||
; VBCC function with locals:
|
||||
_moderate_func:
|
||||
MOVEM.L D2-D3/A2, -(SP) ; saves ONLY the 3 registers used
|
||||
; ... function body ...
|
||||
MOVEM.L (SP)+, D2-D3/A2
|
||||
RTS
|
||||
|
||||
; VBCC large function:
|
||||
_large_func:
|
||||
MOVEM.L D2-D5/A2-A3, -(SP) ; per-function exact save
|
||||
LEA -$80(SP), SP ; allocate stack frame
|
||||
; ... function body ...
|
||||
LEA $80(SP), SP
|
||||
MOVEM.L (SP)+, D2-D5/A2-A3
|
||||
RTS
|
||||
```
|
||||
|
||||
**Key differentiator from GCC**: Both VBCC and GCC use per-function register saves, but VBCC's code is consistently tighter. VBCC uses `BRA.S label` to share common epilogue/cleanup code, where GCC duplicates it. VBCC uses `MOVEQ` and `ADDQ` aggressively for small constants.
|
||||
|
||||
### String Addressing
|
||||
|
||||
Like GCC, VBCC uses PC-relative string addressing:
|
||||
|
||||
```asm
|
||||
LEA .str_hello(PC), A0
|
||||
JSR _Printf
|
||||
|
||||
.str_hello: DC.B "Hello", $0A, 00
|
||||
```
|
||||
|
||||
### The `__reg()` Calling Convention — Unique VBCC Fingerprint
|
||||
|
||||
VBCC's `__reg()` keyword places C variables in named CPU registers without inline assembly:
|
||||
|
||||
```c
|
||||
/* VBCC source: */
|
||||
BPTR __reg("d0") MyOpen(__reg("d1") CONST_STRPTR name,
|
||||
__reg("d2") LONG accessMode);
|
||||
```
|
||||
|
||||
```asm
|
||||
; Generated code for Open("foo", MODE_OLDFILE):
|
||||
MOVEA.L _DOSBase, A6
|
||||
LEA .str_foo(PC), A0
|
||||
MOVE.L A0, D1 ; name → D1
|
||||
MOVEQ #1002, D2 ; MODE_OLDFILE → D2
|
||||
JSR -$1E(A6) ; Open() LVO
|
||||
```
|
||||
|
||||
**No other Amiga compiler generates this exact register-to-argument mapping without inline assembly stubs.** The `__reg()` assignments are visible only through the register usage pattern — functions that take args in specific registers (D1, D2, D3, etc.) without stack access.
|
||||
|
||||
---
|
||||
|
||||
## Library Call Patterns
|
||||
|
||||
VBCC library calls are compact and direct:
|
||||
|
||||
```asm
|
||||
; VBCC library call — minimal code:
|
||||
MOVEA.L (_DOSBase).L, A6 ; load library base (absolute with relocation)
|
||||
MOVE.L fh(SP), D1 ; arg from stack
|
||||
MOVE.L buf(SP), D2
|
||||
MOVE.L len(SP), D3
|
||||
JSR -$2A(A6) ; Read()
|
||||
|
||||
; Return value check:
|
||||
TST.L D0
|
||||
BMI.S .error
|
||||
```
|
||||
|
||||
VBCC differs from SAS/C here: SAS/C would load args through A5-relative offsets (`$08(A5)`). VBCC uses SP-relative offsets. Since SP may change within the function (pushing args), VBCC carefully maintains SP offsets.
|
||||
|
||||
### `#pragma amicall` — VBCC Library Call Pragmas
|
||||
|
||||
```c
|
||||
#pragma amicall(DOSBase, 0x1E, Open(d1, d2))
|
||||
// VBCC pragma format is simpler than SAS/C:
|
||||
// - Library base name (identifier, not a string)
|
||||
// - LVO in hex
|
||||
// - Function name with argument register list
|
||||
```
|
||||
|
||||
In the binary, these pragmas produce the same `JSR -$XXX(A6)` patterns as any other compiler — the pragma just controls argument register assignment.
|
||||
|
||||
---
|
||||
|
||||
## Optimization Patterns
|
||||
|
||||
VBCC prioritizes **code density** over raw speed. Its signatures:
|
||||
|
||||
| Pattern | VBCC Style | SAS/C Equivalent |
|
||||
|---|---|---|
|
||||
| **Shared epilogue** | `BRA.S .epilogue` from multiple exit points | Duplicated epilogue at each return |
|
||||
| **Tail calls** | `BRA _other_func` (discard own frame first) | `JSR _other_func` / `RTS` |
|
||||
| **Small constant loading** | `MOVEQ #N, Dn` whenever possible | `MOVE.L #N, Dn` for some small values |
|
||||
| **Stack frame** | `LEA -$N(SP), SP` (when frame > 32K or variable) | `LINK A5, #-N` |
|
||||
| **Loop termination** | `DBRA Dn, loop` (when counter fits in 16 bits) | `SUBQ.L #1, Dn` / `BNE loop` |
|
||||
|
||||
### Cross-Module Optimization
|
||||
|
||||
VBCC supports cross-module optimization — when linking, `vlink` can reorder and merge functions across `.o` files. In the binary, this means function layout may NOT match source file order, and small static functions may be inlined at link time.
|
||||
|
||||
---
|
||||
|
||||
## Same C Function — VBCC Output
|
||||
|
||||
```asm
|
||||
; CountWords() — VBCC, -O -speed:
|
||||
; C prototype: ULONG CountWords(CONST_STRPTR str)
|
||||
|
||||
_CountWords:
|
||||
MOVEM.L D2-D3, -(SP) ; only D2, D3 needed
|
||||
|
||||
MOVEQ #0, D2 ; D2 = count
|
||||
MOVEQ #0, D3 ; D3 = in_word
|
||||
|
||||
MOVEA.L $0C(SP), A0 ; A0 = str (arg at SP + 12)
|
||||
|
||||
BRA.S .loop_test
|
||||
|
||||
.loop_body:
|
||||
CMPI.B #' ', (A0) ; *str == ' '?
|
||||
BEQ.S .not_word
|
||||
CMPI.B #'\t', (A0)
|
||||
BEQ.S .not_word
|
||||
CMPI.B #'\n', (A0)
|
||||
BEQ.S .not_word
|
||||
|
||||
TST.B D3
|
||||
BNE.S .next_char
|
||||
|
||||
ADDQ.L #1, D2 ; count++
|
||||
MOVEQ #1, D3 ; in_word = TRUE
|
||||
BRA.S .next_char
|
||||
|
||||
.not_word:
|
||||
MOVEQ #0, D3 ; in_word = FALSE
|
||||
|
||||
.next_char:
|
||||
ADDQ.L #1, A0 ; str++
|
||||
|
||||
.loop_test:
|
||||
TST.B (A0)
|
||||
BNE.S .loop_body
|
||||
|
||||
MOVE.L D2, D0 ; return count
|
||||
MOVEM.L (SP)+, D2-D3
|
||||
RTS
|
||||
```
|
||||
|
||||
**VBCC-specific observations**:
|
||||
1. **`MOVEM.L D2-D3, -(SP)`** — only 2 registers saved. Minimal.
|
||||
2. **`BRA.S .loop_test`** — unconditional branch to loop condition at top.
|
||||
3. **`BRA.S .next_char`** — shared increment code reached from two paths.
|
||||
4. **Identical to GCC** in this function because the function is simple enough that optimization differences don't show. For more complex functions (with multiple return paths, struct access, switch statements), VBCC's shared-epilogue and tail-call patterns emerge.
|
||||
|
||||
```
|
||||
Cross-Compiler Comparison (CountWords, bytes of code):
|
||||
SAS/C -O2: ~52 bytes (LINK A5 + 9-reg save + epilogue overhead)
|
||||
GCC -O2: ~48 bytes (no LINK, minimal save, CMPI.B)
|
||||
VBCC -speed:~46 bytes (no LINK, minimal save, aggressive BRA sharing)
|
||||
DICE C: ~48 bytes (similar to VBCC)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Named Antipatterns
|
||||
|
||||
### "The Missing Frame Trap" — Assuming LINK for Function Boundaries
|
||||
|
||||
```asm
|
||||
; VBCC function boundaries are RTS-delimited, not LINK-delimited.
|
||||
; If your IDA script searches for LINK to find functions, you'll miss ALL VBCC functions.
|
||||
|
||||
; VBCC function entry could be any of:
|
||||
; 1. MOVEM.L ..., -(SP) (most common)
|
||||
; 2. LEA -$XX(SP), SP (large frame)
|
||||
; 3. First instruction after previous RTS (leaf functions)
|
||||
; 4. TST.L D0 / BEQ ... (function that doesn't save any regs)
|
||||
```
|
||||
|
||||
### "The Register Ghost" — `__reg()` Without Symbols
|
||||
|
||||
Without source-level `__reg()` declarations, VBCC function arguments appear to use arbitrary register assignments. This can look like a custom ABI. The pattern is actually the VBCC `__reg()` convention encoded via `<proto/*.h>` headers during compilation.
|
||||
|
||||
---
|
||||
|
||||
## Pitfalls & Common Mistakes
|
||||
|
||||
### 1. Confusing VBCC and GCC Output
|
||||
|
||||
Both omit frame pointers and use per-function saves. Disambiguate by:
|
||||
- **Hunk names**: VBCC uses `CODE`/`DATA`; GCC uses `.text`/`.data` (usually)
|
||||
- **`__MERGED` hunk**: VBCC-specific — no other compiler produces this
|
||||
- **Function naming**: VBCC emits names like `_funcname`; GCC emits `.Lxxx` internal labels
|
||||
- **BRA density**: VBCC has more `BRA.S` instructions (shared epilogues); GCC tends to duplicate code
|
||||
|
||||
### 2. Misreading SP-Relative Offsets
|
||||
|
||||
```asm
|
||||
; At function entry (after MOVEM.L D2-D3, -(SP)):
|
||||
; SP points 8 bytes below entry SP (D2 and D3 pushed)
|
||||
; Arg1 is at $0C(SP) (8 bytes regs + 4 bytes return addr)
|
||||
; But after LEA -$10(SP), SP:
|
||||
; Arg1 is now at $1C(SP) (8 regs + 4 ret + 16 locals)
|
||||
; The offset CHANGES when SP is modified — unlike A5-relative offsets
|
||||
```
|
||||
|
||||
Track every `LEA +/-$N(SP), SP` instruction — each one shifts ALL subsequent SP-relative offsets.
|
||||
|
||||
---
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Software Known to Be VBCC-Compiled
|
||||
|
||||
| Application | Notes |
|
||||
|---|---|
|
||||
| **ScummVM (some ports)** | Large C codebase; VBCC's strict C89 catches portability issues |
|
||||
| **Modern Amiga utilities** | Many 2000s+ CLI tools use VBCC for small binary size |
|
||||
| **AROS system components** | VBCC is a supported AROS build compiler |
|
||||
| **MUI 5 custom classes** | Tight BOOPSI dispatch benefits from VBCC's register allocation |
|
||||
| **AmigaOS 4 system libraries** | Hyperion's SDK supports VBCC for OS4 development |
|
||||
|
||||
---
|
||||
|
||||
## Historical Context
|
||||
|
||||
VBCC was created by Volker Barthelmann in the mid-1990s as a lightweight alternative to GCC's growing complexity. While GCC was the "heavy" compiler with C++ support, VBCC targeted developers who wanted a fast, standards-compliant C89 compiler that produced small binaries.
|
||||
|
||||
Unlike SAS/C (commercial, dead since 1996) and GCC (open source but complex), VBCC occupies a unique niche: actively maintained, free for personal use, with a clean codebase. Its `vlink` linker and `vasm` assembler companion tools form a complete toolchain that has become the de facto standard for modern Amiga development alongside GCC bebbo.
|
||||
|
||||
---
|
||||
|
||||
## Modern Analogies
|
||||
|
||||
| VBCC Concept | Modern Equivalent |
|
||||
|---|---|
|
||||
| `__reg()` | `register ... asm("d0")` in GCC/Clang (GNU C extension) |
|
||||
| Per-function register save | Clang's `-O2` with aggressive register allocation |
|
||||
| Cross-module optimization | LTO (Link-Time Optimization) in modern compilers |
|
||||
| `vlink` with `vasm` | LLVM's integrated `lld` linker with `clang` |
|
||||
| Config-driven target system | LLVM's `TargetRegistry` and target description files |
|
||||
|
||||
---
|
||||
|
||||
## FPGA / Emulation Impact
|
||||
|
||||
- **No `LINK`/`UNLK`**: VBCC binaries don't use these instructions, reducing test coverage needs for frame pointer ops on FPGA cores.
|
||||
- **Aggressive `LEA` for stack frames**: `LEA -$N(SP), SP` must correctly update SP in a single instruction — verify your FPGA core handles LEA with SP destination correctly.
|
||||
- **Cross-module optimization**: No runtime impact; all inlining and merging happens at link time.
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
**Q: How do I distinguish VBCC from GCC output?**
|
||||
A: Check hunk names — VBCC uses `CODE`/`DATA`, GCC typically uses `.text`/`.data`. Check for `__MERGED` hunk (VBCC-only). Check internal labels: VBCC uses `_name` format; GCC uses `.Lxxx`. Check BRA density — VBCC shares epilogues more aggressively.
|
||||
|
||||
**Q: Does VBCC support C++?**
|
||||
A: No. If you find C++ constructs (vtables, `new`/`delete`, name mangling), it's NOT VBCC.
|
||||
|
||||
**Q: Can VBCC and GCC object files be mixed?**
|
||||
A: No. They use different calling conventions for internal runtime functions. Link the entire project with one compiler. Assembly (`vasm`) can be mixed with VBCC C code using `vlink`.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [13_toolchain/vbcc.md](../../../13_toolchain/vbcc.md) — VBCC usage and `__reg()` details
|
||||
- [compiler_fingerprints.md](../../compiler_fingerprints.md) — Quick identification
|
||||
- [13_toolchain/vasm_vlink.md](../../../13_toolchain/vasm_vlink.md) — vasm/vlink toolchain
|
||||
- VBCC homepage: http://sun.hasenbraten.de/vbcc/
|
||||
- See also: [sasc.md](sasc.md), [gcc.md](gcc.md) — compare with other compilers
|
||||
Loading…
Add table
Add a link
Reference in a new issue