mirror of
https://github.com/alfishe/amiga-bootcamp.git
synced 2026-06-12 16:16:28 +00:00
More information. Especially about RE and AREXX automation
This commit is contained in:
parent
05452c6c12
commit
0aafceb31e
30 changed files with 8826 additions and 66 deletions
|
|
@ -96,7 +96,10 @@ print(f'{sys.argv[1]}: {names.get(tag, f\"UNKNOWN ({tag:#010x})\")}')
|
|||
|---|---|---|---|---|
|
||||
| `$3E9` | 1001 | `HUNK_CODE` | `[tag] [size_longs] [code_bytes × size×4]` | **Machine-code section.** The loader allocates RAM, copies the bytes, then applies any `HUNK_RELOC32` that follows. Holds 68k instructions — never data. |
|
||||
| `$3EA` | 1002 | `HUNK_DATA` | `[tag] [size_longs] [data_bytes × size×4]` | **Initialized read/write data.** Global variables with non-zero values, string literals, jump tables, etc. Any embedded pointers to other hunks require `HUNK_RELOC32` fixups. |
|
||||
| `$3EB` | 1003 | `HUNK_BSS` | `[tag] [size_longs]` *(no data bytes)* | **Uninitialized data (zero-fill).** Only the size is stored — no bytes in the file. The loader calls `AllocMem(..., MEMF_CLEAR)`. A 64 KB zero array costs 4 bytes on disk. |
|
||||
| `$3EB` | 1003 | `HUNK_BSS` | `[tag] [size_longs]` *(no data bytes)* | **Uninitialized data (zero-fill).** Only the size is stored — no bytes in the file. The loader calls `AllocMem(..., MEMF_CLEAR)`. A 64 KB zero array costs 4 bytes on disk. **No relocation follows BSS hunks** — there are no initialized values to fix up. |
|
||||
|
||||
> [!NOTE]
|
||||
> **HUNK_DATA trailing space:** Data hunks have been observed with trailing `ds.width` variables that do not contribute to the local hunk length declared in the `HUNK_DATA` header, but are accounted for in the `HUNK_HEADER` size table. The OS loader allocates based on the header size table, so the extra space is available at runtime even though the hunk's own `num_longs` field doesn't include it.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -112,7 +115,7 @@ print(f'{sys.argv[1]}: {names.get(tag, f\"UNKNOWN ({tag:#010x})\")}')
|
|||
| `$3F7` | 1015 | `HUNK_DREL32` | — | WORD (16-bit) | **Compact 32-bit reloc.** Same semantics as `HUNK_RELOC32` but count, hunk index, and offsets are stored as 16-bit WORDs, halving the table size. Valid only when all hunk offsets fit in 16 bits (hunk < 64 KB). Generated by BLink. |
|
||||
| `$3F8` | 1016 | `HUNK_DREL16` | — | WORD (16-bit) | Compact 16-bit reloc with WORD-sized fields. Very rare. |
|
||||
| `$3F9` | 1017 | `HUNK_DREL8` | — | WORD (16-bit) | Compact 8-bit reloc with WORD-sized fields. Essentially unused. |
|
||||
| `$3FC` | 1020 | `HUNK_RELOC32SHORT` | — | WORD (16-bit) | **Compact absolute 32-bit reloc with WORD offsets.** Semantically identical to `HUNK_RELOC32` with WORD fields. Default output of vasm/vlink when all offsets fit in 16 bits. Preferred over `HUNK_DREL32` in OS 3.x-era tools. |
|
||||
| `$3FC` | 1020 | `HUNK_RELOC32SHORT` | — | WORD (16-bit) | **Compact absolute 32-bit reloc with WORD offsets.** Semantically identical to `HUNK_RELOC32` with WORD fields. Default output of vasm/vlink when all offsets fit in 16 bits. Preferred over `HUNK_DREL32` in OS 3.x-era tools. **After the table, if the total WORD count is odd, a padding WORD (`$0000`) restores longword alignment** before the next hunk record. |
|
||||
| `$3FD` | 1021 | `HUNK_RELRELOC32` | — | LONG (32-bit) | **PC-relative 32-bit reloc.** Patch: `*(LONG*)(base+off) += target_base − (base+off+4)`. Used by GCC `-fPIC` and PIC shared libraries. |
|
||||
| `$3FE` | 1022 | `HUNK_ABSRELOC16` | — | LONG (32-bit) | **Absolute 16-bit fixup.** Patches a UWORD with the low 16 bits of the target's absolute address. Required for `MOVE.W #abs_addr,Dn` patterns. Rare. |
|
||||
|
||||
|
|
@ -135,7 +138,26 @@ print(f'{sys.argv[1]}: {names.get(tag, f\"UNKNOWN ({tag:#010x})\")}')
|
|||
| Hex | Dec | Constant | Payload | Description |
|
||||
|---|---|---|---|---|
|
||||
| `$3F0` | 1008 | `HUNK_SYMBOL` | `[tag] { [namelen_longs] [name_bytes…] [value] } … [0]` | **Local symbol table.** Maps label names → offsets within this hunk. Consumed by MonAm, wack, IDA Pro. Terminated by `namelen=0`. |
|
||||
| `$3F1` | 1009 | `HUNK_DEBUG` | `[tag] [size_longs] [format_tag] [data_bytes…]` | **Opaque debug block.** The leading `format_tag` longword identifies the format: `$3D415053` = SAS/C stabs; `$3D474343` = GCC stabs; `$3D574152` = Warp/Storm C. See [`hunk_debug_info.md`](hunk_debug_info.md). |
|
||||
| `$3F1` | 1009 | `HUNK_DEBUG` | `[tag] [size_longs] [format_tag] [data_bytes…]` | **Opaque debug block.** The leading `format_tag` longword identifies the debug data encoding — see [Debug Format Tags](#debug-format-tags) below for the full table. See [`hunk_debug_info.md`](hunk_debug_info.md) for stabs record layout. |
|
||||
|
||||
#### Debug Format Tags
|
||||
|
||||
The first longword after the size field in a `HUNK_DEBUG` block is a 4-character ASCII **format tag** identifying the debug data encoding:
|
||||
|
||||
| Format tag (hex) | ASCII | Compiler / Assembler | Description |
|
||||
|---|---|---|---|
|
||||
| `$3D415053` | `=APS` | SAS/C 6.x | SAS/C stabs debug symbols |
|
||||
| `$3D474343` | `=GCC` | GCC (m68k-amigaos) | GCC stabs debug symbols |
|
||||
| `$3D574152` | `=WAR` | Storm C / Warp C | Storm C / Warp C debug symbols |
|
||||
| `$48434C4E` | `HCLN` | Devpac | Devpac assembler — source file name record |
|
||||
| `$48454144` | `HEAD` | Devpac | Devpac assembler — start of source file marker |
|
||||
| `$4C494E45` | `LINE` | Generic / multiple | Line-number debug info (used by several assemblers) |
|
||||
| `$4F444546` | `ODEF` | BAsm | BAsm assembler debug symbols |
|
||||
| `$4F505453` | `OPTS` | SAS/C | SAS/C compiler options record |
|
||||
| `$5A4D4147` | `ZMAG` | GNU tools (ld) | GNU ZMAGIC debug hunk (full 6-byte tag `ZMAGIC`) |
|
||||
|
||||
> [!NOTE]
|
||||
> `dos.library` v31+ treats **any** hunk ID whose lower 29 bits exceed `HUNK_ABSRELOC16` (`$3FE` / 1022) as a `HUNK_DEBUG` block and silently skips it — unless bit 29 is set, which causes `ERROR_BAD_HUNK`. This allows compilers to emit custom debug hunk types that newer loaders ignore transparently.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -163,6 +185,30 @@ print(f'{sys.argv[1]}: {names.get(tag, f\"UNKNOWN ({tag:#010x})\")}')
|
|||
| `$3FB` | 1019 | `HUNK_INDEX` | **Symbol index for `HUNK_LIB`.** A packed string table plus a per-unit map of exported symbol names → unit byte offsets. Lets the linker locate a function without scanning every object in the archive. Always immediately follows the `HUNK_LIB` it describes. |
|
||||
|
||||
|
||||
### Hunk ID Bit Masking
|
||||
|
||||
After the initial `HUNK_HEADER`, the OS loader (`dos.library`) only examines the **lower 29 bits** of each hunk type longword. The upper bits encode memory placement flags (see [Memory Placement Flags](#memory-placement-flags) below). This has two important consequences:
|
||||
|
||||
1. **Unknown hunk types become debug.** `dos.library` v31+ treats any hunk ID whose lower 29 bits exceed `HUNK_ABSRELOC16` (`$3FE` / 1022) as a `HUNK_DEBUG` block and silently skips it. This allows compilers to emit custom debug hunk types that newer loaders ignore without error.
|
||||
2. **Bit 29 set → load failure.** If a hunk ID has bit 29 set but is not a recognized code/data/BSS type, the loader **fails** with `ERROR_BAD_HUNK` rather than treating it as debug.
|
||||
|
||||
```c
|
||||
/* Typical loader logic (dos.library v31+) */
|
||||
hunk_id = read_uint32(f);
|
||||
if (hunk_id == HUNK_HEADER) { ... } /* first hunk only — full 32 bits */
|
||||
/* After HUNK_HEADER: mask memory flags, check range */
|
||||
hunk_id &= 0x3FFFFFFF; /* keep lower 30 bits */
|
||||
if (hunk_id > HUNK_ABSRELOC16) { /* unknown type */
|
||||
if (hunk_id & HUNKF_FAST) /* bit 29 set? */
|
||||
return ERROR_BAD_HUNK; /* hard error */
|
||||
/* else: treat as HUNK_DEBUG — skip silently */
|
||||
}
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> The masking (typically `& 0x3FFFFFFF`) keeps 30 bits, not 29 as the simplified description suggests. The practical rule: after `HUNK_HEADER`, memory flag bits are stripped before the type code comparison.
|
||||
|
||||
|
||||
### Memory Placement Flags
|
||||
|
||||
|
||||
|
|
@ -534,3 +580,4 @@ block-beta
|
|||
- NDK39: `dos/doshunks.h` — hunk type constants
|
||||
- http://amigadev.elowar.com/read/ADCD_2.1/Libraries_Manual_guide/node01E0.html
|
||||
- Community reference: http://sun.hasenbraten.de/vlink/release/vlink.pdf (HUNK format appendix)
|
||||
- http://amiga-dev.wikidot.com/file-format:hunk — HUNK format reference with Python parsing code, debug format tags, and dos.library v31+ compatibility notes
|
||||
|
|
|
|||
|
|
@ -18,12 +18,37 @@ This section provides a systematic methodology for reverse engineering AmigaOS e
|
|||
|---|---|
|
||||
| [methodology.md](methodology.md) | Step-by-step RE workflow for Amiga HUNK binaries |
|
||||
| [ida_setup.md](ida_setup.md) | IDA Pro configuration for 68k/Amiga analysis |
|
||||
| [ghidra_setup.md](ghidra_setup.md) | Ghidra configuration for 68k/Amiga analysis & decompilation |
|
||||
| [compiler_fingerprints.md](compiler_fingerprints.md) | Compiler identification by code patterns |
|
||||
| [library_reconstruction.md](library_reconstruction.md) | Reconstructing unknown library JMP tables |
|
||||
| [static/code_vs_data_disambiguation.md](static/code_vs_data_disambiguation.md) | Distinguishing code bytes from data — IDA/Ghidra workflows |
|
||||
| [patching_techniques.md](patching_techniques.md) | Surgical binary patching methods |
|
||||
| [unpacking_and_decrunching.md](unpacking_and_decrunching.md) | Executable unpacking, decruncher architecture, and manual extraction |
|
||||
| [case_studies/](case_studies/) | Real-world RE walkthroughs |
|
||||
| [case_studies/ramdrive_device.md](case_studies/ramdrive_device.md) | ramdrive.device RE walkthrough |
|
||||
|
||||
### Per-Compiler Reverse Engineering — Binary Field Manuals
|
||||
|
||||
| File | Topic |
|
||||
|---|---|
|
||||
| [static/compilers/README.md](static/compilers/README.md) | Compiler identification flowchart and comparison matrix |
|
||||
| [static/compilers/sasc.md](static/compilers/sasc.md) | **SAS/C 5.x/6.x** — LINK A5 + 9-reg save, absolute strings, `_LibBase` globals |
|
||||
| [static/compilers/gcc.md](static/compilers/gcc.md) | **GCC 2.95.x** — `.text` hunk, A6 frame pointer, PC-relative strings, `__CTOR_LIST__` |
|
||||
| [static/compilers/vbcc.md](static/compilers/vbcc.md) | **VBCC** — No frame pointer, per-function saves, `__reg()`, `__MERGED` hunks |
|
||||
| [static/compilers/stormc.md](static/compilers/stormc.md) | **StormC / StormC++** — SAS/C-compatible C, unique C++ ABI, PPC support |
|
||||
| [static/compilers/aztec_c.md](static/compilers/aztec_c.md) | **Manx Aztec C** — D3-D7 save only (5 regs), D2 scratch, pre-1990 era |
|
||||
| [static/compilers/lattice_c.md](static/compilers/lattice_c.md) | **Lattice C 3.x/4.x** — SAS/C predecessor, simpler optimizer, 6-reg save |
|
||||
| [static/compilers/dice_c.md](static/compilers/dice_c.md) | **DICE C** — No frame pointer, `_mainCRTStartup`, fast compile speed |
|
||||
|
||||
### Language-Specific Reverse Engineering
|
||||
|
||||
| File | Topic |
|
||||
|---|---|
|
||||
| [static/asm68k_binaries.md](static/asm68k_binaries.md) | Hand-written assembly reverse engineering — demos, games, bootblocks |
|
||||
| [static/ansi_c_reversing.md](static/ansi_c_reversing.md) | ANSI C reverse engineering — struct recovery, control flow, library anchoring |
|
||||
| [static/cpp_vtables_reversing.md](static/cpp_vtables_reversing.md) | C++ OOP reverse engineering — vtables, inheritance, RTTI, name mangling |
|
||||
| [static/other_languages.md](static/other_languages.md) | Non-C languages — AMOS, Blitz Basic, Amiga E, Modula-2, FORTH, ARexx |
|
||||
|
||||
## Core Principles
|
||||
|
||||
1. **Know the ABI first** — All library calls are `JSR LVO(A6)`. Before reversing any function, identify which library A6 holds using the `lib_Node.ln_Name` string at `base+$00`.
|
||||
|
|
@ -35,10 +60,10 @@ This section provides a systematic methodology for reverse engineering AmigaOS e
|
|||
|
||||
| Tool | Purpose |
|
||||
|---|---|
|
||||
| IDA Pro 7.x | Primary disassembler and decompiler (Hex-Rays) |
|
||||
| IDA Pro 7.x | Primary static disassembler (no native M68k decompilation) |
|
||||
| IDA Amiga plugin | HUNK loader, HUNK_SYMBOL import |
|
||||
| Ghidra + ghidra-amiga | Powerful disassembler and C-pseudocode decompiler for M68k |
|
||||
| `hunkinfo` | Quick hunk/symbol/reloc dump |
|
||||
| Ghidra + AmigaOS plugin | Free alternative to IDA |
|
||||
| wack / MonAm | On-device debugger |
|
||||
|
||||
## References
|
||||
|
|
|
|||
|
|
@ -4,7 +4,43 @@
|
|||
|
||||
## Overview
|
||||
|
||||
`ramdrive.device` is the Amiga's built-in RAM disk device. It ships in Kickstart ROM and implements the `trackdisk.device`-compatible interface on top of allocated Chip/Fast RAM. Analysing it teaches exec device architecture, IORequest handling, and the device-as-library pattern.
|
||||
`ramdrive.device` is the Amiga's built-in RAM disk device. It provides a RAM-based disk drive (`RAD:`) that can survive a warm reboot (Ctrl-Amiga-Amiga). This makes it an excellent target for reverse engineering to understand **Resident Modules**, **Exec Device Architecture**, and **Memory Survival** techniques.
|
||||
|
||||
Analysing it teaches:
|
||||
- Exec device initialization and the `Resident` structure.
|
||||
- `BeginIO` dispatch logic.
|
||||
- Persistence mechanisms across system resets.
|
||||
|
||||
---
|
||||
|
||||
## Resident Structure (`ROMTag`)
|
||||
|
||||
Like all Amiga libraries and devices, `ramdrive.device` starts with a `struct Resident` (defined in `exec/resident.h`):
|
||||
|
||||
```c
|
||||
struct Resident {
|
||||
UWORD rt_MatchWord; /* $4AFC (RTC_MATCHWORD) */
|
||||
struct Resident *rt_MatchTag; /* Pointer to self */
|
||||
APTR rt_EndSkip; /* Pointer to end of module */
|
||||
UBYTE rt_Flags; /* RTF_AFTERDOS | RTF_COLDBOOT */
|
||||
UBYTE rt_Version; /* Version of module */
|
||||
UBYTE rt_Type; /* NT_DEVICE */
|
||||
BYTE rt_Pri; /* Priority */
|
||||
char *rt_Name; /* "ramdrive.device" */
|
||||
char *rt_IdString; /* ID string */
|
||||
APTR rt_Init; /* Pointer to Init routine */
|
||||
};
|
||||
```
|
||||
|
||||
### Reset Survival Mechanism
|
||||
|
||||
The primary challenge for `ramdrive.device` is ensuring its memory is not reclaimed by the system after a reset.
|
||||
|
||||
1. **Memory Allocation**: When first initialized, it allocates a large block for disk data.
|
||||
2. **Validation**: It writes a "magic cookie" and a checksum at the start of this block.
|
||||
3. **Resident List**: It adds its own `ROMTag` to the `ExecBase->ResModules` list.
|
||||
4. **Warm Reboot**: On a reset, the Exec loader scans memory for `RTC_MATCHWORD` ($4AFC). When it finds the `ramdrive.device` tag, it checks the block's checksum.
|
||||
5. **Re-binding**: If valid, the device re-binds the existing data block instead of allocating a new one.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -22,13 +58,11 @@ for i in range(0, len(rom)-4, 2):
|
|||
rt_matchword = struct.unpack_from(">H", rom, i)[0]
|
||||
rt_matchtag = struct.unpack_from(">I", rom, i+2)[0]
|
||||
rt_name = struct.unpack_from(">I", rom, i+14)[0]
|
||||
# print offset and map rt_name to string
|
||||
# Offset lookup for "ramdrive.device" string
|
||||
print(f"RomTag @ ROM+{i:#x}")
|
||||
EOF
|
||||
```
|
||||
|
||||
The RomTag for `ramdrive.device` has `RT_TYPE=NT_DEVICE` and `RT_NAME="ramdrive.device"`.
|
||||
|
||||
---
|
||||
|
||||
## Device Structure Layout
|
||||
|
|
@ -60,12 +94,12 @@ struct RAMDriveBase {
|
|||
| −30 | `BeginIO` | Queue or execute an IORequest |
|
||||
| −36 | `AbortIO` | Cancel pending IORequest |
|
||||
|
||||
`BeginIO` is the heart of any device driver — it dispatches on `io_Command`.
|
||||
|
||||
---
|
||||
|
||||
## IORequest Command Handling
|
||||
|
||||
`BeginIO` is the heart of the driver. It dispatches on `io_Command`:
|
||||
|
||||
```c
|
||||
void BeginIO(struct IORequest *ior) {
|
||||
struct IOStdReq *io = (struct IOStdReq *)ior;
|
||||
|
|
@ -96,34 +130,38 @@ void rd_Read(struct IOStdReq *io) {
|
|||
|
||||
---
|
||||
|
||||
## Memory Allocation Strategy
|
||||
## Deep Analysis: Checksum Verification
|
||||
|
||||
On initialization, `ramdrive.device` uses `AllocMem`:
|
||||
When disassembling the initialization routine, look for the verification pattern that identifies a valid "surviving" RAM disk:
|
||||
|
||||
```c
|
||||
rdbase->rd_RAMStart = AllocMem(rdbase->rd_RAMSize,
|
||||
MEMF_PUBLIC | MEMF_CLEAR);
|
||||
```asm
|
||||
; Typical checksum verification pattern
|
||||
CheckSum:
|
||||
move.l (a0)+, d1 ; Get magic cookie
|
||||
cmpi.l #$ABCDEF01, d1 ; Verify magic
|
||||
bne.s Invalid
|
||||
move.l #Length, d0
|
||||
Loop:
|
||||
add.l (a0)+, d2 ; Sum up the block
|
||||
dbf d0, Loop
|
||||
cmp.l Expected, d2
|
||||
```
|
||||
|
||||
Later requests can pass `MEMF_CHIP` to force chip RAM allocation (useful for audio/graphics DMA sources).
|
||||
|
||||
---
|
||||
|
||||
## Disassembly Landmarks in IDA
|
||||
|
||||
After loading Kickstart ROM in IDA with M68k + HUNK/ROM loader:
|
||||
|
||||
1. Search for string `"ramdrive.device"` → find RomTag
|
||||
2. `RT_INIT` pointer → initialization function
|
||||
3. `RT_INIT` calls `MakeLibrary` then `AddDevice`
|
||||
4. The device base is stored — follow to find `BeginIO` function
|
||||
5. `BeginIO` switch table → individual command handlers
|
||||
1. **Search for string `"ramdrive.device"`** → finds the `ROMTag`.
|
||||
2. **`RT_INIT` pointer** → points to the initialization function.
|
||||
3. **`RT_INIT` logic** → calls `MakeLibrary` then `AddDevice`.
|
||||
4. **Library Base** → Follow the `rd_Device` base to find the `BeginIO` entry point at offset -30.
|
||||
5. **Switch Table** → `BeginIO` typically uses a jump table (JMP) or a series of `CMPI / BEQ` to dispatch commands.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- NDK39: `exec/devices.h`, `exec/io.h`, `devices/trackdisk.h`
|
||||
- NDK39: `exec/devices.h`, `exec/io.h`, `devices/trackdisk.h`, `exec/resident.h`
|
||||
- [io_requests.md](../../06_exec_os/io_requests.md) — IORequest structure and dispatch
|
||||
- `10_devices/trackdisk_device.md` — TD_* command codes
|
||||
- Kickstart 3.1 ROM dump (required for disassembly)
|
||||
- [IRA Disassembly of ramdrive.device](http://aminet.net/package/dev/asm/ramdrive_src) — Reference for instruction patterns.
|
||||
|
|
|
|||
|
|
@ -151,24 +151,101 @@ Aztec C often uses a different calling convention for internal functions — exa
|
|||
|
||||
---
|
||||
|
||||
## Assembler-Only Code
|
||||
## StormC / StormC++
|
||||
|
||||
Some core library routines and demos are pure assembly. Identifying features:
|
||||
- No compiler prologue pattern
|
||||
- `MOVEM.L` register lists tend to be maximally specified
|
||||
- Copper/blitter setup code appears directly
|
||||
- May use `SECTION` macros instead of implicit hunk ordering
|
||||
Native IDE with C++ support. C-level code mimics SAS/C:
|
||||
|
||||
```asm
|
||||
; StormC C function (SAS/C compatible):
|
||||
LINK A5, #-$10
|
||||
MOVEM.L D2-D7/A2-A4, -(SP)
|
||||
```
|
||||
|
||||
**Distinguishing features:**
|
||||
- HUNK_DEBUG contains project metadata/source paths
|
||||
- C++ methods use custom mangling: `__ct__6Window` (constructor), `__dt__` (destructor)
|
||||
- C++ Vtable layout starts directly at first method (no `offset_to_top`)
|
||||
- May contain `PPC_CODE` hunks for WarpOS binaries
|
||||
|
||||
---
|
||||
|
||||
## Lattice C 3.x/4.x
|
||||
|
||||
The predecessor to SAS/C (1985-1989 era).
|
||||
|
||||
```asm
|
||||
; Lattice C 3.x/4.x:
|
||||
LINK A5, #-$14
|
||||
MOVEM.L D2-D5/A2-A3, -(SP) ; Saves fewer regs than SAS/C
|
||||
```
|
||||
|
||||
**Distinguishing features:**
|
||||
- Saves only 6-7 registers instead of SAS/C's 9.
|
||||
- Uses `MOVE.L #small_val, D0` instead of `MOVEQ`.
|
||||
- Uses long branches (`BRA`, `BEQ`) instead of short branches (`BRA.S`).
|
||||
|
||||
---
|
||||
|
||||
## DICE C
|
||||
|
||||
Lean, fast compiler by Matt Dillon.
|
||||
|
||||
```asm
|
||||
; DICE C:
|
||||
MOVEM.L D2-D4/A2-A3, -(SP) ; Per-function save, no frame pointer
|
||||
```
|
||||
|
||||
**Distinguishing features:**
|
||||
- Extremely similar to VBCC/GCC (no frame pointer).
|
||||
- Entry point is uniquely named `_mainCRTStartup`.
|
||||
- Often uses `ADDQ.L #4, SP` to clean up stack arguments.
|
||||
- Uses `MOVEA.L (_LibBase).L, A6` for library calls.
|
||||
|
||||
---
|
||||
|
||||
## Hand-Written Assembly (Assembler-Only Code)
|
||||
|
||||
Unlike compiler-generated code with predictable prologues and calling sequences, hand-written 68000 assembly (common in demos, games, and bootblocks) is unconstrained.
|
||||
|
||||
**Distinguishing features:**
|
||||
- **No `LINK` or `SUBQ.L #N,SP`** in the entire binary.
|
||||
- **Custom Hardware Base Pointers:** Authors often dedicate a register (typically `A4` or `A5`) to `$DFF000` (custom chip base) for the entire program: `LEA $DFF000, A4`.
|
||||
- **Ad-hoc Calling Conventions:** Parameters passed in arbitrary registers. `A6` might be used as a data pointer rather than the library base.
|
||||
- **Maximally Specified Saves:** `MOVEM.L D0-D7/A0-A6, -(SP)` used aggressively for interrupt handlers or per-routine saves, rather than the compiler's minimal necessary set.
|
||||
- **Self-Modifying Code (SMC):** `MOVE.W #imm, (next_insn+2, PC)` to patch instructions at runtime.
|
||||
- **Hardware Register Banging:** Direct immediate access to `$DFFxxx` (custom chips), `$BFExxx` (CIAA), and `$BFDxxx` (CIAB).
|
||||
- **PC-Relative Data Tables:** `LEA table(PC), An` used for copper lists, sprite data, and audio samples mixed within the `CODE` hunk.
|
||||
|
||||
> [!TIP]
|
||||
> For a deep dive into reversing hand-written Amiga assembly, see the **[Hand-Written Assembly Field Manual](static/asm68k_binaries.md)**.
|
||||
|
||||
### Assembler Toolchain Fingerprints
|
||||
|
||||
Because macro assemblers translate mnemonics 1:1, they lack the rigid calling conventions of C compilers. However, the choice of assembler (and the era it belongs to) can leave subtle forensic clues in the binary:
|
||||
|
||||
| Assembler | Era / Usage | Binary Fingerprints & Output Characteristics |
|
||||
|---|---|---|
|
||||
| **ASM-One** (and ASM-Pro) | 1990s Demoscene Standard | **Literal Translation:** Early versions did not automatically optimize `MOVE.L #0, Dn` to `MOVEQ`.<br>**Section Merging:** Often outputs a single giant `CODE` hunk containing data, copper lists, and BSS because coders frequently omitted `SECTION` directives.<br>**Symbols:** `HUNK_SYMBOL` tables lack the `_` prefix typical of C linkers.<br>**Relocation Ordering:** Unlike external linkers that group `HUNK_RELOC32` arrays strictly by target hunk, ASM-One's single-pass compilation often emits a single massive relocation block at the end of the file in sequential generation order. |
|
||||
| **Seka / K-Seka** | 1980s Early Demoscene | **The Literal Extreme:** Absolutely zero optimization. What you write is what you get.<br>**Compact output:** Often used for bootblocks and 4K intros; does not generate standard Amiga hunks natively unless explicitly coded to do so. |
|
||||
| **Devpac (HiSoft)** | 1980s-90s "Pro" Standard | **Disciplined Hunks:** Devpac encouraged proper `SECTION CODE,CODE` and `SECTION DATA,DATA` usage, resulting in cleanly separated binary hunks.<br>**Optimization:** Featured early peephole optimization (short branches, `MOVEQ`).<br>**Debug Hunks:** Devpac injects proprietary debug structures. Look for `HUNK_DEBUG` ($03F1) blocks containing the `"HCLN"` (HiSoft Compressed Line Numbers) or `"LINE"` ASCII signatures, and unique Devpac-only hunk types like `HUNK_DEXT` ($03F7) and `HUNK_DREL32` ($03F8). |
|
||||
| **PhxAss** | Late 90s Performance | **Aggressive Optimization:** Automatically shrinks `MOVE.L` to `MOVEQ`, and `LEA/JMP` to PC-relative `BSR/BRA` where possible.<br>**Object Linking:** Often output object files linked via `Blink`. `Blink` leaves its own structural fingerprints, strictly ordering `HUNK_RELOC32` offsets in ascending order per target hunk, and cleanly terminating relocation arrays. |
|
||||
| **Barfly** | 1990s High-Speed | Extremely fast. Output binaries are functionally similar to PhxAss, often utilizing external linkers and producing highly optimized instruction sequences. |
|
||||
| **vasm** | Modern Cross-Assembler | Can emulate the syntax and output style of Devpac (`-m68000 -Fhunkexe -phxass`) or ASM-One, making its footprint identical to the legacy assembler it is configured to mimic. |
|
||||
|
||||
---
|
||||
|
||||
## Quick Fingerprint Checklist
|
||||
|
||||
```
|
||||
□ Does function prologue use LINK A5? → SAS/C
|
||||
□ Does function prologue use LINK A5? → SAS/C, StormC (C mode), or Lattice C
|
||||
↳ Saves D2-D5/A2-A3? → Lattice C
|
||||
↳ Has __ct__/__dt__ or project paths?→ StormC
|
||||
↳ Saves D2-D7/A2-A4, absolute refs? → SAS/C
|
||||
□ Does function prologue use LINK A6? → GCC with -fno-omit-frame-pointer
|
||||
□ No LINK at all, just MOVEM.L? → GCC/VBCC (check other patterns)
|
||||
□ PC-relative globals (LEA x(PC))? → GCC -fpic or VBCC
|
||||
□ Absolute globals + HUNK_RELOC32? → SAS/C or GCC without -fpic
|
||||
□ No LINK at all, just MOVEM.L? → GCC, VBCC, or DICE C
|
||||
↳ Entry point is _mainCRTStartup? → DICE C
|
||||
□ PC-relative globals (LEA x(PC))? → GCC -fpic, VBCC, or DICE C
|
||||
□ Absolute globals + HUNK_RELOC32? → SAS/C, StormC, Lattice, or GCC without -fpic
|
||||
□ HUNK_SYMBOL has __main, __exit? → GCC/libnix
|
||||
□ HUNK_SYMBOL has _c_start, _main? → SAS/C
|
||||
```
|
||||
|
|
@ -181,3 +258,11 @@ Some core library routines and demos are pure assembly. Identifying features:
|
|||
- GCC m68k-amigaos port (bebbo): https://github.com/bebbo/amiga-gcc
|
||||
- VBCC manual: http://www.ibaug.de/vbcc/doc/vbcc.html
|
||||
- Aztec C 68k manual (archive.org)
|
||||
- **Amiga ROM Kernel Reference Manual (RKRM): Includes and Autodocs** — Definitive source for standard Commodore Hunk IDs (`HUNK_CODE`, `HUNK_RELOC32`, `HUNK_DEBUG`).
|
||||
- **AmigaDOS Executable Format Documentation** — Details the loader's behavior of skipping unrecognized hunk blocks, which allowed for proprietary debugger extensions.
|
||||
- **HiSoft Devpac Amiga Assembler Manual** — Primary source for understanding `"HCLN"` (HiSoft Compressed Line Numbers) and its proprietary `HUNK_DEXT` / `HUNK_DREL32` structures.
|
||||
- **Amiga Development Wiki (Wikidot)** — Excellent community repository documenting the exact bit-layout of the reverse-engineered `HCLN` compression scheme.
|
||||
- **English Amiga Board (EAB) / Aminet Archives** — Primary historical source for the demoscene evolution of ASM-One, Seka, and PhxAss, including their specific linking behaviors and macro habits.
|
||||
- **[Per-Compiler RE Field Manuals](static/compilers/README.md)** — In-depth binary analysis for each compiler
|
||||
- [SAS/C](static/compilers/sasc.md) · [GCC](static/compilers/gcc.md) · [VBCC](static/compilers/vbcc.md) · [StormC](static/compilers/stormc.md) · [Aztec C](static/compilers/aztec_c.md) · [Lattice C](static/compilers/lattice_c.md) · [DICE C](static/compilers/dice_c.md)
|
||||
- [code_vs_data_disambiguation.md](static/code_vs_data_disambiguation.md) — distinguishing code bytes from data/variables
|
||||
|
|
|
|||
79
05_reversing/ghidra_setup.md
Normal file
79
05_reversing/ghidra_setup.md
Normal file
|
|
@ -0,0 +1,79 @@
|
|||
[← Home](../README.md) · [Reverse Engineering](README.md)
|
||||
|
||||
# Ghidra Setup for Amiga 68k Binaries
|
||||
|
||||
## Requirements
|
||||
|
||||
| Component | Version / Notes |
|
||||
|---|---|
|
||||
| Ghidra | 10.x+ or 11.x+ recommended |
|
||||
| Processor module | M68k — included in Ghidra standard install |
|
||||
| HUNK loader & Amiga extensions | **ghidra-amiga** by BartmanAbyss |
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Install the Amiga Extension
|
||||
|
||||
Ghidra natively supports the M68000 architecture and includes a powerful decompiler for it, but it does not understand the Amiga OS executable format (HUNK) out of the box.
|
||||
|
||||
1. Download the latest release of `ghidra-amiga` from: https://github.com/BartmanAbyss/ghidra-amiga
|
||||
2. Open Ghidra.
|
||||
3. Go to `File → Install Extensions...`
|
||||
4. Click the green `+` (Add extension) button.
|
||||
5. Select the downloaded `.zip` file (do not extract it).
|
||||
6. Restart Ghidra.
|
||||
|
||||
This essential extension provides:
|
||||
- A complete Amiga HUNK format loader.
|
||||
- Custom chipset register definitions mapped to `$DFF000`.
|
||||
- OS library LVO (Library Vector Offset) definitions.
|
||||
- Analyzer scripts specifically for resolving Amiga binaries.
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Importing and Analyzing
|
||||
|
||||
1. Create a new project or open an existing one.
|
||||
2. Select `File → Import File...` and choose your Amiga executable or library.
|
||||
3. The format should automatically be detected as `Amiga Executable` (thanks to the extension).
|
||||
4. Double-click the imported file to open it in the CodeBrowser.
|
||||
5. When prompted to analyze, click **Yes**.
|
||||
6. Ensure the `Amiga` analyzers (provided by the extension) are enabled in the analysis options list before hitting Analyze.
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Decompilation and M68k Specifics
|
||||
|
||||
Unlike IDA Pro (which lacks Hex-Rays support for M68k), **Ghidra's built-in decompiler fully supports the Motorola 68000 family.**
|
||||
|
||||
- The `ghidra-amiga` extension actively assists the decompiler by automatically annotating library calls (like `exec/AllocMem` or `dos/Open`) when it detects jumps to negative offsets on `A6`.
|
||||
- The decompiler will translate these `JSR` instructions directly into C pseudocode function calls with the correct parameters, making it vastly superior for analyzing C/C++ compiled Amiga software.
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Custom Hardware Registers ($DFF000)
|
||||
|
||||
The `ghidra-amiga` extension automatically creates memory blocks for Amiga custom chips and CIA registers.
|
||||
|
||||
1. Go to `Window → Memory Map`. You will see `custom` ($DFF000), `ciaa` ($BFE001), and `ciab` ($BFD000) accurately mapped into the address space.
|
||||
2. The extension automatically defines the Amiga Custom Chip data types.
|
||||
3. When analyzing code that bangs the hardware (e.g., `move.w d0, $096(A4)`), if Ghidra knows `A4` is `$DFF000`, it will automatically format it as `custom->dmacon` in the C pseudocode!
|
||||
4. If it fails to detect the base register automatically, you can manually set the register value by highlighting the start of the function, right-clicking, and selecting `Set Register Values` (or `Ctrl-R`), then defining `A4 = 0xDFF000`.
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Dynamic Analysis
|
||||
|
||||
Ghidra is purely for **static analysis**.
|
||||
For dynamic debugging, the workflow is identical to IDA:
|
||||
1. Do your mapping and decompilation in Ghidra.
|
||||
2. Note the physical addresses and offsets.
|
||||
3. Run the binary in WinUAE and drop into the native debugger (`Shift+F12`) to set breakpoints and step through the hardware state live.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [ghidra-amiga by BartmanAbyss](https://github.com/BartmanAbyss/ghidra-amiga) — The definitive Amiga loader and extension suite for Ghidra.
|
||||
- [Ghidra Official Website](https://ghidra-sre.org/)
|
||||
- [vscode-amiga-debug](https://github.com/BartmanAbyss/vscode-amiga-debug) — Excellent extension for source-level Amiga debugging if you are writing modern Amiga patches.
|
||||
|
|
@ -6,10 +6,9 @@
|
|||
|
||||
| Component | Version / Notes |
|
||||
|---|---|
|
||||
| IDA Pro | 7.0+ (7.5+ recommended for Hex-Rays decompiler quality) |
|
||||
| IDA Pro | 7.0+ (provides standard M68k disassembly, no Hex-Rays support) |
|
||||
| Processor module | M68k — included in IDA standard install |
|
||||
| HUNK loader | Included in some IDA builds; community plugin if absent |
|
||||
| Hex-Rays decompiler | 68k decompiler license required for pseudocode |
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -161,21 +160,60 @@ annotate_lvos()
|
|||
|
||||
---
|
||||
|
||||
## Step 8: Hex-Rays Decompiler Tips for 68k
|
||||
## Step 8: Mapping Custom Hardware Registers
|
||||
|
||||
The Hex-Rays 68k decompiler needs type information to produce clean pseudocode:
|
||||
When reversing games or hardware-banging software, you will frequently encounter direct accesses to `$DFF000` (Custom Chips), `$BFE001` (CIAA), and `$BFD000` (CIAB).
|
||||
|
||||
1. **Set function types** — mark return type and argument registers for library call wrappers
|
||||
2. **Suppress spurious variables** — many D-register temps appear; use `Collapse variable` or retype
|
||||
3. **Add `__asm` register hints** for known argument registers
|
||||
To make these readable in IDA:
|
||||
1. Ensure the Amiga NDK headers are loaded (from Step 6).
|
||||
2. Go to the `Structures` tab and ensure the `Custom` structure (from `hardware/custom.h`) is defined.
|
||||
3. Jump to address `$DFF000` in the IDA view (you may need to create a dummy data segment at `$DFF000` if one doesn't exist).
|
||||
4. Apply the `Custom` struct format to the data at `$DFF000` (using `Alt+Q`).
|
||||
5. When you see an instruction like `MOVE.W D0, $096(A4)` where you know `A4` points to `$DFF000`, press `T` (Struct offset) to map it to the human-readable `dmacon` register.
|
||||
|
||||
Example — marking a library function prototype:
|
||||
```c
|
||||
// In IDA Local Types:
|
||||
APTR __cdecl AllocMem_wrap(ULONG byteSize, ULONG requirements);
|
||||
```
|
||||
> [!TIP]
|
||||
> **Automating with IDAPython:** Instead of mapping structures manually, you can use the Python scripts included in this repository to bulk-define all custom chip and CIA registers specific to your target Amiga model.
|
||||
>
|
||||
> Choose the script matching your target chipset:
|
||||
> - **[`scripts/ida9_amiga_ocs.py`](scripts/ida9_amiga_ocs.py)** (A1000, A500, A2000)
|
||||
> - **[`scripts/ida9_amiga_ecs.py`](scripts/ida9_amiga_ecs.py)** (A500+, A600, A3000)
|
||||
> - **[`scripts/ida9_amiga_aga.py`](scripts/ida9_amiga_aga.py)** (A1200, A4000, CD32)
|
||||
>
|
||||
> Simply load your binary in IDA 9.x, go to `File > Script file...` (or `Alt-F7`), and select the script. It will automatically create the `HW_CUSTOM`, `HW_CIAA`, and `HW_CIAB` segments, format the data types, and apply the physical register names. This makes hardware accesses immediately readable (e.g., `MOVE.W D0, $DFF096` becomes `MOVE.W D0, DMACON`). Using the correct chipset script ensures you quickly spot if an OCS game accidentally accesses an AGA-only register!
|
||||
---
|
||||
|
||||
Then apply to call sites via `Y` (set type) on the JSR instruction.
|
||||
## Step 9: Dynamic Analysis Workflow
|
||||
|
||||
IDA Pro is primarily used for **static analysis** in standard Amiga workflows. Do not attempt to use IDA's Remote GDB debugger out-of-the-box, as standard WinUAE does not contain a GDB stub.
|
||||
|
||||
**The Golden Amiga Reversing Workflow:**
|
||||
1. Use **IDA Pro** to build the map: label variables, identify routines, and find the target logic (e.g., the copy protection check).
|
||||
2. Note the physical offset of the instruction in the binary (or its relative location to a known signature).
|
||||
3. Run the software in **WinUAE**.
|
||||
4. Press `Shift+F12` to drop into the **WinUAE native debugger**.
|
||||
5. Set breakpoints (`f <address>`) based on your findings in IDA.
|
||||
6. Step through the live hardware state natively in WinUAE, where all custom chip registers and DMA timings are perfectly emulated.
|
||||
|
||||
---
|
||||
|
||||
## Step 10: Patching Workflows
|
||||
|
||||
IDA's internal 68k assembler is notoriously finicky for generating inline patches directly in the database. If you need to neutralize a check (e.g., changing a `BNE` to `NOP`s):
|
||||
|
||||
1. **Live Testing:** In the WinUAE debugger, use the `a <address>` command to assemble new instructions live in memory, or `w <address> <value>` to write hex bytes directly. Test the patch live before committing it to disk.
|
||||
2. **Permanent Patching:** Once the offset and replacement bytes are confirmed, use a dedicated hex editor (like HxD or ImHex) on the actual executable file on disk, or write a small Python patcher script to seek and write the bytes.
|
||||
3. **Advanced Payload Patching:** For large patches that don't fit inline, use `vasm` to assemble a payload block, append it to a new HUNK or overwrite dead code, and redirect the execution flow via a `JMP`.
|
||||
|
||||
---
|
||||
|
||||
## Step 11: Decompilation Alternatives (Ghidra)
|
||||
|
||||
> [!WARNING]
|
||||
> **Hex-Rays Does Not Support M68k.** The official Hex-Rays decompiler *does not* natively support the Motorola 68000 architecture. IDA Pro will provide world-class disassembly, debugging, and cross-referencing for Amiga binaries, but it **cannot** generate C pseudocode for them.
|
||||
|
||||
If C pseudocode generation is a strict requirement for your workflow, you must use **Ghidra**:
|
||||
1. Ghidra officially supports the 68000 architecture for both disassembly and its integrated decompiler.
|
||||
2. Use the **[ghidra-amiga](https://github.com/BartmanAbyss/ghidra-amiga)** plugin by BartmanAbyss, which provides a robust HUNK loader, Amiga custom chipset register mappings, and OS library base tracking specifically designed for the Ghidra decompiler engine.
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -67,6 +67,13 @@ If using the HUNK plugin:
|
|||
- HUNK_SYMBOL entries become IDA names automatically
|
||||
- HUNK_RELOC32 become IDA fixups
|
||||
|
||||
> [!NOTE]
|
||||
> **Alternative: Native Disassemblers**
|
||||
> If you are working directly on an Amiga or via emulation (WinUAE), native tools are highly effective:
|
||||
> - **Interactive Disassemblers**: *ReSource* allows for interactive tracing and is well-aware of AmigaOS structures.
|
||||
> - **Command-line Disassemblers**: *IRA* (Interactive Reassembler) is excellent for generating re-assemblable source code from HUNK binaries.
|
||||
> - **Assembler Environments**: *AsmOne* provides a fully integrated debugging, disassembling, and patching environment.
|
||||
|
||||
### 2.2 Set Processor
|
||||
|
||||
`Options → General → Processor type = Motorola 680x0`
|
||||
|
|
@ -209,9 +216,23 @@ Look for timer/protection mechanisms:
|
|||
|
||||
---
|
||||
|
||||
## Limitations: The Decompilation Problem
|
||||
|
||||
While decompilation (generating C/C++ source code from assembly) is a common modern RE workflow via Hex-Rays or Ghidra, the Amiga ecosystem presents severe challenges for decompilation:
|
||||
|
||||
- **Heavy reliance on hand-written assembly**: Many Amiga games and demos eschewed C compilers entirely. Decompiling highly optimized 68000 assembly that uses custom chip registers directly into C yields poor, unreadable results.
|
||||
- **Custom Calling Conventions**: Unlike modern standard ABIs (e.g., cdecl, fastcall), Amiga software frequently used register-based arguments (e.g., D0-D1 for data, A0-A1 for pointers) tailored to specific routines.
|
||||
|
||||
> [!NOTE]
|
||||
> **Historical Context**
|
||||
> Straightforward decompilation of Amiga games is largely a myth. Successful "decompilation" projects, such as Tom Morton's *GLFrontier* (a port of the Atari ST/Amiga game *Frontier*), rely heavily on custom-built decompilation solutions tailored precisely to the game's specific binary patterns, rather than generic tools.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [ida_setup.md](ida_setup.md) — IDA configuration details
|
||||
- [compiler_fingerprints.md](compiler_fingerprints.md) — compiler identification
|
||||
- [lvo_table.md](../04_linking_and_libraries/lvo_table.md) — complete LVO tables
|
||||
- [code_vs_data_disambiguation.md](static/code_vs_data_disambiguation.md) — distinguishing code bytes from data
|
||||
- NDK39: all `.fd` and `include/` files
|
||||
|
|
|
|||
128
05_reversing/scripts/ida9_amiga_aga.py
Normal file
128
05_reversing/scripts/ida9_amiga_aga.py
Normal file
|
|
@ -0,0 +1,128 @@
|
|||
import ida_name
|
||||
import ida_segment
|
||||
import ida_bytes
|
||||
|
||||
def create_segment(name, start_ea, size, sclass="DATA"):
|
||||
end_ea = start_ea + size
|
||||
if not ida_segment.getseg(start_ea):
|
||||
ida_segment.add_segm(0, start_ea, end_ea, name, sclass)
|
||||
print(f"Created segment {name} at 0x{start_ea:X}")
|
||||
else:
|
||||
print(f"Segment already exists at 0x{start_ea:X}")
|
||||
|
||||
def define_registers():
|
||||
custom_base = 0xDFF000
|
||||
ciaa_base = 0xBFE000
|
||||
ciab_base = 0xBFD000
|
||||
|
||||
create_segment("HW_CUSTOM", 0xDFF000, 0x200, "HW")
|
||||
create_segment("HW_CIAA", 0xBFE000, 0x1000, "HW")
|
||||
create_segment("HW_CIAB", 0xBFD000, 0x1000, "HW")
|
||||
|
||||
# Advanced Graphics Architecture (AGA) Register Map
|
||||
custom_registers = {
|
||||
0x000: "BLTDDAT", 0x002: "DMACONR", 0x004: "VPOSR", 0x006: "VHPOSR",
|
||||
0x008: "DSKDATR", 0x00A: "JOY0DAT", 0x00C: "JOY1DAT", 0x00E: "CLXDAT",
|
||||
0x010: "ADKCONR", 0x012: "POT0DAT", 0x014: "POT1DAT", 0x016: "POTGOR",
|
||||
0x018: "SERDATR", 0x01A: "DSKBYTR", 0x01C: "INTENAR", 0x01E: "INTREQR",
|
||||
0x020: "DSKPTH", 0x022: "DSKPTL", 0x024: "DSKLEN", 0x026: "DSKDAT",
|
||||
0x028: "REFPTR", 0x02A: "VPOSW", 0x02C: "VHPOSW", 0x02E: "COPCON",
|
||||
0x030: "SERDAT", 0x032: "SERPER", 0x034: "POTGO", 0x036: "JOYTEST",
|
||||
0x038: "STREQU", 0x03A: "STRVBL", 0x03C: "STRHOR", 0x03E: "STRLONG",
|
||||
0x040: "BLTCON0", 0x042: "BLTCON1", 0x044: "BLTAFWM", 0x046: "BLTALWM",
|
||||
0x048: "BLTCPTH", 0x04A: "BLTCPTL", 0x04C: "BLTBPTH", 0x04E: "BLTBPTL",
|
||||
0x050: "BLTAPTH", 0x052: "BLTAPTL", 0x054: "BLTDPTH", 0x056: "BLTDPTL",
|
||||
0x058: "BLTSIZE", 0x05A: "BLTCON0L", 0x05C: "BLTSIZV", 0x05E: "BLTSIZH",
|
||||
0x060: "BLTCMOD", 0x062: "BLTBMOD", 0x064: "BLTAMOD", 0x066: "BLTDMOD",
|
||||
0x070: "BLTCDAT", 0x072: "BLTBDAT", 0x074: "BLTADAT", 0x078: "SPRHDAT",
|
||||
0x07C: "DENISEID", 0x07E: "DSKSYNC",
|
||||
0x080: "COP1LCH", 0x082: "COP1LCL", 0x084: "COP2LCH", 0x086: "COP2LCL",
|
||||
0x088: "COPJMP1", 0x08A: "COPJMP2", 0x08C: "COPINS",
|
||||
0x08E: "DIWSTRT", 0x090: "DIWSTOP", 0x092: "DDFSTRT", 0x094: "DDFSTOP",
|
||||
0x096: "DMACON", 0x098: "CLXCON", 0x09A: "INTENA", 0x09C: "INTREQ",
|
||||
0x09E: "ADKCON",
|
||||
# AUDIO
|
||||
0x0A0: "AUD0LTH", 0x0A2: "AUD0LTL", 0x0A4: "AUD0LEN", 0x0A6: "AUD0PER", 0x0A8: "AUD0VOL", 0x0AA: "AUD0DAT",
|
||||
0x0B0: "AUD1LTH", 0x0B2: "AUD1LTL", 0x0B4: "AUD1LEN", 0x0B6: "AUD1PER", 0x0B8: "AUD1VOL", 0x0BA: "AUD1DAT",
|
||||
0x0C0: "AUD2LTH", 0x0C2: "AUD2LTL", 0x0C4: "AUD2LEN", 0x0C6: "AUD2PER", 0x0C8: "AUD2VOL", 0x0CA: "AUD2DAT",
|
||||
0x0D0: "AUD3LTH", 0x0D2: "AUD3LTL", 0x0D4: "AUD3LEN", 0x0D6: "AUD3PER", 0x0D8: "AUD3VOL", 0x0DA: "AUD3DAT",
|
||||
# BITPLANES
|
||||
0x0E0: "BPL1PTH", 0x0E2: "BPL1PTL", 0x0E4: "BPL2PTH", 0x0E6: "BPL2PTL",
|
||||
0x0E8: "BPL3PTH", 0x0EA: "BPL3PTL", 0x0EC: "BPL4PTH", 0x0EE: "BPL4PTL",
|
||||
0x0F0: "BPL5PTH", 0x0F2: "BPL5PTL", 0x0F4: "BPL6PTH", 0x0F6: "BPL6PTL",
|
||||
0x0F8: "BPL7PTH", 0x0FA: "BPL7PTL", 0x0FC: "BPL8PTH", 0x0FE: "BPL8PTL",
|
||||
0x100: "BPLCON0", 0x102: "BPLCON1", 0x104: "BPLCON2", 0x106: "BPLCON3",
|
||||
0x108: "BPL1MOD", 0x10A: "BPL2MOD", 0x10C: "BPLCON4",
|
||||
0x110: "BPL1DAT", 0x112: "BPL2DAT", 0x114: "BPL3DAT", 0x116: "BPL4DAT",
|
||||
0x118: "BPL5DAT", 0x11A: "BPL6DAT", 0x11C: "BPL7DAT", 0x11E: "BPL8DAT",
|
||||
# SPRITES
|
||||
0x120: "SPR0PTH", 0x122: "SPR0PTL", 0x124: "SPR1PTH", 0x126: "SPR1PTL",
|
||||
0x128: "SPR2PTH", 0x12A: "SPR2PTL", 0x12C: "SPR3PTH", 0x12E: "SPR3PTL",
|
||||
0x130: "SPR4PTH", 0x132: "SPR4PTL", 0x134: "SPR5PTH", 0x136: "SPR5PTL",
|
||||
0x138: "SPR6PTH", 0x13A: "SPR6PTL", 0x13C: "SPR7PTH", 0x13E: "SPR7PTL",
|
||||
# SPRITE DATA
|
||||
0x140: "SPR0POS", 0x142: "SPR0CTL", 0x144: "SPR0DATA", 0x146: "SPR0DATB",
|
||||
0x148: "SPR1POS", 0x14A: "SPR1CTL", 0x14C: "SPR1DATA", 0x14E: "SPR1DATB",
|
||||
0x150: "SPR2POS", 0x152: "SPR2CTL", 0x154: "SPR2DATA", 0x156: "SPR2DATB",
|
||||
0x158: "SPR3POS", 0x15A: "SPR3CTL", 0x15C: "SPR3DATA", 0x15E: "SPR3DATB",
|
||||
0x160: "SPR4POS", 0x162: "SPR4CTL", 0x164: "SPR4DATA", 0x166: "SPR4DATB",
|
||||
0x168: "SPR5POS", 0x16A: "SPR5CTL", 0x16C: "SPR5DATA", 0x16E: "SPR5DATB",
|
||||
0x170: "SPR6POS", 0x172: "SPR6CTL", 0x174: "SPR6DATA", 0x176: "SPR6DATB",
|
||||
0x178: "SPR7POS", 0x17A: "SPR7CTL", 0x17C: "SPR7DATA", 0x17E: "SPR7DATB",
|
||||
# COLOR PALETTE
|
||||
0x180: "COLOR00", 0x182: "COLOR01", 0x184: "COLOR02", 0x186: "COLOR03",
|
||||
0x188: "COLOR04", 0x18A: "COLOR05", 0x18C: "COLOR06", 0x18E: "COLOR07",
|
||||
0x190: "COLOR08", 0x192: "COLOR09", 0x194: "COLOR10", 0x196: "COLOR11",
|
||||
0x198: "COLOR12", 0x19A: "COLOR13", 0x19C: "COLOR14", 0x19E: "COLOR15",
|
||||
0x1A0: "COLOR16", 0x1A2: "COLOR17", 0x1A4: "COLOR18", 0x1A6: "COLOR19",
|
||||
0x1A8: "COLOR20", 0x1AA: "COLOR21", 0x1AC: "COLOR22", 0x1AE: "COLOR23",
|
||||
0x1B0: "COLOR24", 0x1B2: "COLOR25", 0x1B4: "COLOR26", 0x1B6: "COLOR27",
|
||||
0x1B8: "COLOR28", 0x1BA: "COLOR29", 0x1BC: "COLOR30", 0x1BE: "COLOR31",
|
||||
# ECS / AGA specific
|
||||
0x1C0: "HTOTAL", 0x1C2: "HSSTOP", 0x1C4: "HBSTRT", 0x1C6: "HBSTOP",
|
||||
0x1C8: "VTOTAL", 0x1CA: "VSSTOP", 0x1CC: "VBSTRT", 0x1CE: "VBSTOP",
|
||||
0x1DC: "BEAMCON0",0x1DE: "HSSTRT", 0x1E0: "VSSTRT", 0x1E4: "DIWHIGH",
|
||||
0x1FC: "FMODE"
|
||||
}
|
||||
|
||||
ciaa_registers = {
|
||||
0x001: "CIAA_PRA", 0x101: "CIAA_PRB", 0x201: "CIAA_DDRA", 0x301: "CIAA_DDRB",
|
||||
0x401: "CIAA_TALO", 0x501: "CIAA_TAHI", 0x601: "CIAA_TBLO", 0x701: "CIAA_TBHI",
|
||||
0x801: "CIAA_TODLO",0x901: "CIAA_TODMID",0xA01: "CIAA_TODHI",0xB01: "CIAA_SDR",
|
||||
0xC01: "CIAA_ICR", 0xD01: "CIAA_CRA", 0xE01: "CIAA_CRB"
|
||||
}
|
||||
|
||||
ciab_registers = {
|
||||
0x000: "CIAB_PRA", 0x100: "CIAB_PRB", 0x200: "CIAB_DDRA", 0x300: "CIAB_DDRB",
|
||||
0x400: "CIAB_TALO", 0x500: "CIAB_TAHI", 0x600: "CIAB_TBLO", 0x700: "CIAB_TBHI",
|
||||
0x800: "CIAB_TODLO",0x900: "CIAB_TODMID",0xA00: "CIAB_TODHI",0xB00: "CIAB_SDR",
|
||||
0xC00: "CIAB_ICR", 0xD00: "CIAB_CRA", 0xE00: "CIAB_CRB"
|
||||
}
|
||||
|
||||
count = 0
|
||||
|
||||
# Custom Chips are 16-bit words mapped at DFFxxx
|
||||
for offset, name in custom_registers.items():
|
||||
addr = custom_base + offset
|
||||
ida_bytes.create_word(addr, 2)
|
||||
if ida_name.set_name(addr, name, ida_name.SN_CHECK):
|
||||
count += 1
|
||||
|
||||
# CIAA registers are mapped to odd bytes
|
||||
for offset, name in ciaa_registers.items():
|
||||
addr = ciaa_base + offset
|
||||
ida_bytes.create_byte(addr, 1)
|
||||
if ida_name.set_name(addr, name, ida_name.SN_CHECK):
|
||||
count += 1
|
||||
|
||||
# CIAB registers are mapped to even bytes
|
||||
for offset, name in ciab_registers.items():
|
||||
addr = ciab_base + offset
|
||||
ida_bytes.create_byte(addr, 1)
|
||||
if ida_name.set_name(addr, name, ida_name.SN_CHECK):
|
||||
count += 1
|
||||
|
||||
print(f"Amiga AGA Script: Successfully mapped {count} hardware registers.")
|
||||
|
||||
if __name__ == "__main__":
|
||||
define_registers()
|
||||
130
05_reversing/scripts/ida9_amiga_ecs.py
Normal file
130
05_reversing/scripts/ida9_amiga_ecs.py
Normal file
|
|
@ -0,0 +1,130 @@
|
|||
import ida_name
|
||||
import ida_segment
|
||||
import ida_bytes
|
||||
|
||||
def create_segment(name, start_ea, size, sclass="DATA"):
|
||||
end_ea = start_ea + size
|
||||
if not ida_segment.getseg(start_ea):
|
||||
ida_segment.add_segm(0, start_ea, end_ea, name, sclass)
|
||||
print(f"Created segment {name} at 0x{start_ea:X}")
|
||||
else:
|
||||
print(f"Segment already exists at 0x{start_ea:X}")
|
||||
|
||||
def define_registers():
|
||||
custom_base = 0xDFF000
|
||||
ciaa_base = 0xBFE000
|
||||
ciab_base = 0xBFD000
|
||||
|
||||
create_segment("HW_CUSTOM", 0xDFF000, 0x200, "HW")
|
||||
create_segment("HW_CIAA", 0xBFE000, 0x1000, "HW")
|
||||
create_segment("HW_CIAB", 0xBFD000, 0x1000, "HW")
|
||||
|
||||
# Enhanced Chip Set (ECS) Register Map (Includes OCS)
|
||||
custom_registers = {
|
||||
0x000: "BLTDDAT", 0x002: "DMACONR", 0x004: "VPOSR", 0x006: "VHPOSR",
|
||||
0x008: "DSKDATR", 0x00A: "JOY0DAT", 0x00C: "JOY1DAT", 0x00E: "CLXDAT",
|
||||
0x010: "ADKCONR", 0x012: "POT0DAT", 0x014: "POT1DAT", 0x016: "POTGOR",
|
||||
0x018: "SERDATR", 0x01A: "DSKBYTR", 0x01C: "INTENAR", 0x01E: "INTREQR",
|
||||
0x020: "DSKPTH", 0x022: "DSKPTL", 0x024: "DSKLEN", 0x026: "DSKDAT",
|
||||
0x028: "REFPTR", 0x02A: "VPOSW", 0x02C: "VHPOSW", 0x02E: "COPCON",
|
||||
0x030: "SERDAT", 0x032: "SERPER", 0x034: "POTGO", 0x036: "JOYTEST",
|
||||
0x038: "STREQU", 0x03A: "STRVBL", 0x03C: "STRHOR", 0x03E: "STRLONG", # ECS
|
||||
0x040: "BLTCON0", 0x042: "BLTCON1", 0x044: "BLTAFWM", 0x046: "BLTALWM",
|
||||
0x048: "BLTCPTH", 0x04A: "BLTCPTL", 0x04C: "BLTBPTH", 0x04E: "BLTBPTL",
|
||||
0x050: "BLTAPTH", 0x052: "BLTAPTL", 0x054: "BLTDPTH", 0x056: "BLTDPTL",
|
||||
0x058: "BLTSIZE",
|
||||
0x05A: "BLTCON0L", 0x05C: "BLTSIZV", 0x05E: "BLTSIZH", # ECS
|
||||
0x060: "BLTCMOD", 0x062: "BLTBMOD", 0x064: "BLTAMOD", 0x066: "BLTDMOD",
|
||||
0x070: "BLTCDAT", 0x072: "BLTBDAT", 0x074: "BLTADAT",
|
||||
0x078: "SPRHDAT", # ECS
|
||||
0x07C: "DENISEID", # ECS
|
||||
0x07E: "DSKSYNC",
|
||||
0x080: "COP1LCH", 0x082: "COP1LCL", 0x084: "COP2LCH", 0x086: "COP2LCL",
|
||||
0x088: "COPJMP1", 0x08A: "COPJMP2", 0x08C: "COPINS",
|
||||
0x08E: "DIWSTRT", 0x090: "DIWSTOP", 0x092: "DDFSTRT", 0x094: "DDFSTOP",
|
||||
0x096: "DMACON", 0x098: "CLXCON", 0x09A: "INTENA", 0x09C: "INTREQ",
|
||||
0x09E: "ADKCON",
|
||||
# AUDIO
|
||||
0x0A0: "AUD0LTH", 0x0A2: "AUD0LTL", 0x0A4: "AUD0LEN", 0x0A6: "AUD0PER", 0x0A8: "AUD0VOL", 0x0AA: "AUD0DAT",
|
||||
0x0B0: "AUD1LTH", 0x0B2: "AUD1LTL", 0x0B4: "AUD1LEN", 0x0B6: "AUD1PER", 0x0B8: "AUD1VOL", 0x0BA: "AUD1DAT",
|
||||
0x0C0: "AUD2LTH", 0x0C2: "AUD2LTL", 0x0C4: "AUD2LEN", 0x0C6: "AUD2PER", 0x0C8: "AUD2VOL", 0x0CA: "AUD2DAT",
|
||||
0x0D0: "AUD3LTH", 0x0D2: "AUD3LTL", 0x0D4: "AUD3LEN", 0x0D6: "AUD3PER", 0x0D8: "AUD3VOL", 0x0DA: "AUD3DAT",
|
||||
# BITPLANES
|
||||
0x0E0: "BPL1PTH", 0x0E2: "BPL1PTL", 0x0E4: "BPL2PTH", 0x0E6: "BPL2PTL",
|
||||
0x0E8: "BPL3PTH", 0x0EA: "BPL3PTL", 0x0EC: "BPL4PTH", 0x0EE: "BPL4PTL",
|
||||
0x0F0: "BPL5PTH", 0x0F2: "BPL5PTL", 0x0F4: "BPL6PTH", 0x0F6: "BPL6PTL",
|
||||
0x0F8: "BPL7PTH", 0x0FA: "BPL7PTL", 0x0FC: "BPL8PTH", 0x0FE: "BPL8PTL",
|
||||
0x100: "BPLCON0", 0x102: "BPLCON1", 0x104: "BPLCON2",
|
||||
0x108: "BPL1MOD", 0x10A: "BPL2MOD",
|
||||
0x110: "BPL1DAT", 0x112: "BPL2DAT", 0x114: "BPL3DAT", 0x116: "BPL4DAT",
|
||||
0x118: "BPL5DAT", 0x11A: "BPL6DAT", 0x11C: "BPL7DAT", 0x11E: "BPL8DAT",
|
||||
# SPRITES
|
||||
0x120: "SPR0PTH", 0x122: "SPR0PTL", 0x124: "SPR1PTH", 0x126: "SPR1PTL",
|
||||
0x128: "SPR2PTH", 0x12A: "SPR2PTL", 0x12C: "SPR3PTH", 0x12E: "SPR3PTL",
|
||||
0x130: "SPR4PTH", 0x132: "SPR4PTL", 0x134: "SPR5PTH", 0x136: "SPR5PTL",
|
||||
0x138: "SPR6PTH", 0x13A: "SPR6PTL", 0x13C: "SPR7PTH", 0x13E: "SPR7PTL",
|
||||
# SPRITE DATA
|
||||
0x140: "SPR0POS", 0x142: "SPR0CTL", 0x144: "SPR0DATA", 0x146: "SPR0DATB",
|
||||
0x148: "SPR1POS", 0x14A: "SPR1CTL", 0x14C: "SPR1DATA", 0x14E: "SPR1DATB",
|
||||
0x150: "SPR2POS", 0x152: "SPR2CTL", 0x154: "SPR2DATA", 0x156: "SPR2DATB",
|
||||
0x158: "SPR3POS", 0x15A: "SPR3CTL", 0x15C: "SPR3DATA", 0x15E: "SPR3DATB",
|
||||
0x160: "SPR4POS", 0x162: "SPR4CTL", 0x164: "SPR4DATA", 0x166: "SPR4DATB",
|
||||
0x168: "SPR5POS", 0x16A: "SPR5CTL", 0x16C: "SPR5DATA", 0x16E: "SPR5DATB",
|
||||
0x170: "SPR6POS", 0x172: "SPR6CTL", 0x174: "SPR6DATA", 0x176: "SPR6DATB",
|
||||
0x178: "SPR7POS", 0x17A: "SPR7CTL", 0x17C: "SPR7DATA", 0x17E: "SPR7DATB",
|
||||
# COLOR PALETTE
|
||||
0x180: "COLOR00", 0x182: "COLOR01", 0x184: "COLOR02", 0x186: "COLOR03",
|
||||
0x188: "COLOR04", 0x18A: "COLOR05", 0x18C: "COLOR06", 0x18E: "COLOR07",
|
||||
0x190: "COLOR08", 0x192: "COLOR09", 0x194: "COLOR10", 0x196: "COLOR11",
|
||||
0x198: "COLOR12", 0x19A: "COLOR13", 0x19C: "COLOR14", 0x19E: "COLOR15",
|
||||
0x1A0: "COLOR16", 0x1A2: "COLOR17", 0x1A4: "COLOR18", 0x1A6: "COLOR19",
|
||||
0x1A8: "COLOR20", 0x1AA: "COLOR21", 0x1AC: "COLOR22", 0x1AE: "COLOR23",
|
||||
0x1B0: "COLOR24", 0x1B2: "COLOR25", 0x1B4: "COLOR26", 0x1B6: "COLOR27",
|
||||
0x1B8: "COLOR28", 0x1BA: "COLOR29", 0x1BC: "COLOR30", 0x1BE: "COLOR31",
|
||||
# ECS specific
|
||||
0x1C0: "HTOTAL", 0x1C2: "HSSTOP", 0x1C4: "HBSTRT", 0x1C6: "HBSTOP",
|
||||
0x1C8: "VTOTAL", 0x1CA: "VSSTOP", 0x1CC: "VBSTRT", 0x1CE: "VBSTOP",
|
||||
0x1DC: "BEAMCON0",0x1DE: "HSSTRT", 0x1E0: "VSSTRT", 0x1E4: "DIWHIGH",
|
||||
}
|
||||
|
||||
ciaa_registers = {
|
||||
0x001: "CIAA_PRA", 0x101: "CIAA_PRB", 0x201: "CIAA_DDRA", 0x301: "CIAA_DDRB",
|
||||
0x401: "CIAA_TALO", 0x501: "CIAA_TAHI", 0x601: "CIAA_TBLO", 0x701: "CIAA_TBHI",
|
||||
0x801: "CIAA_TODLO",0x901: "CIAA_TODMID",0xA01: "CIAA_TODHI",0xB01: "CIAA_SDR",
|
||||
0xC01: "CIAA_ICR", 0xD01: "CIAA_CRA", 0xE01: "CIAA_CRB"
|
||||
}
|
||||
|
||||
ciab_registers = {
|
||||
0x000: "CIAB_PRA", 0x100: "CIAB_PRB", 0x200: "CIAB_DDRA", 0x300: "CIAB_DDRB",
|
||||
0x400: "CIAB_TALO", 0x500: "CIAB_TAHI", 0x600: "CIAB_TBLO", 0x700: "CIAB_TBHI",
|
||||
0x800: "CIAB_TODLO",0x900: "CIAB_TODMID",0xA00: "CIAB_TODHI",0xB00: "CIAB_SDR",
|
||||
0xC00: "CIAB_ICR", 0xD00: "CIAB_CRA", 0xE00: "CIAB_CRB"
|
||||
}
|
||||
|
||||
count = 0
|
||||
|
||||
# Custom Chips are 16-bit words mapped at DFFxxx
|
||||
for offset, name in custom_registers.items():
|
||||
addr = custom_base + offset
|
||||
ida_bytes.create_word(addr, 2)
|
||||
if ida_name.set_name(addr, name, ida_name.SN_CHECK):
|
||||
count += 1
|
||||
|
||||
# CIAA registers are mapped to odd bytes
|
||||
for offset, name in ciaa_registers.items():
|
||||
addr = ciaa_base + offset
|
||||
ida_bytes.create_byte(addr, 1)
|
||||
if ida_name.set_name(addr, name, ida_name.SN_CHECK):
|
||||
count += 1
|
||||
|
||||
# CIAB registers are mapped to even bytes
|
||||
for offset, name in ciab_registers.items():
|
||||
addr = ciab_base + offset
|
||||
ida_bytes.create_byte(addr, 1)
|
||||
if ida_name.set_name(addr, name, ida_name.SN_CHECK):
|
||||
count += 1
|
||||
|
||||
print(f"Amiga ECS Script: Successfully mapped {count} hardware registers.")
|
||||
|
||||
if __name__ == "__main__":
|
||||
define_registers()
|
||||
122
05_reversing/scripts/ida9_amiga_ocs.py
Normal file
122
05_reversing/scripts/ida9_amiga_ocs.py
Normal file
|
|
@ -0,0 +1,122 @@
|
|||
import ida_name
|
||||
import ida_segment
|
||||
import ida_bytes
|
||||
|
||||
def create_segment(name, start_ea, size, sclass="DATA"):
|
||||
end_ea = start_ea + size
|
||||
if not ida_segment.getseg(start_ea):
|
||||
ida_segment.add_segm(0, start_ea, end_ea, name, sclass)
|
||||
print(f"Created segment {name} at 0x{start_ea:X}")
|
||||
else:
|
||||
print(f"Segment already exists at 0x{start_ea:X}")
|
||||
|
||||
def define_registers():
|
||||
custom_base = 0xDFF000
|
||||
ciaa_base = 0xBFE000
|
||||
ciab_base = 0xBFD000
|
||||
|
||||
create_segment("HW_CUSTOM", 0xDFF000, 0x200, "HW")
|
||||
create_segment("HW_CIAA", 0xBFE000, 0x1000, "HW")
|
||||
create_segment("HW_CIAB", 0xBFD000, 0x1000, "HW")
|
||||
|
||||
# Original Chip Set (OCS) Register Map
|
||||
custom_registers = {
|
||||
0x000: "BLTDDAT", 0x002: "DMACONR", 0x004: "VPOSR", 0x006: "VHPOSR",
|
||||
0x008: "DSKDATR", 0x00A: "JOY0DAT", 0x00C: "JOY1DAT", 0x00E: "CLXDAT",
|
||||
0x010: "ADKCONR", 0x012: "POT0DAT", 0x014: "POT1DAT", 0x016: "POTGOR",
|
||||
0x018: "SERDATR", 0x01A: "DSKBYTR", 0x01C: "INTENAR", 0x01E: "INTREQR",
|
||||
0x020: "DSKPTH", 0x022: "DSKPTL", 0x024: "DSKLEN", 0x026: "DSKDAT",
|
||||
0x028: "REFPTR", 0x02A: "VPOSW", 0x02C: "VHPOSW", 0x02E: "COPCON",
|
||||
0x030: "SERDAT", 0x032: "SERPER", 0x034: "POTGO", 0x036: "JOYTEST",
|
||||
0x040: "BLTCON0", 0x042: "BLTCON1", 0x044: "BLTAFWM", 0x046: "BLTALWM",
|
||||
0x048: "BLTCPTH", 0x04A: "BLTCPTL", 0x04C: "BLTBPTH", 0x04E: "BLTBPTL",
|
||||
0x050: "BLTAPTH", 0x052: "BLTAPTL", 0x054: "BLTDPTH", 0x056: "BLTDPTL",
|
||||
0x058: "BLTSIZE",
|
||||
0x060: "BLTCMOD", 0x062: "BLTBMOD", 0x064: "BLTAMOD", 0x066: "BLTDMOD",
|
||||
0x070: "BLTCDAT", 0x072: "BLTBDAT", 0x074: "BLTADAT",
|
||||
0x07E: "DSKSYNC",
|
||||
0x080: "COP1LCH", 0x082: "COP1LCL", 0x084: "COP2LCH", 0x086: "COP2LCL",
|
||||
0x088: "COPJMP1", 0x08A: "COPJMP2", 0x08C: "COPINS",
|
||||
0x08E: "DIWSTRT", 0x090: "DIWSTOP", 0x092: "DDFSTRT", 0x094: "DDFSTOP",
|
||||
0x096: "DMACON", 0x098: "CLXCON", 0x09A: "INTENA", 0x09C: "INTREQ",
|
||||
0x09E: "ADKCON",
|
||||
# AUDIO
|
||||
0x0A0: "AUD0LTH", 0x0A2: "AUD0LTL", 0x0A4: "AUD0LEN", 0x0A6: "AUD0PER", 0x0A8: "AUD0VOL", 0x0AA: "AUD0DAT",
|
||||
0x0B0: "AUD1LTH", 0x0B2: "AUD1LTL", 0x0B4: "AUD1LEN", 0x0B6: "AUD1PER", 0x0B8: "AUD1VOL", 0x0BA: "AUD1DAT",
|
||||
0x0C0: "AUD2LTH", 0x0C2: "AUD2LTL", 0x0C4: "AUD2LEN", 0x0C6: "AUD2PER", 0x0C8: "AUD2VOL", 0x0CA: "AUD2DAT",
|
||||
0x0D0: "AUD3LTH", 0x0D2: "AUD3LTL", 0x0D4: "AUD3LEN", 0x0D6: "AUD3PER", 0x0D8: "AUD3VOL", 0x0DA: "AUD3DAT",
|
||||
# BITPLANES
|
||||
0x0E0: "BPL1PTH", 0x0E2: "BPL1PTL", 0x0E4: "BPL2PTH", 0x0E6: "BPL2PTL",
|
||||
0x0E8: "BPL3PTH", 0x0EA: "BPL3PTL", 0x0EC: "BPL4PTH", 0x0EE: "BPL4PTL",
|
||||
0x0F0: "BPL5PTH", 0x0F2: "BPL5PTL", 0x0F4: "BPL6PTH", 0x0F6: "BPL6PTL",
|
||||
0x0F8: "BPL7PTH", 0x0FA: "BPL7PTL", 0x0FC: "BPL8PTH", 0x0FE: "BPL8PTL",
|
||||
0x100: "BPLCON0", 0x102: "BPLCON1", 0x104: "BPLCON2",
|
||||
0x108: "BPL1MOD", 0x10A: "BPL2MOD",
|
||||
0x110: "BPL1DAT", 0x112: "BPL2DAT", 0x114: "BPL3DAT", 0x116: "BPL4DAT",
|
||||
0x118: "BPL5DAT", 0x11A: "BPL6DAT", 0x11C: "BPL7DAT", 0x11E: "BPL8DAT",
|
||||
# SPRITES
|
||||
0x120: "SPR0PTH", 0x122: "SPR0PTL", 0x124: "SPR1PTH", 0x126: "SPR1PTL",
|
||||
0x128: "SPR2PTH", 0x12A: "SPR2PTL", 0x12C: "SPR3PTH", 0x12E: "SPR3PTL",
|
||||
0x130: "SPR4PTH", 0x132: "SPR4PTL", 0x134: "SPR5PTH", 0x136: "SPR5PTL",
|
||||
0x138: "SPR6PTH", 0x13A: "SPR6PTL", 0x13C: "SPR7PTH", 0x13E: "SPR7PTL",
|
||||
# SPRITE DATA
|
||||
0x140: "SPR0POS", 0x142: "SPR0CTL", 0x144: "SPR0DATA", 0x146: "SPR0DATB",
|
||||
0x148: "SPR1POS", 0x14A: "SPR1CTL", 0x14C: "SPR1DATA", 0x14E: "SPR1DATB",
|
||||
0x150: "SPR2POS", 0x152: "SPR2CTL", 0x154: "SPR2DATA", 0x156: "SPR2DATB",
|
||||
0x158: "SPR3POS", 0x15A: "SPR3CTL", 0x15C: "SPR3DATA", 0x15E: "SPR3DATB",
|
||||
0x160: "SPR4POS", 0x162: "SPR4CTL", 0x164: "SPR4DATA", 0x166: "SPR4DATB",
|
||||
0x168: "SPR5POS", 0x16A: "SPR5CTL", 0x16C: "SPR5DATA", 0x16E: "SPR5DATB",
|
||||
0x170: "SPR6POS", 0x172: "SPR6CTL", 0x174: "SPR6DATA", 0x176: "SPR6DATB",
|
||||
0x178: "SPR7POS", 0x17A: "SPR7CTL", 0x17C: "SPR7DATA", 0x17E: "SPR7DATB",
|
||||
# COLOR PALETTE
|
||||
0x180: "COLOR00", 0x182: "COLOR01", 0x184: "COLOR02", 0x186: "COLOR03",
|
||||
0x188: "COLOR04", 0x18A: "COLOR05", 0x18C: "COLOR06", 0x18E: "COLOR07",
|
||||
0x190: "COLOR08", 0x192: "COLOR09", 0x194: "COLOR10", 0x196: "COLOR11",
|
||||
0x198: "COLOR12", 0x19A: "COLOR13", 0x19C: "COLOR14", 0x19E: "COLOR15",
|
||||
0x1A0: "COLOR16", 0x1A2: "COLOR17", 0x1A4: "COLOR18", 0x1A6: "COLOR19",
|
||||
0x1A8: "COLOR20", 0x1AA: "COLOR21", 0x1AC: "COLOR22", 0x1AE: "COLOR23",
|
||||
0x1B0: "COLOR24", 0x1B2: "COLOR25", 0x1B4: "COLOR26", 0x1B6: "COLOR27",
|
||||
0x1B8: "COLOR28", 0x1BA: "COLOR29", 0x1BC: "COLOR30", 0x1BE: "COLOR31"
|
||||
}
|
||||
|
||||
ciaa_registers = {
|
||||
0x001: "CIAA_PRA", 0x101: "CIAA_PRB", 0x201: "CIAA_DDRA", 0x301: "CIAA_DDRB",
|
||||
0x401: "CIAA_TALO", 0x501: "CIAA_TAHI", 0x601: "CIAA_TBLO", 0x701: "CIAA_TBHI",
|
||||
0x801: "CIAA_TODLO",0x901: "CIAA_TODMID",0xA01: "CIAA_TODHI",0xB01: "CIAA_SDR",
|
||||
0xC01: "CIAA_ICR", 0xD01: "CIAA_CRA", 0xE01: "CIAA_CRB"
|
||||
}
|
||||
|
||||
ciab_registers = {
|
||||
0x000: "CIAB_PRA", 0x100: "CIAB_PRB", 0x200: "CIAB_DDRA", 0x300: "CIAB_DDRB",
|
||||
0x400: "CIAB_TALO", 0x500: "CIAB_TAHI", 0x600: "CIAB_TBLO", 0x700: "CIAB_TBHI",
|
||||
0x800: "CIAB_TODLO",0x900: "CIAB_TODMID",0xA00: "CIAB_TODHI",0xB00: "CIAB_SDR",
|
||||
0xC00: "CIAB_ICR", 0xD00: "CIAB_CRA", 0xE00: "CIAB_CRB"
|
||||
}
|
||||
|
||||
count = 0
|
||||
|
||||
# Custom Chips are 16-bit words mapped at DFFxxx
|
||||
for offset, name in custom_registers.items():
|
||||
addr = custom_base + offset
|
||||
ida_bytes.create_word(addr, 2)
|
||||
if ida_name.set_name(addr, name, ida_name.SN_CHECK):
|
||||
count += 1
|
||||
|
||||
# CIAA registers are mapped to odd bytes
|
||||
for offset, name in ciaa_registers.items():
|
||||
addr = ciaa_base + offset
|
||||
ida_bytes.create_byte(addr, 1)
|
||||
if ida_name.set_name(addr, name, ida_name.SN_CHECK):
|
||||
count += 1
|
||||
|
||||
# CIAB registers are mapped to even bytes
|
||||
for offset, name in ciab_registers.items():
|
||||
addr = ciab_base + offset
|
||||
ida_bytes.create_byte(addr, 1)
|
||||
if ida_name.set_name(addr, name, ida_name.SN_CHECK):
|
||||
count += 1
|
||||
|
||||
print(f"Amiga OCS Script: Successfully mapped {count} hardware registers.")
|
||||
|
||||
if __name__ == "__main__":
|
||||
define_registers()
|
||||
605
05_reversing/static/ansi_c_reversing.md
Normal file
605
05_reversing/static/ansi_c_reversing.md
Normal file
|
|
@ -0,0 +1,605 @@
|
|||
[← Home](../../README.md) · [Reverse Engineering](../README.md)
|
||||
|
||||
# ANSI C Reverse Engineering — Reconstructing C from m68k Assembly
|
||||
|
||||
## Overview
|
||||
|
||||
The vast majority of Amiga applications and libraries were written in C — SAS/C, GCC, VBCC, StormC, and Aztec C dominated the ecosystem from 1988 onward. Reversing C binaries means recognizing the **language semantics** underneath the compiler's code generation: struct field access patterns, switch-statement jump tables, `malloc`/`free` lifetime tracking, and control-flow reconstruction. Unlike hand-written assembly, C binaries leave a rich trail of standard-library calls, predictable stack-frame layouts, and relocatable data references that serve as anchors for reconstructing the original source-level intent.
|
||||
|
||||
C reverse engineering on Amiga has one huge advantage over other platforms: **the OS itself is written in C** (with assembly for hot paths). Nearly every data structure — `struct Task`, `struct MsgPort`, `struct IORequest` — is publicly documented in the NDK headers. When you see `MOVE.L $1C(A0), D0` and A0 is a library base, offset `$1C` is the `lib_OpenCnt` field. This tight coupling between disassembly patterns and known OS structures makes Amiga C RE uniquely tractable.
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "C Source (.c)"
|
||||
STRUCT["struct Gadget { ... }"]
|
||||
SWITCH["switch(expr) { case: }"]
|
||||
MALLOC["p = AllocMem(size);"]
|
||||
LOOP["for(i=0; i<n; i++)"]
|
||||
CALLBACK["Hook callback"]
|
||||
end
|
||||
subgraph "m68k Disassembly"
|
||||
OFF["MOVE.L (A0), D0<br/>MOVE.L $1C(A0), D1<br/>→ struct field access at offsets"]
|
||||
JT["MOVE.W offset(PC,Dn.W), D0<br/>JMP (PC, D0.W)<br/>→ switch jump table"]
|
||||
ALLOC["JSR -$C6(A6)<br/>→ AllocMem() LVO"]
|
||||
DBCC["MOVEQ #n-1, D0<br/>loop: ... DBRA D0, loop"]
|
||||
HOOK["h_Entry: MOVE.L A2, -(SP)<br/>→ hook dispatch"]
|
||||
end
|
||||
STRUCT --> OFF
|
||||
SWITCH --> JT
|
||||
MALLOC --> ALLOC
|
||||
LOOP --> DBCC
|
||||
CALLBACK --> HOOK
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architecture: C-to-Assembly Mapping
|
||||
|
||||
### The Standard C ABI on Amiga
|
||||
|
||||
| Concern | Convention | Notes |
|
||||
|---|---|---|
|
||||
| **Return value** | D0 (32-bit), D0/D1 (64-bit), or hidden pointer in A0 | Struct returns: caller allocates space, passes pointer in A0 |
|
||||
| **Scratch registers** | D0, D1, A0, A1 | Caller-saved; callee may destroy |
|
||||
| **Preserved registers** | D2–D7, A2–A6 | Callee must save/restore if used |
|
||||
| **Frame pointer** | A5 (SAS/C) or A6 (GCC with `-fomit-frame-pointer` skips this) | Used for local variable and argument access |
|
||||
| **Stack growth** | Toward lower addresses | `LINK A5, #-N` allocates N bytes |
|
||||
| **Library base** | A6 | Set to library base before `JSR LVO(A6)` |
|
||||
| **Argument passing** | D0–D7, then stack (right-to-left push for SAS/C) | See [register_conventions.md](../../04_linking_and_libraries/register_conventions.md) |
|
||||
|
||||
### SAS/C Calling Convention Variants
|
||||
|
||||
SAS/C supports multiple calling conventions within a single binary. Recognizing them is essential for correct function boundary and parameter analysis:
|
||||
|
||||
| Convention | Keyword | Register Save | Parameter Passing | Prologue Pattern |
|
||||
|---|---|---|---|---|
|
||||
| **Standard** | `__stdargs` (default) | D2–D7, A2–A5 | D0, D1, then stack (right-to-left) | `LINK A5, #-N` / `MOVEM.L D2-D7/A2-A4, -(SP)` |
|
||||
| **Register args** | `__reg` / `__regargs` | D2–D7, A2–A5 | First 2 integer args in D0, D1; rest on stack | Same as standard but D0/D1 hold parameters |
|
||||
| **Save all** | `__saveds` | D2–D7, A2–A6 (every non-scratch register) | All on stack | `MOVEM.L D2-D7/A2-A6, -(SP)` at entry — distinctive 13-register save |
|
||||
| **Interrupt** | `__interrupt` | D0–D7, A0–A6 (all registers) | All on stack | Full register save + `RTE` instead of `RTS` |
|
||||
| **No stack check** | `__no_stack_check` | Varies | Varies | Omits the stack overflow check call at function entry |
|
||||
|
||||
**How to identify in disassembly**:
|
||||
|
||||
```asm
|
||||
; __saveds function (typically used for interrupt handlers and callbacks):
|
||||
_saveds_func:
|
||||
MOVEM.L D2-D7/A2-A6, -(SP) ; 13 registers saved = __saveds signature
|
||||
; ... function body ...
|
||||
MOVEM.L (SP)+, D2-D7/A2-A6 ; restore
|
||||
UNLK A5
|
||||
RTS
|
||||
|
||||
; __reg function (fastcall — first args in registers):
|
||||
_reg_func:
|
||||
LINK A5, #-local_size
|
||||
MOVEM.L D2-D4, -(SP) ; may save fewer registers
|
||||
; D0 = first argument, D1 = second argument
|
||||
; ... function body ...
|
||||
|
||||
; __stdargs function (standard C calling convention):
|
||||
_std_func:
|
||||
LINK A5, #-local_size
|
||||
MOVEM.L D2-D7/A2-A4, -(SP) ; standard SAS/C save set
|
||||
; Arguments on stack: (A5+8) = arg1, (A5+12) = arg2, ...
|
||||
```
|
||||
|
||||
### Compiler-Specific Frame Layouts
|
||||
|
||||
| Compiler | Frame Pointer | Prologue | Epilogue | String Addressing |
|
||||
|---|---|---|---|---|
|
||||
| **SAS/C 6.x** | A5 | `LINK A5, #-N` / `MOVEM.L D2-D7/A2-A4, -(SP)` | `MOVEM.L (SP)+, D2-D7/A2-A4` / `UNLK A5` / `RTS` | Absolute (`MOVE.L #string, D1`) |
|
||||
| **GCC 2.95.x** | A6 (optional) | `LINK A6, #-N` or `SUBQ.L #N, SP` | `UNLK A6` / `RTS` or `ADDQ.L #N, SP` / `RTS` | PC-relative (`LEA string(PC), A0`) |
|
||||
| **VBCC** | None (typical) | `MOVEM.L D2-D4, -(SP)` (only used regs) | `MOVEM.L (SP)+, D2-D4` / `RTS` | PC-relative |
|
||||
| **StormC** | A5 | `LINK A5, #-N` | `UNLK A5` / `RTS` | Absolute (similar to SAS/C) |
|
||||
| **Aztec C** | A5 | `LINK A5, #-N` / `MOVEM.L D3-D7, -(SP)` | `MOVEM.L (SP)+, D3-D7` / `UNLK A5` / `RTS` | Absolute |
|
||||
|
||||
### Common C Constructs → Assembly
|
||||
|
||||
<!-- TODO: Expand — comprehensive pattern catalog, 30+ constructs -->
|
||||
|
||||
| C Construct | Typical m68k Pattern |
|
||||
|---|---|
|
||||
| `x = y + z` | `MOVE.L y(FP), D0` / `ADD.L z(FP), D0` / `MOVE.L D0, x(FP)` |
|
||||
| `if (cond)` | `TST.L cond` / `BEQ skip` |
|
||||
| `if (!ptr)` | `MOVE.L ptr, D0` / `BEQ null_case` |
|
||||
| `for (i=0; i<n; i++)` | `MOVEQ #0, D7` / `loop: ...` / `ADDQ.L #1, D7` / `CMP.L n, D7` / `BLT loop` |
|
||||
| `while (*p++)` | `MOVE.L (A0)+, D0` / `BNE loop` (combined load+increment+test) |
|
||||
| `switch (x)` | `CMP` chain for sparse cases; `MOVE.W jt(PC, Dn.W), D0` / `JMP (PC, D0.W)` for dense |
|
||||
| `struct->field` | `MOVE.L $offset(A0), D0` — offset matches `sizeof` of preceding fields |
|
||||
| `array[i]` | `MOVE.L #array, A0` / `MOVE.L i, D0` / `ASL.L #2, D0` / `MOVE.L 0(A0, D0.W), D1` |
|
||||
| `malloc(size)` → `AllocMem` | `MOVE.L size, D0` / `MOVE.L #MEMF_CLEAR, D1` / `JSR -$C6(A6)` |
|
||||
| `free(ptr)` → `FreeMem` | `MOVE.L ptr, A1` / `MOVE.L size, D0` / `JSR -$D2(A6)` |
|
||||
| `do { ... } while (cond)` | `loop: ...` / `TST cond` / `BNE loop` (test at bottom) |
|
||||
| `goto label` | `BRA label` (unconditional) |
|
||||
| `setjmp` / `longjmp` | `JSR _setjmp` / `JSR _longjmp` — saves/restores all registers + SP |
|
||||
| `fn_ptr(args)` (function pointer call) | `MOVE.L fn_ptr, A0` / `JSR (A0)` |
|
||||
| `printf(fmt, ...)` | Push args right-to-left, `JSR _printf` — no LVO, direct lib call |
|
||||
| `sprintf` / `strcpy` chain | Repeated `MOVE.B (A0)+, (A1)+` with null termination check |
|
||||
| `memcpy` (large) | `MOVE.L (A0)+, (A1)+` / `SUBQ.L #1, D0` / `BNE loop` |
|
||||
| `memset` (zero/pattern fill) | `MOVE.L D0, (A0)+` / loop |
|
||||
| `strcmp` / `strncmp` | `CMPM.B (A0)+, (A1)+` / `DBNE D0, loop` |
|
||||
| `bsearch` / custom binary search | Midpoint calculation via `ADD.L`/`ASR.L`, compare, branch |
|
||||
| `qsort` callback | Passes comparison function pointer; calls `JSR (A2)` per comparison |
|
||||
| `static` local variable | Stored in DATA hunk (not stack); accessed via absolute or PC-relative addressing |
|
||||
| `const` global (read-only data) | May be placed in CODE hunk alongside instructions |
|
||||
| `volatile` access | Generates separate load/store for each access; never optimizes across register reuse |
|
||||
|
||||
### The BPTR: AmigaOS's Unique Pointer Type
|
||||
|
||||
**BPTR** (Byte Pointer) is a legacy from BCPL/Tripos that persists throughout AmigaOS. Understanding it is essential for DOS-related reverse engineering:
|
||||
|
||||
```c
|
||||
/* BPTR definition from NDK headers:
|
||||
* A BPTR stores a word-aligned address shifted right by 2 bits.
|
||||
* BADDR(bptr) converts BPTR → real address: bptr << 2
|
||||
* MKBADDR(addr) converts real address → BPTR: addr >> 2
|
||||
*/
|
||||
#define BADDR(bptr) ((APTR)((ULONG)(bptr) << 2))
|
||||
#define MKBADDR(addr) ((BPTR)((ULONG)(addr) >> 2))
|
||||
```
|
||||
|
||||
**In disassembly**:
|
||||
```asm
|
||||
; DOS call returning a BPTR (e.g., Lock() returns a BPTR file lock):
|
||||
JSR -$54(A6) ; Lock(name, mode) — returns BPTR in D0
|
||||
; D0 now contains a BPTR, NOT a usable address!
|
||||
MOVE.L D0, lock_bptr(FP) ; store BPTR
|
||||
|
||||
; Later, to use this BPTR with another DOS call:
|
||||
MOVE.L lock_bptr(FP), D1 ; pass BPTR directly to Examine(), UnLock(), etc.
|
||||
JSR -$66(A6) ; UnLock(bptr) — accepts BPTR directly
|
||||
|
||||
; To dereference a BPTR to access the underlying struct:
|
||||
MOVE.L lock_bptr(FP), D0 ; D0 = BPTR
|
||||
LSL.L #2, D0 ; D0 = real address (BPTR << 2)
|
||||
MOVE.L D0, A0 ; A0 = real address of FileLock struct
|
||||
; Now you can access A0->fl_Key, A0->fl_Volume, etc.
|
||||
```
|
||||
|
||||
**Key RE identification**:
|
||||
- `LSR.L #2, Dn` before a memory access = BPTR → address conversion (MKBADDR)
|
||||
- `LSL.L #2, Dn` before a DOS call = address → BPTR conversion
|
||||
- BPTRs are passed directly to DOS library calls without conversion (the library does the conversion internally)
|
||||
- Common BPTR uses: file locks, directory locks, seglists (loaded executables), DOS process handles
|
||||
|
||||
> [!WARNING]
|
||||
> Mistaking a BPTR for a real pointer and dereferencing it without the `<< 2` conversion will access the wrong address — 4× lower than intended. This is one of the most common errors in Amiga C RE.
|
||||
|
||||
### AmigaOS-Specific C Patterns
|
||||
|
||||
| OS Pattern | Disassembly Signature |
|
||||
|---|---|
|
||||
| **Hook callback** (`struct Hook`) | `h_Entry: MOVE.L A2, -(SP)` / ... / `RTS` — A2=object, A1=message, A0=hook |
|
||||
| **Tag list processing** (`TagItem *`) | `loop: MOVE.L (A0)+, D0` / `BEQ end` — iterate `ti_Tag`+`ti_Data` pairs until `TAG_DONE` (0) |
|
||||
| **BPTR dereference** | `LSL.L #2, D0` (BPTR→address) or `LSR.L #2, D0` (address→BPTR) — see BPTR section above |
|
||||
| **LVO dispatch** | `JSR -$XXX(A6)` — library vector table call; offset encodes function |
|
||||
| **Forbid/Permit pairs** | `JSR -$84(A6)` (Forbid) / `JSR -$8A(A6)` (Permit) — critical section markers |
|
||||
| **Signal wait loops** | `MOVE.L sigmask, D0` / `JSR -$13E(A6)` (Wait) — blocking on signal bits |
|
||||
| **Message port patterns** | `JSR -$180(A6)` (PutMsg) / `JSR -$174(A6)` (GetMsg) / `JSR -$17A(A6)` (WaitPort) |
|
||||
| **Exec list traversal** | `MOVE.L (A0), A0` — follow `ln_Succ` (offset $00); `struct Node` / `struct List` iteration |
|
||||
| **Device I/O** | `MOVE.L io, A1` / `JSR -$1C8(A6)` (DoIO) or `JSR -$1CE(A6)` (SendIO) + wait |
|
||||
| **Resource tracking** | `JSR -$1E6(A6)` (OpenResource) followed by resource-specific dispatch |
|
||||
|
||||
---
|
||||
|
||||
## Decision Guide: C Binary Analysis Workflow
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
BIN["C binary loaded"]
|
||||
ID_COMPILER["Identify compiler<br/>(fingerprints)"]
|
||||
MAP_LIBS["Map library calls<br/>via LVO xrefs"]
|
||||
FIND_MAIN["Find main() or<br/>entry point"]
|
||||
RECONSTRUCT["Reconstruct types,<br/>structs, locals"]
|
||||
ANNOTATE["Annotate IDA database"]
|
||||
VERIFY["Dynamic verification<br/>(FS-UAE debugger)"]
|
||||
|
||||
BIN --> ID_COMPILER
|
||||
ID_COMPILER --> MAP_LIBS
|
||||
MAP_LIBS --> FIND_MAIN
|
||||
FIND_MAIN --> RECONSTRUCT
|
||||
RECONSTRUCT --> ANNOTATE
|
||||
ANNOTATE --> VERIFY
|
||||
```
|
||||
|
||||
### When to Use C-Focused RE vs Alternatives
|
||||
|
||||
| Scenario | Approach |
|
||||
|---|---|
|
||||
| Binary has `LINK A5` / `JSR LVO(A6)` patterns | Standard C RE (this article) |
|
||||
| Binary has no library calls, direct hardware access | See [asm68k_binaries.md](asm68k_binaries.md) |
|
||||
| Binary has `__vtbl` references, `new`/`delete` patterns | See [cpp_vtables_reversing.md](cpp_vtables_reversing.md) |
|
||||
| Binary is from AMOS, Blitz, or other non-C language | See [other_languages.md](other_languages.md) |
|
||||
| Binary is packed/crunched | Unpack first; then re-evaluate |
|
||||
| Binary is a shared library (.library) | Standard C RE + library structure analysis (RomTag, JMP table, MakeLibrary) |
|
||||
|
||||
---
|
||||
|
||||
## Methodology
|
||||
|
||||
### Phase 1: Compiler Identification
|
||||
|
||||
<!-- TODO: Expand — detailed cross-reference to compiler_fingerprints.md, compiler-specific prologue/epilogue catalogs, IDA Python script to auto-detect compiler from function entry patterns -->
|
||||
|
||||
Before anything else, determine the compiler. The register conventions, string addressing mode, and library call patterns differ substantially between SAS/C, GCC, and VBCC. See [compiler_fingerprints.md](../compiler_fingerprints.md) and [m68k_codegen_patterns.md](m68k_codegen_patterns.md) for the complete catalog.
|
||||
|
||||
### Phase 2: Library Call Anchoring
|
||||
|
||||
<!-- TODO: Expand — library call mapping walkthrough, LVO resolution, FD file cross-referencing -->
|
||||
|
||||
Every `JSR LVO(A6)` is an anchor point. Cross-reference the LVO offset against the [exec LVO table](../../14_references/exec_lvo_table.md) or [dos LVO table](../../14_references/dos_lvo_table.md). Once you know the function:
|
||||
- **Input**: D0–D7 and stack arguments tell you the parameter types
|
||||
- **Output**: D0 return value tells you what was computed
|
||||
- **Context**: The surrounding code tells you *why* the call was made
|
||||
|
||||
### Phase 3: Struct Reconstruction
|
||||
|
||||
<!-- TODO: Expand — struct layout inference, offset pattern recognition, IDA struct type creation workflow -->
|
||||
|
||||
C struct access patterns are systematic: `MOVE.L $08(A0), D0` then `MOVE.L $0C(A0), D1` — repeated offsets that don't overlap suggest struct fields. See [struct_recovery.md](struct_recovery.md) for the complete methodology.
|
||||
|
||||
### Phase 4: Call Graph Reconstruction
|
||||
|
||||
<!-- TODO: Expand — function boundary identification, call tree, indirect call resolution, library call grouping by library -->
|
||||
|
||||
- **Identify function boundaries**: LINK/UNLK pairs, SUBQ/ADDQ pairs, or standalone RTS-terminated blocks
|
||||
- **Build caller-callee matrix**: Every JSR target becomes a node; every JSR source is an edge
|
||||
- **Resolve indirect calls**: `JSR (A0)` where A0 was loaded from a vtable or function pointer table
|
||||
- **Group by library context**: Which library is A6 set to before each JSR LVO block?
|
||||
|
||||
### Phase 5: Type Inference
|
||||
|
||||
<!-- TODO: Expand — pointer vs integer disambiguation, signed/unsigned detection, struct pointer typing from field access patterns -->
|
||||
|
||||
- **Pointer vs integer**: A value used as a base register for offset addressing is a pointer. A value only used in arithmetic is an integer.
|
||||
- **Signed vs unsigned**: `BLT`/`BGE` after compare = signed; `BCS`/`BCC` after operation = unsigned
|
||||
- **Struct pointer typing**: Consistent offset patterns (+$00, +$04, +$08...) with known library struct sizes reveal the type
|
||||
- **BPTR detection**: `LSR.L #2` before use as address = BPTR (BCPL byte pointer)
|
||||
|
||||
### Phase 6: Dynamic Verification
|
||||
|
||||
<!-- TODO: Expand — FS-UAE debugger: verify struct sizes, check library base values, confirm function parameter counts, breakpoint on AllocMem to track allocations -->
|
||||
|
||||
---
|
||||
|
||||
## Tool-Specific Workflows
|
||||
|
||||
<!-- TODO: Expand — detailed walkthroughs for each tool -->
|
||||
|
||||
### IDA Pro
|
||||
|
||||
<!-- TODO: IDA-specific: Hex-Rays decompiler output interpretation, struct type import from NDK headers, LVO enum creation, FLIRT signature generation for Amiga libraries -->
|
||||
|
||||
### Ghidra
|
||||
|
||||
<!-- TODO: Ghidra-specific: AmigaOS plugin setup, 68k decompiler quirks, struct import from C headers, script-based library call annotation -->
|
||||
|
||||
### FS-UAE Debugger
|
||||
|
||||
<!-- TODO: Dynamic verification methodology: breakpoint on specific LVOs, trace library call sequence, verify alloc/free pairing -->
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
<!-- TODO: Numbered list of actionable recommendations -->
|
||||
|
||||
1. **Identify the compiler before anything else** — it determines your prologue/epilogue patterns, string addressing mode, and register conventions
|
||||
2. **Map library calls first** — every `JSR LVO(A6)` is a documented function with known parameters; use this to type inputs and outputs
|
||||
3. **Reconstruct structs from offset patterns** — consistent offset sequences reveal field layout
|
||||
4. **Cross-reference NDK headers** — AmigaOS structs are publicly documented; match your discovered offsets to known structures
|
||||
5. **Use the relocation table** — `HUNK_RELOC32` entries tell you exactly which absolute addresses are inter-hunk references
|
||||
6. **Track A6 assignments** — each library call block sets A6 to a specific library base; identify which library is in use
|
||||
7. **Decompile library calls to C prototypes** — rename `JSR -$C6(A6)` to `AllocMem()` in IDA, not `sub_1234`
|
||||
8. **Verify with dynamic analysis** — breakpoint on suspicious code paths in FS-UAE to confirm your static analysis
|
||||
9. **Document register conventions per function** — build a register map to catch type errors early
|
||||
10. **Leverage HUNK_SYMBOL debug info** — if present, it gives you function names and sometimes local variable names
|
||||
|
||||
---
|
||||
|
||||
## Antipatterns
|
||||
|
||||
<!-- TODO: Add named antipatterns with broken/fixed code pairs -->
|
||||
|
||||
### 1. The Global Confusion
|
||||
|
||||
**Wrong**: Treating every absolute address as a global variable.
|
||||
|
||||
**Why**: SAS/C uses absolute addressing for globals (relocated at load), GCC uses PC-relative, and some addresses are actually hardware registers. Confusing `$DFF000` with a C global variable leads to nonsense decompilation.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 2. The Void Pointer Over-Generalization
|
||||
|
||||
**Wrong**: Marking all unknown pointers as `void *`.
|
||||
|
||||
**Why**: Without type information, you lose the ability to see struct field access patterns. A pointer that's always offset by `+$08`, `+$0C`, `+$1C` is almost certainly a typed struct pointer.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 3. The Missing Return
|
||||
|
||||
**Wrong**: Assuming every `RTS` marks the end of a meaningful function.
|
||||
|
||||
**Why**: Compilers sometimes tail-duplicate, merge epilogues, or generate multiple return points. A single C function may produce 3–5 `RTS` instructions in the assembly.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 4. The Single-Library Assumption
|
||||
|
||||
**Wrong**: Assuming A6 always holds the same library base throughout the program.
|
||||
|
||||
**Why**: Real C programs switch A6 between exec, dos, intuition, graphics, and custom libraries. A `JSR -$C6(A6)` at one point in the code may call `AllocMem` (exec), while the same `JSR -$C6(A6)` after an A6 switch calls something entirely different. You must track A6 reloads.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 5. The BPTR Blindness
|
||||
|
||||
**Wrong**: Treating `LSR.L #2, D0` / `MOVE.L (A0) based on D0` as a confusing bit-shift.
|
||||
|
||||
**Why**: BCPL legacy: AmigaOS uses byte pointers (BPTRs) for file handles, locks, and DOS structures. The `LSR.L #2` converts a BPTR (shifted by 2 for historical reasons) to a real word-aligned address. Missing this means you misidentify DOS API call results.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 6. The Tag List Blind Spot
|
||||
|
||||
**Wrong**: Seeing a loop that processes `(A0)+` pairs and dismissing it as a custom iterator.
|
||||
|
||||
**Why**: Tag lists (`TagItem` arrays of `ti_Tag`/`ti_Data` pairs terminated by `TAG_DONE=0`) are used pervasively in AmigaOS. This is one of the most common patterns in Amiga C and a strong indicator you're looking at an OS API call setup.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 7. The Signal Confusion
|
||||
|
||||
**Wrong**: Assuming a `Wait()` call with a magic constant is waiting on a single event.
|
||||
|
||||
**Why**: Signal bits are allocated dynamically via `AllocSignal()`. A `MOVE.L #$00001000, D0` / `JSR -$13E(A6)` (Wait) doesn't tell you what it's waiting for unless you trace where that signal bit was allocated and who sends it.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 8. The Inline Copy Assumption
|
||||
|
||||
**Wrong**: Identifying every `MOVE.L (A0)+, (A1)+` loop as a custom `memcpy`.
|
||||
|
||||
**Why**: Compilers inline `memcpy` for small fixed sizes, but the same pattern also appears in struct copy operations, array initialization, and DMA buffer filling. The context (source/destination, loop count, surrounding code) tells you which.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
---
|
||||
|
||||
## Pitfalls
|
||||
|
||||
### 1. Register Variable Aliasing
|
||||
|
||||
<!-- TODO: Add worked example -->
|
||||
|
||||
SAS/C with `__register` or GCC with `register` keyword may keep variables in registers across function calls, breaking the standard "arguments go on stack" mental model.
|
||||
|
||||
### 2. Inlined `memcpy` / `strcpy`
|
||||
|
||||
<!-- TODO: Add worked example — distinguishing compiler-inlined copy from hand-written copy -->
|
||||
|
||||
Compilers often inline small copies as `MOVE.L (A0)+, (A1)+` loops. These look like custom struct copy routines but are really compiler-generated `memcpy`.
|
||||
|
||||
### 3. Structure Padding
|
||||
|
||||
<!-- TODO: Add worked example — padding bytes creating misleading offset gaps -->
|
||||
|
||||
The m68k ABI aligns struct fields naturally: `UWORD` at even addresses, `ULONG` at multiples of 4. Compiler-inserted padding bytes create gaps in the offset sequence that can confuse field counting.
|
||||
|
||||
### 4. Compiler Optimizations That Break Pattern Recognition
|
||||
|
||||
<!-- TODO: Expand — tail-call optimization (JMP instead of JSR/RTS), loop unrolling, constant propagation, dead code elimination, function inlining -->
|
||||
|
||||
### 5. Library Base Switching
|
||||
|
||||
<!-- TODO: A6 changes between exec→dos→intuition→graphics; missing a reload means misidentifying every subsequent library call. Show how to detect A6 reloads (MOVE.L _IntuitionBase, A6). -->
|
||||
|
||||
### 6. Mixed C and Assembly in the Same Binary
|
||||
|
||||
<!-- TODO: Many Amiga programs use C for UI/logic and asm for performance. The boundary between C and asm sections requires switching analysis strategies. -->
|
||||
|
||||
### 7. SAS/C `__saveds` vs `__stdargs` vs `__reg` Calling Conventions
|
||||
|
||||
<!-- TODO: SAS/C supports multiple calling conventions within the same binary. `__saveds` (no register args), `__stdargs` (default), `__reg` (register params). Each generates different prologue/epilogue patterns. -->
|
||||
|
||||
### 8. GCC `__asm__` Inline Assembly Blocks
|
||||
|
||||
<!-- TODO: GCC inline asm can inject hand-tuned instructions into otherwise standard C prologue/epilogue frames. These look like compiler bugs but are intentional optimizations. -->
|
||||
|
||||
### 9. BSS vs DATA Confusion
|
||||
|
||||
<!-- TODO: BSS (zero-initialized) vs DATA (explicitly initialized) globals look identical in disassembly but differ in initialization — important for understanding program state at startup. -->
|
||||
|
||||
### 10. CLI vs WB Startup Path
|
||||
|
||||
<!-- TODO: Programs compiled for CLI (argc/argv) vs Workbench (WBStartup message) have fundamentally different entry points. Reversing the wrong path leads to misunderstanding the program's initialization. -->
|
||||
|
||||
### 11. SAS/C `#pragma` Anomalies
|
||||
|
||||
<!-- TODO: SAS/C pragmas can change code generation (amiga-align, donotcombine, etc.). Without knowing the pragmas, some codegen choices look like bugs. -->
|
||||
|
||||
---
|
||||
|
||||
## Use-Case Cookbook
|
||||
|
||||
### Pattern 1: Identifying `main()` Across Compilers
|
||||
|
||||
<!-- TODO: Step-by-step — entry point chain (startup code → main), SAS/C vs GCC vs VBCC main identification, argc/argv reconstruction. IDA Python script. -->
|
||||
|
||||
### Pattern 2: Reconstructing a `struct List` Traversal
|
||||
|
||||
<!-- TODO: Step-by-step — MinList/List identification from ln_Succ/ln_Pred offsets, loop pattern recognition, IDA struct type application. -->
|
||||
|
||||
### Pattern 3: Mapping AllocMem/FreeMem Pairs to Find Memory Leaks
|
||||
|
||||
<!-- TODO: Step-by-step — cross-reference all AllocMem calls, trace the returned pointer through the function, find matching FreeMem, flag unpaired allocations. IDA Python script. -->
|
||||
|
||||
### Pattern 4: Recovering `switch` Statement Cases
|
||||
|
||||
<!-- TODO: Step-by-step — dense vs sparse switch detection, jump table extraction, case value reconstruction from CMP chains, IDA/Ghidra switch idioms. -->
|
||||
|
||||
### Pattern 5: Reconstructing a Hook Callback Dispatch Chain
|
||||
|
||||
<!-- TODO: Step-by-step — identify Hook struct, trace h_Entry to the actual callback, map the callback parameters (A0=Hook*, A2=object, A1=message), identify what triggers the hook. -->
|
||||
|
||||
### Pattern 6: Identifying Open/Close Resource Pairs
|
||||
|
||||
<!-- TODO: Step-by-step — OpenLibrary paired with CloseLibrary, OpenDevice with CloseDevice, OpenResource with... nothing (resources are never closed). Tracking resource lifetimes. -->
|
||||
|
||||
### Pattern 7: Recovering the Startup Code Chain
|
||||
|
||||
<!-- TODO: Step-by-step — tracing from HUNK_HEADER entry to main(): startup module init (_StdOpen, _StdClose, etc.), CLI vs WB branch, argument setup, finding the real program logic. -->
|
||||
|
||||
### Pattern 8: Tracing Tag List Construction
|
||||
|
||||
<!-- TODO: Step-by-step — identifying TagItem arrays in DATA/BSS, mapping ti_Tag values to NDK constants (TAG_xxx), understanding what the tag list configures. -->
|
||||
|
||||
### Pattern 9: Decompiling Device I/O Sequences
|
||||
|
||||
<!-- TODO: Step-by-step — CreateIORequest → OpenDevice → DoIO/SendIO+WaitIO+CheckIO → CloseDevice → DeleteIORequest lifecycle reconstruction. -->
|
||||
|
||||
### Pattern 10: Reconstructing a Message Port Protocol
|
||||
|
||||
<!-- TODO: Step-by-step — CreateMsgPort → PutMsg/GetMsg → signal-based Wait → ReplyMsg → DeleteMsgPort; identifying the message format (custom struct with embedded Message). -->
|
||||
|
||||
### Pattern 11: Identifying SAS/C `__saveds` Functions
|
||||
|
||||
<!-- TODO: Step-by-step — `__saveds` prologue pattern (MOVEM.L with all regs), typical use in interrupt handlers and callback hooks, how to recognize and rename. -->
|
||||
|
||||
### Pattern 12: Differentiating `printf` Variants from Disassembly
|
||||
|
||||
<!-- TODO: Step-by-step — printf/Printf/kprintf/RawDoFmt/VPrintf — each has a different parameter convention, stack layout, and output destination. Identify which is which from the call site. -->
|
||||
|
||||
---
|
||||
|
||||
## Real-World Examples
|
||||
|
||||
<!-- TODO: Reference specific Amiga C applications with documented RE findings -->
|
||||
|
||||
### Applications
|
||||
|
||||
<!-- TODO: Directory Opus (SAS/C) — module system, ARexx integration; FinalWriter (SAS/C) — large C codebase with custom memory management; AmigaAMP (GCC) — plugin architecture, decoder interface; SimpleMail (VBCC) — lightweight compiled C; MUI applications (various compilers) — BOOPSI-based OOP in C. -->
|
||||
|
||||
### Libraries
|
||||
|
||||
<!-- TODO: reqtools.library, asl.library — C libraries with public FD files, good for learning library structure RE; mpega.library — GCC-compiled, FPU-heavy, decoder pipeline reconstruction. -->
|
||||
|
||||
### Games
|
||||
|
||||
<!-- TODO: "Settlers" (SAS/C) — large C game with custom blitter routines; "Frontier: Elite II" (hand-ported C) — mixed C and assembly; "Worms" (various compilers) — C core with assembly effects. -->
|
||||
|
||||
---
|
||||
|
||||
## Cross-Platform Comparison
|
||||
|
||||
| Platform | C RE Challenge | Amiga Difference |
|
||||
|---|---|---|
|
||||
| **DOS (Watcom/Borland)** | Segment juggling, near/far pointers | Amiga flat 32-bit address space simplifies pointer tracking |
|
||||
| **Mac OS (MPW C)** | A5-world jump table, segmented loader | Amiga A6 per-library base is more modular |
|
||||
| **Unix (GCC)** | Position-independent code (PIC), PLT/GOT | Amiga executables are non-PIC; relocations are explicit |
|
||||
| **Windows (MSVC)** | `__stdcall` vs `__cdecl`, SEH frames | Amiga has single ABI; no calling convention variants |
|
||||
| **Embedded ARM** | Thumb interworking, constant pools | m68k has no Thumb equivalent; constants are inline |
|
||||
| **Linux (GCC)** | vDSO, IFUNC resolvers, symbol versioning | Amiga OS calls are flat JMP table; no symbol versioning |
|
||||
| **Classic Mac OS (CodeWarrior)** | Transition vectors, mixed 68k/PPC | Amiga 68k is simpler; no mixed-ISA binaries until PowerUP/WarpOS |
|
||||
|
||||
---
|
||||
|
||||
## Historical Context — Why C Dominated Amiga Development
|
||||
|
||||
<!-- TODO: Expand — SAS/C's dominance (1988+), late arrival of GCC (mid-1990s), VBCC as lightweight alternative, Aztec C's early role, StormC bringing IDE + C/C++ integration. The transition from assembly to C as hardware got faster (68020+ made C performance acceptable). The role of NDK headers + autodocs in making C the standard Amiga programming language. -->
|
||||
|
||||
---
|
||||
|
||||
## Modern Analogies
|
||||
|
||||
<!-- TODO: Expand — connect Amiga C RE to modern developer experience -->
|
||||
|
||||
| Amiga C Concept | Modern Analogy | Where It Holds / Breaks |
|
||||
|---|---|---|
|
||||
| LVO dispatch table | Dynamic linker PLT/GOT | Holds: indirect function call table; breaks: LVO is static ABI, PLT is runtime-resolved |
|
||||
| A6 library base | `this` pointer / vtable dispatch | Holds: base register for method/function lookup; breaks: A6 is shared, `this` is per-object |
|
||||
| TagItem arrays | Named parameters / option structs in C | Holds: extensible key-value config passing; breaks: TagItems are untyped until consumed |
|
||||
| BPTR | Handle / opaque pointer | Holds: abstracted pointer type; breaks: BPTR carries encoding (>>2), modern handles are transparent |
|
||||
| `OpenLibrary` with version | `dlopen` with version check | Holds: runtime library loading; breaks: Amiga libraries are shared singletons |
|
||||
| `Forbid`/`Permit` | `spin_lock` / `mutex_lock` | Holds: critical section entry/exit; breaks: Forbid disables ALL multitasking, not just one resource |
|
||||
| Resident modules (RomTag) | Shared library constructors / `.init_array` | Holds: auto-initialized code at load time; breaks: RomTags are persistent kernel objects |
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
### Q1: How do I tell SAS/C from GCC output without looking at strings?
|
||||
|
||||
<!-- TODO -->
|
||||
|
||||
### Q2: Why does this function have no `LINK` but accesses locals?
|
||||
|
||||
<!-- TODO: Frame-pointer omission (GCC -fomit-frame-pointer, VBCC), leaf functions, register-only functions. -->
|
||||
|
||||
### Q3: How do I recover the original struct field names?
|
||||
|
||||
<!-- TODO: Match offset patterns to NDK headers; when no header exists, derive from usage context. -->
|
||||
|
||||
### Q4: How do I identify which library A6 currently points to?
|
||||
|
||||
<!-- TODO: Trace the most recent `MOVE.L _LibBase, A6` or library open sequence; use `lib_Node.ln_Name` at base+$0A if known. -->
|
||||
|
||||
### Q5: Why are there two different calling conventions in the same binary?
|
||||
|
||||
<!-- TODO: SAS/C __saveds vs __stdargs vs __reg; GCC vs asm hand-tuned sections; mixed compiler object files. -->
|
||||
|
||||
### Q6: How do I identify `printf` format strings in DATA?
|
||||
|
||||
<!-- TODO: Format string signature (%s, %d, %lx), xref from _printf call, format string argument position in the call setup. -->
|
||||
|
||||
### Q7: How do I decompile a BPTR-based DOS call sequence?
|
||||
|
||||
<!-- TODO: BPTR → real address conversion, Lock/UnLock/Examine/ExNext loop, DosPacket vs library call distinction. -->
|
||||
|
||||
### Q8: What does `JSR -$1CE(A6)` mean without the LVO table?
|
||||
|
||||
<!-- TODO: Negative offset from A6 = library LVO call. Calculate which LVO from the offset; match to exec/dos/intuition LVO tables. -->
|
||||
|
||||
### Q9: How do I handle programs compiled with Lattice C (pre-SAS/C)?
|
||||
|
||||
<!-- TODO: Lattice C 3.x/4.x patterns, differences from SAS/C 5.x/6.x, Manx Aztec C similarities. -->
|
||||
|
||||
### Q10: How do I identify custom `AllocMem` wrappers?
|
||||
|
||||
<!-- TODO: Many programs wrap AllocMem in their own allocator (pool, zone, slab). Identify the wrapper by its AllocMem call, then trace all callers to map the allocation strategy. -->
|
||||
|
||||
### Q11: How do I tell if a struct is from the OS or custom?
|
||||
|
||||
<!-- TODO: Match field offsets against NDK struct definitions; custom structs won't match known OS layouts. OS structs have predictable field orders documented in the RKRM. -->
|
||||
|
||||
### Q12: What are the SAS/C `#pragma` directives that affect codegen?
|
||||
|
||||
<!-- TODO: amiga-align, donotcombine, amicall, stackextent, etc. — how to detect their effects in disassembly. -->
|
||||
|
||||
---
|
||||
|
||||
## FPGA / Emulation Impact
|
||||
|
||||
<!-- TODO: Expand — compiler-emitted code assumptions about CPU speed, self-modifying code in C (rare but exists via function pointer tables in writable memory), timing-dependent code generated by older compilers, 68000 vs 68020+ code path differences in compiler output. -->
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [compiler_fingerprints.md](../compiler_fingerprints.md) — Compiler identification
|
||||
- [**Per-Compiler RE Field Manuals**](compilers/README.md) — In-depth per-compiler analysis:
|
||||
- [SAS/C](compilers/sasc.md) · [GCC](compilers/gcc.md) · [VBCC](compilers/vbcc.md) · [StormC](compilers/stormc.md) · [Aztec C](compilers/aztec_c.md) · [Lattice C](compilers/lattice_c.md) · [DICE C](compilers/dice_c.md)
|
||||
- [m68k_codegen_patterns.md](m68k_codegen_patterns.md) — Code generation idiom catalog
|
||||
- [struct_recovery.md](struct_recovery.md) — Struct layout reconstruction
|
||||
- [api_call_identification.md](api_call_identification.md) — Library call recognition
|
||||
- [asm68k_binaries.md](asm68k_binaries.md) — Hand-written assembly RE
|
||||
- [cpp_vtables_reversing.md](cpp_vtables_reversing.md) — C++ OOP RE
|
||||
- [register_conventions.md](../../04_linking_and_libraries/register_conventions.md) — m68k ABI on AmigaOS
|
||||
- [library_structure.md](../../04_linking_and_libraries/library_structure.md) — Library internals and JMP table layout
|
||||
- [startup_code.md](../../04_linking_and_libraries/startup_code.md) — CLI vs WB startup
|
||||
- [exec_lvo_table.md](../../14_references/exec_lvo_table.md) — exec.library LVO offsets
|
||||
- [dos_lvo_table.md](../../14_references/dos_lvo_table.md) — dos.library LVO offsets
|
||||
- *Amiga ROM Kernel Reference Manual: Libraries* — Function signatures
|
||||
- *Amiga ROM Kernel Reference Manual: Devices* — Device I/O protocol
|
||||
925
05_reversing/static/asm68k_binaries.md
Normal file
925
05_reversing/static/asm68k_binaries.md
Normal file
|
|
@ -0,0 +1,925 @@
|
|||
[← Home](../../README.md) · [Reverse Engineering](../README.md)
|
||||
|
||||
# Hand-Written Assembly Reverse Engineering — Pure m68k Binaries
|
||||
|
||||
## Overview
|
||||
|
||||
Unlike compiler-generated code with predictable prologues, frame-pointer conventions, and library-call idioms, hand-written 68000 assembly is **unconstrained**. The author may use any register for any purpose, invent ad-hoc calling conventions, self-modify code, or jump into the middle of instructions. This is the norm for Amiga demos, most pre-1990 games, trackmos, bootblock intros, and hardware-banging utilities — and it demands a fundamentally different reversing strategy than C/C++ binaries.
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Compiler Binary"
|
||||
CPROLOGUE["LINK A5, #-N<br/>MOVEM.L D2-D7,-(SP)"]
|
||||
CEXIT["UNLK A5<br/>RTS"]
|
||||
CLIB["JSR LVO(A6)<br/>predictable ABI"]
|
||||
end
|
||||
subgraph "Hand-Written Assembly"
|
||||
ACUSTOM["Custom calling convention<br/>any register = any purpose"]
|
||||
AJMP["JMP (A0) / JMP $1234.W<br/>opaque control flow"]
|
||||
ASMC["Self-modifying code<br/>move.w #imm, (next_insn+2)"]
|
||||
AHW["Hardware register banging<br/>MOVE.W D0, $DFF180"]
|
||||
end
|
||||
CPROLOGUE -.->|"absent"| ACUSTOM
|
||||
CEXIT -.->|"unpredictable"| AJMP
|
||||
CLIB -.->|"may skip OS"| AHW
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
### What Makes Hand-Written Assembly Different
|
||||
|
||||
| Trait | Compiler Output | Hand-Written Assembly |
|
||||
|---|---|---|
|
||||
| **Function boundaries** | `LINK`/`UNLK` or `SUBQ`/`ADDQ` pairs | No universal marker; code may flow into data |
|
||||
| **Calling convention** | Standard ABI (A6=lib base, D0/D1=scratch, A0/A1=scratch) | Author-defined per routine; may repurpose any register |
|
||||
| **Strings** | `dc.b "text",0` with cross-reference chains | May be XOR-obfuscated, embedded mid-instruction, or stored as bitmaps |
|
||||
| **Library calls** | `JSR LVO(A6)` with reloc entries | May call via absolute address, JMP table, or custom trap |
|
||||
| **Loop structures** | `DBcc Dn, label` (counted) or `TST/BEQ` (conditional) | May unroll completely, use address-range compares, or rely on raster timing |
|
||||
| **Data embedding** | Separate `DATA` hunk | Routinely mixed with code; data tables inside branch-not-taken paths |
|
||||
|
||||
### Common Environments
|
||||
|
||||
- **Bootblock intros** (1024 bytes, no OS): All registers free, hardware banging only
|
||||
- **Trackmos / demos**: Often take over the system entirely; disable multitasking; use custom copper lists and blitter queues
|
||||
- **Games (pre-1992)**: Usually bypass `graphics.library` for speed; hit hardware registers directly
|
||||
- **Hardware drivers**: Heavy CIA/custom chip register manipulation; interrupt-driven
|
||||
- **Virus / bootblock payloads**: Deliberately obfuscated; anti-debugging tricks
|
||||
- **Cracktros / trainer menus**: Small (<4 KB), pre-launch patches to game code, often packed
|
||||
- **Trackdisk loaders**: Custom DMA-driven disk reading; Rob Northen (RNC) loaders, raw MFM decoders
|
||||
- **Non-HUNK binaries**: Raw absolute-load code at fixed addresses (e.g., `$C00000` for trapdoor Fast RAM)
|
||||
- **ROM-resident code**: Kickstart modules, expansion ROMs (DiagROM, SCSI controller firmware)
|
||||
- **Self-relocating code**: Code that copies and patches itself to run at any address
|
||||
|
||||
### The Assembly Author's Toolkit — Common Patterns Across the Demoscene
|
||||
|
||||
These patterns recur across hundreds of hand-written Amiga productions. Recognizing them accelerates function identification and purpose deduction.
|
||||
|
||||
#### Hardware Base Pointer Convention
|
||||
|
||||
Most authors dedicate a register to `$DFF000` for the entire program lifetime. The choice of register is often an **author fingerprint**:
|
||||
|
||||
| Register | Common Users | Notes |
|
||||
|---|---|---|
|
||||
| **A4** | Majority of demoscene productions | `LEA $DFF000, A4` at program start; all hardware writes use `MOVE.W Dn, $offset(A4)` |
|
||||
| **A5** | Some demos, trackmos | May conflict with SAS/C A5 frame pointer convention in mixed C+asm code |
|
||||
| **A6** | Rare — conflicts with exec library base | Only used when the program never calls exec and A6 is freed |
|
||||
|
||||
```asm
|
||||
; The classic demoscene init pattern:
|
||||
; Save OS registers, take over the machine
|
||||
MOVE.W $DFF01C, old_intena ; save INTENA state
|
||||
MOVE.W #$7FFF, $DFF09A ; disable all interrupts
|
||||
MOVE.W #$7FFF, $DFF09C ; clear all interrupt requests
|
||||
LEA $DFF000, A4 ; A4 = custom chip base for entire program
|
||||
; Now all hardware writes are: MOVE.W D0, $XXX(A4)
|
||||
```
|
||||
|
||||
#### Custom Register Offset Tables
|
||||
|
||||
Precomputed address tables indexed by effect number dispatch hardware writes without runtime calculation:
|
||||
|
||||
```asm
|
||||
; Effect dispatcher via offset table:
|
||||
effect_dispatch:
|
||||
MOVE.W effect_num(PC), D0
|
||||
ADD.W D0, D0 ; word index
|
||||
MOVE.W effect_offsets(PC, D0.W), D0
|
||||
JMP (PC, D0.W) ; jump to effect handler
|
||||
|
||||
effect_offsets:
|
||||
DC.W fx_plasma - effect_offsets
|
||||
DC.W fx_rotozoom - effect_offsets
|
||||
DC.W fx_vector3d - effect_offsets
|
||||
DC.W fx_tunnel - effect_offsets
|
||||
```
|
||||
|
||||
#### Cycle-Counted Sequences
|
||||
|
||||
Instruction sequences timed to exact 68000 CPU cycles for per-scanline effects:
|
||||
|
||||
```asm
|
||||
; Color change per scanline — 4-cycle loop (on 68000, fastest possible):
|
||||
; Each color register write needs: MOVE.W Dn, (Am) = 8 cycles
|
||||
; Plus: DBF D7, loop = 10 cycles (taken), 12 cycles (not taken)
|
||||
; A full scanline is ~227 color clocks / 454 CPU cycles on PAL
|
||||
; This limits color changes to ~50 per scanline at best
|
||||
raster_colors:
|
||||
MOVE.W (A0)+, (A4) ; write next color to COLOR00 ($DFF180)
|
||||
DBF D7, raster_colors ; 10 cycles when taken
|
||||
```
|
||||
|
||||
#### MOVEM.L Bulk Save/Restore
|
||||
|
||||
56-byte register dumps to stack for non-standard register preservation — used when a routine needs to save/restore an unusual subset of registers:
|
||||
|
||||
```asm
|
||||
; Save D0-D7 and A0-A6 to stack (15 registers × 4 = 60 bytes):
|
||||
MOVEM.L D0-D7/A0-A6, -(SP)
|
||||
; ... body of interrupt handler or complex effect ...
|
||||
MOVEM.L (SP)+, D0-D7/A0-A6
|
||||
RTE
|
||||
```
|
||||
|
||||
#### Hand-Optimized Idioms That Confuse Disassemblers
|
||||
|
||||
| Idiom | What It Does | Disassembly Trap |
|
||||
|---|---|---|
|
||||
| `ADD.W Dn, Dn` | `ASL.W #1, Dn` (multiply by 2) | IDA shows `ADD.W` — the shift intent is invisible |
|
||||
| `SUB.W Dn, Dn` | `MOVEQ #0, Dn` (clear register) | Same result, but reveals author style |
|
||||
| `OR.B Dn, Dn` / `Scc` chain | Compare Dn to zero, then set conditionally | Disassembler shows raw ops, not intent |
|
||||
| `MOVE SR, Dn` | Save CCR across branches | Used instead of recomputing flags; rare in compiler output |
|
||||
| `SWAP Dn` / `MOVE.W Dn, ...` | Access upper word of 32-bit register | Common in 16-bit coordinate manipulation |
|
||||
| `EXT.L Dn` | Sign-extend word to long | Indicates 16-bit signed value widening to 32-bit |
|
||||
| `MOVEQ #0, Dn` over `CLR.L Dn` | Both clear Dn, but MOVEQ is 2 bytes, CLR.L is 2 bytes too | MOVEQ preserves upper bits of address registers? No — author choice |
|
||||
|
||||
### Control Flow Archetypes
|
||||
|
||||
<!-- TODO: Expand — Mermaid diagrams for each archetype -->
|
||||
|
||||
| Archetype | Signature Pattern | Typical In |
|
||||
|---|---|---|
|
||||
| **State machine via jump table** | `MOVE.W state(PC), D0` / `ADD.W D0, D0` / `MOVE.W jt(PC, D0.W), D0` / `JMP (PC, D0.W)` | Game AI, effect sequencers, menu systems |
|
||||
| **VBlank-driven frame loop** | `MOVE.L $6C.W, old_vbl` / `MOVE.L #my_vbl, $6C.W` / main loop waits on flag set by VBlank | Demos, games, any framed application |
|
||||
| **Copper-interrupt-driven** | `MOVE.L #copper_irq, $68.W` (Level 3 interrupt) / per-scanline effect changes | Raster bars, multiplexed sprites, palette splits |
|
||||
| **Blitter-continuation via interrupt** | Sets `INTREQ` bit for blitter, interrupt handler chains to next blit in queue | Demos with complex blitter pipelines |
|
||||
| **Custom event loop (no exec)** | Polling loop reading CIA / custom chip registers directly; no `Wait()` / `WaitPort()` | Games bypassing OS, bootblock intros |
|
||||
| **Audio-driver callback chain** | Audio interrupt (Level 4) feeds next sample pair from custom module replayer | Protracker/Soundtracker replayers |
|
||||
|
||||
#### Protracker Replayer — Reference Architecture
|
||||
|
||||
The most commonly found audio subsystem in Amiga binaries. Understanding its internals saves hours of reverse engineering:
|
||||
|
||||
```asm
|
||||
; Standard Protracker replayer entry points:
|
||||
;
|
||||
; mt_init — initialize replayer with module data pointer
|
||||
; mt_music — call once per frame to advance pattern playback
|
||||
; mt_end — shutdown replayer, restore system state
|
||||
;
|
||||
; Registration pattern (CIA-based timing):
|
||||
|
||||
; Save old CIA interrupt vector
|
||||
MOVE.L $6C.W, old_level6 ; Level 6 = CIA-B timer interrupt
|
||||
; Install replayer interrupt
|
||||
MOVE.L #mt_irq, $6C.W
|
||||
; Configure CIA-B Timer A for the desired tempo
|
||||
MOVE.B #$7F, $BFDD00 ; CIA-B ICR mask
|
||||
MOVE.B #$81, $BFDD00 ; enable Timer A interrupt
|
||||
; Set timer period (e.g., 125 bpm → ~17060 cycles between ticks)
|
||||
MOVE.B #$7F, $BFDE00 ; CIA-B Timer A low byte
|
||||
MOVE.B #$42, $BFDE00 ; CIA-B Timer A high byte
|
||||
|
||||
; The interrupt handler (mt_irq):
|
||||
mt_irq:
|
||||
MOVEM.L D0-D7/A0-A6, -(SP) ; save all registers
|
||||
BSR mt_music ; advance replayer state
|
||||
MOVEM.L (SP)+, D0-D7/A0-A6 ; restore all registers
|
||||
MOVE.W #$0008, $DFF09C ; acknowledge CIA-B interrupt
|
||||
RTE
|
||||
```
|
||||
|
||||
**Key identification markers**:
|
||||
- Writes to `$BFDD00`/`$BFDE00` (CIA-B registers) — CIA timer setup
|
||||
- `MOVE.L #handler, $6C.W` — Level 6 interrupt vector installation
|
||||
- `MOVEM.L D0-D7/A0-A6, -(SP)` in the handler — all registers saved (standard for audio ISRs)
|
||||
- Audio register writes (`$DFF0A0`–`$DFF0D0`) — AUDxLCH/LCL/PER/VOL
|
||||
- Signature `mt_` or `_mt_` function names in HUNK_SYMBOL if available
|
||||
|
||||
---
|
||||
|
||||
## Identification: Detecting Hand-Written Assembly
|
||||
|
||||
> [!WARNING]
|
||||
> Skip this section if you already know the binary is hand-written. The identification rules are covered in [m68k_codegen_patterns.md](m68k_codegen_patterns.md) and [compiler_fingerprints.md](../compiler_fingerprints.md).
|
||||
|
||||
### Heuristics That Suggest Assembly
|
||||
|
||||
<!-- TODO: Expand — pattern catalog with IDA script snippets, binary scoring system -->
|
||||
|
||||
- **No `LINK` or `SUBQ.L #N,SP`** in the entire binary
|
||||
- **No `JSR LVO(A6)` patterns** — library calls are `JSR absolute_address` or `JMP (table, Dn.W)`
|
||||
- **Hardware register constants** (`$DFF000`–`$DFF200`, `$BFE000`–`$BFEF01`) appear as immediates
|
||||
- **`MOVEM.L` used aggressively** for per-routine save/restore with non-standard register sets
|
||||
- **`RTE` without preceding `MOVE` to SR** — custom interrupt handling
|
||||
- **`ORI #$0700, SR`** / `ANDI #$F8FF, SR` — direct interrupt level manipulation
|
||||
- **`JMP (A0)` or `JSR (A0)`** with dynamically computed target — jump tables, state machines
|
||||
- **`LEA offset(PC), An`** used for data tables rather than `MOVE.L #absolute_address, An` — PC-relative addressing for position-independent data
|
||||
- **`STOP #$2xxx`** — wait for interrupt without OS involvement
|
||||
- **`MOVE USP, An` / `MOVE An, USP`** — user stack pointer manipulation, almost never generated by compilers
|
||||
- **`MOVEC`** (68010+) to/from VBR, SFC, DFC — supervisor-level register access
|
||||
- **`RESET` instruction** — rarely used outside hand-written hardware init code
|
||||
|
||||
### Binary Scoring: Assembly Confidence
|
||||
|
||||
<!-- TODO: Add scoring table — each heuristic contributes points toward a "hand-written confidence" score -->
|
||||
|
||||
---
|
||||
|
||||
## Decision Guide: Choosing Your Approach
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
START["Binary loaded in IDA/Ghidra"]
|
||||
HAS_SYMBOLS{"Has HUNK_SYMBOL<br/>debug info?"}
|
||||
HAS_OS_CALLS{"Uses OS library<br/>calls?"}
|
||||
HAS_STRINGS{"Has readable<br/>strings?"}
|
||||
HAS_CRUNCHER{"Packed / crunched<br/>(PowerPacker, Imploder)?"}
|
||||
PURE_ASM["Pure assembly methodology"]
|
||||
|
||||
START --> HAS_CRUNCHER
|
||||
HAS_CRUNCHER -->|"Yes"| UNPACK["Unpack first<br/>→ see exe_crunchers.md"]
|
||||
HAS_CRUNCHER -->|"No"| HAS_SYMBOLS
|
||||
HAS_SYMBOLS -->|"Yes"| NAMED["Name functions from symbols<br/>then trace logic"]
|
||||
HAS_SYMBOLS -->|"No"| HAS_OS_CALLS
|
||||
HAS_OS_CALLS -->|"Yes"| OS_ANCHOR["Anchor on library calls<br/>→ identify callers by xref"]
|
||||
HAS_OS_CALLS -->|"No"| HAS_STRINGS
|
||||
HAS_STRINGS -->|"Yes"| STR_ANCHOR["Anchor on string xrefs<br/>→ trace outward"]
|
||||
HAS_STRINGS -->|"No"| PURE_ASM
|
||||
```
|
||||
|
||||
### When to Use Pure Assembly Methodology vs When to Fall Back
|
||||
|
||||
<!-- TODO: Expand — decision matrix -->
|
||||
|
||||
| Scenario | Recommended Approach |
|
||||
|---|---|
|
||||
| Binary has zero library calls, heavy custom registers | Pure assembly methodology (this article) |
|
||||
| Binary has some library calls mixed with hardware banging | Hybrid: anchor on library xrefs first, then pure asm for hardware sections |
|
||||
| Binary is packed/crunched | Unpack first, then re-evaluate |
|
||||
| Binary has HUNK_SYMBOL debug info | Standard RE workflow with named functions |
|
||||
| Binary is a ROM module (Kickstart) | ROM-specific workflow (+ known entry points from exec Scan) |
|
||||
|
||||
---
|
||||
|
||||
## Methodology
|
||||
|
||||
### Phase 1: Triage
|
||||
|
||||
1. **Dump hunk structure**: `hunkinfo` shows CODE/DATA/BSS layout and relocation entries. Raw binaries (no HUNK header) skip directly to step 7.
|
||||
2. **Scan for hardware registers**: grep for `$DFF`, `$BFE`, `$BFD` patterns. A binary that touches `$DFF000`–`$DFF1FE` directly is almost certainly hand-written or a game bypassing the OS.
|
||||
3. **Find the entry point**: Resident tag `RT_MATCHWORD` ($4AFC) / `HUNK_HEADER` entry for HUNK; raw bootblock starts executing at `$7C00` in RAM after ROM loads it.
|
||||
4. **Identify interrupt vectors**: `$60`–`$7C` offsets in hunk 0 — these are the m68k exception vectors (Bus Error through Level 7 Autovector). Hand-written binaries often overwrite them.
|
||||
5. **Detect cruncher/packer**: Scan for known decrunch stub signatures:
|
||||
| Cruncher | Signature Bytes (at or near start) | Notes |
|
||||
|---|---|---|
|
||||
| **PowerPacker** | `$42` followed by `MOVE.L`/`LEA` pattern | Uses powerpacker.library; header contains original size |
|
||||
| **Imploder** | `$49` (often) | ATN!Imploder by Animators Of Death; smaller header than PowerPacker |
|
||||
| **Shrinkler** | Context-mixing LZ; no fixed magic | Very high compression ratio; decrunch is slow (minutes on 7 MHz) |
|
||||
| **ByteKiller** | `BRA.S` over data, then `MOVEM.L` pattern | Simple LZ variant; common in 1988–1990 productions |
|
||||
| **CrunchMania** | `CR![version]` text marker | One of the fastest decrunchers; popular for 4K intros |
|
||||
| **TetraPack** | Multi-part header | Compresses data+relocs separately |
|
||||
6. **Check for overlay system**: Look for `HUNK_OVERLAY` or custom overlay loader at entry. The overlay manager swaps code segments from disk — the binary on disk is larger than what's in memory at any moment.
|
||||
7. **Identify non-HUNK binary type**:
|
||||
- **Bootblock**: Exactly 1024 bytes (2 disk blocks), loaded to `$7C00` by Kickstart ROM
|
||||
- **Absolute-load blob**: Loaded to a fixed address (often `$C00000` for trapdoor Fast RAM)
|
||||
- **ROM module**: Has `RT_MATCHWORD` resident tag; part of Kickstart or expansion ROM
|
||||
- **Trackmo loader**: First sector contains a custom loader, not a bootblock — the loader then reads the rest of the demo from disk
|
||||
|
||||
### Phase 2: Map Control Flow
|
||||
|
||||
- **Chase `JMP`/`JSR` chains** from entry point outward. Mark each reached address. When you stop finding new addresses, the unreachable remainder is potential data or SMC target.
|
||||
- **Identify jump tables**: `JMP (A0, Dn.W)` or `MOVE.W offset(PC, Dn.W), D0` → `JMP (PC, D0.W)`. Count table entries by looking at the range of Dn values. IDA needs manual jump table specification for these.
|
||||
- **Cross-reference data tables**: values loaded via `LEA table(PC), An`. These tables are often copper lists, sprite control words, or audio sample pointers.
|
||||
- **Detect self-modifying code**: Any `MOVE`/`LEA` targeting an address within the CODE hunk boundaries is an SMC candidate. Flag and verify with dynamic analysis.
|
||||
- **Identify interrupt service routines**: Trace from vector table addresses. ISRs end with `RTE`, not `RTS`. They typically save/restore many registers at entry/exit.
|
||||
- **Map copper list interactions**: `COP1LC`/`COP2LC` writes indicate copper list switches. A `MOVE.L #new_list, $DFF080` (COP1LC write) triggers the copper to jump to a new instruction list — this is how demos switch between effects mid-frame.
|
||||
- **Trace blitter wait loops**: `BTST #6, $DFF002` / `BNE wait` — the standard "wait for blitter" pattern (polling DMAB_BLTDONE in DMACONR). Also `TST.B $DFF000` loop (wait for blitter via custom chip bus test).
|
||||
- **Flag unreachable code**: Code between `RTS`/`RTE`/`JMP` that isn't directly branched to — potential data, SMC target, or second-stage code loaded later.
|
||||
- **Identify Level 3 interrupt chains**: Music replayers and blitter queues commonly hook into the vertical blank interrupt (Level 3). The handler dispatches to multiple subscribers — find the dispatch loop to understand the full interrupt architecture.
|
||||
|
||||
### Phase 3: Reconstruct Calling Conventions
|
||||
|
||||
- **Map per-routine register usage**: For each identified function, track:
|
||||
- Which registers are **preserved** (saved/restored via `MOVEM.L` or stack pushes). The `MOVEM.L` save mask encodes this explicitly.
|
||||
- Which registers are **destroyed** (modified without save). These are the function's scratch/output registers.
|
||||
- Which registers hold **input parameters**. Look for registers used without prior initialization.
|
||||
- Which registers hold **return values**. D0 is conventional even in hand-written code, but not guaranteed.
|
||||
- **Identify custom ABIs**: Some authors consistently use e.g., A2=data pointer → data segment base, A3=copper list cursor, A4=hardware base ($DFF000), D7=scratch counter. These conventions are stable across a single author's body of work.
|
||||
- **Build a register allocation map**: Color-coded table of which registers carry which meaning across the program. This is the single most valuable artifact for understanding hand-written asm.
|
||||
- **Detect authorial fingerprints**: Consistent register conventions + coding idioms (e.g., always using `MOVEQ #0, Dn` over `CLR.L Dn`) suggest a single author or codebase reuse. This matters for provenance and for predicting conventions in unreversed sections.
|
||||
- **Watch for `USP` manipulation**: `MOVE USP, An` / `MOVE An, USP` is almost never generated by compilers. It indicates the author is using the User Stack Pointer for a second stack (common in context-switching code, coroutines, or task systems).
|
||||
|
||||
### Phase 4: Reconstruct Data Structures
|
||||
|
||||
<!-- TODO: Expand — struct reconstruction for non-C binaries -->
|
||||
|
||||
- **Copper list format**: 3-word instructions (IR1, IR2, data) or 2-word wait/move pairs
|
||||
- **Sprite control words**: `SPRxPOS`/`SPRxCTL` word pairs, attached sprite mode detection
|
||||
- **Blitter minterm lookup tables**: Precomputed blitter operation descriptions
|
||||
- **Audio sample tables**: Period/waveform pointer/volume structures for music replayers
|
||||
- **Custom module formats**: Pattern data, sample lists, effect command tables for Protracker/Soundtracker variants
|
||||
- **Bitmap/bitplane layouts**: Interleaved vs linear, planar depth detection from blitter source/dest usage
|
||||
- **Custom BSS-like allocations**: Large zeroed regions used as frame buffers, audio buffers, or look-up tables
|
||||
|
||||
### Phase 5: Hardware Interaction Mapping
|
||||
|
||||
<!-- TODO: Expand — custom chip register usage analysis -->
|
||||
|
||||
For each custom chip register touched, document:
|
||||
- **Which register** (address)
|
||||
- **From where** (code location)
|
||||
- **In what sequence** (interaction with other register writes)
|
||||
- **Purpose** (deduced from context: blitter setup, copper list switch, audio start, sprite positioning)
|
||||
|
||||
Build a **hardware register access matrix**:
|
||||
|
||||
<!-- TODO: Add table template -->
|
||||
|
||||
| Register | Writes From | Reads From | Deduced Purpose |
|
||||
|---|---|---|---|
|
||||
| `$DFF058` (BLTCON0) | `$01234`, `$05678` | — | Blitter operation setup |
|
||||
| `$DFF096` (DMACON) | `$00123` | `$04567` | DMA channel enable/disable |
|
||||
| ... | ... | ... | ... |
|
||||
|
||||
### Phase 6: Annotate
|
||||
|
||||
<!-- TODO: Expand — IDA/Ghidra annotation workflow for asm binaries -->
|
||||
|
||||
- **Rename functions**: Descriptive names based on deduced purpose (`vbl_irq_handler`, `blitter_queue_submit`, `copper_list_build`)
|
||||
- **Add comments**: Document register conventions at function entry, magic constants, hardware register purposes
|
||||
- **Create struct types**: For custom data structures discovered in Phase 4
|
||||
- **Mark non-code regions**: Force IDA/Ghidra to treat copper lists, sprite data, audio samples as data, not code
|
||||
- **Cross-reference hardware registers**: Create named constants for all `$DFFxxx`/`$BFExxx` addresses in the database
|
||||
- **Build a call graph**: Mermaid diagram of the full control flow for documentation
|
||||
|
||||
### Phase 7: Dynamic Verification
|
||||
|
||||
<!-- TODO: Expand — FS-UAE debugger methodology -->
|
||||
|
||||
- **Breakpoint on custom chip registers**: Verify that register writes occur at expected times
|
||||
- **Watchpoint on memory buffers**: Confirm copper list format, audio sample layout
|
||||
- **Trace mode**: Follow execution through a single frame to verify control flow reconstruction
|
||||
- **Modify-and-test**: Patch the binary and run it — if it breaks, your understanding was incomplete
|
||||
- **Compare static vs dynamic**: Does the code path you predicted match what actually executes?
|
||||
|
||||
---
|
||||
|
||||
## Tool-Specific Workflows
|
||||
|
||||
<!-- TODO: Expand — detailed walkthroughs for each tool -->
|
||||
|
||||
### IDA Pro
|
||||
|
||||
<!-- TODO: IDA-specific: HUNK loader quirks, auto-analysis overrides, scripting for jump table resolution, dealing with data-in-code sections, creating custom register name enums -->
|
||||
|
||||
### Ghidra
|
||||
|
||||
<!-- TODO: Ghidra-specific: Amiga plugin capabilities, 68k SLEIGH processor module limitations, script-based annotation, bookmarking hardware registers -->
|
||||
|
||||
### FS-UAE Debugger
|
||||
|
||||
<!-- TODO: FS-UAE debugger: attaching to running demo, breakpoints on custom chip addresses, memory watchpoints, trace output parsing, cycle-count verification -->
|
||||
|
||||
### Command-Line Pre-Analysis Pipeline
|
||||
|
||||
<!-- TODO: hunkinfo → custom Python scanner → IDA/Ghidra import workflow -->
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
<!-- TODO: Numbered list of actionable recommendations -->
|
||||
|
||||
1. **Never assume the ABI** — document the actual calling convention before tracing callers
|
||||
2. **Start from the entry point and work outward** — don't try to understand everything at once
|
||||
3. **Identify hardware register usage before control flow** — knowing which chips are used narrows the purpose
|
||||
4. **Treat every `MOVE` to an absolute address as a potential self-modifying code write** — until proven otherwise
|
||||
5. **Build a mermaid diagram of the control flow** — it reveals dead code, missing connections, and loop structures
|
||||
6. **Cross-reference relocation entries with code** — relocs tell you which addresses matter
|
||||
7. **Don't trust auto-analysis on mixed code/data sections** — manually define code/data boundaries
|
||||
8. **Run the binary in an emulator** — some behaviors (self-modifying code paths, copper effects) are invisible in static analysis
|
||||
9. **Look for known signatures first** — Protracker replayers, decrunch stubs, common macro libraries leave distinctive patterns
|
||||
10. **Document your register map as you work** — it prevents costly re-analysis when you realize A3 was actually a struct pointer
|
||||
|
||||
---
|
||||
|
||||
## Antipatterns
|
||||
|
||||
### 1. The Compiler Assumption
|
||||
|
||||
**Wrong**: Assuming `A6` holds a library base, `D0`/`D1` are scratch, and `A0`/`A1` are pointer temps.
|
||||
|
||||
**Why it fails**: Hand-written code may use `A6` as a general-purpose data register, `D6` as a frame pointer, or any other non-standard assignment. The author may have declared their own calling convention documented nowhere.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 2. The Prologue Scanner
|
||||
|
||||
**Wrong**: Scanning for `LINK A5` or `SUBQ.L #N,SP` to find function boundaries.
|
||||
|
||||
**Why it fails**: Hand-written assembly may have no standard function entry/exit markers. A routine might start with `MOVEM.L`, a label, or just fall through from the previous block.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 3. The String Hop
|
||||
|
||||
**Wrong**: Assuming `LEA _string(PC), A0` means A0 points to a C string.
|
||||
|
||||
**Why it fails**: Hand-written code may use `LEA` to point to bytecode tables, sprite data, copper lists, or packed structures. The "string" might be a custom encoding.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 4. The Register Reuse Confusion
|
||||
|
||||
**Wrong**: Assuming a register used in one context retains the same meaning throughout the program.
|
||||
|
||||
**Why it fails**: Hand-written asm aggressively reuses registers. The same D0 might be a loop counter in one block, an audio sample value in the next, and a scratch temporary in a third — all within 50 instructions. You must re-derive register meaning at each basic block.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 5. The Disassembly Loop Trap
|
||||
|
||||
**Wrong**: Letting IDA's auto-analysis recursively disassemble from every possible entry point.
|
||||
|
||||
**Why it fails**: Mixed code/data sections cause IDA to decode data as instructions, creating phantom functions from copper lists or audio samples. This pollutes the symbol table with nonsense and obscures real control flow.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 6. The Constant-as-Code Mistake
|
||||
|
||||
**Wrong**: Treating jump table offsets, copper list data, or sprite control words as instructions.
|
||||
|
||||
**Why it fails**: IDA/Ghidra don't know the difference between `$0180` (a copper WAIT for line 0) and `MOVE.B D0, D0` (which happens to encode as `$1000`). Without manual intervention, hardware data tables get disassembled into garbage.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 7. The One-Pass Delusion
|
||||
|
||||
**Wrong**: Attempting linear top-to-bottom analysis and expecting to understand everything on the first pass.
|
||||
|
||||
**Why it fails**: Hand-written asm often uses forward references, self-modifying code patched by an earlier init routine, or data tables that only make sense after you understand the code that consumes them. Reverse engineering is inherently iterative.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 8. The MOVEM Black Box
|
||||
|
||||
**Wrong**: Treating `MOVEM.L D0-D7/A0-A6, -(SP)` / `MOVEM.L (SP)+, D0-D7/A0-A6` as opaque blocks.
|
||||
|
||||
**Why it fails**: Understanding which registers are saved and restored tells you the function's register contract. A routine that saves D5-D7/A4-A5 preserves those across its call — they likely carry important state (frame counter, hardware base pointer, data cursor).
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
---
|
||||
|
||||
## Pitfalls
|
||||
|
||||
### 1. Assuming the OS Is Present
|
||||
|
||||
<!-- TODO: Expand — add worked example from real bootblock/demo code -->
|
||||
|
||||
```asm
|
||||
; This works on a running system:
|
||||
MOVE.L 4.W, A6 ; SysBase
|
||||
JSR LVO(-198, A6) ; OpenLibrary
|
||||
```
|
||||
|
||||
```asm
|
||||
; But in a bootblock or demo, $4.W may contain garbage
|
||||
; and libraries haven't been initialized yet.
|
||||
; The code might be:
|
||||
MOVE.L #$DFF000, A5 ; custom chip base, not SysBase
|
||||
JSR _custom_init(PC) ; custom initialization
|
||||
```
|
||||
|
||||
### 2. Misreading Jump Tables
|
||||
|
||||
Hand-written jump tables frequently use PC-relative indirect jumps with custom offsets that IDA doesn't auto-resolve.
|
||||
|
||||
<!-- TODO: Add worked example — MOVE.W jt(PC, D0.W), D0 / JMP (PC, D0.W) walkthrough -->
|
||||
|
||||
### 3. Self-Modifying Code Deception
|
||||
|
||||
```asm
|
||||
; The code you see is NOT what executes:
|
||||
MOVE.W #$4E71, (next_insn+2, PC) ; patch a NOP into the next instruction
|
||||
next_insn:
|
||||
CMPI.W #$0000, D0 ; becomes NOP at runtime
|
||||
```
|
||||
|
||||
<!-- TODO: Expand with detection methodology — FS-UAE trace comparison, pattern scanning -->
|
||||
|
||||
### 4. Copper List Misidentification
|
||||
|
||||
Copper instructions are 2-word pairs that look like MOVE instructions in disassembly:
|
||||
|
||||
```asm
|
||||
; A copper list at $20000 decoded as instructions by IDA:
|
||||
; DC.W $0180, $0000 → OR.B #$80, D0 / OR.B #0, D0 (garbage!)
|
||||
; DC.W $0182, $0FFF → OR.B #$82, D0 / OR.B #$FF, D0 (more garbage)
|
||||
; DC.W $FFFF, $FFFE → invalid opcode or data
|
||||
;
|
||||
; Correct interpretation:
|
||||
; $0180, $0000 = WAIT for line 0 (VP=$00, HP=$00)
|
||||
; $0182, $0FFF = WAIT for line 0, HP=$0F (standard copper wait)
|
||||
; $FFFF, $FFFE = END of copper list (WAIT forever — never triggers)
|
||||
```
|
||||
|
||||
**Detection methodology**:
|
||||
1. `COP1LC`/`COP2LC` writes give you the copper list address — start your data definition there
|
||||
2. Copper instructions come in **pairs of 16-bit words**. IR1 (first word) encodes the operation or register address; IR2 (second word) is the data or WAIT position.
|
||||
3. **WAIT**: IR1 bit 0 = 1. Decode VP (bits 8–15 of IR1, bits 0–7 of IR2), HP (bits 1–7 of IR1, bits 8–15 of IR2).
|
||||
4. **MOVE**: IR1 bit 0 = 0. IR1 is the register address ($DFFxxx), IR2 is the value to write.
|
||||
5. A `$FFFF, $FFFE` pair terminates the list.
|
||||
6. Mark the entire copper list address range as **data**, not code. Create an array of 4-byte copper instruction structs in IDA/Ghidra.
|
||||
|
||||
### 5. CIA Timer Code Confusion
|
||||
|
||||
CIA register access (`$BFE001`–`$BFEF01` for CIAA, `$BFD000`–`$BFDFFF` for CIAB) looks like any other memory access, but the TOD clock read sequence and timer control register patterns are distinctive:
|
||||
|
||||
```asm
|
||||
; CIA-A Timer A setup (often used for timing in games/demos):
|
||||
MOVE.B #$7F, $BFEE01 ; CIA-A ICR — clear all pending interrupts
|
||||
MOVE.B #$81, $BFEE01 ; CIA-A ICR — enable Timer A interrupt
|
||||
MOVE.B #low_byte, $BFE401 ; CIA-A Timer A low byte
|
||||
MOVE.B #high_byte, $BFE501 ; CIA-A Timer A high byte
|
||||
|
||||
; CIA-B Timer A/B setup (used by Protracker replayers!):
|
||||
MOVE.B #$7F, $BFDD00 ; CIA-B ICR — clear pending
|
||||
MOVE.B #$81, $BFDD00 ; CIA-B ICR — enable Timer A
|
||||
MOVE.B #lo, $BFDE00 ; CIA-B Timer A low (adjacent to CIA-B base $BFD000)
|
||||
|
||||
; Common mistake:
|
||||
; MOVE.B $BFE801, D0 → reading CIAA SDR (serial data register) — could be
|
||||
; mistaken for keyboard data, but it's actually the serial port.
|
||||
; Keyboard data is $BFEC01 (CIAA parallel port).
|
||||
```
|
||||
|
||||
**Key CIA registers for RE identification**:
|
||||
| Register | Address | Purpose |
|
||||
|---|---|---|
|
||||
| CIAA ICR | `$BFEE01` | Interrupt Control Register — enables/disables CIA-A interrupts |
|
||||
| CIAA Timer A Lo | `$BFE401` | Timer A low byte |
|
||||
| CIAA Timer A Hi | `$BFE501` | Timer A high byte |
|
||||
| CIAB ICR | `$BFDD00` | Interrupt Control Register — enables CIA-B interrupts (used by Protracker!) |
|
||||
| CIAB Timer A Lo | `$BFDE00` | Timer A low byte (Protracker tempo control) |
|
||||
| CIAB Timer A Hi | `$BFDF00` | Timer A high byte |
|
||||
|
||||
### 6. Blitter Queue Confusion
|
||||
|
||||
Blitter register writes (`BLTCON0`, `BLTSIZE`, etc.) look like ordinary memory stores to IDA. Without understanding that these are I/O registers, the disassembly shows meaningless `MOVE.W D0, abs_addr` sequences:
|
||||
|
||||
```asm
|
||||
; This looks like garbage writes to random addresses:
|
||||
MOVE.W #$09F0, $DFF040 ; BLTCON0 = use A,B,C channels, minterm=$F0
|
||||
MOVE.W #$0000, $DFF042 ; BLTCON1 = no fill, no line mode
|
||||
MOVE.W #$FFFF, $DFF044 ; BLTAFWM = first word mask (all bits)
|
||||
MOVE.W #$FFFF, $DFF046 ; BLTALWM = last word mask (all bits)
|
||||
MOVE.L #src, $DFF050 ; BLTAPT = source A pointer
|
||||
MOVE.L #dst, $DFF054 ; BLTDPT = destination D pointer
|
||||
MOVE.W #0, $DFF064 ; BLTAMOD = source A modulo (0 = linear)
|
||||
MOVE.W #0, $DFF066 ; BLTDMOD = dest D modulo
|
||||
MOVE.W #(h<<6)|w, $DFF058 ; BLTSIZE = start blit! (writing this triggers DMA)
|
||||
|
||||
; But this is a standard blitter rectangle copy. The register write ORDER
|
||||
; is fixed: BLTCON0→BLTCON1→BLTAFWM→BLTALWM→Pointers→Modulos→BLTSIZE.
|
||||
; BLTSIZE is always LAST — writing it starts the blit.
|
||||
```
|
||||
|
||||
**How to identify a blitter operation**:
|
||||
1. The sequence always ends with a write to `$DFF058` (BLTSIZE) — this is the trigger
|
||||
2. `BLTCON0` ($DFF040) encodes the minterm and active channels (bits 8–15 = minterm, bit 12=D, bit 11=C, bit 10=B, bit 9=A)
|
||||
3. Pointer registers ($DFF048–$DFF054) hold source/destination addresses — these are your key to understanding what data is being moved
|
||||
4. The blit size `(h<<6)|w` in BLTSIZE: height in upper 10 bits, width in lower 6 bits (width is in words, 0 = 64 words)
|
||||
5. Blitter wait: `BTST #6, $DFF002` (bit 6 of DMACONR = DMAB_BLTDONE) — polls until blitter finished
|
||||
|
||||
### 7. MOVEM Register Tracking Across Long Spans
|
||||
|
||||
<!-- TODO: A MOVEM.L save at the top of a function and a matching restore 200 instructions later is easy to miss. Missing it means you think registers survive the call when they're actually clobbered. -->
|
||||
|
||||
### 8. Code Embedded in Interrupt Vector Table
|
||||
|
||||
<!-- TODO: The vector table at $60-$7C (hunk offset) may contain short code sequences instead of pointers. A `BRA.W` at the vector location is valid — it jumps directly to the handler without an intermediate pointer. IDA may treat these as separate functions. -->
|
||||
|
||||
### 9. Dual-Playfield Register Set Confusion
|
||||
|
||||
<!-- TODO: Dual playfield uses separate sets of bitplane pointers (BPL1PT vs BPLxPT). Writes to both sets look like redundant operations but serve different playfields. -->
|
||||
|
||||
### 10. Stack-Based State Machines
|
||||
|
||||
Some hand-written code uses the stack as a state machine — pushing return addresses that represent state transitions, using `RTS` as a computed goto:
|
||||
|
||||
```asm
|
||||
; Instead of a switch statement, the author pushes state transition addresses:
|
||||
MOVE.L #STATE_IDLE, -(SP) ; push initial state
|
||||
...
|
||||
STATE_DISPATCH:
|
||||
RTS ; "return" to the state on top of stack
|
||||
|
||||
STATE_IDLE:
|
||||
; ... handle idle ...
|
||||
MOVE.L #STATE_PLAYING, -(SP) ; push next state
|
||||
BRA STATE_DISPATCH
|
||||
|
||||
STATE_PLAYING:
|
||||
; ... handle playing ...
|
||||
MOVE.L #STATE_PAUSED, -(SP) ; push next state
|
||||
BRA STATE_DISPATCH
|
||||
```
|
||||
|
||||
This pattern breaks all standard call/return analysis because `RTS` doesn't return to a caller — it jumps to the next state. IDA/Ghidra see `RTS` as a function exit and stop disassembling.
|
||||
|
||||
**Detection**: Look for `MOVE.L #addr, -(SP)` or `PEA addr(PC)` (push effective address) followed by `RTS` (or a branch to an `RTS`). These are state pushes, not function call setups.
|
||||
|
||||
### 11. Absolute Address Dependencies
|
||||
|
||||
Code that assumes a fixed load address (common in non-HUNK binaries) will break if relocated. For HUNK binaries, relocation entries tell you which absolute addresses must be patched at load time. Non-HUNK binaries lack relocation metadata entirely.
|
||||
|
||||
```asm
|
||||
; Absolute dependency example — works only at $C00000:
|
||||
LEA $C01000, A0 ; data at fixed offset from load address
|
||||
JSR $C00500 ; subroutine at fixed address within binary
|
||||
|
||||
; For a HUNK binary, these would be:
|
||||
LEA _data(PC), A0 ; PC-relative (no relocation needed)
|
||||
JSR _subroutine(PC) ; PC-relative
|
||||
```
|
||||
|
||||
**Critical**: Bootblock code at `$7C00` uses absolute JMP/JSR within the 1024-byte range. If you relocate the code for analysis, patch all absolute addresses or analyze in-place at the original address.
|
||||
|
||||
---
|
||||
|
||||
## Use-Case Cookbook
|
||||
|
||||
### Pattern 1: Finding the Main Loop in a Demo
|
||||
|
||||
<!-- TODO: Step-by-step walkthrough — follow entry point, find VBlank handler, identify frame counter increment, trace back to main loop that waits on frame counter. IDA Python script to automate. -->
|
||||
|
||||
### Pattern 2: Identifying a Custom Interrupt Handler
|
||||
|
||||
<!-- TODO: Walkthrough — grep for writes to $6C.W/$68.W/$70.W (vector table), trace to the handler code, identify RTE, document register saving convention. IDA Python to auto-detect. -->
|
||||
|
||||
### Pattern 3: Reconstructing a Jump Table
|
||||
|
||||
<!-- TODO: Walkthrough — find MOVE.W jt(PC, Dn.W), D0 / ADD.W D0, D0 / JMP (PC, D0.W) pattern, count entries, resolve offsets, rename targets. IDA Python script. -->
|
||||
|
||||
### Pattern 4: Detecting Self-Modifying Code with IDAPython
|
||||
|
||||
<!-- TODO: Walkthrough — scan for instructions that compute addresses within the CODE segment and write to them, flag as potential SMC, cross-reference with dynamic trace. -->
|
||||
|
||||
### Pattern 5: Identifying a Protracker Replay Routine
|
||||
|
||||
The most commonly found audio subsystem in Amiga binaries. Here's the full identification workflow:
|
||||
|
||||
1. **Find the CIA interrupt vector write**: Search for `MOVE.L #xxx, $6C.W` — this installs the Level 6 (CIA-B timer) interrupt handler used by Protracker for tempo.
|
||||
2. **Identify the CIA-B timer setup**: `MOVE.B #$7F, $BFDD00` / `MOVE.B #$81, $BFDD00` — this configures CIA-B to generate timer interrupts.
|
||||
3. **Trace to the interrupt handler**: The handler saves ALL registers (`MOVEM.L D0-D7/A0-A6, -(SP)`), calls the replayer tick function, then restores all and does `RTE`.
|
||||
4. **Find the audio register writes**: Look for writes to `$DFF0A0`–`$DFF0D0` (AUDxLCH/LCL/PER/VOL). The pattern `MOVE.L sample_ptr, $DFF0A0` / `MOVE.W period, $DFF0A6` / `MOVE.W vol, $DFF0A8` is the per-channel audio update.
|
||||
5. **Identify effect command dispatch**: A `MOVE.W effect_cmd, D0` / `ANDI.W #$0F, D0` / `ADD.W D0, D0` / `JMP (effect_table, D0.W)` pattern dispatches to arpeggio, portamento, vibrato, etc. handlers.
|
||||
6. **Map the pattern data layout**: The replayer reads pattern data via sequential `MOVE.B (A0)+` — map the track/note mapping. Standard format: 4 bytes per note (upper nibble = sample number, lower 12 bits = period).
|
||||
|
||||
**IDA Python script fragment** to auto-detect Protracker replayers:
|
||||
```python
|
||||
# Search for the Level 6 vector installation pattern:
|
||||
# MOVE.L #handler, $6C.W = 21FC xxxx xxxx 006C
|
||||
ea = idaapi.find_binary(0, BADADDR, "21 FC ?? ?? ?? ?? 00 6C", 16, SEARCH_DOWN)
|
||||
if ea != BADADDR:
|
||||
handler = Dword(ea + 2)
|
||||
print(f"Found Level 6 interrupt handler at ${ea:08X} → ${handler:08X}")
|
||||
```
|
||||
|
||||
### Pattern 6: Reversing a Bootblock Virus
|
||||
|
||||
Bootblock viruses are the ideal entry point for learning Amiga RE — they're small (1024 bytes), self-contained, and exercise key system mechanisms:
|
||||
|
||||
#### Lamer Exterminator (October 1989)
|
||||
- **Size**: 1024 bytes (exactly 2 disk blocks)
|
||||
- **Residence**: Installs itself in memory, hooks system vectors
|
||||
- **Infection vector**: Writes itself to any write-enabled disk's bootblock during disk access
|
||||
- **Damage routine**: After activation, overwrites victim bootblocks 84 times with the string `"LAMER!"` — this trashes the disk
|
||||
- **CoolCapture**: Uses the CoolCapture vector for post-reset survival — after a warm reset, the virus re-activates from the captured state
|
||||
- **Detection text**: Sometimes leaves identifiable strings in the bootblock
|
||||
|
||||
#### SADDAM Bootblock Virus
|
||||
- **Size**: 1024 bytes
|
||||
- **Residence**: Copies itself to `$7F000` in memory (just below the 512KB Chip RAM boundary)
|
||||
- **Interrupt hooking**: Hooks Level 3 interrupt (Vertical Blank/Copper/Blitter) via the interrupt vector table
|
||||
- **Infection trigger**: First "read Rootblock" command after a reset — this infects any disk accessed after boot
|
||||
- **Stealth**: Writes the original bootblock back to disk when the rootblock is read (hiding its presence)
|
||||
- **System modification**: Clears `CoolCapture`, `KickTagPtr`, and `KickCheckSum` — disables the system's ability to detect bootblock changes
|
||||
- **Anti-detection text**: Contains the misleading string `"A2000 MB Memory Controller V2"` to disguise itself as a hardware ROM
|
||||
- **Damage trigger**: After ~30,000 interrupt calls, crashes the system by showing an alert in a Level 3 interrupt context
|
||||
|
||||
#### Common Virus RE Workflow
|
||||
1. **Extract the bootblock**: The first 1024 bytes of an infected disk (blocks 0–1)
|
||||
2. **Determine load address**: Bootblocks are loaded to `$7C00` by the Kickstart ROM
|
||||
3. **Identify the infection mechanism**: Look for `DoIO()` / `SendIO()` calls to `trackdisk.device` for writing back to disk
|
||||
4. **Find the residency mechanism**: `CoolCapture`, `KickTagPtr` manipulation, or RAM copy to `$7F000` + vector hooking
|
||||
5. **Trace the trigger condition**: What event activates the virus? Timer count, disk access count, specific command?
|
||||
6. **Document the payload**: Does it corrupt data? Display a message? Overwrite bootblocks?
|
||||
|
||||
### Pattern 7: Finding the Decrunch Stub in a Packed Demo
|
||||
|
||||
The decrunch stub is the gateway to the real binary. Finding and understanding it is prerequisite to all further analysis:
|
||||
|
||||
**Identification by signature**:
|
||||
|
||||
| Cruncher | Magic/Pattern | Decrunch Stub Size | Notes |
|
||||
|---|---|---|---|
|
||||
| **PowerPacker** | `$42` followed by LEA/MOVE pattern near entry | ~200–300 bytes | Uses powerpacker.library; `ppDecrunch()` is the library call |
|
||||
| **Imploder** | Entry has `MOVE.L D0, -(SP)` / `LEA xxx(PC), A0` pattern | ~300–400 bytes | ATN!Imploder; slower decompression, better ratio than early PP |
|
||||
| **Shrinkler** | Entry starts with context-mixing setup code | ~2KB | Extremely high ratio; decrunch takes minutes on 7 MHz 68000 |
|
||||
| **ByteKiller** | Short BRA.S over header data, then MOVEM.L pattern | ~100 bytes | Simple LZ variant; very common in 1988–1991 productions |
|
||||
| **CrunchMania** | String `"CR!"` at or near entry | ~150 bytes | Fastest decruncher; popular for 4K intros |
|
||||
|
||||
**Decrunch strategy**:
|
||||
1. Identify the stub: The first code that executes after the entry point. It reads packed data and expands it to a destination address.
|
||||
2. Let the stub run in an emulator: Set a breakpoint after the decrunch loop completes (look for the `JMP` or `JSR` to the unpacked entry point).
|
||||
3. Dump the decrunched memory: The real binary is now in RAM. Save it for static analysis.
|
||||
4. Optionally: Write an unpacker script — for known formats, run the original cruncher's decruncher against the packed data in a standalone tool.
|
||||
|
||||
**Rob Northen Copylock / Trace Vector Decoder (TVD)**:
|
||||
A special case that appears like a cruncher but is actually a protection system:
|
||||
- Encrypted code is executed one instruction at a time using the 68000 **trace exception**
|
||||
- The trace handler (interrupt vector `$24`) decrypts the next instruction, executes it, then sets the trace bit again
|
||||
- This prevents static disassembly — you only see the encrypted bytes and the trace handler, not the real code
|
||||
- **Detection**: `MOVE #$8000, SR` (set trace bit), `ORI #$8000, SR` in the entry code, plus a custom handler at vector `$24`
|
||||
- **Solution**: Let it execute in FS-UAE with a trace logger, or single-step through and record each decrypted instruction
|
||||
|
||||
### Pattern 8: Identifying a Custom Memory Allocator
|
||||
|
||||
<!-- TODO: Walkthrough — game/demo custom heap management; find the alloc/free routines by looking for LINK-like constructs (linked list of free blocks) without library calls; track the allocation pattern to understand memory layout. -->
|
||||
|
||||
### Pattern 9: Reconstructing a Blitter Queue
|
||||
|
||||
<!-- TODO: Walkthrough — identify blitter register write sequences (BLTCON0, BLTSIZE), find the queue submission routine, map the queue data structure, trace blitter-interrupt continuation. -->
|
||||
|
||||
### Pattern 10: Recovering a Sprite Multiplexer
|
||||
|
||||
<!-- TODO: Walkthrough — copper list sprite pointer updates per raster line, sprite control word pairs, attached sprite mode detection, mapping which logical sprite occupies which scanline range. -->
|
||||
|
||||
### Pattern 11: Extracting a Custom Module Replayer
|
||||
|
||||
<!-- TODO: Walkthrough — identifying pattern data format, sample table layout, effect command dispatch; documenting the custom format to enable playback or conversion to standard Protracker MOD. -->
|
||||
|
||||
### Pattern 12: Tracing a Trackloader
|
||||
|
||||
<!-- TODO: Walkthrough — trackdisk.device bypass, raw MFM decoding in software, custom DSKSYNC-based sync word detection, multi-revolution loading strategies, Rob Northen loader identification. -->
|
||||
|
||||
---
|
||||
|
||||
## Real-World Examples
|
||||
|
||||
### Demo Productions — RE Challenge Highlights
|
||||
|
||||
| Production | Group | Year | Key RE Challenge |
|
||||
|---|---|---|---|
|
||||
| **Arte** | Sanity | 1993 | Dense blitter queue system; effects dispatched via jump table with per-effect copper list switching; multi-part architecture with custom module loader |
|
||||
| **Desert Dream** | Kefrens | 1993 | Multi-part trackmo with per-part custom loaders; heavy copper wizardry (raster bars, palette splits, sprite multiplexing); custom Protracker variant replayer |
|
||||
| **Nexus 7** | Andromeda | 1994 | 3D vector engine with custom math routines (no FPU); object system with update/render phases; blitter-filled polygons |
|
||||
| **Enigma** | Phenomena | 1991 | Modular effect system — each effect is a self-contained subroutine registered in a dispatch table; custom memory management across effect transitions |
|
||||
| **State of the Art** | Spaceballs | 1992 | Morphing effects, rotate-zoomer, vector balls; heavy use of precomputed tables; custom blitter queue for compositing |
|
||||
| **Hardwired** | Crionics & Silents | 1991 | Early 3D vector engine; spreadsheet-generated sine tables identified by their perfect mathematical precision; copper-chunky display mode simulation |
|
||||
|
||||
### Games
|
||||
|
||||
| Title | Year | Key RE Challenge |
|
||||
|---|---|---|
|
||||
| **Shadow of the Beast** | 1989 | 13-level parallax scrolling using dual playfield + sprite overlays; custom blitter queues for sprite rendering; 512-color still images via palette-split copper lists |
|
||||
| **Turrican II** | 1991 | Sprite multiplexer with 20+ sprites on screen; copper-driven status bar split; large state machine for enemy AI |
|
||||
| **Lotus Turbo Challenge 2** | 1991 | Software road rendering with copper sky gradient; blitter-driven car sprite compositing; 2-player split-screen via copper screen split |
|
||||
| **Cannon Fodder** | 1993 | OS-friendly (uses graphics.library!) but still hits hardware for scrolling; custom memory allocator for soldier/bullet objects |
|
||||
| **Pinball Dreams** | 1992 | Multi-ball physics engine; copper-driven score display; custom module replayer with sound effects mixing into music channels |
|
||||
|
||||
### Bootblock Intros — The Art of 1024 Bytes
|
||||
|
||||
Bootblock intros compress entire demoscene effects into two disk sectors:
|
||||
- **Red Sector Inc. (RSI)** bootblocks: Often include a simple scrolltext, starfield, and a logo — all in 1024 bytes of raw m68k
|
||||
- **Tristar & Red Sector Inc. (TRSI)** bootblocks: More advanced effects (copper bars, vector objects)
|
||||
- **SADDAM virus**: A case study in anti-RE techniques within a bootblock — misleading strings, interrupt hooking, stealth write-back
|
||||
- **Lamer Exterminator**: The most infamous Amiga virus, studied for its CoolCapture survival mechanism
|
||||
|
||||
---
|
||||
|
||||
## Cross-Platform Comparison
|
||||
|
||||
| Platform | Assembly RE Challenge | Amiga Analog |
|
||||
|---|---|---|
|
||||
| **C64 (6502)** | Zero-page usage, self-modifying code, raster interrupts | Custom chip register banging, copper-synced code |
|
||||
| **Atari ST (68000)** | Similar CPU but different hardware registers | Amiga custom chips vs ST's simpler shifter/blitter |
|
||||
| **DOS (x86)** | Segment:offset addressing, BIOS/DOS interrupt vectors | Amiga library JMP tables, exec interrupt vectors |
|
||||
| **NES (6502)** | Tight mapper constraints, PPU timing loops | Similar raster-sync challenges in demos |
|
||||
| **Arcade (68000)** | Shared CPU family, custom hardware | Same CPU, different memory maps and custom chips |
|
||||
| **SNES (65816)** | Hardware register banging, HDMA (like copper) | Copper list is the direct analog of SNES HDMA channels |
|
||||
| **Genesis/Mega Drive (68000)** | Same CPU, VDP register interface, Z80 coprocessor | Closest analog — 68000 + custom video hardware, similar register-banging style |
|
||||
| **Game Boy (Z80-like)** | Tight memory (8KB), scanline interrupts, OAM DMA | Similar to bootblock constraints — extreme optimization in tiny space |
|
||||
|
||||
---
|
||||
|
||||
## Historical Context — Why Hand-Written Assembly Dominated
|
||||
|
||||
Before 1990, there were few practical alternatives to assembly for Amiga software that needed to be fast:
|
||||
|
||||
| Factor | Detail |
|
||||
|---|---|
|
||||
| **Compiler quality** | Pre-SAS/C 5.x compilers (Lattice C, Manx Aztec C, early SAS/C) generated code 5–20× slower than hand-tuned assembly for graphics/audio |
|
||||
| **Hardware gap** | A 7 MHz 68000 with 512 KB Chip RAM had zero margin for inefficient code — games and demos needed every CPU cycle |
|
||||
| **OS overhead** | The AmigaOS graphics.library added measurable overhead (layer locking, clipping rectangle checks). Games bypassed it entirely and wrote directly to `$DFFxxx` registers |
|
||||
| **Demoscene culture** | Assembly was the "real" language of the demoscene. Using a compiler was considered lazy — the code *itself* was the art form |
|
||||
| **Size constraints** | Bootblocks (1024 bytes), 4K intros, and single-disk demos imposed hard size limits. Assembly gave precise control over every byte |
|
||||
| **Custom chip intimacy** | Copper lists, blitter queues, and audio DMA are fundamentally low-level. High-level languages abstracted away the very features that made Amiga programming distinctive |
|
||||
|
||||
**The transition**: By 1992–1994, faster CPUs (68020+), more RAM, and mature compilers (SAS/C 6.x, GCC 2.95.x) made C viable for commercial software. But the demoscene stayed with assembly into the late 1990s — and AGA productions on 68060 accelerators continue to use hand-written assembly today.
|
||||
|
||||
---
|
||||
|
||||
## Modern Analogies
|
||||
|
||||
<!-- TODO: Expand — connect hand-written asm concepts to modern developer experience -->
|
||||
|
||||
| Hand-Written Asm Concept | Modern Analogy | Where It Holds / Breaks |
|
||||
|---|---|---|
|
||||
| Cycle-counted raster effects | GPU fragment shader dispatch | Holds: per-pixel/per-scanline execution; breaks: asm is imperative timing, shaders are data-parallel |
|
||||
| Custom blitter queue | GPU command buffer / DMA transfer list | Holds: structured descriptor-based hardware offload; breaks: blitter is in-order, GPUs reorder |
|
||||
| Hardware register banging | MMIO device drivers in embedded systems | Holds: same concept — memory-mapped I/O; breaks: Amiga registers are video/audio, not peripherals |
|
||||
| Self-modifying code | JIT compilation (V8, LuaJIT, WASM) | Holds: code generation at runtime; breaks: SMC patches existing code, JIT generates new code |
|
||||
| Copper list | G-sync / FreeSync adaptive refresh + shader constants per scanline | Holds: timing-sensitive display updates; breaks: copper is a programmable coprocessor, not a protocol |
|
||||
| Stack-based state machine | Coroutine dispatch / async/await | Holds: non-linear control flow; breaks: stack manipulation vs language-level async |
|
||||
| Position-independent code | ASLR + PIE executables | Holds: same goal (run anywhere); breaks: asm PIC is manual, modern PIC is linker/loader assisted |
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
### Q1: How do I know if a function is an interrupt handler vs a regular subroutine?
|
||||
|
||||
<!-- TODO -->
|
||||
|
||||
### Q2: What's the best way to detect self-modifying code?
|
||||
|
||||
<!-- TODO -->
|
||||
|
||||
### Q3: How do I handle code that mixes data and instructions?
|
||||
|
||||
<!-- TODO -->
|
||||
|
||||
### Q4: How do I tell code from data in a mixed section?
|
||||
|
||||
<!-- TODO: Heuristics: what does the byte sequence look like as both code and data? Which interpretation produces more cross-references? Check against known data formats (copper list, sprite, audio). -->
|
||||
|
||||
### Q5: How do I handle encrypted or obfuscated code?
|
||||
|
||||
<!-- TODO: Detection (high entropy, no readable strings), decryption routine identification (XOR loop at entry point), dynamic extraction via emulator memory dump, dealing with layered encryption (decryptor decrypts next decryptor). -->
|
||||
|
||||
### Q6: How do I deal with copper-synced code?
|
||||
|
||||
<!-- TODO: Code that runs at specific scanlines via copper WAIT; the same function may execute multiple times per frame at different raster positions; execution context matters — what's the beam position, which bitplane is being displayed, what's in the color registers? -->
|
||||
|
||||
### Q7: What about self-relocating code?
|
||||
|
||||
<!-- TODO: How to detect (code copies itself, patches absolute addresses), how to trace the relocation table, how to produce a static IDA database that matches the relocated layout. -->
|
||||
|
||||
### Q8: How do I identify custom chip register usage patterns?
|
||||
|
||||
<!-- TODO: Group registers by chip (blitter, copper, audio, sprite, bitplane), identify common write sequences (blitter setup = BLTCON0→BLTAPT→BLTBPT→...→BLTSIZE), build a state machine of expected register write order for each chip. -->
|
||||
|
||||
### Q9: Why do I see `MOVE.W D0, $DFF000` — absolute short addressing to custom registers?
|
||||
|
||||
<!-- TODO: The Amiga custom chips sit in the low 64KB of the 16MB address space, so absolute short addressing mode (sign-extended 16-bit offset) can reach them. This is an optimization — 2 bytes shorter than absolute long and 4 cycles faster. Hand-written code uses this aggressively. -->
|
||||
|
||||
### Q10: How do I trace blitter operations without hardware?
|
||||
|
||||
<!-- TODO: Blitter emulation in FS-UAE debugger; reading blitter register state at breakpoints; deriving source/dest/minterm from BLTCON0/BLTCON1; calculating blit size from BLTSIZE; understanding blitter nasty mode (BLTPRI) and its effect on CPU synchronization. -->
|
||||
|
||||
### Q11: What's the difference between a software interrupt and a hardware interrupt in the code?
|
||||
|
||||
<!-- TODO: Hardware interrupts set by custom chips (INTREQ bits), software interrupts triggered by CPU writing to INTREQ, the distinction matters for understanding the event source. TRAP #N instructions are yet another category. -->
|
||||
|
||||
### Q12: How do I identify which demo group or author wrote this?
|
||||
|
||||
<!-- TODO: Stylistic fingerprints — register conventions (e.g., A4=hardware base), macro library signatures (Photon's startup code), code layout (effects as subroutines vs inline), comment strings in the binary, known author-specific optimization tricks. -->
|
||||
|
||||
### Q13: How do I reverse engineer an audio driver / module replayer?
|
||||
|
||||
<!-- TODO: Audio interrupt (Level 4) analysis, sample period calculation, sample pointer advancement, volume/period effect command dispatch, identifying Protracker vs NoiseTracker vs Soundtracker vs custom format differences. -->
|
||||
|
||||
### Q14: What do I do when IDA creates 500 phantom functions from copper data?
|
||||
|
||||
<!-- TODO: Batch-undefine approach, scripting to identify copper list boundaries, creating a copper list data type, converting undefined bytes to copper instruction arrays. -->
|
||||
|
||||
---
|
||||
|
||||
## FPGA / Emulation Impact
|
||||
|
||||
<!-- TODO: Expand — timing-critical code (cycle-exact loops), self-modifying code on FPGA (cache coherency), copper-synced code verification on MiSTer, blitter timing accuracy requirements for demos that push blitter bandwidth limits, 68000 vs 68020+ behavior differences (MOVE SR is privileged on 68010+, loop mode on 68010, etc.) -->
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [m68k_codegen_patterns.md](m68k_codegen_patterns.md) — Compiler codegen fingerprint catalog
|
||||
- [compiler_fingerprints.md](../compiler_fingerprints.md) — Compiler identification at a glance
|
||||
- [string_xref_analysis.md](string_xref_analysis.md) — String cross-reference methodology
|
||||
- [hunk_reconstruction.md](hunk_reconstruction.md) — HUNK binary reconstruction
|
||||
- [struct_recovery.md](struct_recovery.md) — Struct layout reconstruction
|
||||
- [api_call_identification.md](api_call_identification.md) — Library call recognition
|
||||
- [exe_crunchers.md](../../03_loader_and_exec_format/exe_crunchers.md) — Decruncher identification and unpacking
|
||||
- [code_vs_data_disambiguation.md](code_vs_data_disambiguation.md) — distinguishing code bytes from data/variables
|
||||
- [copper_programming.md](../../08_graphics/copper_programming.md) — Copper list format and programming
|
||||
- [blitter_programming.md](../../08_graphics/blitter_programming.md) — Blitter operation reference
|
||||
- [paula_audio.md](../../01_hardware/ocs_a500/paula_audio.md) — Audio hardware register reference
|
||||
- [custom_registers.md](../../01_hardware/ocs_a500/custom_registers.md) — Complete custom chip register map
|
||||
- *M68000 Family Programmer's Reference Manual* — Instruction set and timing
|
||||
- *Amiga Hardware Reference Manual* — Custom chip register map and DMA cycles
|
||||
- *Amiga Disk Drives Inside & Out* (Abt Electronics) — Trackloader and MFM encoding reference
|
||||
697
05_reversing/static/code_vs_data_disambiguation.md
Normal file
697
05_reversing/static/code_vs_data_disambiguation.md
Normal file
|
|
@ -0,0 +1,697 @@
|
|||
[← Home](../../README.md) · [Reverse Engineering](../README.md) · [Static Analysis](README.md)
|
||||
|
||||
# Code vs Data Disambiguation — Telling Instructions from Variables
|
||||
|
||||
## Overview
|
||||
|
||||
Disassemblers are not oracles. IDA Pro and Ghidra use heuristic algorithms — linear sweep or recursive descent — to decide which bytes are code and which are data. On the Amiga, these heuristics fail routinely: jump tables between functions, PC-relative strings embedded in `.text`, hand-written assembly where data lives in branch-not-taken slots, FORTH-style threaded code where the "program" is a data structure, and copper lists that happen to decode as valid 68k instructions. Every Amiga reverse engineer eventually stares at IDA and thinks: *"Is this real code, or did the disassembler just hallucinate a function out of a color table?"*
|
||||
|
||||
This article provides a **systematic methodology** for distinguishing code from data in Amiga m68k binaries — covering automated detection techniques, manual disambiguation workflows, Amiga-specific failure modes, and tool-specific procedures for both IDA Pro and Ghidra.
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
BYTES["Raw bytes in memory / file"]
|
||||
|
||||
subgraph "Auto-Analysis Methods"
|
||||
LINEAR["Linear Sweep<br/>disassemble sequentially<br/>from entry point"]
|
||||
RECURSIVE["Recursive Descent<br/>follow control flow<br/>from known entry points"]
|
||||
SIG["Signature Matching<br/>FLIRT / function start patterns"]
|
||||
end
|
||||
|
||||
subgraph "Failure Modes"
|
||||
F1["Jump table between<br/>functions → data as code"]
|
||||
F2["PC-relative data in .text<br/>→ data as code"]
|
||||
F3["Unreferenced code<br/>→ code as data"]
|
||||
F4["SMC target / decryptor<br/>→ code as data"]
|
||||
F5["Copper list / color table<br/>→ data as code"]
|
||||
end
|
||||
|
||||
subgraph "Manual Disambiguation"
|
||||
M1["Cross-reference analysis<br/>is anyone reading this?"]
|
||||
M2["Relocation table check<br/>HUNK_RELOC32 targets"]
|
||||
M3["Instruction validity<br/>valid 68k encoding?"]
|
||||
M4["Control flow analysis<br/>does execution reach here?"]
|
||||
M5["Entropy / pattern check<br/>ASCII? Repeated pattern?"]
|
||||
end
|
||||
|
||||
BYTES --> LINEAR & RECURSIVE & SIG
|
||||
LINEAR --> F1
|
||||
RECURSIVE --> F2 & F3
|
||||
SIG --> F1
|
||||
F1 & F2 & F3 & F4 & F5 --> M1 & M2 & M3 & M4 & M5
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## How Disassemblers Decide — And Why They Fail
|
||||
|
||||
### Recursive Descent (IDA's Default)
|
||||
|
||||
Recursive descent starts from known entry points (the HUNK_HEADER entry, exported symbols, interrupt vectors) and follows all control-flow edges — `JSR`, `BRA`, `BEQ`, `RTS`, etc. Any byte not reached by tracing from an entry point is left as undefined data.
|
||||
|
||||
**Why it fails on Amiga binaries**:
|
||||
- **Jump tables** (`JMP (PC, Dn.W)` or `MOVE.W jt(PC, Dn.W), D0` / `JMP (PC, D0.W)`) — the table entries are data, not code, but they live between code regions. IDA often misinterprets table entries as instructions unless you manually define them.
|
||||
- **Computed calls** (`JSR (A0)`, `JSR $00(A0, D0.W)`) — the disassembler cannot trace through a register-indirect call. Functions reached only via function pointers are invisible.
|
||||
- **Callback chains** — exec library hooks, interrupt server chains, and BOOPSI method dispatch all use indirect calls through function pointers. None are reachable via static control-flow tracing.
|
||||
|
||||
### Linear Sweep (Ghidra's Default, IDA Fallback)
|
||||
|
||||
Linear sweep disassembles everything sequentially from a starting address, instruction by instruction, regardless of control flow. If it hits a `DC.B 0` in the middle of a code section, it will decode garbage instructions from there onward.
|
||||
|
||||
**Why it fails on Amiga binaries**:
|
||||
- **PC-relative data in `.text`** — GCC embeds strings and jump tables in the code hunk. After a function's `RTS`, the next bytes might be `DC.B "Hello, World!", 0`. Linear sweep decodes `$48 $65 $6C $6C` as `SWAP D5` / `BCS.S $+$6E` — complete nonsense.
|
||||
- **Padding bytes** — SAS/C aligns functions to word boundaries. The padding byte (`$00` or `$4E71` = NOP) between functions can misalign the linear sweep if it starts at an odd address.
|
||||
- **Data hunks loaded as code** — If a DATA hunk is accidentally loaded into IDA as code, linear sweep will decode global variables as instructions. A global string `"dos.library"` becomes `MOVE.B -(A5), D2` / `ORI.B #$6C, D5` — plausible-looking but meaningless.
|
||||
|
||||
---
|
||||
|
||||
## Amiga-Specific Failure Modes
|
||||
|
||||
### 1. Jump Tables Between Functions (SAS/C, GCC `switch`)
|
||||
|
||||
```asm
|
||||
; SAS/C switch statement — dense case jump table:
|
||||
_cmd_dispatch:
|
||||
CMPI.W #MAX_CMD, D0
|
||||
BHI.S .default
|
||||
ADD.W D0, D0 ; word index
|
||||
MOVE.W .jt(PC, D0.W), D0 ; fetch offset from table
|
||||
JMP (.jt+2)(PC, D0.W) ; jump through table
|
||||
|
||||
.jt:
|
||||
DC.W .case_open - .jt ; ← DATA, not code!
|
||||
DC.W .case_close - .jt ; ← DATA, not code!
|
||||
DC.W .case_read - .jt ; ← DATA, not code!
|
||||
DC.W .case_write - .jt ; ← DATA, not code!
|
||||
|
||||
; If IDA treats .jt as code, it produces:
|
||||
; ORICR #$xxxx, SR (or some other valid-but-wrong instruction)
|
||||
; ...
|
||||
```
|
||||
|
||||
**Why it fools disassemblers**: Jump table entries are word-aligned offsets that happen to be valid 68k opcodes. IDA's linear sweep decodes each as an instruction. Ghidra's recursive descent never sees them because no control-flow edge explicitly targets each table slot.
|
||||
|
||||
**Detection**: Jump tables always follow a `MOVE.W offset(PC, Dn.W), D0` / `JMP (PC, D0.W)` pattern. In IDA, manually undefine the table region (U key) and define it as words (D key → `dc.w`).
|
||||
|
||||
### 2. PC-Relative Data Embedded in `.text` (GCC, VBCC, DICE C)
|
||||
|
||||
```
|
||||
GCC .text hunk layout:
|
||||
┌──────────────────────┐
|
||||
│ _func1: │ ← code
|
||||
│ MOVEM.L D2,-(SP) │
|
||||
│ ... │
|
||||
│ RTS │
|
||||
├──────────────────────┤
|
||||
│ .LC0: │ ← data (string constant)
|
||||
│ DC.B "dos.library" │
|
||||
│ DC.B 0 │
|
||||
├──────────────────────┤
|
||||
│ .LC1: │ ← data (jump table)
|
||||
│ DC.L .L5 │
|
||||
│ DC.L .L6 │
|
||||
├──────────────────────┤
|
||||
│ _func2: │ ← code
|
||||
│ LINK A6, #-$10 │
|
||||
│ ... │
|
||||
└──────────────────────┘
|
||||
```
|
||||
|
||||
**Why it fools disassemblers**: The string `"dos.library"` ($64 $6F $73 $2E ...) decodes as `BCC.S` / `LEA` / `BCC.S` — valid 68k instructions. Linear sweep marches straight through strings into whatever follows, creating phantom functions.
|
||||
|
||||
**Detection**:
|
||||
1. Check if the "instruction" sequence decodes to printable ASCII (`$20`–`$7E` range, null-terminated)
|
||||
2. Cross-reference backwards — is a `LEA xxx(PC), An` pointing at this exact address? If yes, it's a string.
|
||||
3. Check HUNK_RELOC32 — if no relocations point here, it's less likely to be code (but not guaranteed)
|
||||
|
||||
### 3. Branch-Not-Taken Data (Hand-Written Assembly)
|
||||
|
||||
```asm
|
||||
; Classic hand-written asm pattern: data after unconditional branch
|
||||
CMPI.W #MAX_ENTRIES, D0
|
||||
BCC.S .invalid
|
||||
ADD.W D0, D0
|
||||
MOVE.W .data_table(PC, D0.W), D1
|
||||
RTS
|
||||
|
||||
.data_table: ; ← never executed — reached only via PC-relative load
|
||||
DC.W $0120 ; these ARE data
|
||||
DC.W $0340
|
||||
DC.W $0560
|
||||
BRA.S .data_table ; ← wait, is this code? No — it's still data
|
||||
; But linear sweep would decode $0120, $0340, $0560 as BTST, ROL, etc.
|
||||
```
|
||||
|
||||
**Detection**: Look for `RTS` / `RTE` / `JMP` / `BRA` instructions. Anything after an unconditional control transfer that isn't the target of a branch elsewhere is suspect data.
|
||||
|
||||
### 4. Unreferenced Valid Code (Callback, Interrupt Handler)
|
||||
|
||||
The opposite problem: **real code that looks like data** because no static control flow reaches it.
|
||||
|
||||
```asm
|
||||
; Interrupt handler — installed at runtime via SetIntVector()
|
||||
; No JSR/BRA in the binary points here. IDA sees unreferenced bytes.
|
||||
_vblank_handler:
|
||||
MOVEM.L D0-D7/A0-A6, -(SP) ; valid code!
|
||||
MOVE.W #$0020, $DFF09C ; clear VBlank interrupt
|
||||
...
|
||||
MOVEM.L (SP)+, D0-D7/A0-A6
|
||||
RTE
|
||||
|
||||
; This handler is referenced ONLY by a runtime MOVE.L #_vblank_handler, $6C.W
|
||||
; The $6C.W absolute address write is a data write — IDA doesn't trace through it.
|
||||
```
|
||||
|
||||
**Detection**: Search for `MOVE.L #$XXXXXXXX, $6C.W` or `MOVE.L #$XXXXXXXX, $XXXX.W` — these are vector table installations. The `$XXXXXXXX` is a function pointer. Also search `HUNK_SYMBOL` for callback-named symbols (e.g., `h_Entry`, `intr_code`, `isr_`).
|
||||
|
||||
### 5. Copper Lists — The Ultimate Data-as-Code Trap
|
||||
|
||||
```asm
|
||||
; A perfectly valid copper list:
|
||||
DC.W $0180, $0000 ; COLOR00 = black
|
||||
DC.W $0182, $0FFF ; COLOR01 = white
|
||||
DC.W $FFFF, $FFFE ; WAIT for line 256
|
||||
|
||||
; Linear sweep decodes:
|
||||
MOVE.B D0, $0000 ; ?! (valid instruction, nonsensical)
|
||||
MOVE.B D2, $0FFF ; ?! (valid but accesses ROM)
|
||||
; $FFFF, $FFFE is not decodable as an instruction — finally, IDA gives up
|
||||
```
|
||||
|
||||
**Detection**: Copper lists are always pairs of `DC.W` values where the first word matches Copper move/WAIT opcode patterns (`$0xxx` = MOVE, `$FFxx` = WAIT). A string of `DC.W` in a CODE hunk, especially near `$DFF080` writes (COP1LC), is almost certainly a copper list.
|
||||
|
||||
### 6. Self-Modifying Code (SMC) Targets
|
||||
|
||||
```asm
|
||||
; Decryptor writes instructions into a buffer, then jumps to it
|
||||
LEA .encrypted_code(PC), A0
|
||||
LEA _decrypt_buffer, A1
|
||||
MOVE.L (A0)+, D0
|
||||
EOR.L #$12345678, D0 ; decrypt
|
||||
MOVE.L D0, (A1)+ ; write to buffer
|
||||
...
|
||||
JMP _decrypt_buffer ; jump to decrypted code
|
||||
|
||||
.encrypted_code:
|
||||
DC.L $DEADBEEF, $CAFEBABE ; ← DATA that BECOMES code at runtime
|
||||
```
|
||||
|
||||
**Detection**: Look for loops that read from one address and write to another in a tight pattern (`MOVE.L (A0)+, (A1)+` / `DBRA`). The source is encrypted code, the destination is a runtime code buffer. Also look for calls to `CacheClearU()` before a `JMP`/`JSR` to a writable memory region.
|
||||
|
||||
---
|
||||
|
||||
## Systematic Detection Techniques
|
||||
|
||||
### Technique 1: Cross-Reference Analysis (Most Reliable)
|
||||
|
||||
```
|
||||
If a byte is:
|
||||
- The target of a JSR/BRA/Bcc → CODE
|
||||
- The target of a LEA/PEA → DATA (or code-as-data, e.g., callback pointer)
|
||||
- The target of a MOVE.L #xxx, An → Could be either; check how An is used
|
||||
- Not referenced at all → Indeterminate; use other techniques
|
||||
|
||||
In IDA: View → Open Subviews → Cross References
|
||||
In Ghidra: Right-click → References → Show References to Address
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> Zero cross-references does NOT mean "definitely data." Interrupt handlers, callback functions, and dynamically-dispatched code may have no static references.
|
||||
|
||||
### Technique 2: Relocation Table Analysis
|
||||
|
||||
Amiga HUNK binaries contain explicit relocation entries (`HUNK_RELOC32`) that tell the loader which longwords to patch. This is a powerful disambiguation tool:
|
||||
|
||||
| Hunk Type | Relocs Point To | Implication |
|
||||
|---|---|---|
|
||||
| **CODE** | Other CODE hunks | Cross-module call → likely code |
|
||||
| **CODE** | DATA hunk | Global variable reference → likely code reading data |
|
||||
| **DATA** | CODE hunk | Function pointer in vtable/callback array → the target IS code |
|
||||
| **DATA** | DATA hunk | Pointer chain (e.g., linked list head) → data |
|
||||
|
||||
```bash
|
||||
# Dump relocations with hunkinfo:
|
||||
hunkinfo binary.exe | grep RELOC32
|
||||
# Shows: source_hunk, source_offset → target_hunk, target_offset
|
||||
```
|
||||
|
||||
**Key insight**: If a longword in the DATA hunk has a `HUNK_RELOC32` pointing into the CODE hunk, that longword is a **function pointer**. The CODE hunk target IS real code.
|
||||
|
||||
### Technique 3: m68k Instruction Validity Check
|
||||
|
||||
Not all 32-bit values are valid 68k instructions. A quick validity filter:
|
||||
|
||||
| Check | Code Indicator | Data Indicator |
|
||||
|---|---|---|
|
||||
| **First word decodes?** | Valid 68k opcode in first 16 bits | Invalid opcode (e.g., $Fxxx, $Axxx in user mode) |
|
||||
| **Length consistency?** | Variable: 2, 4, 6, 8, 10 bytes | Random distribution of word values |
|
||||
| **Address register usage?** | Reasonable A0-A6 use | Random An register selection |
|
||||
| **Branch targets?** | Target exists and is word-aligned | Target in data section or misaligned |
|
||||
| **Privileged instructions?** | Only in supervisor-mode code (ROM, interrupt handlers) | `MOVE to SR`, `STOP`, `RESET` in user code → likely data |
|
||||
|
||||
```python
|
||||
# Python: quick opcode validity check
|
||||
def looks_like_code(bytes_48):
|
||||
"""Check if 48 bytes look like plausible 68k code."""
|
||||
import re
|
||||
|
||||
# Common 68k prologue patterns:
|
||||
# LINK Ax, #-N → 4E5x xxxx
|
||||
# MOVEM.L xxx, -(SP) → 48Ex xxxx
|
||||
# MOVEQ #N, Dx → 70xx-7Fxx
|
||||
# LEA xxx(PC), Ax → 41FA xxxx / 43FA xxxx / etc.
|
||||
|
||||
code_indicators = 0
|
||||
data_indicators = 0
|
||||
|
||||
# Check word alignment of branch targets
|
||||
# Check for NULL bytes (rare in code, common in data)
|
||||
if b'\x00\x00' in bytes_48:
|
||||
data_indicators += 1
|
||||
|
||||
# Check for ASCII sequences
|
||||
ascii_count = sum(1 for b in bytes_48 if 0x20 <= b <= 0x7E)
|
||||
if ascii_count > len(bytes_48) * 0.6:
|
||||
data_indicators += 3 # strong ASCII signal
|
||||
|
||||
# Check for common opcode prefixes
|
||||
if bytes_48[0:2] in (b'\x4E\x5x', b'\x48\xE7', b'\x4E\x75', b'\x4E\x73'):
|
||||
code_indicators += 2
|
||||
|
||||
return code_indicators > data_indicators
|
||||
```
|
||||
|
||||
### Technique 4: Entropy Analysis
|
||||
|
||||
Code and data have different byte-value distributions:
|
||||
|
||||
| Property | Typical Code | Typical Data |
|
||||
|---|---|---|
|
||||
| **Null bytes ($00)** | Rare (only in `MOVEQ #0`, `ORI.B #0`, `DC.B 0` padding) | Common (NULL terminators, zero-initialized fields, BSS region) |
|
||||
| **ASCII characters ($20–$7E)** | ~30–40% of bytes (instruction encodings include ASCII-range values) | >80% for strings, <20% for binary data |
|
||||
| **Repeated patterns** | Rare (compiler unrolling creates repetition but not identical sequences) | Common (array of identical structs, lookup tables) |
|
||||
| **$4E byte (opcode prefix)** | Very common (~15–20% of instructions: `4E75`=RTS, `4E71`=NOP, `4E56`=LINK, `4EBA`=JSR) | Random distribution |
|
||||
|
||||
**Quick IDA check**: Select a region, View → Open Subviews → Histogram. If the byte distribution is uniform, it's likely compressed/encrypted data. If it clusters around specific opcode values, it's likely code.
|
||||
|
||||
### Technique 5: Structural Pattern Matching
|
||||
|
||||
Certain structures are unambiguous:
|
||||
|
||||
```
|
||||
DC.B "string", 0 → ASCII > 80%, zero-terminated → DATA (string)
|
||||
DC.L addr1, addr2, -1 → Valid addresses + -1 terminator → DATA (function table)
|
||||
$4AFC addr, addr, ... → RomTag structure → DATA (resident tag)
|
||||
$000003F3 size, hunks... → HUNK_HEADER → DATA (file header)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## IDA Pro Workflow — Manual Disambiguation
|
||||
|
||||
### Undefine and Redefine
|
||||
|
||||
```
|
||||
U key → Undefine (removes code/data designation)
|
||||
C key → Convert to Code (forces disassembly)
|
||||
D key → Convert to Data (cycles: db → dw → dd)
|
||||
A key → Convert to ASCII string
|
||||
P key → Create Procedure (function)
|
||||
Alt+P → Edit Function (adjust bounds)
|
||||
```
|
||||
|
||||
### Batch Operations
|
||||
|
||||
```python
|
||||
# IDA Python: find and undefine all jump tables after switch statements
|
||||
import idautils, idc
|
||||
|
||||
def undefine_jump_tables():
|
||||
"""Find MOVE.W xxx(PC,Dn.W), D0 / JMP xxx(PC,D0.W) patterns
|
||||
and undefine the offset table."""
|
||||
for seg_ea in idautils.Segments():
|
||||
if idc.get_segm_name(seg_ea) != 'CODE0':
|
||||
continue
|
||||
ea = seg_ea
|
||||
end = idc.get_segm_end(seg_ea)
|
||||
while ea < end:
|
||||
mnem = idc.print_insn_mnem(ea)
|
||||
if mnem == 'JMP':
|
||||
op0 = idc.print_operand(ea, 0)
|
||||
if 'PC' in op0 and 'D0' in op0:
|
||||
# Found a switch jump — look back for the table load
|
||||
prev_ea = idc.prev_head(ea)
|
||||
if idc.print_insn_mnem(prev_ea) == 'MOVE':
|
||||
# Found the pair — now find the table
|
||||
# (implementation depends on addressing mode)
|
||||
pass
|
||||
ea = idc.next_head(ea, end)
|
||||
```
|
||||
|
||||
### Creating Data Structures from Code
|
||||
|
||||
When you identify a data region within a code hunk:
|
||||
|
||||
1. **Jump table**: Undefine (`U`), then define as words (`D` key twice → `dc.w`)
|
||||
2. **String table**: Undefine, then place cursor on first byte, press `A`
|
||||
3. **Function pointer table**: Undefine, then define as doublewords (`D` key 3 times → `dc.l`), then manually create offsets (`Ctrl+R` → select target segment)
|
||||
4. **Copper list**: Undefine, define as words (`dc.w`), add comment "Copper list — N entries"
|
||||
|
||||
---
|
||||
|
||||
## Ghidra Workflow — Manual Disambiguation
|
||||
|
||||
```
|
||||
C key → Clear code/data (undefine)
|
||||
D key → Disassemble (force code)
|
||||
F key → Create Function
|
||||
T key → Cycle data type (undefined → byte → word → dword → pointer → ...)
|
||||
Ctrl+↑/↓ → Navigate forward/backward
|
||||
Right-click → Data → Choose Data Type
|
||||
```
|
||||
|
||||
### Ghidra-Specific: Bookmark Restoration
|
||||
|
||||
Ghidra's auto-analysis sometimes "over-disassembles" and then later "fixes" itself. If you manually fix a region and Ghidra reverts it, lock the region:
|
||||
|
||||
```
|
||||
Right-click → Disassemble → Lock Code/Data
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Amiga-Specific Patterns Quick Reference
|
||||
|
||||
### Definitely DATA
|
||||
|
||||
| Pattern | Why |
|
||||
|---|---|
|
||||
| `DC.B` with >80% printable ASCII, null-terminated | String |
|
||||
| `DC.W` pairs where first word is `$0xxx` or `$FFxx` | Copper list |
|
||||
| `DC.L` terminated by `-1` ($FFFFFFFF) | Function pointer table |
|
||||
| `DC.L` where every entry is a valid address in CODE0 | Function pointer table (library JMP table, vtable) |
|
||||
| Repeated `DC.L $00000000` for >4 entries | BSS surrogate / zero-initialized array |
|
||||
| `$4AFC` followed by self-referential pointer | RomTag / resident module header |
|
||||
| `$000003F3` (HUNK_HEADER) | HUNK file header |
|
||||
| `$000003E9` (HUNK_CODE) / `$000003EA` (HUNK_DATA) | HUNK hunk header |
|
||||
|
||||
### Definitely CODE
|
||||
|
||||
| Pattern | Why |
|
||||
|---|---|
|
||||
| `$4E56 $xxxx` (LINK) | Function prologue |
|
||||
| `$48E7 $xxxx` (MOVEM.L) | Register save/restore |
|
||||
| `$4E75` (RTS) | Function return |
|
||||
| `$4E73` (RTE) | Interrupt return |
|
||||
| `$60xx` (BRA.S) / `$6000 $xxxx` (BRA) | Control flow |
|
||||
| `$4EBA $xxxx` (JSR) | Function call |
|
||||
| `$51C8 $FFxx` (DBRA D0) | Loop counter |
|
||||
| `$4E71` (NOP) | Padding between aligned functions (but only if adjacent to code) |
|
||||
|
||||
### Could Be Either — Need Context
|
||||
|
||||
| Pattern | Why Ambiguous |
|
||||
|---|---|
|
||||
| `$7000–$7FFF` (MOVEQ) | MOVEQ #0 is also a common data initialization value (NULL pointer) |
|
||||
| `$2xxx` (MOVE.L) | A `MOVE.L (An), Dn` instruction shares encoding with valid address constants |
|
||||
| `$4AFC` (ILLEGAL on 68000) | Could be RT_MATCHWORD (RomTag) OR an intentional crash in debug code |
|
||||
| Long runs of `$00000000` | Could be BSS, padding, OR a table of NULL function pointers |
|
||||
| `$FFFF $FFFE` | Could be Copper WAIT or end-of-data sentinel |
|
||||
|
||||
---
|
||||
|
||||
## Named Antipatterns
|
||||
|
||||
### "The ASCII Instruction"
|
||||
|
||||
**Problem**: A string like `"AllocMem"` ($41 $6C $6C $6F $63 $4D $65 $6D) decodes as:
|
||||
```asm
|
||||
LEA -$14(A4), A0 ; $41 $6C (off by one: actually $41 = SUBA, not LEA)
|
||||
BCC.S .+$6C ; $6C
|
||||
BCC.S .+$6F ; $6F
|
||||
BCS.S .+$63 ; $63
|
||||
...
|
||||
```
|
||||
Every byte of the string becomes a valid (but nonsensical) 68k instruction. The disassembler produces 10+ "instructions" from a 9-byte string.
|
||||
|
||||
**Fix**: Select the region, press `A` in IDA (or `T` → choose "string" in Ghidra). The disassembler will undefine the phantom code and mark the region as a string.
|
||||
|
||||
### "The Phantom Function"
|
||||
|
||||
**Problem**: A color table or copper list inadvertently decodes as a function with valid prologue/epilogue instructions:
|
||||
```asm
|
||||
; Color palette at $00012000:
|
||||
DC.W $0180, $0000 ; → MOVE.B D0, ($0000).W (valid!)
|
||||
DC.W $4E71 ; → NOP
|
||||
DC.W $4E75 ; → RTS (IDA says "this is a function!")
|
||||
```
|
||||
IDA sees `MOVE.B ... / NOP / RTS` and creates a function. There is no function here — it's a color table that happened to contain the NOP and RTS opcodes.
|
||||
|
||||
**Fix**: Check if the "function" is cross-referenced. If no `JSR`/`BSR` targets it, and it lives in what should be a data region (especially at an address written by `MOVE.L #table, $DFF080` — COP1LC), undefine it.
|
||||
|
||||
### "The Jump Table Desert"
|
||||
|
||||
**Problem**: Between two real functions, IDA shows 20 "instructions" that are actually a switch jump table. Each table entry is a 2-byte offset that happens to be a valid opcode.
|
||||
|
||||
**Fix**: Locate the switch pattern (`MOVE.W offset(PC, Dn.W), D0` / `JMP (PC, D0.W)`). The offset table starts at the label used by the `MOVE.W`. Undefine from that label to the next verified function start.
|
||||
|
||||
### "The Decryptor Blind Spot"
|
||||
|
||||
**Problem**: The first 256 bytes of the binary are a decryptor that unpacks the real code into RAM and jumps to it. The REAL code is stored as encrypted data — IDA sees random bytes, Ghidra sees no control flow, both fail to disassemble.
|
||||
|
||||
**Fix**:
|
||||
1. Analyze the decryptor manually (it's short — usually 100–300 bytes of real code)
|
||||
2. Run the binary in FS-UAE with a breakpoint after the decryptor loop
|
||||
3. Dump the decrypted memory region
|
||||
4. Load the dump into IDA as a second binary, or patch the original in IDA
|
||||
|
||||
### "The Vector Table Mirage"
|
||||
|
||||
**Problem**: The 68k exception vector table at `$000000`–`$0003FF` contains 256 longwords. Every even longword is a valid 68k address. IDA tries to disassemble addresses as code, but the vector table ITSELF is data — it's a list of function pointers, not code.
|
||||
|
||||
**Fix**: The first 1024 bytes of a Kickstart ROM (or any binary loaded at `$000000`) should be defined as `dc.l` (doublewords), not code. The longwords ARE function pointers, but the table is data.
|
||||
|
||||
---
|
||||
|
||||
## Decision Flowchart
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
START["Bytes at address X<br/>undefined or suspect"]
|
||||
|
||||
XREF{"Any cross-references<br/>to this address?"}
|
||||
JSR_BRA{"Reached via JSR/BRA/Bcc?"}
|
||||
LEA_PEA{"Reached via LEA/PEA?"}
|
||||
MOVE_L{"Referenced by<br/>MOVE.L #addr?"}
|
||||
NO_XREF{"No xrefs at all"}
|
||||
|
||||
INST{"Decodes as valid<br/>68k instructions?"}
|
||||
PATTERN{"Matches known<br/>data pattern?"}
|
||||
ASCII{">60% printable ASCII?"}
|
||||
ENTROPY{"Byte entropy<br/>distribution?"}
|
||||
|
||||
CODE["→ CODE<br/>Press C or D"]
|
||||
DATA["→ DATA<br/>Press U then D/A"]
|
||||
INVESTIGATE["→ INVESTIGATE<br/>Check relocation table<br/>Check runtime behavior"]
|
||||
|
||||
START --> XREF
|
||||
XREF --> JSR_BRA
|
||||
XREF --> LEA_PEA
|
||||
XREF --> MOVE_L
|
||||
XREF --> NO_XREF
|
||||
|
||||
JSR_BRA -->|"Yes"| CODE
|
||||
LEA_PEA -->|"Yes"| DATA
|
||||
MOVE_L -->|"Check how pointer is used"| INVESTIGATE
|
||||
NO_XREF --> INST
|
||||
|
||||
INST -->|"No"| DATA
|
||||
INST -->|"Yes"| PATTERN
|
||||
|
||||
PATTERN -->|"String/Copper/RomTag"| DATA
|
||||
PATTERN -->|"Unknown"| ASCII
|
||||
|
||||
ASCII -->|"Yes"| DATA
|
||||
ASCII -->|"No"| ENTROPY
|
||||
|
||||
ENTROPY -->|"Opcode-like<br/>distribution"| INVESTIGATE
|
||||
ENTROPY -->|"Uniform/random"| DATA
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Tool-Specific Scripts
|
||||
|
||||
### IDA Python: Find All Undefined Regions in CODE Hunk
|
||||
|
||||
```python
|
||||
# ida_find_undefined.py
|
||||
# Finds all undefined bytes between functions in the CODE hunk
|
||||
# and suggests whether they're likely code or data.
|
||||
|
||||
import idautils, idc, idaapi
|
||||
|
||||
def analyze_undefined_regions():
|
||||
"""Find and classify all undefined regions in CODE segments."""
|
||||
for seg_ea in idautils.Segments():
|
||||
seg_name = idc.get_segm_name(seg_ea)
|
||||
if 'CODE' not in seg_name.upper():
|
||||
continue
|
||||
|
||||
ea = seg_ea
|
||||
end = idc.get_segm_end(seg_ea)
|
||||
|
||||
while ea < end:
|
||||
if not idc.is_code(idc.get_full_flags(ea)):
|
||||
# Found undefined/data region in CODE
|
||||
region_start = ea
|
||||
region_end = ea
|
||||
while region_end < end and not idc.is_code(idc.get_full_flags(region_end)):
|
||||
region_end += 1
|
||||
|
||||
size = region_end - region_start
|
||||
if size < 4: # skip tiny gaps
|
||||
ea = region_end
|
||||
continue
|
||||
|
||||
# Classify the region
|
||||
bytes_data = idc.get_bytes(region_start, min(size, 64))
|
||||
classification = classify_bytes(bytes_data)
|
||||
|
||||
print(f"Undefined region at ${region_start:08X}: "
|
||||
f"{size} bytes → likely {classification}")
|
||||
|
||||
ea = region_end
|
||||
else:
|
||||
ea = idc.next_head(ea, end)
|
||||
|
||||
def classify_bytes(data):
|
||||
"""Classify bytes as CODE_LIKE or DATA_LIKE."""
|
||||
if not data:
|
||||
return "EMPTY"
|
||||
|
||||
# ASCII check
|
||||
ascii_count = sum(1 for b in data if 0x20 <= b <= 0x7E)
|
||||
if len(data) > 4 and ascii_count / len(data) > 0.7:
|
||||
return "ASCII_STRING"
|
||||
|
||||
# Copper list check
|
||||
if len(data) >= 4:
|
||||
w1 = (data[0] << 8) | data[1] if len(data) > 1 else 0
|
||||
w2 = (data[2] << 8) | data[3] if len(data) > 3 else 0
|
||||
if (w1 & 0xFF00) in (0x0000, 0x0100): # MOVE to custom register
|
||||
return "COPPER_MOVE"
|
||||
if w1 == 0xFFFF and (w2 & 0xFF00) == 0xFF00: # WAIT
|
||||
return "COPPER_WAIT"
|
||||
|
||||
# -1 terminator → function table
|
||||
if len(data) >= 8:
|
||||
words = [(data[i] << 8) | data[i+1] for i in range(0, min(len(data), 32), 2)]
|
||||
if 0xFFFF in words:
|
||||
return "FUNCTION_TABLE"
|
||||
|
||||
# Opcode prefix check
|
||||
opcode_prefixes = [0x4E, 0x48, 0x60, 0x61, 0x70, 0x2F, 0x20]
|
||||
prefix_count = sum(1 for b in data[::2] if b in opcode_prefixes)
|
||||
if prefix_count >= 2:
|
||||
return "CODE_LIKE"
|
||||
|
||||
return "UNKNOWN_DATA"
|
||||
|
||||
analyze_undefined_regions()
|
||||
```
|
||||
|
||||
### Ghidra Script: Auto-Detect Jump Tables
|
||||
|
||||
```java
|
||||
// GhidraScript: AutoDetectJumpTables.java
|
||||
// Finds JMP (PC, Dn.W) patterns and marks following offset tables as data.
|
||||
|
||||
import ghidra.app.script.GhidraScript;
|
||||
import ghidra.program.model.lang.Register;
|
||||
import ghidra.program.model.listing.Instruction;
|
||||
|
||||
public class AutoDetectJumpTables extends GhidraScript {
|
||||
@Override
|
||||
public void run() throws Exception {
|
||||
var listing = currentProgram.getListing();
|
||||
var instructions = listing.getInstructions(true);
|
||||
|
||||
for (Instruction insn : instructions) {
|
||||
String mnemonic = insn.getMnemonicString();
|
||||
|
||||
// Look for JMP (xxx, PC, Dn.W) patterns
|
||||
if (mnemonic.equals("JMP")) {
|
||||
String op0 = insn.getDefaultOperandRepresentation(0);
|
||||
if (op0.contains("PC") && op0.contains("D")) {
|
||||
// Found a switch jump — backtrack to find the table reference
|
||||
var prev = insn.getPrevious();
|
||||
if (prev != null) {
|
||||
String prevMnem = prev.getMnemonicString();
|
||||
if (prevMnem.equals("MOVE") || prevMnem.equals("MOVEA")) {
|
||||
// This is likely a switch. Walk forward to find table
|
||||
// (implementation: parse the PC-relative offset
|
||||
// and mark the table as data words)
|
||||
println("Potential jump table after " + prev.getAddress());
|
||||
// TODO: Calculate table bounds and clear code
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cross-Platform Comparison
|
||||
|
||||
| Platform | Code/Data Disambiguation | Amiga Difference |
|
||||
|---|---|---|
|
||||
| **x86/x86-64 (PE/ELF)** | Variable-length instructions (1–15 bytes); `.text`/`.data` section separation is strict; PLT/GOT boundaries are clear | Amiga 68k instructions are always multiples of 2 bytes — easier to scan but easier for data to masquerade as code |
|
||||
| **ARM (Thumb)** | Mixed 16/32-bit instruction encoding; PC-relative literal pools are explicitly marked by assembler | m68k has no Thumb equivalent; byte-level disambiguation is simpler but tooling is less sophisticated |
|
||||
| **Modern C++ (ELF)** | `.rodata` clearly separates read-only data from `.text`; CFI unwind tables provide function boundary hints | Amiga HUNK format doesn't separate read-only data from code; PC-relative strings live in CODE hunks |
|
||||
| **Embedded (bare-metal)** | Vector table at fixed address, similar to 68k exception vectors; often no section separation | Same vector table disambiguation challenge; Amiga adds copper lists and custom chip register tables |
|
||||
| **Classic Mac OS (68k)** | CODE resources vs DREG resources (separated by resource fork); A5-world globals | Mac resource forks provide cleaner code/data separation than Amiga HUNKs |
|
||||
|
||||
---
|
||||
|
||||
## FPGA / Emulation Impact
|
||||
|
||||
- **SMC detection on FPGA**: If you're implementing a MiSTer core, self-modifying code requires the 68000 instruction cache to be flushed or bypassed when code writes to a region that will later be executed. Detect SMC by monitoring writes to memory regions that also have instruction fetches.
|
||||
- **Copper list execution**: Copper lists should NEVER be executed by the 68000 — they are data for the Copper coprocessor. If you see the 68000 reading from Copper list addresses as code, the address decoding in your FPGA core may be incorrectly routing Copper DMA memory to the CPU bus.
|
||||
- **Code/Data bus confusion**: On real Amiga hardware, Chip RAM is shared between the CPU and custom chips. If your FPGA core incorrectly allows the Copper to read from Fast RAM (CPU-only memory), copper lists in Fast RAM might partially execute — producing phantom visual artifacts.
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
**Q: How do I tell if a region is a jump table or just weird code?**
|
||||
A: Jump tables always follow a `MOVE.W offset(PC, Dn.W), Dn` / `JMP (PC, Dn.W)` pattern. The table entries are self-relative offsets (e.g., `.case1 - .jt`). If the "instructions" at the table location are all 2-byte opcodes with values in the range $FF00–$FFFF (i.e., negative short offsets), it's a jump table.
|
||||
|
||||
**Q: Why does IDA create functions at $00000000?**
|
||||
A: IDA creates functions wherever it sees a code xref. If a relocation entry points to offset 0 in a hunk (because of a NULL function pointer initialization), IDA creates a function there. Undefine it — offset 0 of a HUNK_CODE is the entry point, but offset 0 of HUNK_DATA is not code.
|
||||
|
||||
**Q: How do I handle binaries where EVERYTHING is in one hunk?**
|
||||
A: Demos, trackmos, and bootblock intros often have a single CODE hunk containing code, data, copper lists, and sample data. Use the techniques in [asm68k_binaries.md](asm68k_binaries.md) — identify hardware register writes to find copper list addresses, identify `LEA xxx(PC), An` to find data references, and trace audio register writes to find sample data.
|
||||
|
||||
**Q: Why does Ghidra show "Instruction not recognized" for what I know is valid 68k code?**
|
||||
A: Ghidra's 68k SLEIGH specification may not cover every m68k variant instruction. Check that the language is set to "68000" (not "68020" or "ColdFire") in the project settings. Some FPU and supervisor-mode instructions are excluded from user-mode disassembly.
|
||||
|
||||
**Q: My binary has a hunk labeled CODE but it only contains $00 bytes. What's happening?**
|
||||
A: That's a BSS hunk mislabeled as CODE, or a zero-filled overlay area that gets loaded at runtime. Check the hunk's memory flags in the HUNK_HEADER — `MEMF_CLEAR` ($00010000) means it's a BSS-like hunk that should be zeroed.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [asm68k_binaries.md](asm68k_binaries.md) — Hand-written assembly RE (heavy code/data mixing)
|
||||
- [ansi_c_reversing.md](ansi_c_reversing.md) — C binary RE (compiler patterns help code identification)
|
||||
- [compilers/README.md](compilers/README.md) — Per-compiler field manuals (prologue signatures for function detection)
|
||||
- [compiler_fingerprints.md](../compiler_fingerprints.md) — Quick compiler ID for narrowing code patterns
|
||||
- [hunk_format.md](../../03_loader_and_exec_format/hunk_format.md) — HUNK structure for relocation-based disambiguation
|
||||
- [hunk_relocation.md](../../03_loader_and_exec_format/hunk_relocation.md) — Relocation mechanics
|
||||
- [methodology.md](../methodology.md) — General RE workflow
|
||||
- IDA Pro: *The IDA Pro Book* (Eagle, 2nd Edition) — Chapter 7: "Data and Code"
|
||||
- Ghidra: *Ghidra Software Reverse Engineering for Beginners* — Chapter 5: "Code vs. Data"
|
||||
102
05_reversing/static/compilers/README.md
Normal file
102
05_reversing/static/compilers/README.md
Normal file
|
|
@ -0,0 +1,102 @@
|
|||
[← Home](../../../README.md) · [Reverse Engineering](../../README.md) · [Static Analysis](../README.md)
|
||||
|
||||
# Per-Compiler Reverse Engineering — Binary Field Manuals
|
||||
|
||||
## Overview
|
||||
|
||||
This section provides **compiler-specific reverse engineering field manuals**. Each article answers one question: *"I have a binary produced by this compiler — what does it look like in IDA/Ghidra, and how do I read it?"* Rather than discussing compiler usage (see [13_toolchain](../../../13_toolchain/README.md) for that), these articles focus exclusively on **binary output**: hunk naming conventions, prologue/epilogue patterns, stack frame layouts, string addressing modes, startup code, optimization patterns, and debug info formats.
|
||||
|
||||
Every article includes the **same C function compiled by each compiler** — a side-by-side comparison that reveals exactly how `for` loops, `switch` statements, struct access, and AmigaOS library calls differ at the assembly level.
|
||||
|
||||
## Compiler Identification Decision Flowchart
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
BIN["m68k binary loaded in disassembler"]
|
||||
HUNK{"Hunk names?"}
|
||||
L_A5{"LINK A5 present?"}
|
||||
L_A6{"LINK A6 present?"}
|
||||
STR_ABS{"String addressing?"}
|
||||
REG_SAVE{"MOVEM.L save set size?"}
|
||||
FP_DEFAULT{"Default frame pointer?"}
|
||||
|
||||
BIN --> HUNK
|
||||
HUNK -->|"CODE/DATA/BSS"| L_A5
|
||||
HUNK -->|".text/.data/.bss"| L_A6
|
||||
HUNK -->|"CODE/DATA + __MERGED"| VBCC["→ VBCC"]
|
||||
HUNK -->|"Custom prefix"| STORMC["→ StormC"]
|
||||
|
||||
L_A5 -->|"Yes"| STR_ABS
|
||||
L_A5 -->|"No, LINK absent"| FP_DEFAULT
|
||||
STR_ABS -->|"Absolute (MOVE.L #str,Dn)"| SASC["→ SAS/C"]
|
||||
STR_ABS -->|"PC-relative (LEA str(PC))"| DICE["→ DICE C"]
|
||||
|
||||
L_A6 -->|"Yes"| GCC["→ GCC 2.95.x"]
|
||||
L_A6 -->|"No, LINK absent"| VBCC2["→ VBCC"]
|
||||
|
||||
FP_DEFAULT -->|"None (only used regs saved)"| VBCC3["→ VBCC"]
|
||||
FP_DEFAULT -->|"A5 frame pointer"| AZTEC["→ Aztec C / Lattice C"]
|
||||
|
||||
REG_SAVE -->|"D2-D7/A2-A4 (9 regs)"| SASC2["→ SAS/C"]
|
||||
REG_SAVE -->|"D3-D7 (5 regs)"| AZTEC2["→ Aztec C"]
|
||||
REG_SAVE -->|"D2-D7/A2-A6 (11 regs)"| GENSASC["→ SAS/C __saveds"]
|
||||
REG_SAVE -->|"Minimal, per-function"| VBCC4["→ VBCC"]
|
||||
```
|
||||
|
||||
## Quick Identification Matrix
|
||||
|
||||
| Criterion | SAS/C 6.x | GCC 2.95.x | VBCC | StormC | Aztec C | Lattice C | DICE C |
|
||||
|---|---|---|---|---|---|---|---|
|
||||
| **Hunk names** | `CODE`, `DATA`, `BSS` | `.text`, `.data`, `.bss` | `CODE`, `DATA`, `BSS` + `__MERGED` | `CODE`, `DATA` (Amiga standard) | `CODE`, `DATA`, `BSS` | `CODE`, `DATA`, `BSS` | `CODE`, `DATA`, `BSS` |
|
||||
| **Frame pointer** | A5 (`LINK A5, #-N`) | A6 (or none with `-fomit-frame-pointer`) | None (rarely A5) | A5 (`LINK A5, #-N`) | A5 (`LINK A5, #-N`) | A5 (`LINK A5, #-N`) | None typically |
|
||||
| **String addressing** | Absolute + relocated | PC-relative | PC-relative | Absolute | Absolute | Absolute | PC-relative |
|
||||
| **Register save set** | D2-D7/A2-A4 (9 regs) | D2-D3/A2 (per-function) | Only used regs | D2-D7/A2-A4 (9 regs) | D3-D7 (5 regs) | D2-D5/A2-A3 | Per-function |
|
||||
| **Startup entry** | `_start` / `c.o` | `_start` / `libnix` | `_start` / `startup.o` | `_STORM_` prefix | `_start` / `aztec.o` | `_start` / `lc.o` | `_mainCRTStartup` |
|
||||
| **Library call style** | `JSR -$XXX(A6)` after loading global | `JSR -$XXX(A6)` with tighter code | `JSR -$XXX(A6)` via `__reg()` | `JSR -$XXX(A6)` SAS/C-like | `JSR -$XXX(A6)` | `JSR -$XXX(A6)` | `JSR -$XXX(A6)` |
|
||||
| **Era** | 1988–1996 | 1995–present | 1995–present | 1996–2000 | 1985–1992 | 1985–1989 | 1992–1995 |
|
||||
| **RE article** | [sasc.md](sasc.md) | [gcc.md](gcc.md) | [vbcc.md](vbcc.md) | [stormc.md](stormc.md) | [aztec_c.md](aztec_c.md) | [lattice_c.md](lattice_c.md) | [dice_c.md](dice_c.md) |
|
||||
|
||||
## Articles
|
||||
|
||||
| File | Compiler | Key RE Distinguishing Feature |
|
||||
|---|---|---|
|
||||
| [sasc.md](sasc.md) | SAS/C 5.x/6.x | `LINK A5` + 9-register MOVEM.L save — the most common Amiga C prologue |
|
||||
| [gcc.md](gcc.md) | GCC 2.95.x | `LINK A6` (or no frame pointer) + PC-relative strings + `__CTOR_LIST__`/`__DTOR_LIST__` arrays |
|
||||
| [vbcc.md](vbcc.md) | VBCC | No frame pointer + per-function register save + `__reg()` calling convention + `__MERGED` hunks |
|
||||
| [stormc.md](stormc.md) | StormC / StormC++ | A5 frame pointer + C++ vtable differences from GCC + integrated debug info |
|
||||
| [aztec_c.md](aztec_c.md) | Manx Aztec C | `LINK A5` + D3-D7 only (5 regs) — distinct from SAS/C 9-reg save |
|
||||
| [lattice_c.md](lattice_c.md) | Lattice C 3.x/4.x | Predecessor to SAS/C; less aggressive optimization, different startup stub |
|
||||
| [dice_c.md](dice_c.md) | DICE C | No frame pointer + PC-relative strings + extremely fast compilation marker patterns |
|
||||
|
||||
## Cross-Compiler Comparison — Same C Function
|
||||
|
||||
Every per-compiler article includes this reference function compiled by that compiler:
|
||||
|
||||
```c
|
||||
/* Reference function used in all compiler comparison tables */
|
||||
ULONG CountWords(CONST_STRPTR str) {
|
||||
ULONG count = 0;
|
||||
BOOL in_word = FALSE;
|
||||
|
||||
while (*str) {
|
||||
if (*str == ' ' || *str == '\t' || *str == '\n') {
|
||||
in_word = FALSE;
|
||||
} else if (!in_word) {
|
||||
count++;
|
||||
in_word = TRUE;
|
||||
}
|
||||
str++;
|
||||
}
|
||||
return count;
|
||||
}
|
||||
```
|
||||
|
||||
Each article shows the full assembly output, annotated with which patterns are compiler-specific and which are universal m68k idioms.
|
||||
|
||||
## See Also
|
||||
|
||||
- [compiler_fingerprints.md](../../compiler_fingerprints.md) — Quick compiler identification guide
|
||||
- [ansi_c_reversing.md](../ansi_c_reversing.md) — General C reverse engineering methodology
|
||||
- [m68k_codegen_patterns.md](../m68k_codegen_patterns.md) — m68k code generation idiom catalog
|
||||
- [startup_code.md](../../../04_linking_and_libraries/startup_code.md) — CLI vs WB startup internals
|
||||
- [13_toolchain/](../../../13_toolchain/README.md) — Compiler usage and configuration (not RE)
|
||||
125
05_reversing/static/compilers/aztec_c.md
Normal file
125
05_reversing/static/compilers/aztec_c.md
Normal file
|
|
@ -0,0 +1,125 @@
|
|||
[← Home](../../../README.md) · [Reverse Engineering](../../README.md) · [Static Analysis](../README.md) · [Compilers](README.md)
|
||||
|
||||
# Manx Aztec C — Reverse Engineering Field Manual
|
||||
|
||||
## Overview
|
||||
|
||||
**Manx Aztec C** (versions 3.x–5.x, 1985–1992) was the first widely used C compiler for the Amiga, predating Lattice C's market dominance. It targets early AmigaOS (1.1–1.3) and produces code with a distinctive **`LINK A5, #-N` + `MOVEM.L D3-D7, -(SP)`** prologue — saving only 5 data registers (D3–D7) instead of SAS/C's 9 registers. This narrower save set is the single most reliable Aztec C fingerprint.
|
||||
|
||||
Key constraints:
|
||||
- **5-register data save (D3–D7 only)** — Aztec C preserves fewer registers than any other Amiga C compiler. D2 is considered scratch by Aztec, while SAS/C, GCC, and VBCC all preserve D2.
|
||||
- **A5 frame pointer** — standard `LINK A5, #-N` convention, like SAS/C.
|
||||
- **Absolute string addressing** — like SAS/C, strings are in DATA with `HUNK_RELOC32` relocation.
|
||||
- **Pre-MakeLibrary era** — Aztec C libraries use a different initialization pattern than later RTF_AUTOINIT libraries.
|
||||
- **Early AmigaOS focus** — code may assume OS 1.1/1.2 behavior that changed in 2.0+.
|
||||
|
||||
```asm
|
||||
; Aztec C function prologue (THE signature):
|
||||
_func:
|
||||
LINK A5, #-$14 ; allocate frame
|
||||
MOVEM.L D3-D7, -(SP) ; save D3-D7 ONLY (5 regs!)
|
||||
; Note: D2 is NOT saved (unlike SAS/C, GCC, VBCC)
|
||||
; Note: A2-A4 are not saved (unlike SAS/C)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Binary Identification
|
||||
|
||||
| Criterion | Aztec C | SAS/C |
|
||||
|---|---|---|
|
||||
| **Register save** | `D3-D7` (5 regs) | `D2-D7/A2-A4` (9 regs) |
|
||||
| **D2 preservation** | NOT preserved — call-clobbered | Preserved — callee-saved |
|
||||
| **A2-A4 preservation** | NOT preserved by default | Always preserved |
|
||||
| **Frame pointer** | A5 (`LINK A5`) | A5 (`LINK A5`) |
|
||||
| **String addressing** | Absolute + relocation | Absolute + relocation |
|
||||
| **Startup module** | `aztec.o` | `c.o` |
|
||||
| **Hunk names** | `CODE`, `DATA`, `BSS` | `CODE`, `DATA`, `BSS` |
|
||||
| **Era** | 1985–1992 (OS 1.1–1.3) | 1988–1996 (OS 1.2–3.1) |
|
||||
|
||||
### Detecting D2 as Scratch Register
|
||||
|
||||
The most distinctive Aztec C behavior: **D2 is call-clobbered**. After a function call, Aztec C must reload D2 if it was using it. SAS/C, GCC, and VBCC all preserve D2 across calls.
|
||||
|
||||
```asm
|
||||
; Aztec C: D2 is NOT preserved across calls
|
||||
MOVE.L #value, D2 ; D2 = important value
|
||||
BSR _some_func ; D2 may be destroyed!
|
||||
MOVEQ #0, D2 ; reload D2 (Aztec C knows D2 is scratch)
|
||||
; SAS/C would NOT need this reload — D2 is callee-saved there
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Historical Context
|
||||
|
||||
Manx Software Systems produced Aztec C for multiple platforms (CP/M, DOS, Macintosh, Amiga, Atari ST). The Amiga version was one of the earliest C compilers available — released in 1985 alongside the Amiga 1000 launch. Its 5-register save convention (D3-D7 only) reflects the era's emphasis on minimizing prologue/epilogue overhead on the 7.14 MHz 68000.
|
||||
|
||||
Aztec C was superceded by Lattice C (which became SAS/C) in the late 1980s, though Manx continued to release versions into the early 1990s. Most Aztec C binaries date from 1985–1989 — the Amiga's formative years.
|
||||
|
||||
Software known to use Aztec C:
|
||||
- Early Amiga utilities (1985–1987 era)
|
||||
- Some Commodore-developed tools
|
||||
- ABasiC (the Amiga BASIC compiler)
|
||||
- Early versions of certain games ported from other platforms
|
||||
|
||||
---
|
||||
|
||||
## Same C Function — Aztec C Output
|
||||
|
||||
```asm
|
||||
; CountWords() — Manx Aztec C 5.x:
|
||||
; (Note: smaller register save set, but structurally similar to SAS/C)
|
||||
|
||||
_CountWords:
|
||||
LINK A5, #-$08
|
||||
MOVEM.L D3-D4, -(SP) ; ONLY D3-D4 (not D2-D3!)
|
||||
|
||||
MOVEQ #0, D3 ; D3 = count
|
||||
MOVEQ #0, D4 ; D4 = in_word
|
||||
|
||||
MOVEA.L $08(A5), A0 ; str
|
||||
|
||||
BRA.S .loop_test
|
||||
|
||||
.loop_body:
|
||||
MOVEQ #' ', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ.S .not_word
|
||||
MOVEQ #'\t', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ.S .not_word
|
||||
MOVEQ #'\n', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ.S .not_word
|
||||
|
||||
TST.B D4
|
||||
BNE.S .next_char
|
||||
ADDQ.L #1, D3
|
||||
MOVEQ #1, D4
|
||||
BRA.S .next_char
|
||||
|
||||
.not_word:
|
||||
MOVEQ #0, D4
|
||||
|
||||
.next_char:
|
||||
ADDQ.L #1, A0
|
||||
|
||||
.loop_test:
|
||||
TST.B (A0)
|
||||
BNE.S .loop_body
|
||||
|
||||
MOVE.L D3, D0
|
||||
MOVEM.L (SP)+, D3-D4
|
||||
UNLK A5
|
||||
RTS
|
||||
```
|
||||
|
||||
**Aztec C observations**: The function body is nearly identical to SAS/C, but notice D2 is **not used** — Aztec C skips D2 and starts local register allocation at D3. If you see functions that never touch D2, it's likely Aztec C (or early Lattice C).
|
||||
|
||||
---
|
||||
## References
|
||||
|
||||
- [compiler_fingerprints.md](../../compiler_fingerprints.md) — Quick identification
|
||||
- Aztec C 68k Manual (archive.org)
|
||||
- See also: [sasc.md](sasc.md), [lattice_c.md](lattice_c.md) — compare with other compilers
|
||||
135
05_reversing/static/compilers/dice_c.md
Normal file
135
05_reversing/static/compilers/dice_c.md
Normal file
|
|
@ -0,0 +1,135 @@
|
|||
[← Home](../../../README.md) · [Reverse Engineering](../../README.md) · [Static Analysis](../README.md) · [Compilers](README.md)
|
||||
|
||||
# DICE C — Reverse Engineering Field Manual
|
||||
|
||||
## Overview
|
||||
|
||||
**DICE C** (by Matt Dillon, 1992–1995) was a fast, lean C compiler for AmigaOS known for its incredible compilation speed — often 10–50× faster than SAS/C. It was the compiler of choice for rapid development cycles and produced tight, no-frills code. Its key RE characteristics: **no frame pointer** (like GCC/VBCC), **PC-relative string addressing** (like GCC), and **minimal register saves** (per-function, like VBCC). DICE C binaries look most similar to VBCC output but with some distinctive patterns.
|
||||
|
||||
Key constraints:
|
||||
- **No frame pointer** — DICE C omits the frame pointer by default. Functions use SP-relative addressing.
|
||||
- **PC-relative strings** — Like GCC and VBCC, DICE uses `LEA string(PC), A0`.
|
||||
- **Extremely fast compilation** — DICE's speed came from a simpler optimizer; the binary output is clean but not as aggressively optimized as SAS/C -O2 or GCC -O2.
|
||||
- **Custom startup** — `_mainCRTStartup` (not `_start`) is the typical entry point name.
|
||||
- **Hunk names**: `CODE`, `DATA`, `BSS` (Amiga standard)
|
||||
|
||||
```asm
|
||||
; DICE C function — no frame pointer, PC-relative, per-function save:
|
||||
_func:
|
||||
MOVEM.L D2-D4/A2-A3, -(SP) ; save only what's used
|
||||
; ... function body, SP-relative access ...
|
||||
MOVEM.L (SP)+, D2-D4/A2-A3
|
||||
RTS
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Binary Identification
|
||||
|
||||
| Criterion | DICE C | SAS/C | GCC | VBCC |
|
||||
|---|---|---|---|---|
|
||||
| **Frame pointer** | None | A5 always | A6 or none | None |
|
||||
| **String addressing** | PC-relative | Absolute + reloc | PC-relative | PC-relative |
|
||||
| **Register save** | Per-function | Fixed 9 regs | Per-function | Per-function |
|
||||
| **Startup entry** | `_mainCRTStartup` | `_start` | `_start` | `_start` |
|
||||
| **Hunk names** | `CODE`, `DATA`, `BSS` | `CODE`, `DATA`, `BSS` | `.text`, `.data`, `.bss` | `CODE`, `DATA`, `BSS` |
|
||||
| **Optimizer** | Moderate | Aggressive | Aggressive | Aggressive (peephole) |
|
||||
| **Compile speed** | Very fast | Moderate | Slow | Fast |
|
||||
|
||||
### Key Distinguishing Patterns
|
||||
|
||||
1. **`_mainCRTStartup` entry point** — unique to DICE C. No other Amiga compiler uses this name for the startup entry.
|
||||
2. **`ADDQ.L #4, SP` argument cleanup** — DICE C often uses `ADDQ` to pop arguments after function calls, where SAS/C would use `LEA`.
|
||||
3. **Conservative optimization** — DICE C may not perform CSE or loop-invariant code motion as aggressively as SAS/C or GCC.
|
||||
|
||||
---
|
||||
|
||||
## Library Call Patterns
|
||||
|
||||
```asm
|
||||
; DICE C library call:
|
||||
MOVEA.L (_SysBase).L, A6
|
||||
JSR -$C6(A6) ; AllocMem
|
||||
; DICE C may not cache A6 — reloads from global for each call block
|
||||
```
|
||||
|
||||
DICE C is notable for using **`MOVEA.L (_LibBase).L, A6`** (absolute long with relocation) rather than `MOVEA.L _LibBase, A6` (absolute with reloc). The `().L` suffix is a DICE C assembler convention that appears in the disassembly.
|
||||
|
||||
---
|
||||
|
||||
## Historical Context
|
||||
|
||||
**Matt Dillon** (later known for DragonFly BSD, the HAMMER filesystem, and the D compiler) wrote DICE C as a side project while developing Amiga software. Its claim to fame was compiling the entire DICE C compiler itself in **under 10 seconds** on a stock Amiga 3000 — a feat SAS/C needed minutes for.
|
||||
|
||||
DICE C was particularly popular in the Amiga demoscene and shareware community, where fast edit-compile-test cycles mattered more than squeezing every last cycle out of the generated code. It also shipped with a suite of development tools including a linker, librarian, and debugger.
|
||||
|
||||
DICE C's development effectively ended when Matt Dillon moved to FreeBSD development in the mid-1990s. The final version was released as freeware.
|
||||
|
||||
Software known or likely to use DICE C:
|
||||
- **DICE C itself** (self-hosting — compiled with DICE C)
|
||||
- Various Amiga shareware utilities (1992–1995 era)
|
||||
- Some demoscene tools and intros
|
||||
- Early Amiga networking utilities
|
||||
|
||||
---
|
||||
|
||||
## Same C Function — DICE C Output
|
||||
|
||||
```asm
|
||||
; CountWords() — DICE C:
|
||||
; (No frame pointer, PC-relative strings, per-function save)
|
||||
|
||||
_CountWords:
|
||||
MOVEM.L D2-D3, -(SP) ; save D2-D3 only
|
||||
|
||||
MOVEQ #0, D2 ; D2 = count
|
||||
MOVEQ #0, D3 ; D3 = in_word
|
||||
|
||||
MOVEA.L $0C(SP), A0 ; A0 = str (SP+12, after saved regs + ret addr)
|
||||
|
||||
BRA.S .loop_test
|
||||
|
||||
.loop_body:
|
||||
MOVEQ #' ', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ.S .not_word
|
||||
MOVEQ #'\t', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ.S .not_word
|
||||
MOVEQ #'\n', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ.S .not_word
|
||||
|
||||
TST.B D3
|
||||
BNE.S .next_char
|
||||
ADDQ.L #1, D2
|
||||
MOVEQ #1, D3
|
||||
BRA.S .next_char
|
||||
|
||||
.not_word:
|
||||
MOVEQ #0, D3
|
||||
|
||||
.next_char:
|
||||
ADDQ.L #1, A0
|
||||
|
||||
.loop_test:
|
||||
TST.B (A0)
|
||||
BNE.S .loop_body
|
||||
|
||||
MOVE.L D2, D0
|
||||
MOVEM.L (SP)+, D2-D3
|
||||
RTS
|
||||
```
|
||||
|
||||
**DICE C observations**: For this simple function, DICE C's output is nearly identical to GCC and VBCC. The distinction emerges in:
|
||||
- **Startup code naming** (`_mainCRTStartup` vs `_start`)
|
||||
- **Argument cleanup patterns** (`ADDQ.L #4, SP` after calls)
|
||||
- **Less aggressive CSE** in more complex functions
|
||||
|
||||
---
|
||||
## References
|
||||
|
||||
- [compiler_fingerprints.md](../../compiler_fingerprints.md) — Quick identification
|
||||
- DICE C distribution (Aminet: `dev/c/dice`)
|
||||
- Matt Dillon's DICE C documentation (archive.org)
|
||||
- See also: [sasc.md](sasc.md), [gcc.md](gcc.md), [vbcc.md](vbcc.md) — compare with other compilers
|
||||
741
05_reversing/static/compilers/gcc.md
Normal file
741
05_reversing/static/compilers/gcc.md
Normal file
|
|
@ -0,0 +1,741 @@
|
|||
[← Home](../../../README.md) · [Reverse Engineering](../../README.md) · [Static Analysis](../README.md) · [Compilers](README.md)
|
||||
|
||||
# GCC 2.95.x — Reverse Engineering Field Manual
|
||||
|
||||
## Overview
|
||||
|
||||
**GCC 2.95.x** for m68k-amigaos (variants: GeekGadgets, bebbo's modern port, and the original GCC 2.95.3) is the second most common compiler encountered in Amiga reverse engineering, particularly for software from 1995 onward. Unlike SAS/C's rigid "always LINK A5" convention, GCC is far more flexible — it uses **A6** as frame pointer when enabled, defaults to **no frame pointer at all**, uses **PC-relative string addressing**, and generates per-function `MOVEM.L` save sets (saving only the registers actually used, not a fixed set).
|
||||
|
||||
Key constraints to internalize immediately:
|
||||
- **No default frame pointer** — GCC optimizes away the frame pointer whenever possible. Locals and arguments are accessed via `$offset(SP)`. This makes function boundary detection harder initially but produces tighter code.
|
||||
- **A6 is the frame pointer, not A5** — when `-fno-omit-frame-pointer` is used. This is the primary visual disambiguator from SAS/C.
|
||||
- **PC-relative everything** — strings are addressed via `LEA string(PC), A0`. Constants live in the CODE hunk alongside instructions. No `HUNK_RELOC32` for string references.
|
||||
- **`__CTOR_LIST__` / `__DTOR_LIST__`** — global constructor/destructor arrays unique to GCC C++ and GCC with `-finit-priority`.
|
||||
- **`.text` / `.data` / `.bss` hunk names** — Unix convention, unlike SAS/C's Amiga-native `CODE`/`DATA`/`BSS`.
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Source (.c / .cpp)"
|
||||
SRC["C/C++ source"]
|
||||
end
|
||||
subgraph "GCC Compiler Pipeline"
|
||||
CC1["cc1 (C frontend)"]
|
||||
CC1PLUS["cc1plus (C++ frontend)"]
|
||||
AS["vasm / GNU as"]
|
||||
LD["vlink / GNU ld"]
|
||||
LIBNIX["libnix / clib2 (startup)"]
|
||||
end
|
||||
subgraph "Binary Output"
|
||||
HUNK["Amiga HUNK executable"]
|
||||
TEXT[".text hunk — code + PC-relative data"]
|
||||
DATA[".data hunk — initialized globals"]
|
||||
BSS[".bss hunk — zero-filled globals"]
|
||||
CTOR["__CTOR_LIST__ / __DTOR_LIST__ arrays"]
|
||||
SYMBOL["HUNK_SYMBOL — GCC mangled names"]
|
||||
end
|
||||
|
||||
SRC --> CC1 & CC1PLUS
|
||||
CC1 & CC1PLUS --> AS --> LD
|
||||
LIBNIX --> LD
|
||||
LD --> HUNK
|
||||
HUNK --> TEXT & DATA & BSS
|
||||
HUNK --> CTOR
|
||||
HUNK --> SYMBOL
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Binary Identification — The GCC Signature
|
||||
|
||||
### Hunk Names (Unix Convention)
|
||||
|
||||
```
|
||||
Hunk 0: .text (code + read-only data including strings and jump tables)
|
||||
Hunk 1: .data (initialized global variables)
|
||||
Hunk 2: .bss (zero-initialized globals)
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> **The `.text` hunk name is the single fastest way to identify GCC output.** SAS/C, Aztec, Lattice, and StormC all use `CODE`/`DATA`/`BSS`. Only GCC (and sometimes VBCC with certain linker scripts) produces `.text`/`.data`/`.bss`. However, some GCC ports have been configured to emit Amiga-standard names — check multiple indicators.
|
||||
|
||||
### Function Prologue — The Minimalist Approach
|
||||
|
||||
GCC's prologue varies dramatically based on how many registers the function actually uses:
|
||||
|
||||
```asm
|
||||
; GCC with -fomit-frame-pointer (default) — leaf function, no locals:
|
||||
_leaf_func:
|
||||
; NO prologue at all — just starts executing
|
||||
; ... function body ...
|
||||
RTS
|
||||
|
||||
; GCC — function with a few locals, no calls:
|
||||
_modest_func:
|
||||
MOVEM.L D2/A2, -(SP) ; save ONLY the 2 registers actually used
|
||||
; ... function body ...
|
||||
MOVEM.L (SP)+, D2/A2
|
||||
RTS
|
||||
|
||||
; GCC with -fno-omit-frame-pointer:
|
||||
_frame_func:
|
||||
LINK A6, #-N ; A6 frame pointer — NOT A5!
|
||||
MOVEM.L D2-D3/A2-A3, -(SP) ; only actually-used regs
|
||||
; ... function body ...
|
||||
MOVEM.L (SP)+, D2-D3/A2-A3
|
||||
UNLK A6 ; UNLK A6, not UNLK A5
|
||||
RTS
|
||||
|
||||
; GCC — large function with many locals:
|
||||
_large_func:
|
||||
MOVEM.L D2-D7/A2-A5, -(SP) ; many regs — still not all 9
|
||||
LEA -$400(SP), SP ; allocate large frame (ADD/SUB alternative)
|
||||
; ... function body ...
|
||||
LEA $400(SP), SP
|
||||
MOVEM.L (SP)+, D2-D7/A2-A5
|
||||
RTS
|
||||
```
|
||||
|
||||
**Key identification**: the register save set is **per-function, tailored to actual usage**. If you see `MOVEM.L D2-D3/A2, -(SP)` in one function and `MOVEM.L D2-D7/A2-A4, -(SP)` in another, it's GCC (or VBCC). SAS/C always saves the same fixed set.
|
||||
|
||||
### String Addressing — PC-Relative
|
||||
|
||||
```asm
|
||||
; GCC string reference — PC-relative:
|
||||
LEA .LC0(PC), A0 ; A0 = "Hello, World!\n"
|
||||
JSR _Printf ; call Printf(A0)
|
||||
|
||||
; ... later in the same .text hunk:
|
||||
.LC0:
|
||||
DC.B "Hello, World!", $0A, 00
|
||||
```
|
||||
|
||||
**Critical RE implication**: GCC strings live in `.text` next to the code that references them. In IDA, the string appears as inline data within the code segment, creating a `CODE XREF` from the `LEA` instruction. This means:
|
||||
1. Strings are **not separately relocatable** — they move with the code hunk
|
||||
2. String cross-references in IDA are `CODE XREF`, not `DATA XREF`
|
||||
3. The `LEA` pattern is unambiguous — `LEA $XXXXXXXX(PC), An` where the target is ASCII data
|
||||
|
||||
---
|
||||
|
||||
## Calling Conventions
|
||||
|
||||
GCC uses a simpler calling convention model than SAS/C — one primary convention with variations controlled by function attributes. However, what GCC lacks in convention count it makes up for in **register allocation flexibility**: every function gets a customized stack frame and register save set based on exactly which variables the compiler decides to keep in registers.
|
||||
|
||||
### Primary Convention (cdecl, the GCC default)
|
||||
|
||||
| Aspect | GCC Convention |
|
||||
|---|---|
|
||||
| **Return value** | D0 (32-bit integer/pointer), D0:D1 (64-bit `long long`), FP0 (float/double on FPU systems). Structs > 8 bytes: caller allocates space, passes hidden pointer in **A0**. |
|
||||
| **First 2 integer args** | D0, D1 — passed in registers. These are **caller-saved** (the callee may destroy them). |
|
||||
| **All remaining args** | Pushed onto the stack **right-to-left** before the call. The **caller** cleans the stack after the call returns (cdecl convention). |
|
||||
| **Callee-saved registers** | D2-D7, A2-A5 — but GCC saves **only the subset actually used** by the function. This is the key identifiability feature. |
|
||||
| **Caller-saved registers** | D0, D1, A0, A1 — destroyed across calls. If the caller needs these values after a call, it must save them itself. |
|
||||
| **Frame pointer** | A6 when not omitted (`-fno-omit-frame-pointer`); otherwise SP-relative access for both locals and incoming stack args. |
|
||||
| **Library base** | A6 — loaded per-library at call sites. GCC neither preserves A6 across library calls nor uses A6 for any other purpose during library call sequences. |
|
||||
|
||||
> [!NOTE]
|
||||
> Unlike SAS/C's `#pragma libcall` which bakes the register assignment into the pragma, GCC uses inline assembly stubs (`<inline/exec.h>`, `<inline/dos.h>`) or the `__asm()` keyword to set up library calls. In the binary, the result looks identical — `MOVE.L args, Dn` / `JSR -$XXX(A6)` — but the surrounding code pattern differs (GCC is tighter, fewer redundant loads).
|
||||
|
||||
### Parameter Passing — Detailed Breakdown
|
||||
|
||||
Understanding exactly which parameter lands in which register vs which stack slot is essential for reconstructing function prototypes in IDA/Ghidra.
|
||||
|
||||
```
|
||||
Caller side (before BSR/JSR _func):
|
||||
Stack layout after BSR:
|
||||
MOVE.L arg1, D0 ─┐ ┌──────────────────────┐
|
||||
MOVE.L arg2, D1 ├ registers │ arg8 (last pushed) │ SP+28
|
||||
MOVE.L arg3, -(SP) ─┐ │ arg7 │ SP+24
|
||||
MOVE.L arg4, -(SP) ├ stack │ arg6 │ SP+20
|
||||
... │ │ arg5 │ SP+16
|
||||
MOVE.L argN, -(SP) ─┘ │ arg4 │ SP+12
|
||||
BSR _func │ arg3 │ SP+8 ← first stack arg
|
||||
│ return address │ SP+4
|
||||
ADD.L #N*4, SP ← caller cleans │ (saved regs...) │ SP+0
|
||||
└──────────────────────┘
|
||||
```
|
||||
|
||||
**Identifying parameters in disassembly:**
|
||||
|
||||
| Parameter | Location in Callee | How to Find It |
|
||||
|---|---|---|
|
||||
| **arg1** | D0 (may be moved to a callee-saved reg immediately) | Look for `MOVE.L D0, Dn` early in the function |
|
||||
| **arg2** | D1 (same — often moved to a callee-saved reg) | Look for `MOVE.L D1, Dn` after D0 is saved |
|
||||
| **arg3** | `$04(SP)` or `$0C(A6)` (after return address + saved regs) | First stack arg — offset depends on prologue |
|
||||
| **arg4+** | `$08(SP)`, `$0C(SP)`... or `$10(A6)`, `$14(A6)`... | Sequential 4-byte slots above arg3 |
|
||||
|
||||
**With frame pointer (A6):**
|
||||
```asm
|
||||
; Function with LINK A6, #-$10 and MOVEM.L D2-D4, -(SP):
|
||||
_func:
|
||||
LINK A6, #-$10 ; A6 = SP, SP -= 16 (locals)
|
||||
MOVEM.L D2-D4, -(SP) ; save 3 regs (12 bytes)
|
||||
|
||||
; Now the stack looks like:
|
||||
; $08(A6) = return address
|
||||
; $0C(A6) = arg3 (first stack arg at A6+12)
|
||||
; $10(A6) = arg4 ; A6+16
|
||||
; $14(A6) = arg5 ; A6+20
|
||||
|
||||
MOVE.L $0C(A6), D2 ; D2 = arg3 (typical: move to callee-saved)
|
||||
; ...
|
||||
MOVEM.L (SP)+, D2-D4
|
||||
UNLK A6
|
||||
RTS
|
||||
```
|
||||
|
||||
**Without frame pointer (default -O2):**
|
||||
```asm
|
||||
; Function with only MOVEM.L D2-D3, -(SP):
|
||||
_func:
|
||||
MOVEM.L D2-D3, -(SP) ; save 2 regs (8 bytes)
|
||||
|
||||
; Now args are at:
|
||||
; $0C(SP) = arg3 (12 = 4 ret addr + 8 saved regs)
|
||||
; $10(SP) = arg4 ; SP+16
|
||||
|
||||
MOVE.L $0C(SP), D2 ; D2 = arg3
|
||||
; ...
|
||||
MOVEM.L (SP)+, D2-D3
|
||||
RTS
|
||||
```
|
||||
|
||||
> [!WARNING]
|
||||
> **SP-relative offsets are unstable.** If the function uses `ADDQ.L/SUBQ.L` on SP, `PEA`, or pushes temporary values, the SP-relative offset for the same argument shifts. With A6-relative addressing (frame pointer enabled), offsets are constant throughout the function body.
|
||||
|
||||
### Special Argument Types
|
||||
|
||||
| Type | Convention | Disassembly Pattern |
|
||||
|---|---|---|
|
||||
| **64-bit `long long`** | D0:D1 (low 32 in D0, high 32 in D1). If not first param, passed on stack as 8-byte aligned pair. | `MOVE.L D0, D2` / `MOVE.L D1, D3` — pair of moves to callee-saved regs |
|
||||
| **Struct ≤ 8 bytes** | Passed in D0:D1 (if first param) or on stack. | Look for byte-field extraction: `ANDI.B #$FF, D0` / `LSR.L #8, D0` |
|
||||
| **Struct > 8 bytes** | Caller allocates space, passes hidden pointer in **A0**. Callee copies if needed. | `MOVEA.L A0, A2` — A0 moved to callee-saved address reg early in prologue |
|
||||
| **`float` (FPU)** | FP0 (if FPU codegen enabled). With `-msoft-float`, passed as 32-bit integer in D0 or stack. | `FMOVE.S X, FP0` vs `MOVE.L #$3F800000, D0` (1.0f as integer) |
|
||||
| **`double` (FPU)** | FP0 (FPU). With `-msoft-float`, passed as 64-bit pair in D0:D1 or on stack. | `FMOVE.D X, FP0` vs D0:D1 pair |
|
||||
|
||||
### GCC Register Allocation — Recognizing Register vs Stack Variables
|
||||
|
||||
GCC's register allocator is the single most important thing to understand when reading GCC output, because it determines whether a C variable appears as a persistent register value or a frame-relative stack slot.
|
||||
|
||||
#### How GCC Assigns Registers to Variables
|
||||
|
||||
GCC 2.95.x uses a **priority-based graph coloring allocator**. The heuristic, simplified:
|
||||
|
||||
1. **Most-referenced variables get registers first.** A loop counter used 50 times wins over a flag set once.
|
||||
2. **Address-taken variables go to stack.** If a variable's address is taken (`&x`), it MUST live in memory — GCC can't keep it in a register.
|
||||
3. **D2-D7 used for integer/pointer values.** Data registers are the first choice for arithmetic and pointer-sized values.
|
||||
4. **A2-A5 used for pointer chasing and base addresses.** Address registers are preferred for `struct->field` access and array indexing.
|
||||
5. **Register pressure causes spilling.** If a function uses more live variables than available registers, the least-frequently-used variable gets evicted to a stack slot.
|
||||
|
||||
#### Identifying Register Variables in Disassembly
|
||||
|
||||
```asm
|
||||
; GCC -O2 function with register-allocated locals:
|
||||
_count_words:
|
||||
MOVEM.L D2-D3, -(SP) ; D2-D3 saved → they WILL be used as locals
|
||||
|
||||
MOVE.L D0, D2 ; D2 = str (arg1 moved to callee-saved reg)
|
||||
MOVEQ #0, D3 ; D3 = count (initialized to 0, stays in D3)
|
||||
MOVEQ #0, D1 ; D1 = in_word (scratch — destroyed across calls)
|
||||
|
||||
.loop:
|
||||
TST.B (D2) ; D2 used as pointer (not reloaded from stack)
|
||||
BEQ.S .done
|
||||
CMPI.B #' ', (D2)
|
||||
BNE.S .not_space
|
||||
MOVEQ #0, D1 ; D1 modified directly — no stack write
|
||||
.not_space:
|
||||
; ...
|
||||
ADDQ.L #1, D3 ; D3 incremented in-register — no stack read/modify/write
|
||||
BRA.S .loop
|
||||
|
||||
.done:
|
||||
MOVE.L D3, D0 ; return count (from D3, not from a stack load)
|
||||
MOVEM.L (SP)+, D2-D3
|
||||
RTS
|
||||
```
|
||||
|
||||
**Key signs a variable lives in a register:**
|
||||
- The register is saved in the prologue → it's being used as a named local
|
||||
- The variable's value is modified with `ADDQ`, `SUBQ`, `MOVEQ` operating on that register — never with `MOVE $offset(A6), Dn` / modify / `MOVE Dn, $offset(A6)`
|
||||
- The variable is read **without a preceding stack load** and written **without a following stack store**
|
||||
- At function exit, the value returns from the register, not from a reload
|
||||
|
||||
#### Identifying Stack Variables in Disassembly
|
||||
|
||||
```asm
|
||||
; Same function compiled -O0 (everything on stack):
|
||||
_count_words_O0:
|
||||
LINK A6, #-$08 ; 8 bytes of locals
|
||||
MOVEM.L D2-D3, -(SP)
|
||||
|
||||
MOVE.L $08(A6), D0 ; load arg1 from stack
|
||||
MOVE.L D0, -$04(A6) ; spill to local: str
|
||||
CLR.L -$08(A6) ; count = 0 (on stack)
|
||||
|
||||
.loop:
|
||||
MOVEA.L -$04(A6), A0 ; load str from stack
|
||||
TST.B (A0)
|
||||
BEQ.S .done
|
||||
; ... modify count ...
|
||||
ADDQ.L #1, -$08(A6) ; count++ — READ-MODIFY-WRITE to stack slot
|
||||
BRA.S .loop
|
||||
|
||||
.done:
|
||||
MOVE.L -$08(A6), D0 ; return count (load from stack)
|
||||
MOVEM.L (SP)+, D2-D3
|
||||
UNLK A6
|
||||
RTS
|
||||
```
|
||||
|
||||
**Key signs a variable lives on the stack:**
|
||||
- Every read is preceded by `MOVE.L $offset(A6), Dn`
|
||||
- Every write follows `MOVE.L Dn, $offset(A6)`
|
||||
- Increments are three instructions: load→add→store (read-modify-write)
|
||||
- The same frame offset (`-$04(A6)`) appears in multiple load/store instructions
|
||||
- Variables are never held in callee-saved registers across statements
|
||||
|
||||
#### Recognizing Spilled Registers
|
||||
|
||||
When register pressure exceeds available registers, GCC **spills** a variable temporarily to the stack:
|
||||
|
||||
```asm
|
||||
; D2 holds 'count', but we need D2 for a DIVU operation:
|
||||
MOVE.L D2, -$04(A6) ; spill count to stack
|
||||
MOVE.L denominator, D2
|
||||
DIVU D2, D0 ; D0/D2 → D0 (D2 destroyed)
|
||||
MOVE.L -$04(A6), D2 ; reload count from stack
|
||||
```
|
||||
|
||||
**Spill identification**: look for a `MOVE.L Dn, $offset(A6)` followed later by `MOVE.L $offset(A6), Dn` where `Dn` is used for a different purpose in between. The frame offset is typically in the local-variable area (negative offset from A6, or positive offset from SP+0).
|
||||
|
||||
#### Register Allocation Quick-Reference
|
||||
|
||||
| Pattern | Register Variable | Stack Variable | Spilled Variable |
|
||||
|---|---|---|---|
|
||||
| **Prologue saves it** | ✅ Saved in MOVEM | ❌ Not saved specifically | ✅ Saved in MOVEM |
|
||||
| **Read pattern** | Value already in Dn — no load | `MOVE.L $offset, Dn` before every use | `MOVE.L Dn, $offset` (store) then later `MOVE.L $offset, Dn` (load) |
|
||||
| **Write pattern** | `MOVEQ/ADDQ/SUBQ Dn` — register direct | `MOVE Dn, $offset` + `ADDQ $offset` or separate modify+store | `MOVE.L Dn, $offset` (spill); `MOVE.L $offset, Dn` (reload) |
|
||||
| **Typical compiler** | GCC -O2, -Os, -O3 | GCC -O0; SAS/C with low optimization | GCC under register pressure; SAS/C with many locals |
|
||||
| **RE effort** | Harder — must track register lifetime | Easier — named stack slot = stable location | Hardest — intermittent storage |
|
||||
|
||||
### Function Call Setup Patterns
|
||||
|
||||
GCC's call-site code reveals whether the caller passes parameters in registers or had to push to the stack:
|
||||
|
||||
```asm
|
||||
; Calling a function with 2 or fewer args (register-only):
|
||||
MOVE.L filename, D0 ; arg1 in D0
|
||||
MOVEQ #MODE_OLDFILE, D1 ; arg2 in D1
|
||||
BSR _OpenFile ; no stack setup, no cleanup
|
||||
|
||||
; Calling a function with 4 args (2 register + 2 stack):
|
||||
MOVE.L count, -(SP) ; arg4 pushed first (right-to-left!)
|
||||
MOVE.L buffer, -(SP) ; arg3 pushed second
|
||||
MOVE.L fh, D1 ; arg2 in D1
|
||||
MOVE.L #1024, D0 ; arg1 in D0
|
||||
BSR _ReadData
|
||||
ADDQ.L #8, SP ; caller cleans 8 bytes of stack args
|
||||
|
||||
; Calling a varargs function (all args on stack — no register args):
|
||||
MOVE.L arg3, -(SP)
|
||||
MOVE.L arg2, -(SP)
|
||||
MOVE.L arg1, -(SP)
|
||||
BSR _Printf
|
||||
LEA $0C(SP), SP ; caller cleans 12 bytes
|
||||
```
|
||||
|
||||
> [!NOTE]
|
||||
> **Varargs functions** (like `Printf`, `sprintf`, custom `Format()`) force ALL arguments onto the stack in GCC 2.95.x — even the first two. This is a reliable disambiguator: if you see a call with 3+ stack pushes and NO register args, the target is likely a varargs function.
|
||||
|
||||
### `__attribute__((interrupt))` — Interrupt Handler
|
||||
|
||||
```asm
|
||||
; GCC interrupt handler:
|
||||
_int_handler:
|
||||
MOVEM.L D0-D7/A0-A6, -(SP) ; save ALL regs
|
||||
; ... handler body ...
|
||||
MOVEM.L (SP)+, D0-D7/A0-A6
|
||||
RTE ; Return From Exception
|
||||
```
|
||||
|
||||
### `__attribute__((noreturn))` — No-Return Functions
|
||||
|
||||
```asm
|
||||
; GCC noreturn function — NO RTS at end:
|
||||
_exit_func:
|
||||
; ... cleanup ...
|
||||
JSR _exit ; tail-call to exit()
|
||||
; No RTS — compiler knows this never returns
|
||||
; May be followed by ILLEGAL or DC.B 0 padding
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Library Call Patterns
|
||||
|
||||
### GCC Library Call Style
|
||||
|
||||
```asm
|
||||
; GCC library call — characteristic patterns:
|
||||
; 1. Library base loaded once, may be reused across calls
|
||||
MOVEA.L (_SysBase).L, A6 ; load from absolute address (or PC-relative)
|
||||
|
||||
; 2. Arguments set up with minimal register traffic
|
||||
MOVE.L D3, D1 ; arg1 already in D3, just move to D1
|
||||
MOVE.L #$100, D2 ; immediate arg2
|
||||
|
||||
; 3. LVO call
|
||||
JSR -$C6(A6) ; AllocMem
|
||||
|
||||
; 4. Return value used immediately
|
||||
MOVE.L D0, A0 ; ptr → A0 for immediate use
|
||||
```
|
||||
|
||||
Compared to SAS/C:
|
||||
- GCC is more likely to reuse A6 across multiple library calls without reloading
|
||||
- GCC uses `MOVE.L Dreg, D1` (register-to-register) where SAS/C would reload from stack
|
||||
- GCC may use `LEA (xxx).L, A0` or `MOVEA.L (xxx).L, A0` for address loads
|
||||
|
||||
### Position-Independent Code (`-fPIC`)
|
||||
|
||||
```asm
|
||||
; GCC -fPIC: PC-relative indirection through GOT-like table
|
||||
LEA _GLOBAL_OFFSET_TABLE_(PC), A4 ; A4 = GOT base
|
||||
MOVEA.L (_SysBase@GOT)(A4), A6 ; load SysBase via GOT slot
|
||||
JSR -$C6(A6) ; AllocMem
|
||||
```
|
||||
|
||||
When `-fPIC` is enabled, globals are accessed through a GOT (Global Offset Table) similar to ELF shared libraries. This pattern uses `A4` as the GOT base register and `LEA xxx(PC), A4` at function entry.
|
||||
|
||||
---
|
||||
|
||||
## C++ Support — What It Means for RE
|
||||
|
||||
### Global Constructors and Destructors
|
||||
|
||||
GCC 2.95.x emits two arrays for C++ global object initialization:
|
||||
|
||||
```
|
||||
__CTOR_LIST__ format:
|
||||
┌──────────────────────┐
|
||||
│ count (N) │ __CTOR_LIST__[0]
|
||||
├──────────────────────┤
|
||||
│ constructor_1 │ function pointer
|
||||
├──────────────────────┤
|
||||
│ constructor_2 │
|
||||
├──────────────────────┤
|
||||
│ ... │
|
||||
├──────────────────────┤
|
||||
│ 0x00000000 │ Terminator (NULL)
|
||||
└──────────────────────┘
|
||||
|
||||
__DTOR_LIST__ — identical format for destructors.
|
||||
```
|
||||
|
||||
**In disassembly**:
|
||||
```asm
|
||||
; The startup code processes __CTOR_LIST__ before calling main():
|
||||
_do_global_ctors:
|
||||
MOVEA.L #__CTOR_LIST__, A0 ; A0 = ctor array
|
||||
MOVE.L (A0)+, D0 ; D0 = count
|
||||
SUBQ.L #1, D0
|
||||
BMI.S .done
|
||||
|
||||
.ctor_loop:
|
||||
MOVEA.L (A0)+, A1 ; A1 = ctor function pointer
|
||||
JSR (A1) ; call ctor
|
||||
DBRA D0, .ctor_loop
|
||||
.done:
|
||||
RTS
|
||||
```
|
||||
|
||||
**RE importance**: If you see `__CTOR_LIST__` in the symbol table or a constructor-processing loop in the startup code, the binary was compiled with GCC and likely contains C++ code. SAS/C does not use this mechanism.
|
||||
|
||||
### Vtable Layout (GCC 2.95.x m68k C++)
|
||||
|
||||
See [cpp_vtables_reversing.md](../cpp_vtables_reversing.md) for the complete GCC C++ vtable/RTTI layout. Key points for compiler identification:
|
||||
- Vtable symbol naming: `_ZTV6Window` (GCC mangled)
|
||||
- RTTI pointer at `vtable[-1]`
|
||||
- `offset_to_top` at `vtable[-2]`
|
||||
- C++ name mangling follows GCC 2.95 conventions (different from StormC++)
|
||||
|
||||
---
|
||||
|
||||
## Optimization Level Fingerprints
|
||||
|
||||
| Level | Flag | Binary Characteristics |
|
||||
|---|---|---|
|
||||
| **-O0** | Default | Every variable on stack. No register allocation across statements. Full `LINK A6` frame. `MOVE.L D0, -4(A6)` / `MOVE.L -4(A6), D0` store-reload pairs. |
|
||||
| **-O1** | `-O` | Basic register allocation. Dead code removed. Constants folded. `MOVEQ` for small values. Redundant stack traffic eliminated. |
|
||||
| **-O2** | `-O2` | Aggressive CSE (common subexpression elimination). Loop invariants hoisted. `-fomit-frame-pointer` implied. Loop induction variable optimization. |
|
||||
| **-Os** | `-Os` | `-O2` but favoring smaller code. May use `BSR` instead of inlining. `DBRA` loops preferred over unrolled sequences. |
|
||||
| **-O3** | `-O3` | Function inlining (`-finline-functions`). `__builtin_memcpy` expansion. Aggressive loop unrolling. |
|
||||
|
||||
**How to identify**:
|
||||
- **-O0**: Distinctive store-immediate-reload pattern. Look for `MOVE.L D0, -N(A6)` followed immediately by `MOVE.L -N(A6), D0` — the compiler stores then reloads the same value.
|
||||
- **-O2+**: Variables stay in registers across compound statements. The `LINK A6` instruction is absent in most functions.
|
||||
- **-O3**: You'll find expanded inline code where a function call would normally appear. Look for repeated code blocks with slightly different register assignments.
|
||||
|
||||
### Tail-Call Optimization
|
||||
|
||||
GCC aggressively applies tail-call optimization:
|
||||
|
||||
```asm
|
||||
; Instead of:
|
||||
BSR _helper_func
|
||||
RTS
|
||||
|
||||
; GCC generates:
|
||||
BRA _helper_func ; JMP to helper — no return, no stack growth
|
||||
```
|
||||
|
||||
The `BRA` to another function (not a local label) is GCC's tail-call signature. SAS/C rarely does this.
|
||||
|
||||
---
|
||||
|
||||
## Startup Code — libnix vs clib2 vs ixemul
|
||||
|
||||
### libnix Startup (Most Common)
|
||||
|
||||
```asm
|
||||
; libnix gcrt0.S — minimal startup:
|
||||
_start:
|
||||
MOVEA.L 4.W, A6 ; SysBase
|
||||
JSR ___startup_SysBase ; store SysBase, init libnix internals
|
||||
|
||||
; Open dos.library
|
||||
LEA .dosname(PC), A1
|
||||
MOVEQ #0, D0
|
||||
JSR -$228(A6) ; OpenLibrary (LVO differs by build)
|
||||
|
||||
; Parse CLI args
|
||||
JSR ___parse_args ; sets up __argc, __argv globals
|
||||
|
||||
; Call main()
|
||||
JSR _main
|
||||
|
||||
; Exit
|
||||
MOVE.L D0, -(SP)
|
||||
JSR ___exit
|
||||
|
||||
.dosname: .asciz "dos.library"
|
||||
```
|
||||
|
||||
**Finding `main()`**: Locate `_start`, find the `JSR _main` call. In GCC/libnix binaries, the `_main` symbol is typically preserved even without debug info, because the startup code must reference it.
|
||||
|
||||
### ixemul Startup (Unix-like)
|
||||
|
||||
ixemul provides a much richer Unix-like environment. The startup code is substantially larger and includes `__init_env`, `__parse_shell_args`, and signal setup. ixemul binaries require `ixemul.library` at runtime — a unique dependency that strongly identifies the binary.
|
||||
|
||||
---
|
||||
|
||||
## Same C Function — GCC Output
|
||||
|
||||
```asm
|
||||
; CountWords() — GCC 2.95.3, -O2, -fomit-frame-pointer:
|
||||
; C prototype: ULONG CountWords(CONST_STRPTR str)
|
||||
|
||||
_CountWords:
|
||||
MOVEM.L D2-D3, -(SP) ; save only D2-D3 (no LINK, no A2-A6)
|
||||
|
||||
MOVEQ #0, D2 ; D2 = count
|
||||
MOVEQ #0, D3 ; D3 = in_word
|
||||
|
||||
MOVEA.L $0C(SP), A0 ; A0 = str (arg at SP+12, after saved regs)
|
||||
|
||||
BRA.S .L2
|
||||
|
||||
.L5:
|
||||
CMPI.B #' ', (A0) ; compare immediate to memory — GCC style
|
||||
BEQ.S .L3
|
||||
CMPI.B #'\t', (A0)
|
||||
BEQ.S .L3
|
||||
CMPI.B #'\n', (A0)
|
||||
BEQ.S .L3
|
||||
|
||||
TST.B D3
|
||||
BNE.S .L4
|
||||
ADDQ.L #1, D2
|
||||
MOVEQ #1, D3
|
||||
BRA.S .L4
|
||||
|
||||
.L3:
|
||||
MOVEQ #0, D3 ; in_word = 0
|
||||
|
||||
.L4:
|
||||
ADDQ.L #1, A0 ; str++
|
||||
|
||||
.L2:
|
||||
TST.B (A0)
|
||||
BNE.S .L5
|
||||
|
||||
MOVE.L D2, D0 ; return count
|
||||
MOVEM.L (SP)+, D2-D3
|
||||
RTS
|
||||
```
|
||||
|
||||
**GCC-specific observations**:
|
||||
1. **No `LINK` instruction** — frame pointer omitted. Arg accessed as `$0C(SP)` (SP + saved regs + return address).
|
||||
2. **`CMPI.B #' ', (A0)`** — compare-immediate-to-memory instruction. GCC uses `CMPI` where SAS/C uses `MOVEQ`+`CMP`. This is more compact (one instruction vs two).
|
||||
3. **Minimal register save** — only `D2-D3` saved (two registers actually used). SAS/C would save 9 (or at minimum D2-D3 but with LINK).
|
||||
4. **`BRA.S .L4`** — unconditional branch to common `str++` code. GCC's optimizer merges the increment code.
|
||||
5. **SP-relative argument access** — `$0C(SP)` instead of `$08(A5)`. This changes as the stack grows/shrinks within the function.
|
||||
|
||||
**SAS/C comparison (same function)**:
|
||||
| Aspect | SAS/C | GCC |
|
||||
|---|---|---|
|
||||
| Frame setup | `LINK A5, #-$08` + `MOVEM.L D2-D3, -(SP)` | `MOVEM.L D2-D3, -(SP)` only |
|
||||
| First char compare | `MOVEQ #' ', D0` / `CMP.B (A0), D0` | `CMPI.B #' ', (A0)` |
|
||||
| Arg access | `$08(A5)` — stable throughout function | `$0C(SP)` — changes if SP moves |
|
||||
| Total instructions | 28 (varies by optimization) | 25 |
|
||||
| Code size | ~52 bytes | ~48 bytes |
|
||||
|
||||
---
|
||||
|
||||
## Named Antipatterns
|
||||
|
||||
### "The Unix Hunk Assumption" — Confusing `.text` with CODE
|
||||
|
||||
```asm
|
||||
; WRONG: treating .text hunk as just "code" and ignoring PC-relative data:
|
||||
; If you see this and think "that's just a weird instruction":
|
||||
LEA .LC0(PC), A0
|
||||
; ... but .LC0 is actually a string embedded in .text:
|
||||
.LC0: DC.B "Hello", 0
|
||||
; These two are in the SAME hunk. IDA may not split them properly.
|
||||
```
|
||||
|
||||
**Fix**: After loading a GCC binary in IDA, search for `LEA xxx(PC), A0` patterns and check if `xxx` resolves to ASCII data. If so, convert the bytes at `xxx` to a string type. For strings that follow a function's `RTS` instruction, create a separate data segment in the `.text` hunk area.
|
||||
|
||||
### "The Missing Frame" — Assuming Every Function Has LINK
|
||||
|
||||
```asm
|
||||
; WRONG: looking for LINK/UNLK to find function boundaries
|
||||
; GCC function with no frame pointer:
|
||||
_myfunc:
|
||||
MOVEM.L D2-D4, -(SP)
|
||||
; ... 200 lines of code ...
|
||||
MOVEM.L (SP)+, D2-D4
|
||||
RTS
|
||||
; If you search for LINK, you'll never find this function's boundary
|
||||
```
|
||||
|
||||
**Fix**: Function boundaries in GCC are marked by `RTS` (return) instructions. A GCC function can start at any address after a previous `RTS`/`RTE`/`ILLEGAL`/`JMP` that terminates execution flow. Use IDA's auto-analysis or Ghidra's function detection, which look for `RTS` boundaries.
|
||||
|
||||
### "The A6 Confusion" — GCC Frame Pointer vs Library Base
|
||||
|
||||
```asm
|
||||
; CRITICAL: A6 plays TWO roles in GCC binaries:
|
||||
; Role 1: Frame pointer (when -fno-omit-frame-pointer)
|
||||
; Role 2: Library base (during JSR -$XXX(A6) calls)
|
||||
;
|
||||
; WRONG: seeing LINK A6 and thinking A6 is the exec base:
|
||||
_func:
|
||||
LINK A6, #-$14 ; A6 = FRAME POINTER here
|
||||
MOVEM.L D2, -(SP)
|
||||
; ...
|
||||
MOVEA.L (_DOSBase).L, A6 ; A6 = DOS BASE now (overwrites frame ptr!)
|
||||
JSR -$2A(A6) ; Read() via DOS base
|
||||
; After JSR, A6 is NO LONGER VALID as frame pointer or library base
|
||||
; GCC will RELOAD A6 from global before next library call
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pitfalls & Common Mistakes
|
||||
|
||||
### 1. Misidentifying `-fomit-frame-pointer` Code as Hand-Written Assembly
|
||||
|
||||
```asm
|
||||
; GCC -O2 output can look surprisingly like hand-optimized asm:
|
||||
MOVEM.L D2/A2, -(SP)
|
||||
LEA .LC0(PC), A0 ; string reference
|
||||
MOVEA.L (_DOSBase).L, A6
|
||||
MOVE.L (A1), D1
|
||||
JSR -$2A(A6)
|
||||
; The combination of PC-relative string + SP-relative access + per-function save
|
||||
; looks like hand-crafted code. It's just GCC -O2.
|
||||
```
|
||||
|
||||
### 2. Missing `__CTOR_LIST__` Means Missing C++ Globals
|
||||
|
||||
If the binary has `__CTOR_LIST__` / `__DTOR_LIST__` but you don't trace them, you'll miss global C++ objects that execute code before `main()` runs. These constructors can allocate memory, open resources, or register callbacks — essential for understanding program behavior.
|
||||
|
||||
### 3. Tail-Call Optimization Confusion
|
||||
|
||||
```asm
|
||||
; You might incorrectly identify function boundaries here:
|
||||
_funcA:
|
||||
; ... code ...
|
||||
BRA _funcB ; THIS IS A TAIL CALL, not the end of funcA
|
||||
; _funcB inherits funcA's stack frame and returns directly to funcA's caller
|
||||
; The call graph should show: caller → funcA → funcB (not two parallel calls)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Software Known to Be GCC-Compiled
|
||||
|
||||
| Application | Compiler | RE Clues |
|
||||
|---|---|---|
|
||||
| **AmigaAMP** | GCC 2.95.x | `.text`/.`data` hunks; PC-relative strings; libnix startup; plugin architecture via `dlopen`-like mechanism |
|
||||
| **ScummVM (Amiga port)** | GCC 6.x (bebbo) | Modern GCC codegen; large `.text` hunk; C++ vtables with GCC mangling |
|
||||
| **Miami TCP/IP** | GCC 2.95.x | Mixed C/asm; `libnix` startup; `__CTOR_LIST__` for global initializers |
|
||||
| **AmiTCP** | GCC 2.7.x | Early GCC codegen; less aggressive optimization; no tail-call |
|
||||
| **Various 1996–2000 ports** | GCC 2.95.x (GeekGadgets) | Unix-to-Amiga ports; often ixemul-dependent; `.text` hunk naming |
|
||||
| **MUI 3.x custom classes** | Various, including GCC | C++ vtables need GCC-specific handling; BOOPSI dispatch patterns |
|
||||
|
||||
---
|
||||
|
||||
## Historical Context
|
||||
|
||||
GCC on Amiga arrived relatively late. While Lattice/SAS C dominated the late 1980s, the **GeekGadgets** project (1995) brought a complete GCC-based Unix-like environment to AmigaOS, including GCC 2.7.x and later 2.95.x. This opened the door for Unix software ports and attracted developers who preferred GCC's familiar GNU toolchain.
|
||||
|
||||
Key timeline:
|
||||
- **1995**: GeekGadgets — first usable GCC for AmigaOS (2.7.2)
|
||||
- **1996**: GCC 2.95.3 — stable, well-tested, becomes the standard
|
||||
- **2000s**: Various GCC 3.x/4.x ports (limited adoption due to code size)
|
||||
- **2015–present**: bebbo's GCC 6.5 cross-compiler — modern GCC for retro development
|
||||
|
||||
GCC's PC-relative addressing is a fundamental design difference from SAS/C. It stems from GCC's Unix heritage where position-independent code (PIC) is essential for shared libraries. On AmigaOS, PC-relative code has the practical benefit that the `.text` hunk can be loaded anywhere without relocation — the HUNK loader doesn't need to patch string references.
|
||||
|
||||
The A6 frame pointer choice (rather than A5) comes from the System V m68k ABI, which designated A6 as the frame pointer. GCC followed this convention because the m68k backend was shared across all m68k targets (Sun, HP, Amiga, Atari).
|
||||
|
||||
---
|
||||
|
||||
## Modern Analogies
|
||||
|
||||
| GCC 2.95.x Concept | Modern Equivalent | Notes |
|
||||
|---|---|---|
|
||||
| `-fomit-frame-pointer` | Default in modern compilers (`-O2` on x86-64 omits RBP) | Same tradeoff: faster code vs harder debugging |
|
||||
| PC-relative string addressing | `-fpic` code on modern ELF systems | Same principle: load-time relocation avoidance |
|
||||
| `__CTOR_LIST__` / `__DTOR_LIST__` | `.init_array` / `.fini_array` sections in ELF | Same purpose: global constructor/destructor registration; modern ELF is more structured |
|
||||
| `libnix` minimal runtime | Newlib / picolibc for embedded systems | Both provide compact C runtime for constrained environments |
|
||||
| `ixemul` Unix emulation | Cygwin / MSYS2 DLL (Unix-on-Windows) | Both provide Unix API layer on top of non-Unix kernel |
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
**Q: How do I tell GCC 2.95.x from GCC 6.x (bebbo) in a binary?**
|
||||
A: GCC 2.95.x uses gcc-specific HUNK_SYMBOL patterns (`.Lxxx` local labels). GCC 6.x with bebbo's toolchain uses `vasm`/`vlink` which generate `CODE`/`DATA` hunk names (Amiga standard, not `.text`). GCC 6.x also applies more aggressive optimizations — if you see heavy loop unrolling and auto-vectorization patterns on m68k, it's modern GCC.
|
||||
|
||||
**Q: Why are there no `__CTOR_LIST__` entries in my GCC binary?**
|
||||
A: `__CTOR_LIST__` only exists if the binary uses C++ with global objects, or if compiled with `-finit-priority` in C. Pure C programs without global constructors won't have it.
|
||||
|
||||
**Q: How do I find `main()` in a stripped GCC binary?**
|
||||
A: Search for libnix startup signature: `MOVE.L 4.W, A6` / `JSR ___startup_SysBase`. The `JSR` after `dos.library` open is `_main`. Even in stripped binaries, the startup code is typically at the beginning of `.text` and the call pattern is consistent.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [13_toolchain/gcc_amiga.md](../../../13_toolchain/gcc_amiga.md) — GCC setup and compilation
|
||||
- [compiler_fingerprints.md](../../compiler_fingerprints.md) — Quick compiler identification
|
||||
- [cpp_vtables_reversing.md](../cpp_vtables_reversing.md) — GCC C++ vtable layout and RTTI
|
||||
- [startup_code.md](../../../04_linking_and_libraries/startup_code.md) — libnix/clib2 startup internals
|
||||
- *bebbo's amiga-gcc*: https://codeberg.org/bebbo/amiga-gcc
|
||||
- *GeekGadgets*: GCC 2.95 for AmigaOS (archived documentation)
|
||||
- See also: [sasc.md](sasc.md), [vbcc.md](vbcc.md) — compare with other compilers
|
||||
153
05_reversing/static/compilers/lattice_c.md
Normal file
153
05_reversing/static/compilers/lattice_c.md
Normal file
|
|
@ -0,0 +1,153 @@
|
|||
[← Home](../../../README.md) · [Reverse Engineering](../../README.md) · [Static Analysis](../README.md) · [Compilers](README.md)
|
||||
|
||||
# Lattice C 3.x/4.x — Reverse Engineering Field Manual
|
||||
|
||||
## Overview
|
||||
|
||||
**Lattice C** (versions 3.x–4.x, 1985–1989) is the direct predecessor to SAS/C. When SAS Institute acquired the Lattice C product line in 1988, they rebranded version 5.0 as "SAS/C". Lattice C 3.x and 4.x binaries represent the first generation of commercial C compilers for AmigaOS. Their code generation is recognizably similar to SAS/C but with less aggressive optimization and some distinct early patterns.
|
||||
|
||||
Key constraints:
|
||||
- **The transition point**: Lattice C 3.x → 4.x → SAS/C 5.x form a continuous evolution. Code from 3.x looks noticeably "early" — simpler register allocation, less peephole optimization, longer function prologues.
|
||||
- **LINK A5 + D2-D5/A2-A3 save** — Lattice C 3.x typically saves fewer registers than SAS/C (D2-D5 + A2-A3, 6 registers total) but more than Aztec C (5 regs, data only).
|
||||
- **Startup code evolution** — Lattice C 3.x's `lc.o` startup is simpler than SAS/C's `c.o` — may not handle Workbench launches correctly, may not support `argc`/`argv` parsing.
|
||||
- **Hunk names**: `CODE`, `DATA`, `BSS` (same as SAS/C — established this convention)
|
||||
|
||||
```asm
|
||||
; Lattice C 3.x function prologue (less aggressive than SAS/C):
|
||||
_func:
|
||||
LINK A5, #-$14
|
||||
MOVEM.L D2-D5/A2-A3, -(SP) ; 4 data + 2 address = 6 registers
|
||||
; Compare: SAS/C saves D2-D7/A2-A4 (9 registers)
|
||||
; Compare: Aztec C saves D3-D7 only (5 registers, data only)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Binary Identification — Lattice C vs SAS/C
|
||||
|
||||
| Criterion | Lattice C 3.x | Lattice C 4.x | SAS/C 5.x/6.x |
|
||||
|---|---|---|---|
|
||||
| **Register save** | D2-D5, A2-A3 (6 regs) | D2-D6, A2-A3 (7 regs) | D2-D7, A2-A4 (9 regs) |
|
||||
| **D6/D7 usage** | Rarely used | Sometimes used | Frequently used |
|
||||
| **Peephole optimization** | Minimal | Moderate | Aggressive |
|
||||
| **MOVEQ for small values** | Inconsistent | Common | Always |
|
||||
| **Stack frame** | LINK A5 always | LINK A5 always | LINK A5 always |
|
||||
| **Library calls** | `JSR -$XXX(A6)` | `JSR -$XXX(A6)` | `JSR -$XXX(A6)` |
|
||||
| **Startup** | `lc.o` (simpler) | `lc.o` (improved) | `c.o` (full-featured) |
|
||||
| **Era** | 1985–1987 | 1987–1989 | 1988–1996 |
|
||||
|
||||
### Evolutionary Markers
|
||||
|
||||
The Lattice→SAS/C evolution is visible in the binary:
|
||||
|
||||
1. **Register save set grows** — 6→7→9 registers as the optimizer learned to use more registers effectively
|
||||
2. **MOVEQ adoption** — Lattice 3.x uses `MOVE.L #0, D0`; Lattice 4.x uses `MOVEQ #0, D0`; SAS/C always uses MOVEQ
|
||||
3. **Library call density** — Lattice 3.x loads A6 before every single library call; SAS/C may reuse A6 across calls
|
||||
4. **Stack frame size** — Lattice 3.x often allocates oversized frames (locals * sizeof(LONG) rounded up to nice boundary)
|
||||
|
||||
---
|
||||
|
||||
## Historical Context
|
||||
|
||||
Lattice, Inc. was an early cross-platform compiler vendor. Their C compiler for the Amiga was the first commercially viable option, shipping in 1985. Commodore itself used Lattice C for some system development before adopting SAS/C.
|
||||
|
||||
Key timeline:
|
||||
- **1985**: Lattice C 3.0 — first commercial Amiga C compiler
|
||||
- **1986**: Lattice C 3.1 — improved optimizer, bug fixes
|
||||
- **1987**: Lattice C 4.0 — major update, AmigaOS 1.2 support
|
||||
- **1988**: SAS Institute acquires Lattice C product line
|
||||
- **1989**: Rebranded as SAS/C 5.0
|
||||
|
||||
Any binary from 1985–1989 is likely Lattice C. After 1989, the brand transitioned to SAS/C, though Lattice C was still sold through existing channels for a time.
|
||||
|
||||
Software likely compiled with Lattice C:
|
||||
- Commodore's early Amiga utilities (1985–1986)
|
||||
- Early third-party tools like `DiskMon`, `CLImate`, `Memacs`
|
||||
- Amiga 1000 launch-era software
|
||||
- Early versions of `ARP` (AmigaDOS Replacement Project) components
|
||||
- Early `WShell` / `ZShell` versions
|
||||
|
||||
---
|
||||
|
||||
## Same C Function — Lattice C Output
|
||||
|
||||
```asm
|
||||
; CountWords() — Lattice C 4.x:
|
||||
; (Notably simpler than SAS/C — less aggressive optimizer)
|
||||
|
||||
_CountWords:
|
||||
LINK A5, #-$08
|
||||
MOVEM.L D2-D4/A2, -(SP) ; 4 regs saved (D2-D4 + A2)
|
||||
; Note: A2 saved even though it's not used — Lattice C saves a fixed set
|
||||
|
||||
MOVEQ #0, D2 ; D2 = count
|
||||
MOVEQ #0, D3 ; D3 = in_word
|
||||
|
||||
MOVEA.L $08(A5), A0 ; A0 = str
|
||||
|
||||
BRA .loop_test ; Lattice C uses BRA (long), not BRA.S
|
||||
|
||||
.loop_body:
|
||||
MOVE.L #' ', D0 ; MOVE.L for char constant (should be MOVEQ!)
|
||||
CMP.B (A0), D0
|
||||
BEQ .not_word ; BEQ (long), not BEQ.S
|
||||
|
||||
MOVE.L #'\t', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ .not_word
|
||||
|
||||
MOVE.L #'\n', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ .not_word
|
||||
|
||||
TST.B D3
|
||||
BNE .next_char
|
||||
|
||||
ADDQ.L #1, D2
|
||||
MOVEQ #1, D3
|
||||
BRA .next_char
|
||||
|
||||
.not_word:
|
||||
MOVEQ #0, D3
|
||||
|
||||
.next_char:
|
||||
ADDQ.L #1, A0
|
||||
|
||||
.loop_test:
|
||||
TST.B (A0)
|
||||
BNE .loop_body
|
||||
|
||||
MOVE.L D2, D0
|
||||
MOVEM.L (SP)+, D2-D4/A2
|
||||
UNLK A5
|
||||
RTS
|
||||
```
|
||||
|
||||
**Lattice C observations**:
|
||||
1. **`MOVE.L #' ', D0`** instead of `MOVEQ #' ', D0` — Lattice C doesn't always use MOVEQ for constants that fit in 8 bits. This wastes 2 bytes and 4 cycles per constant load.
|
||||
2. **`BRA`/`BEQ`/`BNE`** (long, 4-byte) instead of `BRA.S`/`BEQ.S`/`BNE.S` (short, 2-byte) — Lattice C's branch target distance calculation is conservative.
|
||||
3. **A2 saved but unused** — Lattice C saves a fixed register set rather than analyzing which registers are actually needed.
|
||||
|
||||
---
|
||||
|
||||
## Differences from SAS/C — Summary
|
||||
|
||||
```
|
||||
Lattice C 3.x/4.x → SAS/C 5.x/6.x improvements visible in disassembly:
|
||||
✓ MOVEQ substituted for MOVE.L #small_const
|
||||
✓ BRA.S/BEQ.S/BNE.S used where target is within 8-bit range
|
||||
✓ Dead register saves eliminated (per-function save analysis)
|
||||
✓ Common subexpression elimination (CSE) more aggressive
|
||||
✓ Loop induction variables kept in registers, not on stack
|
||||
✓ Struct copy inlined as MOVE.L (A0)+, (A1)+ for small structs
|
||||
✓ Tail-call optimization in some cases (rare but present in SAS/C 6.x)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [compiler_fingerprints.md](../../compiler_fingerprints.md) — Quick identification
|
||||
- *Lattice C 3.x/4.x Manual* (archive.org)
|
||||
- See also: [sasc.md](sasc.md) — SAS/C (direct successor)
|
||||
- See also: [aztec_c.md](aztec_c.md) — contemporary competitor
|
||||
1006
05_reversing/static/compilers/sasc.md
Normal file
1006
05_reversing/static/compilers/sasc.md
Normal file
File diff suppressed because it is too large
Load diff
321
05_reversing/static/compilers/stormc.md
Normal file
321
05_reversing/static/compilers/stormc.md
Normal file
|
|
@ -0,0 +1,321 @@
|
|||
[← Home](../../../README.md) · [Reverse Engineering](../../README.md) · [Static Analysis](../README.md) · [Compilers](README.md)
|
||||
|
||||
# StormC / StormC++ — Reverse Engineering Field Manual
|
||||
|
||||
## Overview
|
||||
|
||||
**StormC** (by Haage & Partner, 1996–2000) was a native Amiga IDE with integrated C and C++ compiler. It occupies a unique position in Amiga RE: it's the **only native Amiga compiler with full C++ support** (exceptions, RTTI, STL), yet its C++ ABI is **incompatible with GCC's C++ ABI** — StormC uses its own name mangling, vtable layout, and exception handling mechanism. For the RE practitioner, StormC binaries look like SAS/C at the C level (A5 frame pointer, absolute strings) but diverge significantly when C++ constructs appear.
|
||||
|
||||
Key constraints:
|
||||
- **A5 frame pointer** — StormC follows the SAS/C convention (`LINK A5, #-N`), making C-level code appear SAS/C-compatible.
|
||||
- **C++ ABI is unique** — StormC's name mangling, vtable layout, RTTI, and exception handling differ from both GCC and the Itanium C++ ABI. StormC++ libraries cannot link with GCC C++ code.
|
||||
- **Native IDE integration** — StormC embeds project metadata (source paths, build configs) in the binary via custom HUNK_DEBUG entries.
|
||||
- **PowerPC support (v3+)** — StormC 3.0+ could target PPC (WarpOS/PowerUP). PPC code sections use a different hunk type and appear as foreign code in 68k disassembly.
|
||||
- **Hunk names**: `CODE`, `DATA` (Amiga standard, SAS/C-compatible)
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Source (.c / .cpp)"
|
||||
SRC["C/C++ source"]
|
||||
end
|
||||
subgraph "StormC IDE"
|
||||
IDE["Project Manager"]
|
||||
EDITOR["GUI Editor"]
|
||||
COMPILER["StormC Compiler"]
|
||||
LINKER["StormLink"]
|
||||
end
|
||||
subgraph "Binary Output"
|
||||
HUNK["Amiga HUNK executable"]
|
||||
CODE["CODE hunk — 68k code"]
|
||||
PPC["PPC_CODE (optional, v3+)"]
|
||||
SYMBOL["HUNK_SYMBOL — StormC mangled names"]
|
||||
DEBUG["HUNK_DEBUG — project metadata + line info"]
|
||||
end
|
||||
|
||||
SRC --> IDE
|
||||
IDE --> COMPILER --> LINKER
|
||||
LINKER --> HUNK
|
||||
HUNK --> CODE & PPC
|
||||
HUNK --> SYMBOL & DEBUG
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Binary Identification
|
||||
|
||||
### C-Level Code (SAS/C-Compatible)
|
||||
|
||||
At the C level, StormC output is deliberately SAS/C-compatible:
|
||||
|
||||
```asm
|
||||
; StormC C function (looks identical to SAS/C):
|
||||
_my_c_function:
|
||||
LINK A5, #-$10 ; A5 frame pointer
|
||||
MOVEM.L D2-D7/A2-A4, -(SP) ; 9-reg save — same as SAS/C
|
||||
; ... function body ...
|
||||
MOVEM.L (SP)+, D2-D7/A2-A4
|
||||
UNLK A5
|
||||
RTS
|
||||
```
|
||||
|
||||
**How to distinguish from SAS/C**: Without symbols, C-level StormC code is nearly indistinguishable from SAS/C. Look for:
|
||||
1. **Project metadata in HUNK_DEBUG** — StormC embeds source file paths and project names
|
||||
2. **StormC-specific startup code** — different library open sequence
|
||||
3. **C++ markers** — if you see C++ constructs with non-GCC mangling, it's StormC
|
||||
|
||||
### C++ Level — Where StormC Diverges
|
||||
|
||||
StormC++ uses its own ABI:
|
||||
|
||||
```asm
|
||||
; StormC++ virtual method dispatch (different from GCC!):
|
||||
MOVEA.L obj_ptr(FP), A0 ; A0 = object pointer
|
||||
MOVE.L (A0), D0 ; D0 = vtable pointer (at offset +$00)
|
||||
MOVEA.L D0, A1
|
||||
JSR $XX(A1) ; call virtual method at vtable[XX]
|
||||
; No offset_to_top, no RTTI pointer before vtable!
|
||||
```
|
||||
|
||||
### Name Mangling — StormC vs GCC
|
||||
|
||||
| Construct | StormC++ Mangled | GCC 2.95.x Mangled |
|
||||
|---|---|---|
|
||||
| `Window::Draw()` | `Draw__6Window` | `Draw__6Window` — *can be identical for simple cases* |
|
||||
| `Window::SetPos(int,int)` | `SetPos__6WindowFii` | `SetPos__6Windowii` (no `F`) |
|
||||
| `operator new(unsigned long)` | `__nw__FUl` | `__nw__FUl` (may match) |
|
||||
| Constructor | `__ct__6Window` | `__6Window` (GCC uses different prefix) |
|
||||
| Destructor | `__dt__6Window` | `__6Window` (GCC encodes in vtable entry type) |
|
||||
|
||||
**Key disambiguator**: StormC prepends `__ct__` and `__dt__` to constructor/destructor names. GCC encodes the constructor/destructor type in the vtable offset, not the name.
|
||||
|
||||
### Vtable Layout Differences
|
||||
|
||||
```
|
||||
GCC 2.95.x vtable layout: StormC++ vtable layout:
|
||||
┌──────────────────────┐ ┌──────────────────────┐
|
||||
│ offset_to_top = 0 │ vtable[-2] │ (no offset_to_top) │
|
||||
├──────────────────────┤ ├──────────────────────┤
|
||||
│ RTTI pointer │ vtable[-1] │ (RTTI pointer or 0) │
|
||||
├──────────────────────┤ ← vptr ├──────────────────────┤ ← vptr
|
||||
│ virtual destructor │ vtable[0] │ first virtual method │ vtable[0]
|
||||
├──────────────────────┤ ├──────────────────────┤
|
||||
│ virtual method 1 │ vtable[1] │ second virtual meth │ vtable[1]
|
||||
├──────────────────────┤ ├──────────────────────┤
|
||||
│ ... │ │ ... │
|
||||
└──────────────────────┘ └──────────────────────┘
|
||||
```
|
||||
|
||||
> [!WARNING]
|
||||
> StormC++ vtables start at the first virtual function. There is no `offset_to_top` field at `vtable[-2]`. If your struct layout assumes the GCC layout, all vtable offsets will be wrong by 2 entries.
|
||||
|
||||
---
|
||||
|
||||
## Library Call Patterns
|
||||
|
||||
StormC uses SAS/C-compatible library calls:
|
||||
|
||||
```asm
|
||||
MOVEA.L _DOSBase, A6 ; load from global
|
||||
MOVE.L filename, D1
|
||||
MOVE.L #MODE_OLDFILE, D2
|
||||
JSR -$1E(A6) ; Open()
|
||||
```
|
||||
|
||||
The difference is in **how** `_DOSBase` is initialized — StormC's startup code may use different symbol naming or library open order.
|
||||
|
||||
---
|
||||
|
||||
## C++ Exception Handling
|
||||
|
||||
StormC 3.0+ supports C++ exceptions with a custom unwinding mechanism:
|
||||
|
||||
```asm
|
||||
; Exception handling setup (simplified):
|
||||
; StormC registers an exception handler frame on the stack:
|
||||
PEA .exception_handler ; handler address
|
||||
MOVE.L ___current_exception_frame, -(SP)
|
||||
MOVE.L SP, ___current_exception_frame
|
||||
|
||||
; ... try block code ...
|
||||
|
||||
; Cleanup on normal exit:
|
||||
MOVE.L (SP)+, ___current_exception_frame
|
||||
ADDQ.L #4, SP ; discard handler
|
||||
|
||||
.exception_handler:
|
||||
; Exception recovery code
|
||||
```
|
||||
|
||||
This is structurally different from GCC's exception handling (which uses DWARF2 unwinding tables or setjmp/longjmp). In the binary, look for a global `___current_exception_frame` variable being pushed/popped in functions with try/catch blocks.
|
||||
|
||||
---
|
||||
|
||||
## Startup Code
|
||||
|
||||
StormC's startup differs from SAS/C `c.o`:
|
||||
|
||||
```asm
|
||||
; StormC startup (typical pattern):
|
||||
_start:
|
||||
MOVEA.L 4.W, A6 ; SysBase
|
||||
MOVE.L A6, ___SysBase
|
||||
|
||||
; StormC may use different library open order:
|
||||
JSR ___OpenStormCLibs ; open DOS, Intuition, etc.
|
||||
|
||||
; C++ static constructors (if C++ code present):
|
||||
JSR ___init_cpp ; calls __ct__ functions
|
||||
|
||||
; Call main()
|
||||
BSR _main
|
||||
|
||||
; C++ static destructors:
|
||||
JSR ___exit_cpp ; calls __dt__ functions
|
||||
|
||||
; Cleanup
|
||||
JSR ___CloseStormCLibs
|
||||
MOVE.L D0, ___ReturnCode
|
||||
RTS
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Same C Function — StormC Output
|
||||
|
||||
```asm
|
||||
; CountWords() — StormC 4.0, C mode, -O2:
|
||||
; (Structurally identical to SAS/C — StormC's C codegen mirrors SAS/C)
|
||||
|
||||
_CountWords:
|
||||
LINK A5, #-$08
|
||||
MOVEM.L D2-D3, -(SP)
|
||||
|
||||
MOVEQ #0, D2 ; count
|
||||
MOVEQ #0, D3 ; in_word
|
||||
|
||||
MOVEA.L $08(A5), A0 ; str (arg1 at A5+8)
|
||||
|
||||
BRA.S .loop_test
|
||||
|
||||
.loop_body:
|
||||
MOVEQ #' ', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ.S .not_word
|
||||
MOVEQ #'\t', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ.S .not_word
|
||||
MOVEQ #'\n', D0
|
||||
CMP.B (A0), D0
|
||||
BEQ.S .not_word
|
||||
|
||||
TST.B D3
|
||||
BNE.S .next_char
|
||||
ADDQ.L #1, D2
|
||||
MOVEQ #1, D3
|
||||
BRA.S .next_char
|
||||
|
||||
.not_word:
|
||||
MOVEQ #0, D3
|
||||
|
||||
.next_char:
|
||||
ADDQ.L #1, A0
|
||||
|
||||
.loop_test:
|
||||
TST.B (A0)
|
||||
BNE.S .loop_body
|
||||
|
||||
MOVE.L D2, D0
|
||||
MOVEM.L (SP)+, D2-D3
|
||||
UNLK A5
|
||||
RTS
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Named Antipatterns
|
||||
|
||||
### "The GCC-C++ Assumption" — Using GCC Vtable Layout on StormC++
|
||||
|
||||
Applying GCC vtable offsets to StormC++ binaries will misidentify every virtual method by 2 slots and miss `offset_to_top`. Always determine the C++ compiler BEFORE applying vtable layout assumptions.
|
||||
|
||||
### "The StormC-C++ Silence" — Missing C++ in What Looks Like C
|
||||
|
||||
StormC C code looks identical to SAS/C. But if the binary was compiled with StormC++ (C++ mode), global constructors run before `main()`, exceptions unwind, and objects have vtables — all invisible at the C codegen level. Check `HUNK_SYMBOL` for `__ct__` and `__dt__` prefixes.
|
||||
|
||||
---
|
||||
|
||||
## Pitfalls & Common Mistakes
|
||||
|
||||
### 1. Linking StormC++ Objects with GCC Code
|
||||
|
||||
StormC++ and GCC C++ share NO ABI compatibility. Name mangling, vtable layout, RTTI, and exception handling all differ. If you're patching a binary and need to add C++ code, you must use the same compiler that produced the original.
|
||||
|
||||
### 2. PowerPC Code Sections (StormC 3+)
|
||||
|
||||
```asm
|
||||
; In the HUNK structure, PPC code appears as a separate hunk type:
|
||||
; If your disassembler only handles HUNK_CODE ($03E9), PPC sections
|
||||
; will appear as unknown hunk types. StormC PPC sections use custom
|
||||
; hunk types for WarpOS/PowerUP code.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Software Known to Be StormC-Compiled
|
||||
|
||||
| Application | Version | Notes |
|
||||
|---|---|---|
|
||||
| **AmigaWriter** | StormC 3/4 | Word processor with C++ document model |
|
||||
| **Various MUI applications** | StormC 3+ | MUI class wizard generated C++ classes |
|
||||
| **WarpOS/PowerUP software** | StormC 3+ | Mixed 68k/PPC binaries — check for PPC hunk sections |
|
||||
| **Late-era Amiga games** | StormC 3/4 | C++ game engines with 68k-optimized inner loops |
|
||||
|
||||
---
|
||||
|
||||
## Historical Context
|
||||
|
||||
StormC arrived at a pivotal moment: the Amiga market had shrunk, SAS/C was abandoned after 6.58, and developers wanted a modern IDE. Haage & Partner (known for AmigaOS 3.5/3.9) positioned StormC as the future of native Amiga development. It offered features no other native compiler had: a GUI debugger, C++ with exceptions, PowerPC support, and integrated MUI class generation.
|
||||
|
||||
However, the PowerPC era fragmented quickly (WarpOS vs PowerUP), the Amiga market collapsed, and Haage & Partner ceased operations. StormC 4.0 was the last release. Today, GCC (cross-compilation) and VBCC dominate, but StormC binaries remain in the wild — particularly late-1998 to 2000 era C++ applications.
|
||||
|
||||
---
|
||||
|
||||
## Modern Analogies
|
||||
|
||||
| StormC Concept | Modern Equivalent |
|
||||
|---|---|
|
||||
| Native IDE with built-in compiler | Xcode with Clang, Visual Studio with MSVC |
|
||||
| Proprietary C++ ABI | MSVC's C++ ABI (incompatible with Itanium/GCC ABI) |
|
||||
| Mixed 68k/PPC binaries | Universal Binaries (Intel + ARM) on macOS |
|
||||
| MUI class generation wizard | Qt Creator's class wizard, Visual Studio's MFC wizard |
|
||||
|
||||
---
|
||||
|
||||
## FPGA / Emulation Impact
|
||||
|
||||
- **PowerPC sections**: If the binary contains PPC hunk sections (StormC 3+), a 68k-only FPGA core cannot execute them — a PowerPC emulation layer (like WarpOS emulation in WinUAE) is required.
|
||||
- **C++ exception handling**: StormC's custom exception mechanism uses a linked list of exception frames on the stack — the 68000 core must support `MOVE.L SP, An` correctly (standard ISA support, no issues).
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
**Q: How do I tell StormC from SAS/C if both use LINK A5?**
|
||||
A: Check `HUNK_SYMBOL` — SAS/C uses `_name` with `=APS` stabs; StormC uses `__ct__`/`__dt__` prefixes for C++. Check `HUNK_DEBUG` for project metadata strings (StormC embeds source paths). Check startup code — StormC's `___OpenStormCLibs` vs SAS/C's `_OpenLibraries`.
|
||||
|
||||
**Q: Can I link StormC objects with SAS/C objects?**
|
||||
A: For C-only code, possibly yes if the calling conventions match. For C++ code, absolutely not — the ABIs are incompatible.
|
||||
|
||||
**Q: Does StormC support `__saveds`?**
|
||||
A: Yes — StormC supports SAS/C calling convention keywords for compatibility: `__saveds`, `__stdargs`, `__reg`, `__interrupt`.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [13_toolchain/stormc.md](../../../13_toolchain/stormc.md) — StormC usage and features
|
||||
- [compiler_fingerprints.md](../../compiler_fingerprints.md) — Quick identification
|
||||
- [cpp_vtables_reversing.md](../cpp_vtables_reversing.md) — C++ vtable layouts (GCC focus — StormC differences noted)
|
||||
- See also: [sasc.md](sasc.md), [gcc.md](gcc.md) — compare with other compilers
|
||||
327
05_reversing/static/compilers/vbcc.md
Normal file
327
05_reversing/static/compilers/vbcc.md
Normal file
|
|
@ -0,0 +1,327 @@
|
|||
[← Home](../../../README.md) · [Reverse Engineering](../../README.md) · [Static Analysis](../README.md) · [Compilers](README.md)
|
||||
|
||||
# VBCC — Reverse Engineering Field Manual
|
||||
|
||||
## Overview
|
||||
|
||||
**VBCC** (Volker Barthelmann's C Compiler) is a portable, retargetable ISO C89 compiler that produces the smallest binaries among Amiga compilers. Its key RE characteristics are: **no frame pointer** (SP-relative access only), **per-function register saves** (only what's actually used), **PC-relative string addressing**, and a distinctive **`__reg()`** calling convention for AmigaOS library calls. VBCC generates clean, tight code that can look deceptively like hand-optimized assembly.
|
||||
|
||||
Key constraints:
|
||||
- **No LINK instruction** — VBCC never uses `LINK A5` or `LINK A6`. Locals are accessed via `$offset(SP)`. Function boundaries are defined by `MOVEM.L ... -(SP)` at entry and `RTS` at exit.
|
||||
- **Minimal register saves** — Unlike SAS/C (9 registers always) or GCC (per-function but often substantial), VBCC saves only the exact registers used. A leaf function with no locals has no prologue at all.
|
||||
- **Tail-call optimization** — VBCC uses `BRA.S` to common epilogue blocks and `BRA` to tail-call other functions more aggressively than any other Amiga compiler.
|
||||
- **`__MERGED` hunks** — VBCC sometimes merges CODE and DATA into a single hunk when the small data model is active.
|
||||
- **Hunk names**: `CODE`, `DATA`, `BSS` (+ optional `__MERGED` for small-data)
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Source (.c)"
|
||||
SRC["C source (C89)"]
|
||||
end
|
||||
subgraph "VBCC Compiler Pipeline"
|
||||
VC["vc (driver)"]
|
||||
VBCCM68K["vbccm68k (code generator)"]
|
||||
VASM["vasm (assembler)"]
|
||||
VLINK["vlink (linker)"]
|
||||
end
|
||||
subgraph "Binary Output"
|
||||
HUNK["Amiga HUNK executable"]
|
||||
CODE["CODE hunk"]
|
||||
DATA["DATA hunk"]
|
||||
MERGED["__MERGED (optional, small-data)"]
|
||||
end
|
||||
|
||||
SRC --> VC
|
||||
VC --> VBCCM68K --> VASM --> VLINK
|
||||
VLINK --> HUNK
|
||||
HUNK --> CODE & DATA
|
||||
HUNK --> MERGED
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Binary Identification — The VBCC Signature
|
||||
|
||||
### Function Prologue — Nothing or Minimal
|
||||
|
||||
```asm
|
||||
; VBCC leaf function (no locals, no calls):
|
||||
_simple_func:
|
||||
; NO prologue at all
|
||||
; ... function body ...
|
||||
RTS
|
||||
|
||||
; VBCC function with locals:
|
||||
_moderate_func:
|
||||
MOVEM.L D2-D3/A2, -(SP) ; saves ONLY the 3 registers used
|
||||
; ... function body ...
|
||||
MOVEM.L (SP)+, D2-D3/A2
|
||||
RTS
|
||||
|
||||
; VBCC large function:
|
||||
_large_func:
|
||||
MOVEM.L D2-D5/A2-A3, -(SP) ; per-function exact save
|
||||
LEA -$80(SP), SP ; allocate stack frame
|
||||
; ... function body ...
|
||||
LEA $80(SP), SP
|
||||
MOVEM.L (SP)+, D2-D5/A2-A3
|
||||
RTS
|
||||
```
|
||||
|
||||
**Key differentiator from GCC**: Both VBCC and GCC use per-function register saves, but VBCC's code is consistently tighter. VBCC uses `BRA.S label` to share common epilogue/cleanup code, where GCC duplicates it. VBCC uses `MOVEQ` and `ADDQ` aggressively for small constants.
|
||||
|
||||
### String Addressing
|
||||
|
||||
Like GCC, VBCC uses PC-relative string addressing:
|
||||
|
||||
```asm
|
||||
LEA .str_hello(PC), A0
|
||||
JSR _Printf
|
||||
|
||||
.str_hello: DC.B "Hello", $0A, 00
|
||||
```
|
||||
|
||||
### The `__reg()` Calling Convention — Unique VBCC Fingerprint
|
||||
|
||||
VBCC's `__reg()` keyword places C variables in named CPU registers without inline assembly:
|
||||
|
||||
```c
|
||||
/* VBCC source: */
|
||||
BPTR __reg("d0") MyOpen(__reg("d1") CONST_STRPTR name,
|
||||
__reg("d2") LONG accessMode);
|
||||
```
|
||||
|
||||
```asm
|
||||
; Generated code for Open("foo", MODE_OLDFILE):
|
||||
MOVEA.L _DOSBase, A6
|
||||
LEA .str_foo(PC), A0
|
||||
MOVE.L A0, D1 ; name → D1
|
||||
MOVEQ #1002, D2 ; MODE_OLDFILE → D2
|
||||
JSR -$1E(A6) ; Open() LVO
|
||||
```
|
||||
|
||||
**No other Amiga compiler generates this exact register-to-argument mapping without inline assembly stubs.** The `__reg()` assignments are visible only through the register usage pattern — functions that take args in specific registers (D1, D2, D3, etc.) without stack access.
|
||||
|
||||
---
|
||||
|
||||
## Library Call Patterns
|
||||
|
||||
VBCC library calls are compact and direct:
|
||||
|
||||
```asm
|
||||
; VBCC library call — minimal code:
|
||||
MOVEA.L (_DOSBase).L, A6 ; load library base (absolute with relocation)
|
||||
MOVE.L fh(SP), D1 ; arg from stack
|
||||
MOVE.L buf(SP), D2
|
||||
MOVE.L len(SP), D3
|
||||
JSR -$2A(A6) ; Read()
|
||||
|
||||
; Return value check:
|
||||
TST.L D0
|
||||
BMI.S .error
|
||||
```
|
||||
|
||||
VBCC differs from SAS/C here: SAS/C would load args through A5-relative offsets (`$08(A5)`). VBCC uses SP-relative offsets. Since SP may change within the function (pushing args), VBCC carefully maintains SP offsets.
|
||||
|
||||
### `#pragma amicall` — VBCC Library Call Pragmas
|
||||
|
||||
```c
|
||||
#pragma amicall(DOSBase, 0x1E, Open(d1, d2))
|
||||
// VBCC pragma format is simpler than SAS/C:
|
||||
// - Library base name (identifier, not a string)
|
||||
// - LVO in hex
|
||||
// - Function name with argument register list
|
||||
```
|
||||
|
||||
In the binary, these pragmas produce the same `JSR -$XXX(A6)` patterns as any other compiler — the pragma just controls argument register assignment.
|
||||
|
||||
---
|
||||
|
||||
## Optimization Patterns
|
||||
|
||||
VBCC prioritizes **code density** over raw speed. Its signatures:
|
||||
|
||||
| Pattern | VBCC Style | SAS/C Equivalent |
|
||||
|---|---|---|
|
||||
| **Shared epilogue** | `BRA.S .epilogue` from multiple exit points | Duplicated epilogue at each return |
|
||||
| **Tail calls** | `BRA _other_func` (discard own frame first) | `JSR _other_func` / `RTS` |
|
||||
| **Small constant loading** | `MOVEQ #N, Dn` whenever possible | `MOVE.L #N, Dn` for some small values |
|
||||
| **Stack frame** | `LEA -$N(SP), SP` (when frame > 32K or variable) | `LINK A5, #-N` |
|
||||
| **Loop termination** | `DBRA Dn, loop` (when counter fits in 16 bits) | `SUBQ.L #1, Dn` / `BNE loop` |
|
||||
|
||||
### Cross-Module Optimization
|
||||
|
||||
VBCC supports cross-module optimization — when linking, `vlink` can reorder and merge functions across `.o` files. In the binary, this means function layout may NOT match source file order, and small static functions may be inlined at link time.
|
||||
|
||||
---
|
||||
|
||||
## Same C Function — VBCC Output
|
||||
|
||||
```asm
|
||||
; CountWords() — VBCC, -O -speed:
|
||||
; C prototype: ULONG CountWords(CONST_STRPTR str)
|
||||
|
||||
_CountWords:
|
||||
MOVEM.L D2-D3, -(SP) ; only D2, D3 needed
|
||||
|
||||
MOVEQ #0, D2 ; D2 = count
|
||||
MOVEQ #0, D3 ; D3 = in_word
|
||||
|
||||
MOVEA.L $0C(SP), A0 ; A0 = str (arg at SP + 12)
|
||||
|
||||
BRA.S .loop_test
|
||||
|
||||
.loop_body:
|
||||
CMPI.B #' ', (A0) ; *str == ' '?
|
||||
BEQ.S .not_word
|
||||
CMPI.B #'\t', (A0)
|
||||
BEQ.S .not_word
|
||||
CMPI.B #'\n', (A0)
|
||||
BEQ.S .not_word
|
||||
|
||||
TST.B D3
|
||||
BNE.S .next_char
|
||||
|
||||
ADDQ.L #1, D2 ; count++
|
||||
MOVEQ #1, D3 ; in_word = TRUE
|
||||
BRA.S .next_char
|
||||
|
||||
.not_word:
|
||||
MOVEQ #0, D3 ; in_word = FALSE
|
||||
|
||||
.next_char:
|
||||
ADDQ.L #1, A0 ; str++
|
||||
|
||||
.loop_test:
|
||||
TST.B (A0)
|
||||
BNE.S .loop_body
|
||||
|
||||
MOVE.L D2, D0 ; return count
|
||||
MOVEM.L (SP)+, D2-D3
|
||||
RTS
|
||||
```
|
||||
|
||||
**VBCC-specific observations**:
|
||||
1. **`MOVEM.L D2-D3, -(SP)`** — only 2 registers saved. Minimal.
|
||||
2. **`BRA.S .loop_test`** — unconditional branch to loop condition at top.
|
||||
3. **`BRA.S .next_char`** — shared increment code reached from two paths.
|
||||
4. **Identical to GCC** in this function because the function is simple enough that optimization differences don't show. For more complex functions (with multiple return paths, struct access, switch statements), VBCC's shared-epilogue and tail-call patterns emerge.
|
||||
|
||||
```
|
||||
Cross-Compiler Comparison (CountWords, bytes of code):
|
||||
SAS/C -O2: ~52 bytes (LINK A5 + 9-reg save + epilogue overhead)
|
||||
GCC -O2: ~48 bytes (no LINK, minimal save, CMPI.B)
|
||||
VBCC -speed:~46 bytes (no LINK, minimal save, aggressive BRA sharing)
|
||||
DICE C: ~48 bytes (similar to VBCC)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Named Antipatterns
|
||||
|
||||
### "The Missing Frame Trap" — Assuming LINK for Function Boundaries
|
||||
|
||||
```asm
|
||||
; VBCC function boundaries are RTS-delimited, not LINK-delimited.
|
||||
; If your IDA script searches for LINK to find functions, you'll miss ALL VBCC functions.
|
||||
|
||||
; VBCC function entry could be any of:
|
||||
; 1. MOVEM.L ..., -(SP) (most common)
|
||||
; 2. LEA -$XX(SP), SP (large frame)
|
||||
; 3. First instruction after previous RTS (leaf functions)
|
||||
; 4. TST.L D0 / BEQ ... (function that doesn't save any regs)
|
||||
```
|
||||
|
||||
### "The Register Ghost" — `__reg()` Without Symbols
|
||||
|
||||
Without source-level `__reg()` declarations, VBCC function arguments appear to use arbitrary register assignments. This can look like a custom ABI. The pattern is actually the VBCC `__reg()` convention encoded via `<proto/*.h>` headers during compilation.
|
||||
|
||||
---
|
||||
|
||||
## Pitfalls & Common Mistakes
|
||||
|
||||
### 1. Confusing VBCC and GCC Output
|
||||
|
||||
Both omit frame pointers and use per-function saves. Disambiguate by:
|
||||
- **Hunk names**: VBCC uses `CODE`/`DATA`; GCC uses `.text`/`.data` (usually)
|
||||
- **`__MERGED` hunk**: VBCC-specific — no other compiler produces this
|
||||
- **Function naming**: VBCC emits names like `_funcname`; GCC emits `.Lxxx` internal labels
|
||||
- **BRA density**: VBCC has more `BRA.S` instructions (shared epilogues); GCC tends to duplicate code
|
||||
|
||||
### 2. Misreading SP-Relative Offsets
|
||||
|
||||
```asm
|
||||
; At function entry (after MOVEM.L D2-D3, -(SP)):
|
||||
; SP points 8 bytes below entry SP (D2 and D3 pushed)
|
||||
; Arg1 is at $0C(SP) (8 bytes regs + 4 bytes return addr)
|
||||
; But after LEA -$10(SP), SP:
|
||||
; Arg1 is now at $1C(SP) (8 regs + 4 ret + 16 locals)
|
||||
; The offset CHANGES when SP is modified — unlike A5-relative offsets
|
||||
```
|
||||
|
||||
Track every `LEA +/-$N(SP), SP` instruction — each one shifts ALL subsequent SP-relative offsets.
|
||||
|
||||
---
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Software Known to Be VBCC-Compiled
|
||||
|
||||
| Application | Notes |
|
||||
|---|---|
|
||||
| **ScummVM (some ports)** | Large C codebase; VBCC's strict C89 catches portability issues |
|
||||
| **Modern Amiga utilities** | Many 2000s+ CLI tools use VBCC for small binary size |
|
||||
| **AROS system components** | VBCC is a supported AROS build compiler |
|
||||
| **MUI 5 custom classes** | Tight BOOPSI dispatch benefits from VBCC's register allocation |
|
||||
| **AmigaOS 4 system libraries** | Hyperion's SDK supports VBCC for OS4 development |
|
||||
|
||||
---
|
||||
|
||||
## Historical Context
|
||||
|
||||
VBCC was created by Volker Barthelmann in the mid-1990s as a lightweight alternative to GCC's growing complexity. While GCC was the "heavy" compiler with C++ support, VBCC targeted developers who wanted a fast, standards-compliant C89 compiler that produced small binaries.
|
||||
|
||||
Unlike SAS/C (commercial, dead since 1996) and GCC (open source but complex), VBCC occupies a unique niche: actively maintained, free for personal use, with a clean codebase. Its `vlink` linker and `vasm` assembler companion tools form a complete toolchain that has become the de facto standard for modern Amiga development alongside GCC bebbo.
|
||||
|
||||
---
|
||||
|
||||
## Modern Analogies
|
||||
|
||||
| VBCC Concept | Modern Equivalent |
|
||||
|---|---|
|
||||
| `__reg()` | `register ... asm("d0")` in GCC/Clang (GNU C extension) |
|
||||
| Per-function register save | Clang's `-O2` with aggressive register allocation |
|
||||
| Cross-module optimization | LTO (Link-Time Optimization) in modern compilers |
|
||||
| `vlink` with `vasm` | LLVM's integrated `lld` linker with `clang` |
|
||||
| Config-driven target system | LLVM's `TargetRegistry` and target description files |
|
||||
|
||||
---
|
||||
|
||||
## FPGA / Emulation Impact
|
||||
|
||||
- **No `LINK`/`UNLK`**: VBCC binaries don't use these instructions, reducing test coverage needs for frame pointer ops on FPGA cores.
|
||||
- **Aggressive `LEA` for stack frames**: `LEA -$N(SP), SP` must correctly update SP in a single instruction — verify your FPGA core handles LEA with SP destination correctly.
|
||||
- **Cross-module optimization**: No runtime impact; all inlining and merging happens at link time.
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
**Q: How do I distinguish VBCC from GCC output?**
|
||||
A: Check hunk names — VBCC uses `CODE`/`DATA`, GCC typically uses `.text`/`.data`. Check for `__MERGED` hunk (VBCC-only). Check internal labels: VBCC uses `_name` format; GCC uses `.Lxxx`. Check BRA density — VBCC shares epilogues more aggressively.
|
||||
|
||||
**Q: Does VBCC support C++?**
|
||||
A: No. If you find C++ constructs (vtables, `new`/`delete`, name mangling), it's NOT VBCC.
|
||||
|
||||
**Q: Can VBCC and GCC object files be mixed?**
|
||||
A: No. They use different calling conventions for internal runtime functions. Link the entire project with one compiler. Assembly (`vasm`) can be mixed with VBCC C code using `vlink`.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [13_toolchain/vbcc.md](../../../13_toolchain/vbcc.md) — VBCC usage and `__reg()` details
|
||||
- [compiler_fingerprints.md](../../compiler_fingerprints.md) — Quick identification
|
||||
- [13_toolchain/vasm_vlink.md](../../../13_toolchain/vasm_vlink.md) — vasm/vlink toolchain
|
||||
- VBCC homepage: http://sun.hasenbraten.de/vbcc/
|
||||
- See also: [sasc.md](sasc.md), [gcc.md](gcc.md) — compare with other compilers
|
||||
745
05_reversing/static/cpp_vtables_reversing.md
Normal file
745
05_reversing/static/cpp_vtables_reversing.md
Normal file
|
|
@ -0,0 +1,745 @@
|
|||
[← Home](../../README.md) · [Reverse Engineering](../README.md)
|
||||
|
||||
# C++ Reverse Engineering — Vtables, Inheritance, and OOP Reconstruction
|
||||
|
||||
## Overview
|
||||
|
||||
C++ on the Amiga — primarily via StormC, GCC 2.95.x, and SAS/C with limited C++ support — compiles object-oriented constructs into predictable patterns in the binary. Virtual method dispatch goes through **vtables** (arrays of function pointers at fixed offsets from the object pointer), constructors chain through inheritance hierarchies, and name mangling encodes the full class-qualified signature into linker symbols. Reversing C++ binaries means reconstructing the **class hierarchy** from these artifacts — recovering which methods are virtual, how many levels of inheritance exist, and where each class appears in the vtable dispatch graph.
|
||||
|
||||
Unlike modern platforms with rich RTTI and exception metadata, Amiga C++ binaries are typically **stripped and lean**. RTTI is often disabled (`-fno-rtti`), and exception support is minimal. The vtable is your primary reconstruction tool — it encodes the entire polymorphic structure of the program.
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "C++ Source"
|
||||
CLASS["class Window : public Gadget {<br/> virtual void Draw();<br/> virtual void HandleEvent();<br/>};"]
|
||||
end
|
||||
subgraph "Binary Layout"
|
||||
OBJ["Window object<br/>───────────<br/>+00: _vptr → vtable"]
|
||||
VTABLE["Window vtable<br/>───────────<br/>+00: ~Window()<br/>+04: Draw()<br/>+08: HandleEvent()<br/>+0C: RTTI ptr → typeinfo"]
|
||||
RTTI["RTTI<br/>───────────<br/>type_info struct<br/>mangled name<br/>base class list"]
|
||||
end
|
||||
OBJ -->|"+$00"| VTABLE
|
||||
VTABLE -->|"+$0C (g++), -$04 (StormC)"| RTTI
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architecture: C++ to m68k Mapping
|
||||
|
||||
### Vtable Layout — Complete (GCC 2.95.x on m68k)
|
||||
|
||||
On the Amiga, GCC 2.95.x follows the Itanium C++ ABI concepts adapted for 32-bit m68k. The vtable pointer (`_vptr`) resides at object offset `+$00` and points to the **first virtual function** entry, not the start of the vtable itself.
|
||||
|
||||
```
|
||||
Full GCC Vtable Layout (m68k, 32-bit, all entries 4 bytes):
|
||||
┌─────────────────────┐ ← vtable_start (symbol address, e.g. _ZTV6Window)
|
||||
│ offset_to_top = 0 │ vtable[-2] — always 0 for most-derived class
|
||||
├─────────────────────┤
|
||||
│ RTTI pointer │ vtable[-1] — _ZTI6Window (type_info for Window)
|
||||
├─────────────────────┤ ← _vptr points here (vtable_start + 8)
|
||||
│ Window::~Window() │ vtable[0] — virtual destructor (D1 complete)
|
||||
├─────────────────────┤
|
||||
│ Window::~Window() │ vtable[1] — virtual destructor (D0 deleting)
|
||||
├─────────────────────┤
|
||||
│ Window::Draw() │ vtable[2] — first user virtual method
|
||||
├─────────────────────┤
|
||||
│ Window::HandleEvt() │ vtable[3] — second user virtual method
|
||||
├─────────────────────┤
|
||||
│ ... │ vtable[n] — more virtual methods
|
||||
└─────────────────────┘
|
||||
```
|
||||
|
||||
| Vtable Offset (from vptr) | Vtable Offset (from start) | Contents | Notes |
|
||||
|---|---|---|---|
|
||||
| `-8` (vptr − 2) | `+$00` | `offset_to_top` | Always 0 for most-derived; non-zero in multiple inheritance for non-primary bases |
|
||||
| `-4` (vptr − 1) | `+$04` | RTTI pointer (`type_info*`) | Points to `type_info` struct with mangled class name; NULL if `-fno-rtti` |
|
||||
| `+0` (vptr + 0) | `+$08` | Destructor variant 1 | In-charge non-deleting destructor (D1); cleans up object, does NOT call FreeMem |
|
||||
| `+4` (vptr + 1) | `+$0C` | Destructor variant 2 | In-charge deleting destructor (D0); cleans up AND calls operator delete |
|
||||
| `+8` (vptr + 2) | `+$10` | First user virtual method | Declaration order in the class body |
|
||||
| `+12` (vptr + 3) | `+$14` | Second user virtual method | ...continues for all declared virtuals |
|
||||
|
||||
### Virtual Method Dispatch
|
||||
|
||||
```asm
|
||||
; In C++: obj->Draw()
|
||||
; Becomes:
|
||||
MOVE.L obj_ptr(FP), A0 ; load object pointer
|
||||
MOVE.L (A0), A1 ; dereference vtable pointer (at offset +00)
|
||||
MOVE.L $04(A1), A0 ; load Draw() from vtable[1]
|
||||
JSR (A0) ; call via function pointer
|
||||
```
|
||||
|
||||
The signature pattern: **`MOVE.L (A0), An` followed by `MOVE.L $offset(An), target` then `JSR (target)`** — this is the C++ vtable dispatch fingerprint.
|
||||
|
||||
### Constructor Pattern — Full Lifecycle
|
||||
|
||||
```asm
|
||||
; C++: new Window()
|
||||
; Generates:
|
||||
; 1. Allocate memory (operator new → AllocMem)
|
||||
; 2. Call base class constructor (Gadget::Gadget)
|
||||
; 3. Store vtable pointer at object+$00
|
||||
; 4. Initialize Window-specific members
|
||||
; 5. Return object pointer in D0
|
||||
|
||||
MOVE.L #sizeof_Window, D0
|
||||
MOVE.L #MEMF_CLEAR, D1
|
||||
JSR -$C6(A6) ; AllocMem
|
||||
MOVE.L D0, A2 ; save object ptr
|
||||
; Call base constructor (Gadget::Gadget):
|
||||
MOVE.L A2, -(SP)
|
||||
JSR _Gadget_ctor ; calls SUPER::ctor
|
||||
ADDQ.L #4, SP
|
||||
; Install vtable:
|
||||
LEA _Window_vtable, A0
|
||||
MOVE.L A0, (A2) ; _vptr = &vtable
|
||||
; Initialize Window members:
|
||||
MOVE.W #$00FF, $14(A2) ; this->width = 255
|
||||
; Return this:
|
||||
MOVE.L A2, D0
|
||||
RTS
|
||||
```
|
||||
|
||||
### Destructor Pattern — Multiple Variants
|
||||
|
||||
GCC generates up to **three distinct destructor functions** per class. Understanding which is which is critical for vtable reconstruction:
|
||||
|
||||
| Variant | GCC Suffix | Purpose | Vtable[0] or [1]? | Contains FreeMem call? |
|
||||
|---|---|---|---|---|
|
||||
| **D2** (not-in-charge) | `~ClassName` (base variant) | Destroys this subobject only; called by derived class destructors | Neither — called directly by derived dtors | No |
|
||||
| **D1** (in-charge, non-deleting) | `~ClassName` (complete) | Destroys full object; does NOT free memory | **vtable[0]** | No |
|
||||
| **D0** (in-charge, deleting) | `~ClassName` (deleting) | Destroys full object AND calls operator delete | **vtable[1]** | Yes (`JSR operator delete`) |
|
||||
|
||||
```asm
|
||||
; D2 — Not-in-charge destructor (base subobject cleanup):
|
||||
__6Window_D2: ; no vtable entry points here directly
|
||||
LINK A6, #0
|
||||
; Destroy Window-specific members
|
||||
; CALL base class D2 destructor
|
||||
JSR __6Gadget_D2
|
||||
UNLK A6
|
||||
RTS
|
||||
|
||||
; D1 — In-charge non-deleting (cleans up, no FreeMem):
|
||||
__6Window_D1: ; vtable[0] = this function
|
||||
LINK A6, #0
|
||||
; Store vtable pointer (restore to most-derived)
|
||||
LEA _ZTV6Window, A0
|
||||
MOVE.L A0, (A2) ; _vptr = &Window_vtable
|
||||
; Destroy Window-specific members
|
||||
JSR __6Gadget_D2 ; call base D2
|
||||
UNLK A6
|
||||
RTS ; NO FreeMem call!
|
||||
|
||||
; D0 — In-charge deleting (cleans up AND frees memory):
|
||||
__6Window_D0: ; vtable[1] = this function
|
||||
LINK A6, #0
|
||||
BSR __6Window_D1 ; call D1 to do the cleanup
|
||||
; Now free the memory:
|
||||
MOVE.L A2, -(SP)
|
||||
JSR operator_delete ; calls FreeVec/FreeMem
|
||||
ADDQ.L #4, SP
|
||||
UNLK A6
|
||||
RTS
|
||||
```
|
||||
|
||||
### Inheritance Hierarchy in the Binary
|
||||
|
||||
#### Single Inheritance
|
||||
|
||||
```
|
||||
Gadget object:
|
||||
+00: _vptr → Gadget_vtable
|
||||
+04: gadget_member_1
|
||||
+08: gadget_member_2
|
||||
|
||||
Window object (extends Gadget):
|
||||
+00: _vptr → Window_vtable ← overwrites Gadget's vptr
|
||||
+04: gadget_member_1 ← inherited
|
||||
+08: gadget_member_2 ← inherited
|
||||
+0C: window_member_1 ← new in Window
|
||||
+10: window_member_2 ← new in Window
|
||||
```
|
||||
|
||||
#### Multiple Inheritance
|
||||
|
||||
<!-- TODO: Expand — full diagram of multiple base classes, multiple _vptr fields, this-adjustment thunks for each base -->
|
||||
|
||||
```
|
||||
Window object (extends Gadget AND Drawable):
|
||||
+00: _vptr → Window_vtable (primary: Gadget subobject)
|
||||
+04: gadget_member_1
|
||||
+08: gadget_member_2
|
||||
+0C: _vptr → Window_Drawable_vtable (secondary: Drawable subobject)
|
||||
+10: drawable_member_1
|
||||
+14: window_member_1
|
||||
|
||||
this-adjustment thunk for Drawable::method():
|
||||
ADDQ.L #$0C, A0 ; adjust this to Drawable subobject
|
||||
JMP _Window_Drawable_method ; tail-call real implementation
|
||||
```
|
||||
|
||||
#### Virtual Inheritance (Diamond Problem)
|
||||
|
||||
<!-- TODO: Expand — VBASE pointer, virtual base offset table, shared subobject layout -->
|
||||
|
||||
### Name Mangling — GCC 2.95.x Reference
|
||||
|
||||
The GCC 2.95.x mangling scheme (based on the Itanium C++ ABI draft) encodes the full qualified name and parameter types into linker symbols. This is your primary source for recovering class names and method signatures:
|
||||
|
||||
| Source Declaration | GCC 2.95.x Mangled Symbol | Decode |
|
||||
|---|---|---|
|
||||
| `Window::Draw(void)` | `Draw__6Window` or `Draw__6WindowFv` | `Draw` method of class `Window` (6 chars) |
|
||||
| `Window::SetPos(int, int)` | `SetPos__6WindowFii` | `SetPos` method, takes two `int` parameters |
|
||||
| `Window::SetPos(long, long)` | `SetPos__6WindowFll` | Same method name, different mangling for `long` |
|
||||
| `operator new(unsigned long)` | `__nw__FUl` | `new` operator, takes `unsigned long` (size) |
|
||||
| `operator delete(void *)` | `__dl__FPv` | `delete` operator, takes `void*` |
|
||||
| `Window::~Window(void)` | `__6Window` or `_$_6Window` | Destructor; `_$` prefix often on Amiga GCC builds |
|
||||
| Static class member function | `GetCount__6WindowFv` | Same mangling as instance method — context determines `static` |
|
||||
| `operator+(Window const &)` | `__pl__6WindowFRC6Window` | `__pl` = operator+, `FRC6Window` = const reference param |
|
||||
| `Window::Window(int, int)` | `__6WindowFii` | Constructor — same pattern as destructor but no special prefix |
|
||||
|
||||
**Demangling helper** (Python):
|
||||
```python
|
||||
# Quick-and-dirty GCC 2.95.x demangler for Amiga symbols
|
||||
import re
|
||||
|
||||
def demangle_gcc295(sym):
|
||||
# Example: SetPos__6WindowFii → Window::SetPos(int, int)
|
||||
m = re.match(r'(.+)__(d+)(.+?)(F.*)?$', sym)
|
||||
if m:
|
||||
method = m.group(1)
|
||||
class_len = int(m.group(2))
|
||||
class_name = m.group(3)[:class_len]
|
||||
params = m.group(4) or ''
|
||||
type_map = {'i': 'int', 'l': 'long', 'v': 'void', 'c': 'char',
|
||||
's': 'short', 'f': 'float', 'Pv': 'void*', 'Ul': 'unsigned long'}
|
||||
return f"{class_name}::{method}(...)"
|
||||
return sym
|
||||
```
|
||||
|
||||
### Name Mangling — StormC Differences
|
||||
|
||||
StormC uses a different mangling scheme from GCC:
|
||||
|
||||
| C++ Construct | GCC 2.95.x | StormC |
|
||||
|---|---|---|
|
||||
| Method `Draw()` on class `Window` | `Draw__6Window` | `Draw_Window` or `Window_Draw` |
|
||||
| Operator `new` | `__nw__FUl` | `__nw_Ul` or inline to `AllocMem` |
|
||||
| Destructor | `__6Window` | `_dtor_Window` or `~Window` |
|
||||
| RTTI | `_ZTI6Window` | `_Type_Window` or absent |
|
||||
| Vtable | `_ZTV6Window` | `_VTable_Window` or `Window_VTable` |
|
||||
|
||||
StormC binaries typically use **fewer mangled symbols** because StormC often inlines trivial methods and may not emit RTTI or vtable symbols with predictable names. Look for the constructor pattern (vtable store at offset +00) as an alternative anchor.
|
||||
|
||||
### RTTI Structure Format (GCC 2.95.x)
|
||||
|
||||
The `type_info` struct provides class identity and — for derived classes — the inheritance chain:
|
||||
|
||||
```c
|
||||
/* GCC 2.95.x type_info layout on m68k: */
|
||||
|
||||
struct type_info {
|
||||
/* +00: vtable pointer for type_info itself (points to __class_type_info vtable) */
|
||||
void * _vptr_type_info;
|
||||
/* +04: mangled class name (null-terminated string) */
|
||||
const char * _name; // e.g., "6Window" (6-char prefix + "Window")
|
||||
};
|
||||
|
||||
/* Single inheritance class type info: */
|
||||
struct __si_class_type_info : public type_info {
|
||||
/* +08: pointer to base class type_info */
|
||||
const type_info * _base_type;
|
||||
};
|
||||
|
||||
/* Multiple inheritance class type info: */
|
||||
struct __vmi_class_type_info : public type_info {
|
||||
/* +08: flags (bit 0 = diamond inheritance) */
|
||||
unsigned int _flags;
|
||||
/* +0C: number of base classes */
|
||||
unsigned int _base_count;
|
||||
/* +10: array of __base_class_type_info entries */
|
||||
struct __base_class_type_info {
|
||||
const type_info * _base_type; // pointer to base class type_info
|
||||
long _offset_flags; // offset to base subobject within most-derived
|
||||
/* bit 0-7: offset shift, bit 8: is_virtual flag */
|
||||
} _base_info[];
|
||||
};
|
||||
```
|
||||
|
||||
> [!WARNING]
|
||||
> Most Amiga C++ binaries use `-fno-rtti` to save space. RTTI is present in fewer than 25% of Amiga C++ productions. When present, it's a goldmine. When absent, rely on vtable structure and mangled names exclusively.
|
||||
|
||||
---
|
||||
|
||||
## Decision Guide
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
BIN["Binary loaded"]
|
||||
VPTR{"_vptr store pattern<br/>MOVE.L vtable, (An)"}
|
||||
VTABLE_DISPATCH{"Vtable dispatch<br/>MOVE.L (A0),An / JSR (An)"}
|
||||
MANGLED{"Mangled symbol<br/>names in HUNK_SYMBOL"}
|
||||
NEW_DELETE{"new/delete wrappers<br/>AllocMem+ctor sequence"}
|
||||
THUNKS{"this-adjustment<br/>thunks present?"}
|
||||
|
||||
BIN --> VPTR
|
||||
VPTR -->|"Yes"| CXX_CONFIRMED["C++ confirmed"]
|
||||
VPTR -->|"No"| VTABLE_DISPATCH
|
||||
VTABLE_DISPATCH -->|"Yes"| CXX_CONFIRMED
|
||||
VTABLE_DISPATCH -->|"No"| MANGLED
|
||||
MANGLED -->|"Yes"| CXX_CONFIRMED
|
||||
MANGLED -->|"No"| NEW_DELETE
|
||||
NEW_DELETE -->|"Yes"| CXX_LIKELY["Likely C++ — no virtuals"]
|
||||
NEW_DELETE -->|"No"| NOT_CXX["Probably C or asm<br/>→ see other guides"]
|
||||
CXX_CONFIRMED --> THUNKS
|
||||
THUNKS -->|"Yes"| MULTI_INHERIT["Multiple/virtual<br/>inheritance present"]
|
||||
THUNKS -->|"No"| SINGLE_INHERIT["Single inheritance<br/>or no inheritance"]
|
||||
```
|
||||
|
||||
### When to Use C++ RE vs Alternatives
|
||||
|
||||
| Scenario | Approach |
|
||||
|---|---|
|
||||
| Vtable dispatch patterns present | C++ RE (this article) |
|
||||
| No vtables, but name mangling suggests classes | C++ without virtual methods (use ANSI C RE as base + class struct reconstruction) |
|
||||
| MUI BOOPSI class (C-implemented OOP) | C RE + BOOPSI dispatcher analysis |
|
||||
| Pure C with function pointer tables | See [ansi_c_reversing.md](ansi_c_reversing.md) — not C++ vtables |
|
||||
|
||||
---
|
||||
|
||||
## Methodology
|
||||
|
||||
### Phase 1: Detect C++ Usage
|
||||
|
||||
<!-- TODO: Expand — detection heuristics, scoring system, IDA Python detector script -->
|
||||
|
||||
Signs of C++:
|
||||
- **Vtable store**: `LEA _vtable, A0` / `MOVE.L A0, (A1)` at object construction
|
||||
- **Vtable dispatch**: `MOVE.L (A0), A1` / `MOVE.L $0N(A1), A0` / `JSR (A0)`
|
||||
- **Name mangling**: Symbol names containing `__` with class name and parameter type encoding
|
||||
- **`new`/`delete` calls**: Wrappers around `AllocMem`/`FreeMem` with constructor/destructor calls
|
||||
- **`this` pointer**: First argument passed in A0 or on stack, used as base for member access
|
||||
- **this-adjustment thunks**: `ADDQ.L #offset, A0` / `JMP real_method`
|
||||
- **RTTI structures**: `type_info` with `.name` pointer
|
||||
|
||||
### Phase 2: Reconstruct Vtables
|
||||
|
||||
<!-- TODO: Expand — vtable extraction methodology, IDA/Ghidra scripts -->
|
||||
|
||||
1. Search for `LEA xxx, A0` / `MOVE.L A0, (An)` pairs — each LEA target is a vtable candidate
|
||||
2. At each vtable address, enumerate the function pointers (4-byte entries)
|
||||
3. Cross-reference each function pointer back to its implementation
|
||||
4. Map vtable index → method name (from mangled symbols or manual deduction)
|
||||
5. Identify the destructor (vtable[0]): look for `FreeMem` calls, base destructor chains, virtual destructor helpers
|
||||
|
||||
### Phase 3: Recover Class Hierarchy
|
||||
|
||||
<!-- TODO: Expand — inheritance graph reconstruction, base class identification, root class discovery -->
|
||||
|
||||
- **Single inheritance**: Objects share the same `_vptr` offset (+00); derived class vtable extends base class vtable entries
|
||||
- **Multiple inheritance**: Multiple `_vptr` fields at different offsets in the object; `this` adjustment thunks
|
||||
- **Virtual base classes**: Shared base via pointer indirection (virtual base offset table)
|
||||
- **Common base detection**: Look for identical vtable prefix sequences across multiple classes
|
||||
|
||||
### Phase 4: Match Constructors to Classes
|
||||
|
||||
<!-- TODO: Expand — constructor identification, this-pointer initialization, vtable store as constructor anchor -->
|
||||
|
||||
Constructors are entry points that:
|
||||
1. Call a base constructor (recursive)
|
||||
2. Store a vtable pointer
|
||||
3. Initialize member variables after the vtable store
|
||||
4. Return `this` in D0
|
||||
|
||||
### Phase 5: Reconstruct Class Member Layout
|
||||
|
||||
<!-- TODO: Expand — field offset extraction from constructor init sequences and method body access patterns, alignment rules -->
|
||||
|
||||
### Global Constructor & Destructor Arrays (GCC)
|
||||
|
||||
GCC 2.95.x emits two arrays that the startup code must process before calling `main()`:
|
||||
|
||||
```
|
||||
__CTOR_LIST__ format:
|
||||
┌──────────────────────┐
|
||||
│ count (N) │ __CTOR_LIST__[0] = number of constructor functions
|
||||
├──────────────────────┤
|
||||
│ constructor_func_1 │ __CTOR_LIST__[1] = function pointer
|
||||
├──────────────────────┤
|
||||
│ constructor_func_2 │ __CTOR_LIST__[2]
|
||||
├──────────────────────┤
|
||||
│ ... │
|
||||
├──────────────────────┤
|
||||
│ 0x00000000 │ Terminator (NULL)
|
||||
└──────────────────────┘
|
||||
|
||||
__DTOR_LIST__ has the SAME format.
|
||||
```
|
||||
|
||||
**Startup processing** (found in the startup code, before `main()` is called):
|
||||
```asm
|
||||
; Iterate __CTOR_LIST__ and call each constructor:
|
||||
LEA __CTOR_LIST__, A0
|
||||
MOVE.L (A0)+, D0 ; load count
|
||||
BEQ no_ctors
|
||||
ctor_loop:
|
||||
MOVE.L (A0)+, A1 ; load constructor function pointer
|
||||
JSR (A1) ; call it
|
||||
SUBQ.L #1, D0
|
||||
BNE ctor_loop
|
||||
no_ctors:
|
||||
```
|
||||
|
||||
**In the RE workflow**: If you find `__CTOR_LIST__` and `__DTOR_LIST__` in the symbol table:
|
||||
1. Each function pointer in `__CTOR_LIST__` is a global object constructor — these initialize global C++ objects
|
||||
2. Trace each constructor to find which class it initializes
|
||||
3. The matching destructor is in `__DTOR_LIST__` at the same index
|
||||
4. Destructors are called in **reverse order** at program exit by the startup code
|
||||
|
||||
> [!NOTE]
|
||||
> `__CTOR_LIST__` and `__DTOR_LIST__` are emitted even with `-nostdlib` (as described in the RastPort article on C++ without standard library). They're part of the GCC ABI, not the standard library.
|
||||
|
||||
Common operator patterns in m68k disassembly:
|
||||
|
||||
| Operator | Assembly Signature | Notes |
|
||||
|---|---|---|
|
||||
| `operator=` | `CMP.L src, this` / `BEQ skip` (self-assignment guard) then member-by-member copy | Self-assignment check is the definitive marker |
|
||||
| `operator new` | `MOVE.L size, D0` / `JSR AllocVec` — thin wrapper, returns `this` in D0 | Size argument is `sizeof(Class)` — confirms class identity |
|
||||
| `operator delete` | `JSR FreeVec` — calls destructor first if virtual | May be a single instruction if inlined |
|
||||
| `operator==` | `CMP.L` per member, `SEQ D0` / `EXT.L D0` / `RTS` — returns 0 or 1 in D0 | Boolean return in D0 is distinctive |
|
||||
| `operator+` | Creates new object via `operator new`, initializes with sum of two operands, returns new object | Creates and returns new object in D0 |
|
||||
| `operator[]` | `MOVE.L index, D0` / `ASL.L #element_shift, D0` / `ADD.L base, D0` then access at (D0) | Index calculation → base+offset load |
|
||||
| `operator++` (prefix) | Increments member, returns `*this` in D0 | Returns reference to modified object |
|
||||
| `operator++` (postfix) | Saves old value, increments member, returns old value | Postfix has dummy `int` parameter in mangling |
|
||||
|
||||
### Phase 7: Dynamic Verification
|
||||
|
||||
<!-- TODO: Expand — FS-UAE debugger: breakpoint on constructors to count live objects, trace vtable dispatch at runtime to confirm method mapping, verify object lifetimes -->
|
||||
|
||||
---
|
||||
|
||||
## Tool-Specific Workflows
|
||||
|
||||
<!-- TODO: Expand — detailed walkthroughs for each tool -->
|
||||
|
||||
### IDA Pro
|
||||
|
||||
<!-- TODO: Vtable struct type creation, Hex-Rays C++ decompilation quirks, class hierarchy plugin (ClassInformer equivalent for m68k), mangled name demangling script -->
|
||||
|
||||
### Ghidra
|
||||
|
||||
<!-- TODO: Ghidra C++ class recovery, structure editor for class layout, script for vtable-to-class mapping -->
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
<!-- TODO: Numbered list -->
|
||||
|
||||
1. **Start from the vtable** — it's the most information-dense artifact in a C++ binary
|
||||
2. **Identify the destructor first** — it anchors the vtable; everything else chains from it
|
||||
3. **Match constructors to vtables** — each constructor stores exactly one vtable; that's your class identity
|
||||
4. **Use mangled names when available** — they encode class name, method name, and parameter types
|
||||
5. **Trace `this` through the function** — document which offsets are read/written; those are member fields
|
||||
6. **Detect multiple inheritance by counting `_vptr` stores per constructor** — more than one store per object = multiple inheritance
|
||||
7. **Don't assume RTTI is present** — it's often stripped; rely on vtable structure instead
|
||||
8. **Build a class diagram as you work** — manually or via tooling; the relationships become visible from vtable sharing
|
||||
9. **Verify destructor chains dynamically** — breakpoint on `FreeMem` to see which destructors run in which order
|
||||
10. **Document the vtable layout in a table** — offset → method name → implementation address; this is your reconstruction artifact
|
||||
|
||||
---
|
||||
|
||||
## Antipatterns
|
||||
|
||||
<!-- TODO: Add named antipatterns with broken/fixed code pairs -->
|
||||
|
||||
### 1. The Missing Base Class
|
||||
|
||||
**Wrong**: Assuming a vtable with N entries represents a single class with N virtual methods.
|
||||
|
||||
**Why**: In single inheritance, the derived vtable contains entries from ALL base classes plus its own additions. A 12-entry vtable might be a 3-level hierarchy with 4 virtual methods per class.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 2. The Flat Vtable Assumption
|
||||
|
||||
**Wrong**: Treating all vtable entries as equal without identifying the destructor.
|
||||
|
||||
**Why**: The destructor (first vtable entry) is the anchor. Once you identify the destructor chain, you can trace back through the constructor chain to reconstruct the class hierarchy.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 3. The Single Inheritance Blindness
|
||||
|
||||
**Wrong**: Assuming `_vptr` is always at offset +00.
|
||||
|
||||
**Why**: In multiple inheritance, each base class subobject has its own `_vptr`. An object may have 2–3 vtable pointers at different offsets (+00, +$10, +$20).
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 4. The RTTI Assumption
|
||||
|
||||
**Wrong**: Relying on RTTI being present to name classes and map hierarchies.
|
||||
|
||||
**Why**: Most Amiga C++ projects use `-fno-rtti` to save space. RTTI is the exception, not the rule. You must reconstruct class names from mangled symbols or manual analysis.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 5. The Thunk-as-Function Mistake
|
||||
|
||||
**Wrong**: Treating `this`-adjustment thunks as separate virtual methods.
|
||||
|
||||
**Why**: A thunk is a 2-instruction trampoline (`ADDQ.L #offset, A0` / `JMP real_method`). Adding it to the vtable count inflates the method inventory and confuses the call graph.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 6. The Virtual Destructor Blind Spot
|
||||
|
||||
**Wrong**: Assuming every destructor is a standalone function.
|
||||
|
||||
**Why**: GCC generates up to 3 destructor variants: D0 (in-charge, deletes object), D1 (in-charge, no delete), D2 (not-in-charge, base subobject destructor). All three may appear as separate functions near each other. Missing one means missing an entire constructor chain path.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 7. The Constructor-as-Init Confusion
|
||||
|
||||
**Wrong**: Assuming any function that initializes memory is a constructor.
|
||||
|
||||
**Why**: C-style init functions, factory functions, and reset methods all initialize objects but don't set vtable pointers. Only C++ constructors store the vtable. The vtable store is the definitive constructor marker.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 8. The Virtual Inheritance Denial
|
||||
|
||||
**Wrong**: Assuming inheritance is always simple enough to reconstruct from vtable layout alone.
|
||||
|
||||
**Why**: Virtual inheritance (diamond problem) introduces vbase pointers and offset tables that make the object layout non-linear. Without recognizing the vbase pattern, you'll place fields at wrong offsets and misidentify base class relationships.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
---
|
||||
|
||||
## Pitfalls
|
||||
|
||||
### 1. Thunk Confusion
|
||||
|
||||
```asm
|
||||
; "this adjustment" thunk for multiple inheritance:
|
||||
; When calling base2::method() through a derived pointer,
|
||||
; the compiler generates:
|
||||
ADDQ.L #$10, A0 ; adjust this to base2 subobject
|
||||
JMP _base2_method ; tail-call the real method
|
||||
```
|
||||
|
||||
These small thunks are not real methods — they're pointer adjustments. Mistaking them for separate functions inflates the vtable count.
|
||||
|
||||
<!-- TODO: Add worked example — how to identify and filter thunks -->
|
||||
|
||||
### 2. Inline Destructor Deception
|
||||
|
||||
<!-- TODO: Add worked example -->
|
||||
|
||||
GCC sometimes inlines trivial destructors, making the vtable entry point directly to `FreeMem` with no destructor body. This looks like a bug but is correct — the class has no resources to free.
|
||||
|
||||
### 3. RTTI Disabled
|
||||
|
||||
<!-- TODO: Add worked example — how to proceed without RTTI -->
|
||||
|
||||
Not all C++ compilers for Amiga emitted RTTI. Don't assume `type_info` will be present — GCC `-fno-rtti` disables it, and many Amiga projects used this flag to save space.
|
||||
|
||||
### 4. Multiple Destructor Variants
|
||||
|
||||
<!-- TODO: GCC generates D0/D1/D2 destructor variants; StormC may use different numbering. Identifying which is which from disassembly alone. -->
|
||||
|
||||
### 5. vtable Sharing Between Classes
|
||||
|
||||
<!-- TODO: Related classes may share vtable sections; distinguishing shared entries from derived overrides. -->
|
||||
|
||||
### 6. Static Initialization Order
|
||||
|
||||
<!-- TODO: C++ static constructors run before main(); their initialization order matters but is invisible in static analysis. Dynamic verification needed. -->
|
||||
|
||||
### 7. Exception Handling Artifacts
|
||||
|
||||
<!-- TODO: If exceptions are enabled (rare on Amiga), frame unwind tables add data sections that look like garbage but are critical for stack unwinding. -->
|
||||
|
||||
### 8. Template Instantiation Code Bloat
|
||||
|
||||
<!-- TODO: Templates generate duplicate code for each type instantiation. Identifying which functions are template instantiations vs hand-written duplicates. -->
|
||||
|
||||
### 9. Compiler-Generated Default Methods
|
||||
|
||||
<!-- TODO: Default constructor, copy constructor, assignment operator, destructor — compilers generate these silently. They appear as functions you didn't expect but must account for in the vtable. -->
|
||||
|
||||
---
|
||||
|
||||
## Use-Case Cookbook
|
||||
|
||||
### Pattern 1: Extracting All Vtables from a Binary
|
||||
|
||||
<!-- TODO: Step-by-step — IDA Python script to scan for vtable store patterns (LEA vtable, A0 / MOVE.L A0, (An)), collect vtable addresses, enumerate entries, cross-reference implementations. -->
|
||||
|
||||
### Pattern 2: Reconstructing a Single Inheritance Chain
|
||||
|
||||
<!-- TODO: Step-by-step — start from most-derived constructor, trace vtable entries backward through base class constructors, identify shared entries, build hierarchy diagram. -->
|
||||
|
||||
### Pattern 3: Recovering Class Method Names from Mangled Symbols
|
||||
|
||||
<!-- TODO: Step-by-step — parse GCC mangled names, extract class name + method name + parameter types, apply to IDA. Python demangler script. -->
|
||||
|
||||
### Pattern 4: Mapping `new`/`delete` to Class Types
|
||||
|
||||
<!-- TODO: Step-by-step — identify operator new wrappers, trace the size argument back to sizeof(Class), cross-reference with constructor calls to confirm class identity. -->
|
||||
|
||||
### Pattern 5: Reconstructing Multiple Inheritance from Thunks
|
||||
|
||||
<!-- TODO: Step-by-step — find all this-adjustment thunks, map the offset to the correct base subobject, reconstruct the full object layout with multiple _vptr fields. -->
|
||||
|
||||
### Pattern 6: Identifying a MUI Custom Class (BOOPSI/C++ Hybrid)
|
||||
|
||||
<!-- TODO: Step-by-step — MUI classes use BOOPSI dispatch (OM_NEW/OM_SET/OM_GET etc.) but may be implemented in C++. Identifying the BOOPSI dispatcher as a vtable entry, mapping OM_xxx method IDs to vtable indices. -->
|
||||
|
||||
### Pattern 7: Reconstructing a StormC++ Class Hierarchy
|
||||
|
||||
<!-- TODO: Step-by-step — StormC-specific vtable layout, RTTI format, name mangling differences from GCC, constructor signature patterns. -->
|
||||
|
||||
### Pattern 8: Tracing Object Lifetimes Through Constructors/Destructors
|
||||
|
||||
<!-- TODO: Step-by-step — dynamic analysis: breakpoint on all constructors, log object addresses, breakpoint on destructors, verify every constructed object is destroyed. IDA + FS-UAE workflow. -->
|
||||
|
||||
### Pattern 9: Decompiling Virtual Method Bodies
|
||||
|
||||
<!-- TODO: Step-by-step — once vtable is mapped, each method body can be decompiled as a standalone C function with `this` as first parameter. Hex-Rays / Ghidra decompiler workflow for virtual methods. -->
|
||||
|
||||
### Pattern 10: Identifying the Root Base Class
|
||||
|
||||
<!-- TODO: Step-by-step — the root base class constructor does NOT call another constructor (no recursive SUPER::ctor call). Finding this constructor identifies the inheritance tree root. -->
|
||||
|
||||
---
|
||||
|
||||
## Real-World Examples
|
||||
|
||||
<!-- TODO: Reference specific Amiga C++ software with documented RE findings -->
|
||||
|
||||
### Applications
|
||||
|
||||
<!-- TODO: MUI custom classes — BOOPSI implemented in C++; YAM (Yet Another Mailer) — C++ with MUI, complex class hierarchy; StormC IDE itself — built with StormC++; AmigaWriter — C++ word processor; fxPaint — C++ graphics application. -->
|
||||
|
||||
### Games
|
||||
|
||||
<!-- TODO: "Foundation" / "Foundation: Gold" — large C++ game; "Earth 2140" — Amiga port using C++; any StormC++-compiled shareware games. -->
|
||||
|
||||
### Libraries
|
||||
|
||||
<!-- TODO: MUI framework classes (NList, NListview, etc.) — C++ implementations of BOOPSI gadgets; custom MUI classes from third-party developers. -->
|
||||
|
||||
---
|
||||
|
||||
## Historical Context — C++ on Amiga
|
||||
|
||||
C++ adoption on Amiga was limited by several factors:
|
||||
|
||||
| Factor | Impact |
|
||||
|---|---|
|
||||
| **Late compiler arrival** | StormC (1996) was the first practical Amiga-native C++ IDE. GCC 2.95.x cross-compiler arrived later. Before this, C++ on Amiga was essentially non-existent. |
|
||||
| **RAM constraints** | Virtual dispatch tables, RTTI data, and template instantiations consumed RAM that a stock A500 (512KB–1MB) couldn't spare |
|
||||
| **Performance overhead** | Virtual method dispatch on a 7 MHz 68000 (2 indirections + JSR) was measurably slower than a direct function call — problematic for real-time code |
|
||||
| **SAS/C limited support** | SAS/C had rudimentary C++ support but no exceptions, no RTTI, and no STL. It was effectively "C with classes" |
|
||||
| **StormC dominance (late 1990s)** | StormC brought a full IDE and usable C++ to Amiga. Most late-era Amiga C++ software was compiled with StormC |
|
||||
| **GCC cross-compilation** | Developers targeting Amiga from Linux/Windows used GCC 2.95.x m68k cross-compilers, bringing modern C++ (templates, STL) to Amiga |
|
||||
| **`-fno-rtti -fno-exceptions` were standard** | Nearly all Amiga C++ binaries disable RTTI and exceptions to save space and avoid runtime overhead |
|
||||
|
||||
**Notable C++ Amiga applications**:
|
||||
- **YAM (Yet Another Mailer)**: C++ with MUI, complex class hierarchy — one of the most-studied Amiga C++ applications
|
||||
- **StormC IDE**: Self-hosting — the StormC compiler was written in C++ and compiled with itself
|
||||
- **Foundation / Foundation: Gold**: Large C++ game with custom memory management
|
||||
- **fxPaint**: C++ graphics application with plugin architecture
|
||||
- **MUI custom classes**: Many MUI widgets (NList, NListview, TextEditor) were implemented in C++
|
||||
|
||||
---
|
||||
|
||||
## Modern Analogies
|
||||
|
||||
<!-- TODO: Expand -->
|
||||
|
||||
| Amiga C++ Concept | Modern Analogy | Where It Holds / Breaks |
|
||||
|---|---|---|
|
||||
| Vtable dispatch | C++ vtable (unchanged!) | Holds: exactly the same mechanism; breaks: modern ABIs add more metadata |
|
||||
| this-adjustment thunk | Multiple inheritance pointer fixup | Holds: same concept; breaks: modern compilers use more compact thunk encoding |
|
||||
| Name mangling | Itanium C++ ABI mangling | Holds: GCC uses same mangling scheme; breaks: StormC uses different scheme |
|
||||
| RTTI type_info | `std::type_info` / `typeid` | Holds: same structure; breaks: Amiga often stripped RTTI to save space |
|
||||
| Virtual destructor | Virtual destructor (unchanged) | Holds: identical concept; breaks: Amiga compilers generate multiple destructor variants |
|
||||
| operator new/delete | `operator new`/`delete` | Holds: same wrappers around allocator; breaks: Amiga wraps AllocMem/FreeMem directly |
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
### Q1: How do I identify the destructor in a vtable?
|
||||
|
||||
<!-- TODO -->
|
||||
|
||||
### Q2: Why are there empty "padding" entries in the vtable?
|
||||
|
||||
<!-- TODO: Offset-to-top values at negative vtable offsets, RTTI pointers, alignment padding. -->
|
||||
|
||||
### Q3: How do I tell virtual inheritance from multiple inheritance?
|
||||
|
||||
<!-- TODO: Virtual base offset tables vs multiple _vptr fields; diamond inheritance detection from shared base constructor calls. -->
|
||||
|
||||
### Q4: What does `this` adjustment look like and how do I automate detection?
|
||||
|
||||
<!-- TODO: ADDQ.L #offset, A0 / JMP pattern; IDA Python script to detect thunks. -->
|
||||
|
||||
### Q5: How do I distinguish between SAS/C C++ and GCC C++ in disassembly?
|
||||
|
||||
<!-- TODO: SAS/C uses different name mangling, constructor signature, and vtable layout. Detection heuristics. -->
|
||||
|
||||
### Q6: How do I recover the original class name without RTTI?
|
||||
|
||||
<!-- TODO: Name mangling in HUNK_SYMBOL, string references in constructors, library OpenLibrary calls that suggest purpose. -->
|
||||
|
||||
### Q7: Why do I see three destructors for one class?
|
||||
|
||||
<!-- TODO: GCC D0/D1/D2 destructor variants explained — in-charge deleting, in-charge non-deleting, not-in-charge (base subobject). -->
|
||||
|
||||
### Q8: How do I identify a C++ exception handler in the binary?
|
||||
|
||||
<!-- TODO: Rare on Amiga, but if present: frame unwind tables look like data arrays between functions; setjmp/longjmp used as exception primitive. -->
|
||||
|
||||
### Q9: How do I reconstruct template instantiations?
|
||||
|
||||
<!-- TODO: Templates generate code clones with different type sizes; identify by matching code structure across functions with different constant values. -->
|
||||
|
||||
### Q10: Can I tell if a class inherits from a BOOPSI base (MUI)?
|
||||
|
||||
<!-- TODO: BOOPSI dispatch method in vtable, OM_NEW/OM_SET dispatch patterns, MUI-specific method IDs. -->
|
||||
|
||||
### Q11: How do I handle mixed C and C++ in the same binary?
|
||||
|
||||
<!-- TODO: `extern "C"` functions appear without name mangling; C++ functions have mangled names. The boundary is visible in the symbol table. -->
|
||||
|
||||
### Q12: What's the difference between StormC++ and GCC vtable layout?
|
||||
|
||||
<!-- TODO: RTTI placement, destructor numbering scheme, name mangling format, this-adjustment thunk encoding. -->
|
||||
|
||||
---
|
||||
|
||||
## FPGA / Emulation Impact
|
||||
|
||||
<!-- TODO: Expand — vtable dispatch is pure CPU code with no custom chip interaction; no special FPGA concerns. Exception: timing-sensitive code in constructors/destructors (rare). -->
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [ansi_c_reversing.md](ansi_c_reversing.md) — C binary reverse engineering
|
||||
- [struct_recovery.md](struct_recovery.md) — Struct layout reconstruction from offsets
|
||||
- [compiler_fingerprints.md](../compiler_fingerprints.md) — Compiler identification
|
||||
- [m68k_codegen_patterns.md](m68k_codegen_patterns.md) — Code generation patterns
|
||||
- [asm68k_binaries.md](asm68k_binaries.md) — Hand-written assembly RE
|
||||
- [api_call_identification.md](api_call_identification.md) — Library call recognition
|
||||
- *Itanium C++ ABI* (reference for GCC-compatible vtable layout)
|
||||
- *StormC++ Manual* — Amiga-native C++ compiler documentation
|
||||
- NDK 3.9: `include/exec/types.h` — Fundamental type layout
|
||||
|
|
@ -397,3 +397,4 @@ StormC is the primary Amiga C++ compiler. C++ code is identifiable by:
|
|||
- GCC for m68k: https://github.com/bebbo/amiga-gcc
|
||||
- VBCC manual: http://www.compilers.de/vbcc.html
|
||||
- *Amiga ROM Kernel Reference Manual: Libraries* — register conventions
|
||||
- [code_vs_data_disambiguation.md](code_vs_data_disambiguation.md) — distinguishing code bytes from data/variables
|
||||
|
|
|
|||
679
05_reversing/static/other_languages.md
Normal file
679
05_reversing/static/other_languages.md
Normal file
|
|
@ -0,0 +1,679 @@
|
|||
[← Home](../../README.md) · [Reverse Engineering](../README.md)
|
||||
|
||||
# Non-C Languages — AMOS, Blitz Basic, E, Modula-2, FORTH, and Others
|
||||
|
||||
## Overview
|
||||
|
||||
While C and assembly dominated Amiga development, a surprising number of applications and games were built in higher-level or domain-specific languages. **AMOS Professional** and **Blitz Basic 2** produced hundreds of commercial and shareware titles. **Amiga E** (by Wouter van Oortmerssen) was a fast compiled language with C-like performance and Python-like expressiveness. **Modula-2** and **Oberon** brought structured programming from ETH Zürich. **FORTH** (JForth, HSForth) powered interactive development environments and embedded systems. **ARexx** served as the system-wide scripting glue. **CanDo** and **AmigaVision** enabled non-programmers to build multimedia applications.
|
||||
|
||||
Each of these languages leaves a **distinctive runtime fingerprint** in the binary — an interpreter loop, a tokenized bytecode format, a compiler-specific runtime library, or a unique memory layout. Reversing these binaries requires recognizing the language first, then applying language-specific decompilation strategies that bear little resemblance to standard C reverse engineering.
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Languages on Amiga"
|
||||
ASM["Assembly<br/>Direct m68k"]
|
||||
C_FAM["C / C++<br/>SAS/C, GCC, VBCC, StormC"]
|
||||
BASIC["BASIC dialects<br/>AMOS, Blitz Basic, ABasiC"]
|
||||
E_LANG["Amiga E<br/>Compiled, fast, modular"]
|
||||
MODULA["Modula-2 / Oberon<br/>ETH heritage"]
|
||||
FORTH_LANG["FORTH<br/>JForth, HSForth"]
|
||||
PASCAL["Pascal<br/>HiSoft, Maxon"]
|
||||
SCRIPT["Scripting / Visual<br/>ARexx, CanDo, AmigaVision"]
|
||||
end
|
||||
subgraph "Binary Fingerprint"
|
||||
NATIVE["Native code<br/>HUNK binary, direct m68k"]
|
||||
TOKENIZED["Tokenized bytecode<br/>Interpreter loop, opcode tables"]
|
||||
HYBRID["Hybrid<br/>Compiled to m68k with runtime"]
|
||||
THREADED["Threaded code<br/>FORTH dictionary, NEXT routine"]
|
||||
end
|
||||
ASM --> NATIVE
|
||||
C_FAM --> NATIVE
|
||||
E_LANG --> NATIVE
|
||||
MODULA --> NATIVE
|
||||
PASCAL --> NATIVE
|
||||
BASIC --> TOKENIZED
|
||||
BASIC --> HYBRID
|
||||
SCRIPT --> TOKENIZED
|
||||
FORTH_LANG --> THREADED
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
### Language Classification by Binary Type
|
||||
|
||||
| Category | Languages | Binary Format | RE Strategy |
|
||||
|---|---|---|---|
|
||||
| **Native Code** | Amiga E, Modula-2, Oberon, Pascal | Standard HUNK binary | Standard disassembly with language-specific runtime library recognition |
|
||||
| **Tokenized Bytecode** | AMOS, ABasiC, Hisoft BASIC, CanDo | Custom executable with embedded interpreter | Extract bytecode, identify opcode table, decompile token stream |
|
||||
| **Hybrid (Compiled + Runtime)** | Blitz Basic 2, AmigaVision | HUNK binary + runtime library + optional tokenized sections | Identify Blitz runtime calls; decompile library calls back to BASIC semantics |
|
||||
| **Threaded Code** | JForth, HSForth, Yerk | HUNK binary with threaded interpreter | Identify NEXT routine, walk dictionary, reconstruct FORTH words |
|
||||
| **Scripted / Embedded** | ARexx, Installer | ARexx macro files (.rexx), Installer scripts | Plain-text or tokenized script; host application provides runtime |
|
||||
|
||||
---
|
||||
|
||||
## Language Quick-Reference Table
|
||||
|
||||
| Language | Compiler / Runtime | Binary Type | Main Runtime Library | Key Identifier |
|
||||
|---|---|---|---|---|
|
||||
| **AMOS** | AMOSPro_Interpreter | Custom + tokenized IFF | `AMOS.library` | `AMOS` or `AMOSPro` string |
|
||||
| **Blitz Basic 2** | Blitz2 compiler | HUNK native code | `blitzlib.library` | `BB_` prefixed runtime calls |
|
||||
| **Amiga E** | `ec` / CreativE | HUNK native code | `ec.runtime` | `E_GC`, module export tables |
|
||||
| **Modula-2** | M2Amiga | HUNK native code | M2 runtime | Module `_init`/`_final` |
|
||||
| **Oberon** | AmigaOberon / OOC | HUNK native code | Oberon runtime | Type-bound procedure dispatch |
|
||||
| **FORTH** | JForth, HSForth | HUNK + threaded dict | Built-in | Threaded `NEXT` interpreter |
|
||||
| **ARexx** | `rexxsyslib.library` | Plain text / tokenized `.rexx` | `rexxsyslib.library` | ARexx script header |
|
||||
| **ABasiC** | Metacomco ABasiC | Tokenized | ABasiC runtime | ABasiC token format |
|
||||
| **Hisoft BASIC** | Hisoft compiler | HUNK native code | Hisoft runtime | Hisoft runtime calls |
|
||||
| **Hisoft Pascal** | Hisoft compiler | HUNK native code | Hisoft Pascal runtime | Pascal calling convention, string descriptors |
|
||||
| **Maxon Pascal** | Maxon compiler | HUNK native code | Maxon runtime | Module system, OOP extensions |
|
||||
| **CanDo** | CanDo runtime | Tokenized deck format | `cando.library` | Deck file magic bytes, card/button descriptors |
|
||||
| **AmigaVision** | AmigaVision runtime | Hybrid (compiled flows) | AmigaVision runtime | Flow chart bytecode, media references |
|
||||
| **DevPAC** | DevPAC assembler macros | HUNK native (asm) | None (pure m68k) | DevPAC-specific include/macro signatures |
|
||||
|
||||
---
|
||||
|
||||
## Language-Specific RE Guides
|
||||
|
||||
### AMOS Professional / AMOS Basic
|
||||
|
||||
<!-- TODO: Expand — full AMOS reversing guide -->
|
||||
|
||||
AMOS compiles to a tokenized bytecode embedded in a custom executable wrapper. The interpreter is `AMOS.library` or the standalone `AMOSPro_Interpreter`. Tokenized programs contain:
|
||||
- **Token stream**: 1-byte opcodes with inline arguments
|
||||
- **Sprite/bob data**: Embedded IFF ILBM chunks
|
||||
- **Sample data**: Embedded IFF 8SVX chunks
|
||||
- **String table**: Pascal strings (length-prefixed)
|
||||
- **Bank system**: Separate memory banks for sprites, samples, music, code — each with its own identifier
|
||||
|
||||
**Detection**: Look for `AMOS` or `AMOSPro` strings, or the `AMOS.library` OpenLibrary call. Tokenized executables have a distinct header with bank count and sizes.
|
||||
|
||||
**Decompilation methodology**:
|
||||
<!-- TODO: Expand — step-by-step AMOS token extraction, opcode table reconstruction, bank extraction, IFF chunk recovery -->
|
||||
|
||||
**Decompilation tools**: `AMOSList` (token dumper), `AMOS2ASCII` converters exist but are incomplete.
|
||||
|
||||
**Key Challenge**: AMOS extensions (AMAL, AMOS 3D, TOME, etc.) add custom opcodes to the token stream. An unextended decompiler will fail on these. You must map extension opcodes to their extension name.
|
||||
|
||||
#### AMOS-Specific Pitfalls
|
||||
|
||||
<!-- TODO: Extension opcode conflicts, bank type misidentification, compressed bank data, AMOS vs AMOSPro bytecode differences -->
|
||||
|
||||
---
|
||||
|
||||
### Blitz Basic 2 / AmiBlitz
|
||||
|
||||
<!-- TODO: Expand — full Blitz Basic reversing guide -->
|
||||
|
||||
Blitz Basic 2 compiles to native m68k code linked with `BlitzLibs` (or `AmiBlitz` libraries). The compiled output:
|
||||
- Uses standard HUNK format
|
||||
- Links against `blitzlib.library` or specific Blitz support libraries
|
||||
- Generates surprisingly efficient code for a BASIC compiler (rivals hand-written C in some cases)
|
||||
- Library calls are regular `JSR LVO(A6)` but use Blitz-specific libraries
|
||||
- Supports inline assembly via `[asm] ... [end asm]` blocks
|
||||
|
||||
**Detection**: Look for `blitzlib.library` OpenLibrary calls, or Blitz-specific runtime functions. Common BlitzLib function prefixes:
|
||||
|
||||
| BlitzLib Function Prefix | Purpose | Example |
|
||||
|---|---|---|
|
||||
| `BB_AllocMem` | Memory allocation | Blitz's internal allocator |
|
||||
| `BB_FreeMem` | Memory deallocation | Matching free |
|
||||
| `BB_StrCopy` | String copy | Blitz string handling |
|
||||
| `BB_StrCmp` | String compare | Case-sensitive comparison |
|
||||
| `BB_LoadShape` | Load IFF ILBM shape | Used with `LoadShape` statement |
|
||||
| `BB_LoadSound` | Load IFF 8SVX sample | Used with `LoadSound` statement |
|
||||
| `BB_DisplayShape` | Blit shape to screen | `ShowShape` / `DisplayShape` statement |
|
||||
| `BB_QSprite` | QSprite (hardware sprite) management | Blitz sprite system |
|
||||
| `BB_Poke` / `BB_Peek` | Direct memory access | `Poke.b`, `Peek.w` etc. |
|
||||
| `BB_Print` | Text output | Blitz `Print` statement |
|
||||
| `BB_Input` | Text input | Blitz `Input` / `Edit()` function |
|
||||
|
||||
> [!NOTE]
|
||||
> Blitz Basic 2 inline assembly (`[asm]...[/asm]`) requires A4-A6 to be preserved. Look for `MOVEM.L` save/restore of A4-A6 around code blocks that contain direct hardware access — these are likely inline asm blocks within a Blitz program.
|
||||
|
||||
**Decompilation methodology**:
|
||||
<!-- TODO: Expand — mapping BlitzLib calls back to Blitz statements, For/Next loop detection, Select/Case pattern recognition, Type/End Type struct layout recovery, NewType object system analysis -->
|
||||
|
||||
**Key Challenge**: Blitz inlines certain operations. A `For...Next` loop generates a `DBRA` directly. Understanding the mapping between Blitz statements and their code generation patterns is essential.
|
||||
|
||||
**BlitzLib to BASIC mapping table**:
|
||||
<!-- TODO: Common BlitzLib calls and their corresponding Blitz BASIC statements -->
|
||||
|
||||
#### Blitz-Specific Pitfalls
|
||||
|
||||
<!-- TODO: Inline assembly boundaries in decompilation, AmiBlitz vs Blitz2 library differences, NewType object system (not vtable-based, but custom dispatch), tokenized data sections (images, shapes) embedded in DATA hunks -->
|
||||
|
||||
---
|
||||
|
||||
### Amiga E
|
||||
|
||||
Amiga E compiles to native m68k code via the `ec` compiler (or `CreativE`). Key characteristics:
|
||||
- **Garbage collector**: `E_GC()` runtime calls interspersed in the code for conservative stack scanning
|
||||
- **Module system**: `.e` modules compile to `.m` object files with specific export tables
|
||||
- **Exception handling**: Try/Except generates custom stack unwinding code
|
||||
- **List comprehension**: Generates iteration patterns with `E_Next()` calls
|
||||
- **Object system**: Message-passing OOP (not C++ vtables) — objects have a **method table** pointer at +00, but the table maps message IDs to handlers, not fixed-offset virtual methods
|
||||
- **Concurrency**: Lightweight tasks via `E_Task()` — cooperative coroutines, not exec tasks
|
||||
|
||||
**Amiga E Object Layout**:
|
||||
```
|
||||
E Object:
|
||||
+00: method_table_ptr → MethodTable
|
||||
+04: instance_var_1
|
||||
+08: instance_var_2
|
||||
...
|
||||
|
||||
MethodTable:
|
||||
+00: class_name (string ptr)
|
||||
+04: parent_method_table (ptr or NULL)
|
||||
+08: message_handler_count
|
||||
+0C: message_id_0 → handler_0
|
||||
+10: message_id_1 → handler_1
|
||||
...
|
||||
```
|
||||
|
||||
**Critical difference from C++**: E methods are dispatched by **message ID** (a symbol/integer), not by fixed vtable offset. The method table is a key-value map, not an array. A call like `obj.method()` compiles to a search through the method table for the matching message ID, then a `JSR` to the handler. This is more dynamic than C++ and harder to reconstruct from static analysis alone.
|
||||
|
||||
**Detection**: Look for `ec.runtime` library calls, `E_GC`, or module export tables with `.m` format. The E runtime is typically statically linked.
|
||||
|
||||
**Decompilation methodology**:
|
||||
<!-- TODO: Expand — E object system reconstruction (method tables, not vtables), module dependency graph from export tables, GC root tracing for memory layout, exception handler mapping, ANY type dispatch reconstruction -->
|
||||
|
||||
**Key Challenge**: E's syntax is so expressive that a single line of E can generate 15–20 m68k instructions. Mapping back to source-level intent requires understanding E's compilation strategies.
|
||||
|
||||
#### Amiga E-Specific Pitfalls
|
||||
|
||||
<!-- TODO: GC root misidentification (objects held by registers may not look reachable), message-passing vs function call distinction, module export vs import confusion, E's custom string format (not C strings) -->
|
||||
|
||||
---
|
||||
|
||||
### Modula-2 / Oberon
|
||||
|
||||
<!-- TODO: Expand — full Modula-2/Oberon reversing guide -->
|
||||
|
||||
Modula-2 (via the Modula-2 Development System or M2Amiga) compiles to native code with distinctive patterns:
|
||||
- **Module initialization**: Each module has `_init` and `_final` procedures called at load/unload
|
||||
- **COROUTINES**: `NEWPROCESS`/`TRANSFER` generate custom context-switching code
|
||||
- **Opaque types**: Information hiding generates accessor functions that only appear for exported types
|
||||
- **No preprocessor**: No `#include`, no macros — all dependencies are explicit IMPORT/EXPORT
|
||||
- **Range-checked arrays**: Runtime bounds checking generates additional CMP/Bcc instructions
|
||||
- **Set operations**: Bit-manipulation patterns for set types with `INCL`/`EXCL` (include/exclude)
|
||||
|
||||
Oberon (via AmigaOberon or OOC) adds:
|
||||
- **Type-bound procedures** (method-like dispatch but not virtual tables like C++)
|
||||
- **Garbage collection** (optional — generates GC safe-points in code)
|
||||
- **Type extension** (single inheritance without vtables — uses descriptor records)
|
||||
- **Dynamic arrays**: Runtime-allocated with descriptor blocks
|
||||
|
||||
**Detection**: Look for module init signatures, or the distinctive `M2_` / `OB_` runtime function prefixes. Modula-2 modules may use `.mod` or `.m2` file references. Oberon's type-bound procedures use descriptor-based dispatch (function pointer tables at known offsets, but not C++ vtables).
|
||||
|
||||
**Decompilation methodology**:
|
||||
<!-- TODO: Expand — module dependency graph from import/export tables, type-bound procedure dispatch reconstruction, coroutine context-switch analysis, GC root identification (Oberon) -->
|
||||
|
||||
---
|
||||
|
||||
### FORTH (JForth, HSForth, Yerk)
|
||||
|
||||
<!-- TODO: Expand — full FORTH reversing guide -->
|
||||
|
||||
FORTH on Amiga uses **threaded code** — the binary is a dictionary of word definitions, each consisting of:
|
||||
1. A header (link to previous word, name, flags)
|
||||
2. A code field (pointer to native code for primitives, or to the inner interpreter for colon definitions)
|
||||
3. A parameter field (list of execution tokens for colon definitions)
|
||||
|
||||
**Detection**: Look for the FORTH inner interpreter (`NEXT` routine) — typically:
|
||||
```asm
|
||||
; NEXT: (ip) → W, advance ip, jump to (W)
|
||||
NEXT:
|
||||
MOVE.L (A4)+, A5 ; load next execution token from IP, advance
|
||||
MOVE.L (A5), A6 ; load code field address
|
||||
JMP (A6) ; execute it (threaded dispatch)
|
||||
```
|
||||
|
||||
Additional detection heuristics:
|
||||
- `LIT` routine (pushes inline literal to data stack)
|
||||
- `EXIT` routine (pops return stack to IP — `MOVE.L (A3)+, A4` / `JMP (A4)`)
|
||||
- Stack pointer registers: A3=data stack, A4=return stack in JForth convention
|
||||
- Dictionary header format: link field → name length byte + name characters → code field → parameter field
|
||||
|
||||
**Decompilation methodology**:
|
||||
<!-- TODO: Expand — dictionary walk algorithm, word name extraction, colon definition decompilation (expand execution token list to word names), primitive word catalog, vocabulary/wordlist traversal -->
|
||||
|
||||
**Key Challenge**: FORTH code is a data structure, not a call graph. Standard disassemblers see only the inner interpreter loop; the actual program logic is in the dictionary data, which IDA treats as data words, not code. You must write a custom dictionary walker.
|
||||
|
||||
**FORTH dialects on Amiga**:
|
||||
<!-- TODO: JForth vs HSForth vs Yerk vs 4th — dictionary format differences, threading model (indirect vs direct vs subroutine), primitive sets -->
|
||||
|
||||
---
|
||||
|
||||
### ARexx
|
||||
|
||||
<!-- TODO: Expand — ARexx script format, tokenized vs source, rexxsyslib.library host interface, ARexx macro extraction from applications -->
|
||||
|
||||
**Detection**: ARexx scripts are plain text (`.rexx`) or tokenized by the host application. The `rexxsyslib.library` provides the interpreter. Scripts are typically found as external files, not embedded in HUNK binaries, but macros are often stored as resources inside applications.
|
||||
|
||||
**Decompilation methodology**:
|
||||
<!-- TODO: Detokenization (if tokenized), macro extraction from application resources, ARexx command host identification -->
|
||||
|
||||
---
|
||||
|
||||
### Pascal (HiSoft Pascal, Maxon Pascal)
|
||||
|
||||
<!-- TODO: Expand — HiSoft Pascal compiled output analysis, Maxon Pascal OOP extensions -->
|
||||
|
||||
HiSoft Pascal generates native HUNK code with:
|
||||
- **String descriptors**: Pascal strings are length-prefixed with a descriptor (length byte + characters)
|
||||
- **Set operations**: Bit-field operations for Pascal SET types (up to 256 elements)
|
||||
- **Nested procedures**: Static link chain for accessing outer procedure locals — generates `LINK A6` chains unlike C
|
||||
- **Runtime checks**: Array bounds, subrange, NIL pointer checks generate conditional trap code
|
||||
|
||||
Maxon Pascal adds OOP extensions with:
|
||||
- **Object type dispatch**: Not C++ vtables; uses method lookup tables with different layout
|
||||
- **Module system**: Similar to Modula-2 with explicit IMPORT/EXPORT
|
||||
|
||||
**Detection**: Look for Pascal string descriptor patterns, nested procedure static links (`MOVE.L (A6), A6` chains), runtime check trap sequences, and Pascal-specific runtime library calls.
|
||||
|
||||
**Key Challenge**: Pascal nested procedures create non-standard call graphs where a procedure has access to its enclosing procedure's local variables via the static link. IDA/Ghidra don't natively understand this — you must trace the static link chain manually.
|
||||
|
||||
---
|
||||
|
||||
### CanDo / AmigaVision
|
||||
|
||||
<!-- TODO: Expand — visual programming tools, deck/card metaphor, flow chart bytecode -->
|
||||
|
||||
**CanDo**: A hypermedia authoring system using a "deck of cards" metaphor. Programs are stored as "deck" files containing:
|
||||
- Card descriptors (background, size, color)
|
||||
- Button/field objects with attached scripts
|
||||
- Script bytecode (proprietary CanDo scripting language)
|
||||
- Embedded media (IFF ILBM, 8SVX)
|
||||
|
||||
**AmigaVision**: A flowchart-based multimedia authoring tool. Programs are flow charts where nodes are actions (display image, play sound, wait, branch). Stored in a custom format with:
|
||||
- Flow chart structure (node connections)
|
||||
- Node type identifiers
|
||||
- Parameter data per node
|
||||
- Embedded media references
|
||||
|
||||
**Detection**: CanDo deck files have a recognizable header; AmigaVision flow files have a node count + edge table structure. Both reference `cando.library` or the AmigaVision runtime.
|
||||
|
||||
**Key Challenge**: These are not traditional programming languages — reversing them means understanding the runtime engine's interpretation of the data structure, not disassembling code. The logic is in the data, similar to FORTH but at a much higher abstraction level.
|
||||
|
||||
---
|
||||
|
||||
## Decision Guide: Identifying the Language
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
BIN["Unknown binary"]
|
||||
HUNK{"Standard HUNK<br/>header?"}
|
||||
INTERP_STR{"'AMOS' or 'AMOSPro'<br/>string present?"}
|
||||
BLITZ_LIB{"blitzlib.library<br/>OpenLibrary call?"}
|
||||
E_RUNTIME{"ec.runtime or E_GC<br/>calls present?"}
|
||||
FORTH_NEXT{"FORTH NEXT<br/>threaded interpreter?"}
|
||||
M2_INIT{"Module init/final<br/>export tables?"}
|
||||
PASCAL_STR{"Pascal string<br/>descriptors?"}
|
||||
REX_HEADER{"ARexx script<br/>header?"}
|
||||
CANDO_HDR{"CanDo deck<br/>magic bytes?"}
|
||||
|
||||
BIN --> HUNK
|
||||
HUNK -->|"No"| REX_HEADER
|
||||
REX_HEADER -->|"Yes"| AREXX_RESULT["ARexx script<br/>→ detokenize or read as text"]
|
||||
REX_HEADER -->|"No"| CANDO_HDR
|
||||
CANDO_HDR -->|"Yes"| CANDO_RESULT["CanDo deck<br/>→ card/button extraction"]
|
||||
CANDO_HDR -->|"No"| AMOS_CUSTOM["Check for AMOS<br/>tokenized header"]
|
||||
HUNK -->|"Yes"| INTERP_STR
|
||||
INTERP_STR -->|"Yes"| AMOS_RESULT["AMOS/AMOSPro<br/>→ tokenized bytecode"]
|
||||
INTERP_STR -->|"No"| BLITZ_LIB
|
||||
BLITZ_LIB -->|"Yes"| BLITZ_RESULT["Blitz Basic 2<br/>→ decompile BlitzLib calls"]
|
||||
BLITZ_LIB -->|"No"| E_RUNTIME
|
||||
E_RUNTIME -->|"Yes"| E_RESULT["Amiga E<br/>→ ec.runtime analysis"]
|
||||
E_RUNTIME -->|"No"| FORTH_NEXT
|
||||
FORTH_NEXT -->|"Yes"| FORTH_RESULT["FORTH<br/>→ dictionary walk"]
|
||||
FORTH_NEXT -->|"No"| M2_INIT
|
||||
M2_INIT -->|"Yes"| M2_RESULT["Modula-2 / Oberon"]
|
||||
M2_INIT -->|"No"| PASCAL_STR
|
||||
PASCAL_STR -->|"Yes"| PASCAL_RESULT["HiSoft/Maxon Pascal"]
|
||||
PASCAL_STR -->|"No"| C_ASM["Likely C or ASM<br/>→ see other guides"]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices — Cross-Language RE
|
||||
|
||||
<!-- TODO: Numbered list -->
|
||||
|
||||
1. **Identify the language before disassembling** — each language has a fundamentally different binary architecture
|
||||
2. **Don't treat tokenized binaries as native code** — they will confuse every standard tool
|
||||
3. **For native-code languages, find the runtime library first** — it tells you the language and provides anchor xrefs
|
||||
4. **FORTH requires custom tooling** — standard disassemblers cannot handle threaded code
|
||||
5. **BASIC compilers leave recognizable patterns** — library calls, loop structures, string handling distinguish them from C
|
||||
6. **Check for embedded media** — AMOS and Blitz binaries often contain IFF chunks (ILBM, 8SVX) that confirm the language
|
||||
7. **String format is a strong differentiator** — C strings (null-terminated) vs Pascal strings (length-prefixed) vs E strings (custom format) vs FORTH counted strings
|
||||
8. **ARexx macros are often plain text** — check the binary's resources for readable script text before disassembling
|
||||
9. **Mixed-language programs exist** — C core + ARexx scripting + asm hot paths; analyze each section with the appropriate methodology
|
||||
10. **Build language-specific IDA/Ghidra loaders** — for tokenized/threaded formats, a custom loader that pre-processes the binary saves enormous time
|
||||
|
||||
---
|
||||
|
||||
## Named Antipatterns
|
||||
|
||||
### 1. The C Disassembler Default
|
||||
|
||||
**Wrong**: Loading a Blitz Basic or AMOS binary into IDA and treating it like C.
|
||||
|
||||
**Why**: The token stream or interpreter loop looks like garbled code to a standard disassembler. You'll waste hours trying to make sense of what is fundamentally not native code.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 2. The Missing Interpreter
|
||||
|
||||
**Wrong**: Disassembling only the HUNK code and ignoring the embedded runtime interpreter.
|
||||
|
||||
**Why**: AMOS tokenized programs carry a chunk of the interpreter or reference `AMOS.library`. Without understanding the opcode dispatch loop, the token stream is opaque.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 3. The FORTH Data Wall
|
||||
|
||||
**Wrong**: Running standard recursive-descent disassembly on a FORTH binary.
|
||||
|
||||
**Why**: FORTH dictionaries are data structures, not call graphs. Standard disassembly produces one huge function (the `NEXT` loop) and treats the entire dictionary as data bytes.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 4. The BASIC Loop Assumption
|
||||
|
||||
**Wrong**: Seeing `DBRA D0, loop` and assuming it's a C `for` loop.
|
||||
|
||||
**Why**: Blitz Basic generates `DBRA` for `For...Next` loops, but the loop variable semantics differ — BASIC loops may have different termination conditions and the counter may be used differently than in C.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 5. The String Format Switcheroo
|
||||
|
||||
**Wrong**: Assuming all strings are null-terminated C strings.
|
||||
|
||||
**Why**: Pascal uses length-prefixed strings (1-byte length + chars). Amiga E uses a custom string format with GC metadata. FORTH uses counted strings. Reading past the actual string boundary produces garbage.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 6. The Garbage Collector Blindness
|
||||
|
||||
**Wrong**: Ignoring `E_GC()` calls as irrelevant runtime noise.
|
||||
|
||||
**Why**: Amiga E's garbage collector roots determine which objects survive. Missing a GC root means you misunderstand object lifetimes and may think objects are leaked when they're actually reachable.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 7. The Module Boundary Erasure
|
||||
|
||||
**Wrong**: Analyzing Modula-2 or E code without understanding module boundaries.
|
||||
|
||||
**Why**: Module `_init`/`_final` pairs and import/export tables define the program's dependency graph. Treating all functions as a flat namespace loses the architectural structure.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
### 8. The ARexx-as-C Mistake
|
||||
|
||||
**Wrong**: Loading an ARexx script file into a hex editor and trying to find machine code.
|
||||
|
||||
**Why**: ARexx is a scripting language. The "binary" may be plain text with a `/*` comment header containing ARexx script. Running it through a disassembler produces garbage.
|
||||
|
||||
<!-- TODO: Add bad/good code pair -->
|
||||
|
||||
---
|
||||
|
||||
## Pitfalls
|
||||
|
||||
### 1. Mixed-Language Binaries
|
||||
|
||||
<!-- TODO: Add worked example -->
|
||||
|
||||
Many Amiga applications mix languages: C for the core + ARexx for scripting + assembler for performance-critical routines. A single binary may contain multiple RE challenges requiring different methodologies.
|
||||
|
||||
### 2. Custom Token Formats
|
||||
|
||||
<!-- TODO: Add worked example -->
|
||||
|
||||
Some Blitz Basic variants allow inline assembly, which gets embedded as raw m68k opcodes within the token stream. A pure token decompiler will fail at these boundaries.
|
||||
|
||||
### 3. Version-Specific Runtimes
|
||||
|
||||
<!-- TODO: Add worked example -->
|
||||
|
||||
AMOS 1.3, AMOS Professional, and AMOS Compiler use different token encodings. Blitz Basic 2 and AmiBlitz have different runtime library versions. Always identify the exact language version before decompiling.
|
||||
|
||||
### 4. FORTH Dialect Variance
|
||||
|
||||
<!-- TODO: JForth, HSForth, Yerk have different dictionary formats, threading models, and primitive sets. A dictionary walker for one may fail on another. -->
|
||||
|
||||
### 5. Pascal Static Link Chain Complexity
|
||||
|
||||
<!-- TODO: Nested procedures create non-trivial stack frames where A6 chains through enclosing scopes. Missing a static link distorts the entire variable access model. -->
|
||||
|
||||
### 6. Amiga E GC Root Misidentification
|
||||
|
||||
<!-- TODO: E's garbage collector uses conservative stack scanning. Objects that look unreferenced in static analysis may be held by registers at GC time. -->
|
||||
|
||||
### 7. Tokenized BASIC Extension Opcodes
|
||||
|
||||
<!-- TODO: Extensions add opcodes to the token set. Without the extension mapping, entire sections of the token stream become unreadable. -->
|
||||
|
||||
### 8. ARexx Host Command Context
|
||||
|
||||
<!-- TODO: ARexx scripts can send commands to any application. Without knowing the host application, the command set is unknown. The host is identified by the script's "ADDRESS" statement or launch context. -->
|
||||
|
||||
### 9. CanDo / AmigaVision Data-Driven Logic
|
||||
|
||||
<!-- TODO: These are not code — they're data describing visual layouts and event responses. Standard RE tools don't apply; you need format-specific parsers. -->
|
||||
|
||||
### 10. Oberon Type-Bound Procedure vs C++ vtable Confusion
|
||||
|
||||
<!-- TODO: Oberon's type-bound procedures use descriptor-based dispatch, not vtables. Applying C++ vtable analysis to Oberon binaries produces incorrect hierarchies. -->
|
||||
|
||||
---
|
||||
|
||||
## Use-Case Cookbook
|
||||
|
||||
### Pattern 1: Identifying AMOS Tokenized Binaries Programmatically
|
||||
|
||||
<!-- TODO: Step-by-step — scan for AMOS magic bytes, parse bank header, extract token stream, map opcodes. Python scanner script. -->
|
||||
|
||||
### Pattern 2: Walking a FORTH Dictionary
|
||||
|
||||
<!-- TODO: Step-by-step — identify link field format, walk from latest definition to oldest, extract word names, classify as primitive/colon/constant/variable. Python dictionary walker script. -->
|
||||
|
||||
### Pattern 3: Decompiling Blitz Basic Library Calls to Source Patterns
|
||||
|
||||
<!-- TODO: Step-by-step — map BB_ runtime calls to Blitz statements, recognize For/Next DBRA patterns, reconstruct Select/Case from jump tables, recover Type/End Type struct layouts. -->
|
||||
|
||||
### Pattern 4: Reconstructing Amiga E Module Dependencies
|
||||
|
||||
<!-- TODO: Step-by-step — parse .m export table format, trace IMPORT references, build module dependency graph. Python dependency analyzer. -->
|
||||
|
||||
### Pattern 5: Extracting ARexx Macros from Application Binaries
|
||||
|
||||
<!-- TODO: Step-by-step — locate macro strings in DATA/BSS hunks, identify ARexx script boundaries, handle tokenized vs source format, map macro names to host commands. -->
|
||||
|
||||
### Pattern 6: Decompiling HiSoft Pascal to Source
|
||||
|
||||
<!-- TODO: Step-by-step — recover nested procedure scopes from static link chains, map Pascal string descriptors, identify SET operations, reconstruct WITH statement scoping. -->
|
||||
|
||||
### Pattern 7: Recovering an AMOS Sprite Bank from a Tokenized Binary
|
||||
|
||||
<!-- TODO: Step-by-step — identify bank headers, locate IFF ILBM chunks, extract sprite data, reconstruct AMOS sprite format (hotspot, bitplanes, masks). -->
|
||||
|
||||
### Pattern 8: Mapping a FORTH Program's Control Flow
|
||||
|
||||
<!-- TODO: Step-by-step — decompile colon definitions to execution token lists, resolve IF/ELSE/THEN from branch tokens, reconstruct BEGIN/UNTIL/WHILE/REPEAT loops, identify DO/LOOP structures. -->
|
||||
|
||||
### Pattern 9: Reconstructing an Oberon Type Hierarchy
|
||||
|
||||
<!-- TODO: Step-by-step — parse type descriptors, trace type-bound procedure tables, identify type extension records, build inheritance diagram. -->
|
||||
|
||||
### Pattern 10: Identifying Language from an Unknown Binary (Blind Triage)
|
||||
|
||||
<!-- TODO: Step-by-step — systematic checklist: check magic bytes, scan for library strings, look for interpreter loops, test string formats, score each language hypothesis. -->
|
||||
|
||||
---
|
||||
|
||||
## Real-World Examples
|
||||
|
||||
<!-- TODO: Reference specific Amiga software for each language with documented RE findings -->
|
||||
|
||||
### AMOS
|
||||
|
||||
<!-- TODO: "Flight of the Amazon Queen" (AMOS), "Valhalla" series, shareware AMOS games — token extraction, bank recovery. -->
|
||||
|
||||
### Blitz Basic 2
|
||||
|
||||
<!-- TODO: "Skidmarks" / "Super Skidmarks" (Blitz), "Worms" (Blitz), "Gloom" (Blitz + asm) — BlitzLib call mapping. -->
|
||||
|
||||
### Amiga E
|
||||
|
||||
<!-- TODO: Amiga E compiler itself (self-hosting), "PortablE" cross-platform E, Amiga E games and utilities. -->
|
||||
|
||||
### FORTH
|
||||
|
||||
<!-- TODO: JForth development environment, HSForth applications, Yerk-based embedded systems. -->
|
||||
|
||||
### ARexx
|
||||
|
||||
<!-- TODO: ARexx scripts bundled with MUI applications, AmigaGuide ARexx integration, Directory Opus ARexx command set. -->
|
||||
|
||||
### Modula-2 / Oberon
|
||||
|
||||
<!-- TODO: ETH Oberon system on Amiga, Modula-2 shareware, AmigaOberon applications. -->
|
||||
|
||||
---
|
||||
|
||||
## Cross-Platform Comparison
|
||||
|
||||
| Platform | Equivalent Language | Amiga Parallel |
|
||||
|---|---|---|
|
||||
| **C64** | Simons' BASIC, COMAL | AMOS / Blitz Basic — tokenized BASIC with extended graphics |
|
||||
| **Atari ST** | ST BASIC, GFA BASIC | Blitz Basic 2 (similar compiled BASIC approach) |
|
||||
| **DOS** | QBasic, Turbo Basic | AMOS (tokenized), Blitz (compiled) |
|
||||
| **Mac OS** | HyperCard, FutureBASIC | AMOS (similar ease-of-use + graphics focus), CanDo (HyperCard analog) |
|
||||
| **Acorn Archimedes** | BBC BASIC V (ARM) | Blitz Basic 2 (fast compiled BASIC with inline asm) |
|
||||
| **Apple IIGS** | ORCA/Pascal, TML Pascal | HiSoft/Maxon Pascal (Wirth-family languages on 16-bit) |
|
||||
| **NeXT** | Objective-C | Amiga E (fast, modular, object-oriented with message-passing) |
|
||||
| **Windows 3.1** | Visual Basic, ToolBook | CanDo / AmigaVision (visual programming with scripting) |
|
||||
|
||||
---
|
||||
|
||||
## Historical Context — Why So Many Languages on Amiga?
|
||||
|
||||
The Amiga's open architecture and lack of a "blessed" development language created a uniquely diverse programming ecosystem:
|
||||
|
||||
| Factor | Effect |
|
||||
|---|---|
|
||||
| **No official language** | Unlike Mac (Object Pascal), DOS (Turbo C/QuickBasic), or ST (GFA BASIC), Commodore didn't push a specific development tool. C, assembly, BASIC, and others coexisted as equals. |
|
||||
| **Beginner accessibility** | AMOS (1990) and Blitz Basic 2 (1991) filled the gap for users who found C intimidating. AMOS alone produced hundreds of shareware games. |
|
||||
| **FORTH migration** | FORTH programmers carried their development culture from 8-bit machines (C64, Spectrum) to Amiga — JForth and HSForth were mature systems. |
|
||||
| **ARexx as system glue** | ARexx (1988) provided system-wide scripting that no other platform matched until AppleScript (1993). Any application could expose an ARexx port. |
|
||||
| **Educational influence** | Modula-2 and Oberon (from ETH Zürich, Niklaus Wirth) brought structured programming to Amiga. HiSoft Pascal ported the Mac's dominant educational language. |
|
||||
| **Multimedia authoring** | CanDo and AmigaVision made application creation accessible to non-programmers — precursors to tools like HyperCard and modern no-code platforms. |
|
||||
| **Late C++ arrival** | Without early C++ compilers, E (1993) filled the niche for a modern, object-oriented compiled language — it was what C++ should have been for Amiga. |
|
||||
|
||||
---
|
||||
|
||||
## Modern Analogies
|
||||
|
||||
<!-- TODO: Expand -->
|
||||
|
||||
| Amiga Language Concept | Modern Analogy | Where It Holds / Breaks |
|
||||
|---|---|---|
|
||||
| AMOS tokenized bytecode | Python `.pyc` bytecode | Holds: interpreted bytecode with embedded media; breaks: AMOS bytecode is undocumented, Python's is open |
|
||||
| Blitz Basic compiled output | Go (compiled, fast, runtime-linked) | Holds: compiled native code with runtime library; breaks: Blitz is tied to one platform |
|
||||
| Amiga E GC + modules | Go / D (GC + fast compilation) | Holds: modern compiled language with GC; breaks: E is single-threaded |
|
||||
| FORTH threaded code | WASM stack machine | Holds: stack-based execution model; breaks: FORTH is memory-mapped, WASM is sandboxed |
|
||||
| ARexx system scripting | AppleScript / VBA | Holds: system-wide IPC scripting; breaks: ARexx is string-based, AppleScript is object-based |
|
||||
| Oberon type-bound procedures | Go interfaces | Holds: non-inheritance-based polymorphism; breaks: Oberon uses explicit descriptors, Go uses implicit satisfaction |
|
||||
| CanDo hypermedia | HyperCard / Powerpoint VBA | Holds: card-based visual programming; breaks: CanDo is a standalone runtime |
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
### Q1: Which language should I learn to reverse Amiga software most effectively?
|
||||
|
||||
<!-- TODO -->
|
||||
|
||||
### Q2: Is there an automatic decompiler for AMOS or Blitz Basic?
|
||||
|
||||
<!-- TODO -->
|
||||
|
||||
### Q3: How do I tell if a binary is AMOS or Blitz without running it?
|
||||
|
||||
<!-- TODO: String scan for interpreter/library references, check for IFF chunks (AMOS) vs HUNK headers (Blitz), look for token stream vs native code in the first 1KB. -->
|
||||
|
||||
### Q4: Can IDA Pro decompile FORTH binaries?
|
||||
|
||||
<!-- TODO: No, not natively. You need a custom loader that walks the dictionary and creates named functions/words. Python scripts available. -->
|
||||
|
||||
### Q5: How do I extract embedded media from an AMOS program?
|
||||
|
||||
<!-- TODO: IFF chunk scanner, bank header parser, ILBM/8SVX extraction. Python extraction tool. -->
|
||||
|
||||
### Q6: What's the difference between JForth and HSForth binary formats?
|
||||
|
||||
<!-- TODO: Dictionary header layout, threading model (indirect vs direct threaded), primitive word set, stack register conventions. -->
|
||||
|
||||
### Q7: How do I handle Amiga E binaries that mix E and C code?
|
||||
|
||||
<!-- TODO: Module boundary identification, calling convention differences, E_GC safe-point recognition in mixed code. -->
|
||||
|
||||
### Q8: Can ARexx macros be tokenized? How do I detokenize them?
|
||||
|
||||
<!-- TODO: Yes, by the host application. Detokenization requires the host's token table. Some hosts use standard rexxsyslib tokenization. -->
|
||||
|
||||
### Q9: How do I distinguish between Modula-2 and Oberon binaries?
|
||||
|
||||
<!-- TODO: Type-bound procedure dispatch (Oberon) vs module-only structure (Modula-2), GC safe-points (Oberon), coroutine patterns (Modula-2). -->
|
||||
|
||||
### Q10: Is there a Ghidra plugin for any of these languages?
|
||||
|
||||
<!-- TODO: Current state of Ghidra support for non-C Amiga languages. Most require custom scripts. -->
|
||||
|
||||
### Q11: How do I decompile a Blitz Basic program that uses inline assembly?
|
||||
|
||||
<!-- TODO: Inline asm boundaries are marked by BB library calls before/after; the asm block is pure m68k — treat that section with asm RE methodology. -->
|
||||
|
||||
### Q12: What are the most common non-C languages found in Amiga games?
|
||||
|
||||
<!-- TODO: Blitz Basic 2 (most common), AMOS (second), E (rare), FORTH (embedded systems), custom bytecode engines (common in adventure games). -->
|
||||
|
||||
### Q13: How do I reverse engineer a CanDo deck file?
|
||||
|
||||
<!-- TODO: Deck header format, card descriptor structure, button/field records, script bytecode mapping, embedded media extraction. -->
|
||||
|
||||
### Q14: What tools exist for batch language identification of unknown Amiga binaries?
|
||||
|
||||
<!-- TODO: Custom Python scanner that checks: HUNK header, library strings, interpreter patterns, IFF chunks, FORTH dictionary, Pascal descriptors — produces a language confidence score. -->
|
||||
|
||||
---
|
||||
|
||||
## FPGA / Emulation Impact
|
||||
|
||||
<!-- TODO: Expand — tokenized BASIC interpreters are timing-independent (no custom chip banging); compiled BASIC (Blitz) may hit hardware registers; FORTH execution speed depends on threading model (indirect threaded is slower); Amiga E GC pauses may cause timing issues in real-time code. -->
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [asm68k_binaries.md](asm68k_binaries.md) — Hand-written assembly reverse engineering
|
||||
- [ansi_c_reversing.md](ansi_c_reversing.md) — C binary reverse engineering
|
||||
- [cpp_vtables_reversing.md](cpp_vtables_reversing.md) — C++ OOP reverse engineering
|
||||
- [compiler_fingerprints.md](../compiler_fingerprints.md) — Compiler identification
|
||||
- [m68k_codegen_patterns.md](m68k_codegen_patterns.md) — Code generation patterns
|
||||
- [api_call_identification.md](api_call_identification.md) — Library call recognition
|
||||
- [hunk_reconstruction.md](hunk_reconstruction.md) — HUNK binary reconstruction
|
||||
- [rexxsyslib.md](../../11_libraries/rexxsyslib.md) — ARexx library internals
|
||||
- *AMOS Professional Manual* — François Lionet, Europress Software
|
||||
- *Blitz Basic 2 Manual* — Mark Sibly, Acid Software
|
||||
- *Amiga E Manual* — Wouter van Oortmerssen
|
||||
- *JForth Manual* — Delta Research
|
||||
- *HiSoft Pascal Manual* — HiSoft
|
||||
- *CanDo User Manual* — INOVAtronics
|
||||
- [Amiga E Compiler Source](https://github.com/Amiga-E/ec) — Open-source `ec` compiler
|
||||
163
05_reversing/unpacking_and_decrunching.md
Normal file
163
05_reversing/unpacking_and_decrunching.md
Normal file
|
|
@ -0,0 +1,163 @@
|
|||
[← Home](../README.md) · [Reverse Engineering](README.md)
|
||||
|
||||
# Executable Unpacking — Decrunching and Memory Extraction
|
||||
|
||||
> [!NOTE]
|
||||
> This article details how to reverse-engineer compressed Amiga binaries by extracting the underlying code from memory. See [Reverse Engineering](README.md) for the broader static analysis methodology.
|
||||
|
||||
## Overview
|
||||
|
||||
In the 1980s and 90s, disk space and RAM were brutally constrained. Commercial software, cracktros, and demos rarely shipped as standard AmigaOS HUNK executables. Instead, they were heavily compressed — "crunched" — to fit on an 880 KB floppy disk and load faster from slow media.
|
||||
|
||||
This was the problem executable compression solved. A packer (like ByteKiller, Shrinkler, or Imploder) would compress the original code into a payload, wrap it in a tiny "decruncher stub", and output a new executable. When run, the stub allocates memory, decompresses the payload, and jumps to the Original Entry Point (OEP) of the software.
|
||||
|
||||
For reverse engineers, this is an obstacle. Standard static analysis tools like IDA Pro cannot disassemble compressed entropy; they only see the decruncher stub. To analyze the actual software, you must unpack it. This involves either using automated tools to recognize known compression formats, or manually intercepting execution immediately after the decruncher finishes, and dumping the pristine code from memory to disk.
|
||||
|
||||
---
|
||||
|
||||
## The Decruncher Architecture
|
||||
|
||||
Executable compression typically follows a standard two-phase lifecycle during execution.
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
A[Disk: Packed Executable] --> B[OS Loader: LoadSeg]
|
||||
B --> C[Memory: Decruncher Stub + Payload]
|
||||
C -->|1. Execution starts at Stub| D[Stub: Allocates Temp Memory]
|
||||
D --> E[Stub: Decompresses Payload]
|
||||
E --> F[Memory: Decrunched Original Code]
|
||||
F -->|2. Stub Jumps to OEP| G[Execution of Original Software]
|
||||
|
||||
style A fill:#f9f9f9,stroke:#333,stroke-width:2px
|
||||
style C fill:#e8f4fd,stroke:#2196f3,stroke-width:2px
|
||||
style G fill:#fff9c4,stroke:#f9a825,stroke-width:2px
|
||||
```
|
||||
|
||||
1. **Load Phase**: AmigaOS's `LoadSeg()` loads the executable. Because the file is packed, only the decruncher stub and the packed data payload are loaded into memory.
|
||||
2. **Execution Phase**: The OS jumps to the first instruction of the decruncher. The decruncher allocates necessary memory (often via `AllocMem()`), unpacks the data in a tight loop, and then executes a `JMP` to the Original Entry Point.
|
||||
|
||||
---
|
||||
|
||||
## Automated Unpacking Tools
|
||||
|
||||
Before attempting manual unpacking, always try to use an automated tool. If the software was compressed with a standard, recognized packer, these tools can automatically extract the original binary.
|
||||
|
||||
### xfdmaster.library
|
||||
|
||||
The standard Amiga solution for automated decrunching is `xfdmaster.library`. It provides a unified API for identifying and unpacking dozens of historical compression formats.
|
||||
|
||||
- **When to use**: Standard packers like PowerPacker, Imploder, or ByteKiller.
|
||||
- **Tools**: Command-line utilities like `xfdDecrunch` (available on Aminet) utilize this library to unpack files directly on the Amiga or via emulator.
|
||||
|
||||
### Unpacker.library
|
||||
|
||||
An alternative to `xfdmaster`, `Unpacker.library` provides similar functionality for recognizing and extracting packed executables.
|
||||
|
||||
---
|
||||
|
||||
## Manual Unpacking Methodology
|
||||
|
||||
When dealing with a custom packer, an unknown format, or heavily obfuscated code (e.g., custom cracktros or copy protection), automated tools will fail. You must unpack the code manually by letting the Amiga do the work, then freezing it at the exact moment decompression finishes.
|
||||
|
||||
> [!IMPORTANT]
|
||||
> The Motorola 68000 is a **Big-Endian** architecture. When inspecting memory dumps in a hex editor, remember that longwords are stored most-significant-byte first.
|
||||
|
||||
### Step 1: Locate the Jump to OEP
|
||||
|
||||
The goal is to find the exact instruction where the decruncher yields control to the original software. This is almost always a `JMP (An)` or a `JSR` at the very end of the decruncher loop.
|
||||
|
||||
1. Open the packed executable in a disassembler (e.g., IDA Pro, ReSource, or IRA).
|
||||
2. Look for the decompression loop — a tight loop doing continuous memory writes and bit-shifting.
|
||||
3. Immediately following the loop, find the jump instruction transferring control.
|
||||
|
||||
```asm
|
||||
; Typical Decruncher End
|
||||
move.l (a7)+,d0 ; Restore registers
|
||||
movea.l (a7)+,a0
|
||||
jmp (a0) ; Jump to Original Entry Point (OEP)!
|
||||
```
|
||||
|
||||
### Step 2: Intercept Execution
|
||||
|
||||
You must run the executable, but prevent it from executing the original software so you can dump the memory safely.
|
||||
|
||||
**Using an Assembler/Debugger (e.g., AsmOne or Seka):**
|
||||
1. Load the executable into memory using the assembler.
|
||||
2. Place a breakpoint precisely on the `JMP (An)` instruction identified in Step 1.
|
||||
3. Run the code. The decruncher will execute, decompress the payload into RAM, and then halt exactly before executing it.
|
||||
|
||||
### Step 3: Dump to Disk
|
||||
|
||||
With execution halted, the decrunched payload is now sitting in RAM in plain text (68k opcodes).
|
||||
|
||||
1. Identify the memory range of the decrunched payload. This is usually tracked by the address registers used in the decompression loop (e.g., the destination pointer).
|
||||
2. Use the assembler/debugger's save command to write that memory range directly to a file on disk.
|
||||
|
||||
```
|
||||
; AsmOne memory save command example
|
||||
> S "RAM:unpacked.bin" $20000 $35000
|
||||
```
|
||||
|
||||
### Step 4: Reassembly and Analysis
|
||||
|
||||
The resulting `unpacked.bin` is a raw memory dump. It is no longer an AmigaOS HUNK executable; the HUNK headers have been stripped away by the OS loader and the decruncher.
|
||||
|
||||
To analyze it:
|
||||
1. Load the `.bin` file into IDA Pro as a raw binary blob.
|
||||
2. Set the base address to match where it was loaded in Amiga memory (if the code is not position-independent).
|
||||
3. Begin static analysis at the OEP.
|
||||
|
||||
---
|
||||
|
||||
## Pitfalls & Common Mistakes
|
||||
|
||||
### 1. In-Place Decrunching Corruption
|
||||
|
||||
Some decrunchers attempt to save memory by unpacking the data "in-place" — overwriting the packed data with the unpacked data as they go.
|
||||
|
||||
**The Problem:** If the unpacked data is larger than the packed data (which it always is), the write pointer will eventually overtake the read pointer, corrupting the unread packed data.
|
||||
|
||||
**Why it fails:** The decruncher calculates an exact offset to start unpacking from the *end* of the file backwards to prevent this, but manual interference or memory layout shifts can break this delicate balance.
|
||||
|
||||
```c
|
||||
/* BAD: Forward in-place decrunching */
|
||||
void decrunch(char *src, char *dest) {
|
||||
while(size--) {
|
||||
*dest++ = unpack_byte(src++); /* Dest will overwrite Src eventually! */
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
```c
|
||||
/* GOOD: Backward in-place decrunching */
|
||||
void decrunch_safe(char *src_end, char *dest_end) {
|
||||
while(size--) {
|
||||
*--dest_end = unpack_byte(--src_end); /* Works backwards from end of buffer */
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Assuming Position Independence
|
||||
|
||||
A common mistake is dumping the raw memory, and then assuming it can be simply wrapped back into a HUNK header and executed anywhere.
|
||||
|
||||
**Why it fails:** AmigaOS HUNK executables rely on the OS loader to fix up absolute memory addresses (`HUNK_RELOC32`) based on where the program is loaded in RAM. When you dump the memory, the relocations have *already been applied* for that specific address. If you try to run the raw dump at a different address, every absolute `JSR` or `MOVE.L` will crash the machine.
|
||||
|
||||
**The Fix:** You must manually reconstruct the relocation table, or ensure the code is strictly position-independent (PC-relative), which is rare for large C/C++ applications.
|
||||
|
||||
---
|
||||
|
||||
## Summary Best Practices
|
||||
|
||||
1. **Always try automated tools first** — `xfdmaster.library` can save hours of manual reverse engineering.
|
||||
2. **Use a native debugger** — Tools like AsmOne running in an emulator are perfectly suited for intercepting execution.
|
||||
3. **Beware of relocations** — Raw memory dumps are locked to the address they were unpacked at; do not assume they are relocatable.
|
||||
4. **Identify the OEP precisely** — The `JMP (An)` at the end of the decruncher is the key to successful extraction.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [Reverse Engineering Methodology](methodology.md)
|
||||
- [Code vs Data Disambiguation](static/code_vs_data_disambiguation.md)
|
||||
- Aminet: `util/pack/xfdmaster`
|
||||
|
|
@ -16,6 +16,7 @@ Shared libraries beyond the core exec/dos/graphics/intuition subsystems. These p
|
|||
| [locale.md](locale.md) | Internationalization: catalog system (.cd/.ct files), locale-aware date/number formatting, character classification |
|
||||
| [keymap.md](keymap.md) | Keyboard mapping: raw-to-ASCII translation, KeyMap structure, dead keys, rawkey codes, national layouts |
|
||||
| [rexxsyslib.md](rexxsyslib.md) | ARexx scripting: hosting ARexx ports, command parsing, sending commands, return codes |
|
||||
| [arexx_integration.md](arexx_integration.md) | ARexx integration guide: exposing app features, dispatch tables, antipatterns, cookbook |
|
||||
| [mathffp.md](mathffp.md) | Motorola FFP and IEEE 754 floating point |
|
||||
| [layers.md](layers.md) | Window clipping: ClipRect engine, Simple/Smart/Super refresh, damage repair, backfill hooks, layer locking |
|
||||
| [diskfont.md](diskfont.md) | **Bitmap fonts deep dive: .font file format (FontContentsHeader), font descriptor files (DiskFontHeader), glyph bitmap layout, FONTS: assign, adding/installing fonts, bitmap vs TrueType/OpenType comparison, color fonts (OS 3.0+), Compugraphic outline fonts, AvailFonts enumeration, font loading pipeline** |
|
||||
|
|
|
|||
1279
11_libraries/arexx_integration.md
Normal file
1279
11_libraries/arexx_integration.md
Normal file
File diff suppressed because it is too large
Load diff
|
|
@ -173,4 +173,5 @@ QUIT
|
|||
|
||||
- NDK39: `rexx/storage.h`, `rexx/rxslib.h`
|
||||
- ADCD 2.1: rexxsyslib.library autodocs
|
||||
- [arexx_integration.md](arexx_integration.md) — Complete ARexx integration guide: dispatch tables, antipatterns, use-case cookbook
|
||||
- See also: [process_management.md](../07_dos/process_management.md) — process/task message ports
|
||||
|
|
|
|||
24
README.md
24
README.md
|
|
@ -119,8 +119,21 @@ The Amiga's documentation was scattered across out-of-print manuals, Usenet post
|
|||
|---|---|
|
||||
| [methodology.md](05_reversing/methodology.md) | General Amiga RE workflow |
|
||||
| [ida_setup.md](05_reversing/ida_setup.md) | IDA Pro setup for Amiga binaries |
|
||||
| [ghidra_setup.md](05_reversing/ghidra_setup.md) | Ghidra setup & decompilation |
|
||||
| [compiler_fingerprints.md](05_reversing/compiler_fingerprints.md) | SAS/C vs GCC vs VBCC codegen patterns |
|
||||
| [patching_techniques.md](05_reversing/patching_techniques.md) | Binary patching strategies |
|
||||
| [unpacking_and_decrunching.md](05_reversing/unpacking_and_decrunching.md) | Executable unpacking, decruncher architecture, and manual extraction |
|
||||
| [case_studies/ramdrive_device.md](05_reversing/case_studies/ramdrive_device.md) | Case Study: ramdrive.device RE walkthrough |
|
||||
|
||||
| Per-Compiler RE Field Manuals | Topic |
|
||||
|---|---|
|
||||
| [compilers/sasc.md](05_reversing/static/compilers/sasc.md) | SAS/C 5.x/6.x: LINK A5 + 9-reg save, absolute strings, `_LibBase` globals |
|
||||
| [compilers/gcc.md](05_reversing/static/compilers/gcc.md) | GCC 2.95.x: `.text` hunk, A6 frame pointer, PC-relative strings, `__CTOR_LIST__` |
|
||||
| [compilers/vbcc.md](05_reversing/static/compilers/vbcc.md) | VBCC: No frame pointer, per-function saves, `__reg()`, `__MERGED` hunks |
|
||||
| [compilers/stormc.md](05_reversing/static/compilers/stormc.md) | StormC / StormC++: SAS/C-compatible C, unique C++ ABI, PPC support |
|
||||
| [compilers/aztec_c.md](05_reversing/static/compilers/aztec_c.md) | Manx Aztec C: D3-D7 save only (5 regs), D2 scratch, pre-1990 era |
|
||||
| [compilers/lattice_c.md](05_reversing/static/compilers/lattice_c.md) | Lattice C 3.x/4.x: SAS/C predecessor, simpler optimizer, 6-reg save |
|
||||
| [compilers/dice_c.md](05_reversing/static/compilers/dice_c.md) | DICE C: No frame pointer, `_mainCRTStartup`, fast compile speed |
|
||||
|
||||
| Static Analysis | Topic |
|
||||
|---|---|
|
||||
|
|
@ -130,6 +143,7 @@ The Amiga's documentation was scattered across out-of-print manuals, Usenet post
|
|||
| [m68k_codegen_patterns.md](05_reversing/static/m68k_codegen_patterns.md) | Compiler-specific assembly idioms |
|
||||
| [string_xref_analysis.md](05_reversing/static/string_xref_analysis.md) | String cross-reference hunting |
|
||||
| [struct_recovery.md](05_reversing/static/struct_recovery.md) | Recovering C structures from assembly |
|
||||
| [code_vs_data_disambiguation.md](05_reversing/static/code_vs_data_disambiguation.md) | Distinguishing code bytes from data/variables — Amiga-specific failure modes |
|
||||
|
||||
| Dynamic Analysis | Topic |
|
||||
|---|---|
|
||||
|
|
@ -138,6 +152,15 @@ The Amiga's documentation was scattered across out-of-print manuals, Usenet post
|
|||
| [live_memory_probing.md](05_reversing/dynamic/live_memory_probing.md) | Live memory inspection |
|
||||
| [serial_debug.md](05_reversing/dynamic/serial_debug.md) | Serial debug output |
|
||||
|
||||
| Language-Specific Analysis | Topic |
|
||||
|---|---|
|
||||
| [asm68k_binaries.md](05_reversing/static/asm68k_binaries.md) | ⚠️ Hand-written assembly reversing — demos, games, bootblocks |
|
||||
| [ansi_c_reversing.md](05_reversing/static/ansi_c_reversing.md) | ⚠️ ANSI C reversing — struct recovery, control flow, library anchoring |
|
||||
| [cpp_vtables_reversing.md](05_reversing/static/cpp_vtables_reversing.md) | ⚠️ C++ OOP reversing — vtables, inheritance, RTTI, name mangling |
|
||||
| [other_languages.md](05_reversing/static/other_languages.md) | ⚠️ Non-C languages — AMOS, Blitz Basic, Amiga E, Modula-2, FORTH |
|
||||
|
||||
| Case Studies | Topic |
|
||||
|---|---|
|
||||
| [ramdrive_device.md](05_reversing/case_studies/ramdrive_device.md) | RAM disk device driver RE |
|
||||
|
||||
### 06 — Exec Kernel (OS 3.1/3.2)
|
||||
|
|
@ -228,6 +251,7 @@ The Amiga's documentation was scattered across out-of-print manuals, Usenet post
|
|||
| [locale.md](11_libraries/locale.md) | Internationalization, catalogs |
|
||||
| [keymap.md](11_libraries/keymap.md) | Keyboard mapping, MapRawKey |
|
||||
| [rexxsyslib.md](11_libraries/rexxsyslib.md) | ARexx interface |
|
||||
| [arexx_integration.md](11_libraries/arexx_integration.md) | ARexx integration guide: exposing app features, dispatch, cookbook |
|
||||
| [mathffp.md](11_libraries/mathffp.md) | Floating point libraries, FFP, IEEE |
|
||||
| [layers.md](11_libraries/layers.md) | Window clipping layers |
|
||||
| [diskfont.md](11_libraries/diskfont.md) | **Bitmap fonts: .font file format, FontContentsHeader, glyph bitmap layout, FONTS: assign, adding fonts, bitmap vs TrueType, Compugraphic outline fonts** |
|
||||
|
|
|
|||
|
|
@ -130,7 +130,7 @@ Articles were scored against [AGENTS.md](../amiga/AGENTS.md) "Deep" criteria:
|
|||
| Article | Lines | Status | Notes |
|
||||
|---|---|---|---|
|
||||
| `exe_crunchers.md` | 617 | ✅ Deep | PowerPacker/Imploder/Shrinkler, decrunch stubs — exemplary article |
|
||||
| `hunk_format.md` | 536 | ✅ Deep | Complete HUNK binary specification, all 22 hunk type codes |
|
||||
| `hunk_format.md` | 583 | ✅ Deep | Complete HUNK binary specification, all 22 hunk type codes, debug format tags, bit masking |
|
||||
| `hunk_relocation.md` | 326 | ✅ Adequate | Relocation mechanics with visual before/after diagrams |
|
||||
| `overlay_system.md` | 311 | ✅ Adequate | HUNK_OVERLAY tree architecture, runtime manager |
|
||||
| `exe_load_pipeline.md` | 276 | ✅ Adequate | LoadSeg → relocation → segment chain → CreateProc |
|
||||
|
|
@ -161,17 +161,33 @@ Articles were scored against [AGENTS.md](../amiga/AGENTS.md) "Deep" criteria:
|
|||
| `methodology.md` | 217 | ✅ Deep | Step-by-step RE workflow: static → dynamic → reconstruction |
|
||||
| `ida_setup.md` | 190 | ✅ Adequate | IDA Pro config for 68k/Amiga: loaders, plugins |
|
||||
| `compiler_fingerprints.md` | 183 | ✅ Adequate | Compiler identification by code patterns |
|
||||
| `static/api_call_identification.md` | 296 | ⚠️ Thin | API call pattern recognition — needs expansion |
|
||||
| `static/m68k_codegen_patterns.md` | 156 | ⚠️ Thin | 68k code generation idiom catalog — needs expansion |
|
||||
| `static/library_jmp_table.md` | 142 | ⚠️ Thin | Library LVO table identification techniques — needs expansion |
|
||||
| `static/struct_recovery.md` | 137 | ⚠️ Thin | Struct layout reconstruction from disassembly — needs expansion |
|
||||
| `dynamic/live_memory_probing.md` | 132 | ⚠️ Thin | Runtime memory inspection techniques — needs expansion |
|
||||
| `dynamic/setfunction_patching.md` | 130 | ⚠️ Thin | Dynamic SetFunction interception — needs expansion |
|
||||
| `static/api_call_identification.md` | 532 | ✅ Deep | API call pattern recognition — complete with Mermaid, decision guide, antipatterns, cookbook |
|
||||
| `static/m68k_codegen_patterns.md` | 399 | ✅ Deep | 68k code generation idiom catalog — complete with StormC/Aztec, Mermaid flowchart, cookbook |
|
||||
| `static/library_jmp_table.md` | 381 | ✅ Deep | Library LVO table identification techniques — complete with third-party reconstruction, Python scripts |
|
||||
| `static/struct_recovery.md` | 278 | ✅ Deep | Struct layout reconstruction from disassembly — complete with IDA Python batch annotator |
|
||||
| `dynamic/live_memory_probing.md` | 263 | ✅ Deep | Runtime memory inspection techniques — complete with Mermaid, safe probing rules, cookbook |
|
||||
| `dynamic/setfunction_patching.md` | 323 | ✅ Deep | Dynamic SetFunction interception — complete with before/after Mermaid, trampoline patterns, cookbook |
|
||||
| `case_studies/ramdrive_device.md` | 129 | ✅ Deep | Real-world RE walkthrough: RAMDrive reverse engineering |
|
||||
| `static/hunk_reconstruction.md` | 123 | ⚠️ Thin | HUNK binary reconstruction from memory — needs expansion |
|
||||
| `static/string_xref_analysis.md` | 123 | ⚠️ Thin | String cross-reference analysis — needs expansion |
|
||||
| `dynamic/serial_debug.md` | 118 | ⚠️ Thin | Serial debug output techniques — needs expansion |
|
||||
| `dynamic/enforcer_mungwall.md` | 110 | ⚠️ Thin | Enforcer/MungWall runtime error detection — needs expansion |
|
||||
| `static/hunk_reconstruction.md` | 247 | ✅ Deep | HUNK binary reconstruction from memory — complete with antipatterns, Python extraction scripts |
|
||||
| `static/string_xref_analysis.md` | 258 | ✅ Deep | String cross-reference analysis — complete with Mermaid, library mapping cookbook, Pascal string handling |
|
||||
| `dynamic/serial_debug.md` | 178 | ✅ Deep | Serial debug output techniques — complete with Mermaid, baud rate pitfalls, host-side capture |
|
||||
| `dynamic/enforcer_mungwall.md` | 215 | ✅ Deep | Enforcer/MungWall runtime error detection — complete with decision guide, antipatterns, cross-platform comparison |
|
||||
| `static/asm68k_binaries.md` | 924 | ⚠️ Adequate | Hand-written assembly RE: demos, games, bootblocks, hardware-banging code — substantial research content added |
|
||||
| `static/ansi_c_reversing.md` | 603 | ⚠️ Adequate | ANSI C RE: struct recovery, control flow reconstruction, library anchoring — BPTR + SAS/C convention details added |
|
||||
| `static/cpp_vtables_reversing.md` | 745 | ⚠️ Adequate | C++ OOP RE: vtables, inheritance hierarchies, RTTI, name mangling — GCC vtable layout + C++ ABI details added |
|
||||
| `static/other_languages.md` | 679 | ⚠️ Adequate | Non-C languages: AMOS, Blitz Basic, Amiga E, Modula-2, FORTH, ARexx — JForth corrected + BlitzLib table + E object layout added |
|
||||
|
||||
| Per-Compiler RE | Lines | Status | Notes |
|
||||
|---|---|---|---|
|
||||
| `static/compilers/README.md` | 102 | ✅ Adequate | Compiler identification flowchart and comparison matrix |
|
||||
| `static/compilers/sasc.md` | 1006 | ✅ Adequate | SAS/C 5.x/6.x: LINK A5 + 9-reg save, all 4 calling conventions, register vs stack allocation, call-site patterns, IDA Python detection |
|
||||
| `static/compilers/gcc.md` | 742 | ✅ Adequate | GCC 2.95.x: `.text` hunk, A6 frame pointer, `__CTOR_LIST__`, tail-call optimization, libnix startup, register allocation/stack variable identification |
|
||||
| `static/compilers/vbcc.md` | 327 | ✅ Adequate | VBCC: No frame pointer, per-function saves, `__reg()`, cross-module optimization, `__MERGED` hunks |
|
||||
| `static/compilers/stormc.md` | 321 | ✅ Adequate | StormC / StormC++: SAS/C-compatible C, unique C++ ABI, vtable layout differences, PowerPC support |
|
||||
| `static/compilers/aztec_c.md` | 125 | ✅ Adequate | Manx Aztec C: D3-D7 save only (5 regs), D2 scratch, pre-1990 era |
|
||||
| `static/compilers/lattice_c.md` | 153 | ✅ Adequate | Lattice C 3.x/4.x: SAS/C predecessor, evolutionary markers, simpler optimizer |
|
||||
| `static/compilers/dice_c.md` | 135 | ✅ Adequate | DICE C: No frame pointer, `_mainCRTStartup`, fast compile speed, Matt Dillon's compiler |
|
||||
| `static/code_vs_data_disambiguation.md` | 698 | ✅ Adequate | Code vs data disambiguation: IDA/Ghidra workflows, Amiga failure modes, detection scripts |
|
||||
|
||||
### 06 — Exec Kernel (exec.library)
|
||||
|
||||
|
|
@ -291,6 +307,7 @@ Articles were scored against [AGENTS.md](../amiga/AGENTS.md) "Deep" criteria:
|
|||
| `workbench.md` | 194 | ✅ Adequate | WBStartup, AppWindow, AppIcon, AppMenuItem |
|
||||
| `icon.md` | 188 | ✅ Adequate | .info format, DiskObject, ToolTypes, true-color icons |
|
||||
| `rexxsyslib.md` | 176 | ✅ Adequate | ARexx hosting, command parsing, return codes |
|
||||
| `arexx_integration.md` | 1128 | ✅ Adequate | Complete ARexx integration guide: dispatch tables, 6 antipatterns, use-case cookbook, event loop integration |
|
||||
| `keymap.md` | 162 | ✅ Adequate | Raw-to-ASCII, KeyMap structure, dead keys |
|
||||
|
||||
### 12 — Networking
|
||||
|
|
@ -350,14 +367,14 @@ Articles were scored against [AGENTS.md](../amiga/AGENTS.md) "Deep" criteria:
|
|||
|
||||
| Status | Count |
|
||||
|---|---|
|
||||
| ✅ Deep | 46 |
|
||||
| ✅ Adequate | 99 |
|
||||
| ⚠️ Thin | 10 |
|
||||
| ✅ Deep | 56 |
|
||||
| ✅ Adequate | 113 |
|
||||
| ⚠️ Thin | 0 |
|
||||
| ❌ Pending (Tier 3) | 10 |
|
||||
| **Total** | **165** |
|
||||
| **Total** | **179** |
|
||||
|
||||
> MUI framework adds 13 additional articles (4 Deep, 9 Adequate) tracked separately above.
|
||||
> **Progress**: 15 of 25 primary targets complete (60%). Tier 1 fully cleared. 0 Tier 2 remaining. 10 Tier 3 items remain.
|
||||
> **Progress**: 8 per-compiler RE articles created (1 README + 7 compiler field manuals). 1 code-vs-data disambiguation article added. 1 ARexx integration guide added. 10 Tier 3 items remain.
|
||||
|
||||
---
|
||||
|
||||
Loading…
Add table
Add a link
Reference in a new issue