mirror of
https://github.com/alfishe/amiga-bootcamp.git
synced 2026-06-12 16:16:28 +00:00
More information added
This commit is contained in:
parent
a383d4c065
commit
05452c6c12
10 changed files with 1617 additions and 10 deletions
|
|
@ -98,6 +98,111 @@ Both tools output via `kprintf` to serial port (115200 8N1). Capture on host:
|
|||
screen /dev/cu.usbserial-XXXX 115200
|
||||
# or
|
||||
minicom -D /dev/cu.usbserial-XXXX -b 115200
|
||||
|
||||
---
|
||||
|
||||
## Decision Guide — Enforcer vs MungWall vs Manual Debugging
|
||||
|
||||
| Scenario | Use | Why |
|
||||
|---|---|---|
|
||||
| Random Guru Meditation, unknown cause | Both | Enforcer catches the access violation; MungWall catches the corruption that caused it |
|
||||
| Reproducible crash at known address | Enforcer first | Identifies the exact instruction and register state at the crash |
|
||||
| Heap data corruption (silent, no crash) | MungWall | Guards catch overwrites on FreeMem — may be the only detection |
|
||||
| Use-after-free bugs | Both | MungWall poisons freed blocks; Enforcer traps reads from unmapped freed pages |
|
||||
| 68000 (no MMU) | MungWall only | Enforcer requires 68020+ MMU for hardware trapping |
|
||||
| MiSTer FPGA / emulation | Both (if MMU implemented) | Verify MMU implementation supports Enforcer's page-level trapping |
|
||||
|
||||
---
|
||||
|
||||
## Named Antipatterns
|
||||
|
||||
### 1. "The Ignored Hit"
|
||||
|
||||
**What it looks like** — seeing an Enforcer hit, noting the PC, but dismissing it because "the program still runs":
|
||||
|
||||
```
|
||||
ENFORCER HIT: READ-WORD FROM $00000012
|
||||
PC: $0023AB12
|
||||
```
|
||||
|
||||
**Why it fails:** Enforcer catches the violation and *allows the program to continue* by emulating the access or returning dummy data. The crash may not happen immediately — but the corruption is real. A null pointer read that "works" because Enforcer returned `$00000000` may cause a crash 10 minutes later when that zero propagates to a pointer dereference.
|
||||
|
||||
**Correct:** Every Enforcer hit is a real bug. Fix them all, even if the program appears to survive.
|
||||
|
||||
### 2. "The Missing MungWall on Exit"
|
||||
|
||||
**What it looks like** — running MungWall, seeing clean output during the program, but not checking on program exit:
|
||||
|
||||
```
|
||||
run mungwall
|
||||
myapp
|
||||
; No MungWall output during run — looks clean!
|
||||
; But on exit, all allocations are freed — that's when guards are checked
|
||||
```
|
||||
|
||||
**Why it fails:** MungWall validates guards at `FreeMem()` time, not at corruption time. If the program corrupts a buffer, the corruption is detected only when that buffer is freed — typically at program exit. If you don't capture exit-time output, you miss the report.
|
||||
|
||||
**Correct:** Always capture serial output until the program fully exits and the CLI prompt returns.
|
||||
|
||||
---
|
||||
|
||||
## Use-Case Cookbook
|
||||
|
||||
### Track Down a Heap Overflow
|
||||
|
||||
1. `run mungwall` — intercepts AllocMem/FreeMem
|
||||
2. `run enforcer QUIET LOG enforcer.log` — catches illegal accesses
|
||||
3. Launch the program
|
||||
4. Reproduce the crash
|
||||
5. Check `enforcer.log` and serial output
|
||||
6. If MungWall reports "Trailer guard CORRUPTED at +132":
|
||||
- The allocation at the reported address + the offset is the corruption site
|
||||
- Walk backward from `FreeMem` PC to find the caller that corrupted it
|
||||
- Set a **hardware write watchpoint** on the guard address using Enforcer's MMU capability
|
||||
|
||||
### Verify All Allocations Are Freed (Leak Detection)
|
||||
|
||||
MungWall can report unfreed allocations at exit:
|
||||
|
||||
```bash
|
||||
run mungwall LEAKCHECK
|
||||
myapp
|
||||
# Output on exit:
|
||||
# MUNGWALL: 3 blocks still allocated (48 bytes total):
|
||||
# $001A2000: size=16, alloc PC=$0023BC44
|
||||
# $001A3000: size=16, alloc PC=$0023BC44
|
||||
# $001A4000: size=16, alloc PC=$0023BD12
|
||||
```
|
||||
|
||||
Cross-reference the alloc PCs with IDA to find the leaking code.
|
||||
|
||||
---
|
||||
|
||||
## Cross-Platform Comparison
|
||||
|
||||
| Amiga Concept | Modern Equivalent | Notes |
|
||||
|---|---|---|
|
||||
| Enforcer (MMU trap) | AddressSanitizer (ASan) | Same concept: trap illegal accesses, report PC + registers |
|
||||
| MungWall (heap guards) | `mallocscribble` / `MALLOC_CHECK_` | Same: canary values before/after each allocation |
|
||||
| MungWall use-after-free | ASan quarantine / `MALLOC_PERTURB_` | Same: poison freed memory, trap on re-read |
|
||||
| Combined Enforcer + MungWall | `-fsanitize=address` (GCC/Clang) | ASan combines both approaches in one tool |
|
||||
| Serial port output | `ASAN_OPTIONS=log_path=asan.log` | Same: output goes to a separate channel to survive crashes |
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
### Does Enforcer work on 68000 (A500/A600/A2000)?
|
||||
|
||||
Enforcer can work in "software mode" on 68000 by patching the bus error exception vector and using `trap #N` for software breakpoints. However, it cannot detect arbitrary illegal memory accesses without MMU hardware — the 68000 has no page tables to mark addresses as inaccessible. Use MungWall alone on 68000 systems.
|
||||
|
||||
### Why does Enforcer hit on perfectly valid code?
|
||||
|
||||
False positives are rare but possible: (1) self-modifying code that writes to code segments, (2) ROM shadowing — writing to what appears to be ROM but is actually a RAM mirror, (3) memory-mapped I/O regions that Enforcer doesn't know about (custom expansion hardware).
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
```
|
||||
|
||||
---
|
||||
|
|
|
|||
|
|
@ -4,7 +4,28 @@
|
|||
|
||||
## Overview
|
||||
|
||||
Live memory probing on a running Amiga means directly reading exec structures — `SysBase`, `LibList`, `TaskReady`, `MemList` — to observe system state without a traditional debugger.
|
||||
The Amiga has no `Task Manager`, no `dtrace`, no `/proc`. But it has something better: **every critical OS data structure is reachable from a single pointer at absolute address `$4`.** From `SysBase`, you can walk the library list, enumerate every running task, map every memory region, and even modify kernel structures — all from a user-mode program. No debugger required.
|
||||
|
||||
Live memory probing means reading (and sometimes writing) exec structures directly from a running Amiga without a traditional debugger. This is how tools like Scout, SysInspector, and XOpa work. It's how you verify that a hook is installed, check if a library is loaded, or inspect task state during development. This article covers the key data structures, the traversal patterns, and the safety rules.
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
SYSBASE["SysBase<br/>at absolute $4"]
|
||||
subgraph "Reachable Structures"
|
||||
LIBLIST["LibList<br/>→ every loaded library"]
|
||||
DEVLIST["DeviceList<br/>→ every loaded device"]
|
||||
TASKREADY["TaskReady<br/>→ runnable tasks"]
|
||||
TASKWAIT["TaskWait<br/>→ waiting tasks"]
|
||||
MEMLIST["MemList<br/>→ memory regions"]
|
||||
end
|
||||
SYSBASE --> LIBLIST
|
||||
SYSBASE --> DEVLIST
|
||||
SYSBASE --> TASKREADY
|
||||
SYSBASE --> TASKWAIT
|
||||
SYSBASE --> MEMLIST
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -124,6 +145,116 @@ Permit();
|
|||
|
||||
---
|
||||
|
||||
## Decision Guide — Safe Probing Rules
|
||||
|
||||
| Operation | Required Protection | Risk Without Protection |
|
||||
|---|---|---|
|
||||
| Reading a single field (`lib_Version`) | None — atomic word read | None on 68000–68060 |
|
||||
| Walking a linked list (LibList, TaskReady) | `Forbid()` / `Permit()` | Task switch mid-walk → stale pointer → crash |
|
||||
| Modifying a structure field | `Forbid()` (minimum) or `Disable()` | Other task reads half-written value |
|
||||
| Allocating/freeing memory during probing | `Forbid()` only — don't `Disable()` | `Disable()` blocks interrupts, may deadlock AllocMem |
|
||||
| Walking interrupt-visible data | `Disable()` / `Enable()` | Interrupt modifies structure mid-read |
|
||||
|
||||
---
|
||||
|
||||
## Named Antipatterns
|
||||
|
||||
### 1. "The Naked List Walk"
|
||||
|
||||
**What it looks like** — walking `TaskReady` without `Forbid()`:
|
||||
|
||||
```c
|
||||
// BROKEN — task switch mid-walk
|
||||
struct Task *t = (struct Task *)SysBase->TaskReady.lh_Head;
|
||||
while (t->tc_Node.ln_Succ) {
|
||||
printf("%s\n", t->tc_Node.ln_Name);
|
||||
t = (struct Task *)t->tc_Node.ln_Succ; // ← may be stale after switch!
|
||||
}
|
||||
```
|
||||
|
||||
**Why it fails:** If the current task's time slice expires during the walk, another task can add or remove nodes from `TaskReady`. The `ln_Succ` pointer you cached is now dangling — pointing to freed or moved memory.
|
||||
|
||||
**Correct:**
|
||||
|
||||
```c
|
||||
Forbid();
|
||||
struct Task *t = (struct Task *)SysBase->TaskReady.lh_Head;
|
||||
while (t->tc_Node.ln_Succ) {
|
||||
printf("%s\n", t->tc_Node.ln_Name);
|
||||
t = (struct Task *)t->tc_Node.ln_Succ;
|
||||
}
|
||||
Permit();
|
||||
```
|
||||
|
||||
### 2. "The Disable Trap"
|
||||
|
||||
**What it looks like** — using `Disable()` when only `Forbid()` is needed:
|
||||
|
||||
```c
|
||||
// OVERKILL — Disable blocks ALL interrupts AND task switches
|
||||
Disable();
|
||||
struct Library *lib = FindName(&SysBase->LibList, "intuition.library");
|
||||
UWORD ver = lib->lib_Version; // atomic word read — Forbid is enough!
|
||||
Enable();
|
||||
```
|
||||
|
||||
**Why it fails:** `Disable()` blocks ALL interrupts — including the vertical blank interrupt that drives the system clock. Holding `Disable()` for more than a few hundred cycles causes lost time, missed serial data, and audio dropouts. Forbid is sufficient for list traversal; Disable is only needed when interrupts themselves modify the same data.
|
||||
|
||||
**Correct:** Use the weakest protection that covers your access pattern.
|
||||
|
||||
---
|
||||
|
||||
## Use-Case Cookbook
|
||||
|
||||
### Check If a Hook Is Installed
|
||||
|
||||
```c
|
||||
BOOL is_hook_installed(struct Library *lib, LONG lvo, APTR expected_func) {
|
||||
APTR current = (APTR)(*(ULONG *)((UBYTE *)lib + lvo + 2));
|
||||
return (current == expected_func);
|
||||
}
|
||||
```
|
||||
|
||||
### Live Patch Verification Script
|
||||
|
||||
```c
|
||||
/* Verify a library function is still at its original address */
|
||||
ULONG get_jmp_target(struct Library *lib, LONG lvo) {
|
||||
UBYTE *entry = (UBYTE *)lib + lvo;
|
||||
if (*(UWORD *)entry != 0x4EF9) return 0; // Not a JMP ABS.L
|
||||
return *(ULONG *)(entry + 2);
|
||||
}
|
||||
|
||||
if (get_jmp_target(DOSBase, -48) != original_write_addr)
|
||||
printf("WARNING: dos.library Write() has been patched!\n");
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cross-Platform Comparison
|
||||
|
||||
| Amiga Concept | Win32 Equivalent | Linux Equivalent | Notes |
|
||||
|---|---|---|---|
|
||||
| `SysBase` at absolute `$4` | `NtCurrentPeb()` or `fs:[0x30]` | No equivalent — kernel/user split blocks direct access | Amiga's flat memory model makes this trivially accessible |
|
||||
| Walking `LibList` | `EnumProcessModules()` | `dl_iterate_phdr()` | Amiga's linked list is directly readable; Win32/Linux require API calls |
|
||||
| `TaskReady` enumeration | `CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS)` | `/proc/[pid]/stat` | Amiga lets you read the scheduler's run queue directly |
|
||||
| `MemList` memory map | `VirtualQuery()` | `/proc/self/maps` | Same result; Amiga reads kernel memory, Linux reads a pseudo-file |
|
||||
| `Forbid()`/`Permit()` protection | `EnterCriticalSection()` | `pthread_mutex_lock()` | Same purpose: prevent concurrent modification |
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
### Is live probing safe on 68000 (no MMU)?
|
||||
|
||||
Yes — and it's even simpler. On 68000, there's no memory protection at all. Any address is readable. The risks are purely logical: reading a list while it's being modified. `Forbid()` is sufficient on all CPU models.
|
||||
|
||||
### Can live probing crash the system?
|
||||
|
||||
Writing to the wrong address can corrupt OS structures and cause an immediate crash or silent data corruption. Reading is generally safe. The most common crash from reading is dereferencing a `NULL` pointer at the end of a list without checking `ln_Succ` first.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- NDK39: `exec/execbase.h`, `exec/memory.h`, `exec/tasks.h`
|
||||
|
|
|
|||
|
|
@ -111,6 +111,66 @@ MiSTer FPGA: the UART bridge is exposed on the MiSTer IO board or via the DE10-N
|
|||
|
||||
---
|
||||
|
||||
## Decision Guide — Which Debug Output Method?
|
||||
|
||||
| Method | Works During... | Requires | Throughput | Use Case |
|
||||
|---|---|---|---|---|
|
||||
| `kprintf()` | ROM init, crashes | Debug ROM or Kickstart 1.3 | Low (polled) | Kernel-level debugging |
|
||||
| `RawDoFmt + RawPutChar` | Any time after exec init | exec.library only | Medium | Universal: all Kickstart versions |
|
||||
| Direct `SERDAT` write | Anytime, even without OS | Nothing — bare metal | High (custom batching) | Crash handler, bootloader |
|
||||
| `dprintf` (debug.lib) | Application runtime | SAS/C debug.lib | Medium | Application-level tracing |
|
||||
| `serial.device` | Full OS running | serial.device open | High (interrupt-driven) | High-volume data transfer |
|
||||
|
||||
---
|
||||
|
||||
## Named Antipatterns
|
||||
|
||||
### 1. "The Deadly Debug Printf"
|
||||
|
||||
**What it looks like** — calling `printf()` or `VPrintf()` from inside a `Forbid()`/`Disable()` block:
|
||||
|
||||
```c
|
||||
Forbid();
|
||||
printf("Processing item %d\n", i); // BROKEN — calls dos.library!
|
||||
Permit();
|
||||
```
|
||||
|
||||
**Why it fails:** `printf()` goes through `dos.library Write()` which may call `Wait()` for buffered I/O. Inside `Forbid()`, task switching is blocked — `Wait()` never returns → system deadlock. Inside `Disable()`, even worse — interrupts are off, so the system clock stops and the serial device can't transmit.
|
||||
|
||||
**Correct:** Use `kprintf()` or `RawDoFmt + RawPutChar` inside Forbid/Disable — both bypass dos.library entirely.
|
||||
|
||||
### 2. "The Baud Rate Mismatch"
|
||||
|
||||
**What it looks like** — the Amiga outputs at 9600 baud but the host is configured for 115200:
|
||||
|
||||
```bash
|
||||
# Host configured for 115200
|
||||
screen /dev/cu.usbserial 115200
|
||||
# Output: ¥φΩ≡ƒ╤ ╚α≡α≤φσ≡ ╔╞╒ ... (garbage)
|
||||
```
|
||||
|
||||
**Why it fails:** The Amiga's default `SERPER` value after reset is for 9600 baud (on NTSC; PAL may differ). The host-side baud rate MUST match exactly. A single bit error in the start bit cascades into every subsequent bit being wrong.
|
||||
|
||||
**Correct:** Set `SERPER` to a known value before output, or cycle through common baud rates on the host side (9600, 19200, 38400, 57600, 115200) until text becomes readable.
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
### Why doesn't kprintf work on my Kickstart 3.1 ROM?
|
||||
|
||||
`kprintf()` was removed from release Kickstart ROMs starting with 2.04. It only exists in debug/test ROMs and Kickstart 1.3. For 2.0+, use `RawDoFmt + RawPutChar` or the direct hardware approach.
|
||||
|
||||
### Can I use the serial port without a null-modem cable?
|
||||
|
||||
No. The Amiga's serial port is RS-232 level (not TTL). You need a null-modem cable or a USB-serial adapter with RS-232 voltage levels. Direct connection to a USB UART (3.3V TTL) will damage the hardware.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- NDK39: `exec/execbase.h` — `RawDoFmt`, `RawPutChar` LVOs
|
||||
|
|
|
|||
|
|
@ -4,7 +4,32 @@
|
|||
|
||||
## Overview
|
||||
|
||||
`SetFunction()` is the official AmigaOS mechanism for **patching a library's JMP table** at runtime. It installs a custom function at a given LVO, replacing the original, and returns the old function pointer so a trampoline can be constructed.
|
||||
You want to know every file an application opens. Or every byte it writes. Or every memory allocation it makes — with sizes, flags, and call stacks. You could patch the binary. Or you could use the operating system's own hooking mechanism: **`SetFunction()`**.
|
||||
|
||||
`SetFunction()` is AmigaOS's official API for **replacing a library's JMP table entry at runtime.** It atomically swaps the target address of a specific LVO, returning the original pointer so you can construct a trampoline. Every call through that LVO — from every task, in every process — now routes through your code. This is the foundation of Amiga reverse engineering tooling: file system monitors, API tracers, memory debuggers, and anti-piracy checks all begin with `SetFunction()`.
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph "Before SetFunction"
|
||||
APP1["App calls<br/>JSR -48(A6)"]
|
||||
JMP["JMP table[-48]<br/>→ original Write"]
|
||||
ORIG["dos.library<br/>Write_impl()"]
|
||||
end
|
||||
subgraph "After SetFunction"
|
||||
APP2["App calls<br/>JSR -48(A6)"]
|
||||
JMP2["JMP table[-48]<br/>→ my_write_hook"]
|
||||
HOOK["my_write_hook()<br/>log, modify, block"]
|
||||
TRAMP["Trampoline →<br/>original Write"]
|
||||
end
|
||||
APP1 --> JMP
|
||||
JMP --> ORIG
|
||||
APP2 --> JMP2
|
||||
JMP2 --> HOOK
|
||||
HOOK --> TRAMP
|
||||
TRAMP --> ORIG
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -122,6 +147,174 @@ atexit(remove_hook);
|
|||
|
||||
---
|
||||
|
||||
## Decision Guide — SetFunction vs Alternatives
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
Q["Need to intercept<br/>library calls?"]
|
||||
Q -->|"System-wide,<br/>all tasks"| SF["Use SetFunction()"]
|
||||
Q -->|"Single task only"| ALT1["Consider patching<br/>the task's A6/A4"]
|
||||
Q -->|"At load time,<br/>before execution"| ALT2["Binary patch<br/>or HUNK relocation"]
|
||||
SF -->|"Need to call original?"| TRAMP["Write trampoline<br/>save orig pointer"]
|
||||
SF -->|"Block/replace only"| BLOCK["Don't save orig<br/>simpler, no trampoline"]
|
||||
```
|
||||
|
||||
| Approach | Scope | Invasiveness | Use Case |
|
||||
|---|---|---|---|
|
||||
| **SetFunction()** | System-wide | Low (official API) | API tracing, memory debugging, anti-piracy |
|
||||
| **Direct JMP table patch** | System-wide | Medium (bypasses API) | Pre-OS 2.0 compatibility |
|
||||
| **Task A6 replacement** | Single task | Medium | Per-application sandboxing |
|
||||
| **Binary patch (file)** | Single binary | High (modifies disk) | Permanent behavior change, crack intros |
|
||||
|
||||
---
|
||||
|
||||
## Named Antipatterns
|
||||
|
||||
### 1. "The Leaky Hook"
|
||||
|
||||
**What it looks like** — installing a hook but never removing it:
|
||||
|
||||
```c
|
||||
void setup(void) {
|
||||
Forbid();
|
||||
orig = SetFunction(DOSBase, -48, my_write);
|
||||
Permit();
|
||||
// No atexit() cleanup — hook lives forever
|
||||
}
|
||||
```
|
||||
|
||||
**Why it fails:** When the hooking program exits, `my_write` is unloaded from memory. But the JMP table still points to it. The next task that calls `Write()` jumps into freed memory → Guru Meditation.
|
||||
|
||||
**Correct:** Always register cleanup:
|
||||
|
||||
```c
|
||||
void cleanup(void) {
|
||||
Forbid();
|
||||
SetFunction(DOSBase, -48, orig); // restore original
|
||||
Permit();
|
||||
}
|
||||
// In main():
|
||||
atexit(cleanup);
|
||||
```
|
||||
|
||||
### 2. "The Forbid-Free Patch"
|
||||
|
||||
**What it looks like** — calling `SetFunction()` without `Forbid()`:
|
||||
|
||||
```c
|
||||
// BROKEN — task switch during SetFunction may corrupt list
|
||||
orig = SetFunction(DOSBase, -48, my_write);
|
||||
```
|
||||
|
||||
**Why it fails:** `SetFunction()` modifies the library's `lib_OpenCnt` and may trigger expunge logic. If a task switch occurs during this modification, another task may see an inconsistent state. The result: corrupted open counts, premature expunge, or lost patches.
|
||||
|
||||
**Correct:** Always wrap in `Forbid()`/`Permit()`.
|
||||
|
||||
### 3. "The Register Stomper"
|
||||
|
||||
**What it looks like** — a hook that corrupts registers before calling the original:
|
||||
|
||||
```asm
|
||||
_my_write:
|
||||
MOVEM.L D0-D2/A0-A1, -(SP) ; save only D0-D2/A0-A1
|
||||
JSR _log_args
|
||||
MOVEM.L (SP)+, D0-D2/A0-A1
|
||||
MOVEA.L _orig_write, A0
|
||||
JMP (A0) ; D3-D7/A2-A6 may contain garbage!
|
||||
```
|
||||
|
||||
**Why it fails:** The original `Write()` expects `D1`=file, `D2`=buffer, `D3`=length. If your logging code modified D3 and you didn't save/restore it, the original function sees a corrupted length — potentially writing gigabytes or zero bytes. Even worse: the caller may rely on other registers (D4-D7, A2-A5) being preserved per the AmigaOS ABI, and your hook trashed them.
|
||||
|
||||
**Correct:** Save and restore ALL registers the original function might read or the caller expects preserved. The safest approach is `MOVEM.L D0-D7/A0-A6`.
|
||||
|
||||
---
|
||||
|
||||
## Use-Case Cookbook
|
||||
|
||||
### File Access Tracer — Log Every Open
|
||||
|
||||
```c
|
||||
static APTR orig_Open;
|
||||
|
||||
LONG __asm my_Open(register __d1 STRPTR name,
|
||||
register __d2 LONG mode) {
|
||||
LONG result = ((LONG(*)(STRPTR,LONG))orig_Open)(name, mode);
|
||||
if (result) {
|
||||
kprintf("Open: %s mode=%ld → handle=%ld\n", name, mode, result);
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
void install_file_tracer(void) {
|
||||
Forbid();
|
||||
orig_Open = SetFunction(DOSBase, -30, my_Open);
|
||||
Permit();
|
||||
}
|
||||
```
|
||||
|
||||
### Write Blocker — Prevent All Disk Writes
|
||||
|
||||
```c
|
||||
static APTR orig_Write;
|
||||
static BOOL write_blocked = TRUE;
|
||||
|
||||
LONG __asm my_Write(register __d1 BPTR fh,
|
||||
register __d2 APTR buf,
|
||||
register __d3 LONG len) {
|
||||
if (write_blocked) {
|
||||
return 0; // pretend success, write nothing
|
||||
}
|
||||
return ((LONG(*)(BPTR,APTR,LONG))orig_Write)(fh, buf, len);
|
||||
}
|
||||
```
|
||||
|
||||
### Detect SetFunction Itself Being Called (Anti-Anti-Debug)
|
||||
|
||||
Some software detects patching by checking if `SetFunction` returns the expected original address. Counter-patch by hooking `SetFunction` itself:
|
||||
|
||||
```c
|
||||
static APTR orig_SetFunction;
|
||||
|
||||
APTR __asm my_SetFunction(register __a1 struct Library *lib,
|
||||
register __a0 LONG lvo,
|
||||
register __d0 APTR newFunc) {
|
||||
if (lib == DOSBase && lvo == -48) {
|
||||
return orig_Write; // lie: return our hook as "original"
|
||||
}
|
||||
return ((APTR(*)(struct Library*,LONG,APTR))orig_SetFunction)(lib, lvo, newFunc);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cross-Platform Comparison
|
||||
|
||||
| Amiga Concept | Win32 Equivalent | Linux Equivalent | Notes |
|
||||
|---|---|---|---|
|
||||
| `SetFunction()` | `DetourAttach()` (Microsoft Detours) | `LD_PRELOAD` + `dlsym(RTLD_NEXT)` | Same idea: intercept library calls transparently |
|
||||
| JMP table modification | IAT hooking | PLT/GOT hooking | Amiga's JMP table is simpler — one 6-byte write vs multi-level indirection |
|
||||
| Trampoline pattern | Detour trampoline | `dlsym(RTLD_NEXT, "write")` | Same: call original after instrumentation |
|
||||
| `Forbid()`/`Permit()` | `SuspendThread` / `ResumeThread` (crude) | Signal blocking (crude) | Amiga's task-level atomicity is unique — no per-thread suspend needed |
|
||||
| System-wide by default | Per-process by default | Per-process by default | Amiga's flat address space means one hook covers everything |
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
### Does SetFunction work on all library types?
|
||||
|
||||
Yes — `SetFunction()` works on any library with a standard JMP table (exec, dos, graphics, intuition, third-party). It does NOT work on ROM-based resident modules that use a different dispatch mechanism (some Kickstart modules).
|
||||
|
||||
### Can multiple hooks coexist on the same function?
|
||||
|
||||
Yes — in a chain. Each hook saves the "original" pointer (which may itself be a previous hook's trampoline). Removal must happen in reverse order: last hooked = first removed. Removing hooks out of order breaks the chain.
|
||||
|
||||
### Is SetFunction safe across CPU architectures?
|
||||
|
||||
On 68000–68060, yes. However, 68040+ systems with data cache enabled may cache the old JMP table entry. Always call `CacheClearU()` after `SetFunction()` on 040/060 to flush the data cache and ensure the new target address is visible to the instruction fetch unit.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- NDK39: `exec/execbase.h`
|
||||
|
|
|
|||
|
|
@ -2,9 +2,34 @@
|
|||
|
||||
# Identifying OS API Calls in Disassembly
|
||||
|
||||
## Background: How AmigaOS Library Calls Work
|
||||
## Overview
|
||||
|
||||
Before diving into identification techniques, it helps to understand the mechanics from first principles.
|
||||
You are staring at a disassembly listing of an unknown Amiga binary. You see `JSR (-$48,A6)` and have no idea what it calls. Multiply this by a thousand such instructions across the binary, and you realize: **without a systematic way to identify every OS call, reverse engineering an Amiga program is impossible.**
|
||||
|
||||
The AmigaOS library calling convention encodes every public function as a negative byte offset from a library base pointer — the **Library Vector Offset (LVO)**. If you know which library base lives in `A6` and what LVO is being called, you know exactly what the code does. This article covers the complete methodology: from raw `JSR (-N,A6)` to a fully annotated disassembly where every OS call is named.
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph "Source Code"
|
||||
SRC["Open(name, mode)"]
|
||||
end
|
||||
subgraph "Compiler Output"
|
||||
ASM["JSR -30(A6)"]
|
||||
end
|
||||
subgraph "Runtime Dispatch"
|
||||
JT["JMP table<br/>entry at -30"]
|
||||
IMPL["dos.library<br/>Open_impl()"]
|
||||
end
|
||||
subgraph ".fd File Mapping"
|
||||
FD["##bias 30<br/>Open(name,mode)(d1,d2)"]
|
||||
end
|
||||
SRC -->|"compiles to"| ASM
|
||||
ASM -->|"dispatches through"| JT
|
||||
JT --> IMPL
|
||||
FD -.->|"documents the<br/>LVO→name mapping"| ASM
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### What is a Shared Library?
|
||||
|
||||
|
|
@ -287,6 +312,217 @@ If you encounter `JSR (-N,A6)` and don't know which library A6 holds:
|
|||
|
||||
---
|
||||
|
||||
## Decision Guide — Which Lookup Method?
|
||||
|
||||
When you encounter `JSR (-N,A6)`, you have multiple ways to resolve the call. Choose based on what you know:
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
Q[\"JSR -N,A6 —<br/>what does it call?\"/]
|
||||
Q -->|\"A6 is a known library base?\"| KNOWN["Look up N in<br/>that library's .fd file"]
|
||||
Q -->|\"A6 is unknown\"| TRACE["Trace A6 back<br/>to its source"]
|
||||
TRACE -->|\"Found global+OpenLibrary\"| ID["Identify library<br/>from string arg"]
|
||||
ID --> KNOWN
|
||||
TRACE -->|\"Can't trace\"| HEUR["Heuristic:<br/>LVO in common ranges?"]
|
||||
HEUR -->|\"-30 to -300\"| DOS["Likely dos.library"]
|
||||
HEUR -->|\"-120 to -558\"| EXEC_LIKELY["Likely exec.library"]
|
||||
HEUR -->|\"Other\"| SEARCH["Search all .fd files<br/>for matching bias"]
|
||||
```
|
||||
|
||||
| Method | Speed | Accuracy | When to Use |
|
||||
|---|---|---|---|
|
||||
| **Known A6 + .fd lookup** | Instant | 100% | You've already identified the library base |
|
||||
| **Trace A6 + find OpenLibrary** | ~2 min | 100% | Unknown library base, need certainty |
|
||||
| **LVO range heuristic** | ~10 sec | ~80% | Quick triage, common LVOs overlap |
|
||||
| **Grep all .fd files** | ~1 min | 95% | Unknown library, LVO not in common ranges |
|
||||
|
||||
---
|
||||
|
||||
## Named Antipatterns
|
||||
|
||||
### 1. \"The Kitchen Sink LVO Table\"
|
||||
|
||||
**What it looks like** — loading a massive precomputed table covering every LVO in every library into IDA, then blindly applying it without verifying A6:
|
||||
|
||||
```python
|
||||
# BROKEN — applies ALL LVOs globally, ignores which library A6 holds
|
||||
for lvo, name in ALL_LVOS.items():
|
||||
idc.set_name(base + lvo, name) # wrong: base = Who knows?
|
||||
```
|
||||
|
||||
**Why it fails:** A6 could hold `DOSBase`, `GfxBase`, or `IntuitionBase` at any point. An LVO `-30` means `dos.library Open()` only when A6=`DOSBase`. Applied to `GfxBase`, it's `graphics.library BltBitMap()` — completely different. You get a disassembly full of confidently wrong labels.
|
||||
|
||||
**Correct:** Always identify A6's library first, then apply that specific library's LVO map:
|
||||
|
||||
```python
|
||||
if get_name(global_ptr) == "_DOSBase":
|
||||
apply_lvos(lib_base, DOS_LVO)
|
||||
elif get_name(global_ptr) == "_GfxBase":
|
||||
apply_lvos(lib_base, GFX_LVO)
|
||||
```
|
||||
|
||||
### 2. \"The Ghost Library\"
|
||||
|
||||
**What it looks like** — assuming the first `JSR (-N,A6)` after a `MOVEA.L 4.W, A6` uses the exec library base, but A6 was overwritten between the load and the call:
|
||||
|
||||
```asm
|
||||
MOVEA.L 4.W, A6 ; exec base — but this is never used
|
||||
MOVEA.L (_DOSBase).L, A6 ; A6 overwritten with DOS base
|
||||
JSR (-552,A6) ; WRONG assumption: this is NOT exec OpenLibrary
|
||||
; Correct: LVO -552 for dos.library is ExAll
|
||||
```
|
||||
|
||||
**Why it fails:** The disassembler shows `JSR (-552,A6)` and annotates it as `exec.library OpenLibrary()` because that's the most common match. But A6 was reloaded with `_DOSBase` — the actual call is `dos.library ExAll()` at LVO `-552`. Same LVO, different library, completely different behavior.
|
||||
|
||||
**Correct:** Track A6's value at every `JSR`. Never assume A6 is static across a function.
|
||||
|
||||
```asm
|
||||
MOVEA.L 4.W, A6 ; A6 = SysBase (verified: exec at $4)
|
||||
MOVEA.L (_DOSBase).L, A6 ; A6 = DOSBase (verified: global labeled)
|
||||
JSR (-552,A6) ; LVO -552 in dos.library = ExAll
|
||||
```
|
||||
|
||||
### 3. \"The Stale Base\"
|
||||
|
||||
**What it looks like** — calling through A6 after `CloseLibrary()`:
|
||||
|
||||
```c
|
||||
// BROKEN
|
||||
CloseLibrary(DOSBase);
|
||||
if (result)
|
||||
DOSBase->DoSometime(); // DOSBase is now stale — crash or call into freed memory
|
||||
```
|
||||
|
||||
In disassembly, you see `JSR (-N,A6)` after a `JSR (-558,A6)` (CloseLibrary). A6 becomes a dangling pointer. Any subsequent call through it hits freed memory — a crash or, worse, silent corruption.
|
||||
|
||||
**Correct:** After `CloseLibrary`, zero the base pointer. In the disassembly, flag any `JSR` that follows a `CloseLibrary` sequence as suspicious.
|
||||
|
||||
---
|
||||
|
||||
## Pitfalls
|
||||
|
||||
### 1. LVO Collisions Across Libraries
|
||||
|
||||
**The bug:**
|
||||
|
||||
```asm
|
||||
MOVEA.L (_IntuitionBase).L, A6
|
||||
JSR (-42,A6) ; Is this Read()? No!
|
||||
```
|
||||
|
||||
**Why:** LVO `-42` is `dos.library Read()` AND `intuition.library DrawImageState()` AND `graphics.library RectFill()`. LVOs are only unique within a single library.
|
||||
|
||||
**Correct:** The library base in A6 disambiguates. Always label the library base global first, then resolve LVOs.
|
||||
|
||||
### 2. Private LVOs in Third-Party Libraries
|
||||
|
||||
**The bug:** Using NDK `.fd` files to resolve calls to a third-party library (e.g., `Miami.library`, `muimaster.library`). The NDK doesn't document these — the LVO table won't match.
|
||||
|
||||
**Correct:** Third-party libraries require third-party `.fd` files. Search Aminet for `"libraryname" fd` or reconstruct the LVO table from the library binary itself (see [library_jmp_table.md](library_jmp_table.md)).
|
||||
|
||||
### 3. Inline Variants — Bypassing the JMP Table
|
||||
|
||||
**The bug:**
|
||||
|
||||
```asm
|
||||
MOVEA.L (_DOSBase).L, A6
|
||||
MOVEA.L (-30,A6), A0 ; read JMP table entry (NOT the JMP itself)
|
||||
JSR (A0) ; call directly — you won't see LVO -30 here
|
||||
```
|
||||
|
||||
**Why:** Some compilers (especially GCC with `-fbaserel`) inline the JMP table read. The `JSR (A0)` has no static LVO that grep can match. You must trace A0 back to the `MOVEA.L (-30,A6),A0` to recover the LVO.
|
||||
|
||||
**Correct:** When you see `JSR (A0)` or `JSR (An)` with a register, check the immediately preceding instruction for a `MOVEA.L (-N, A6), An` — that `-N` is your LVO.
|
||||
|
||||
---
|
||||
|
||||
## Use-Case Cookbook
|
||||
|
||||
### Find All File I/O Operations
|
||||
|
||||
To identify every file open/read/write/close in a binary:
|
||||
|
||||
1. Search for `JSR (-552,A6)` (exec OpenLibrary) and identify the `dos.library` open
|
||||
2. Label the resulting global `_DOSBase`
|
||||
3. Xref `_DOSBase` — every read is a function that uses dos.library
|
||||
4. Filter for `JSR (-30,A6)` (Open), `JSR (-42,A6)` (Read), `JSR (-48,A6)` (Write), `JSR (-36,A6)` (Close)
|
||||
5. Cross-reference the `D1` register before each Open call to identify **which files** are being opened
|
||||
|
||||
### Find All Memory Allocations
|
||||
|
||||
1. Search for `JSR (-198,A6)` where A6=`SysBase` (exec AllocMem)
|
||||
2. Note `D0` (size) and `D1` (attributes — `MEMF_CHIP` = `$0002`, `MEMF_FAST` = `$0004`, `MEMF_CLEAR` = `$10000`)
|
||||
3. Identify allocations that request Chip RAM — these are for audio buffers, copper lists, or bitplanes
|
||||
4. Trace the returned pointer in `D0` to find what the allocation is used for
|
||||
|
||||
### Trace an Unknown Message Flow
|
||||
|
||||
1. Find `JSR (-378,A6)` (exec PutMsg) — identifies message senders
|
||||
2. Find `JSR (-384,A6)` (exec GetMsg) — identifies message receivers
|
||||
3. Find `JSR (-408,A6)` (exec WaitPort) — identifies blocking receivers
|
||||
4. Trace `A0` before each `PutMsg` to identify **which port** the message targets
|
||||
5. Trace `A0` before each `GetMsg` to identify **which port** the receiver listens on
|
||||
6. If sender and receiver port names match, you've found a communication pair
|
||||
|
||||
### Map an Application's Library Dependencies
|
||||
|
||||
```python
|
||||
# IDA Python: dump every library used by a binary
|
||||
import idautils, idc
|
||||
|
||||
def find_library_opens():
|
||||
"""Find all OpenLibrary calls and print library names."""
|
||||
for ea in idautils.Heads():
|
||||
if idc.print_insn_mnem(ea) == 'JSR':
|
||||
# Check if A6 register is used (LVO-style call)
|
||||
op = idc.print_operand(ea, 0)
|
||||
if 'A6' in op and '-552' in op: # OpenLibrary
|
||||
# Walk back to find LEA with string
|
||||
prev = idc.prev_head(ea)
|
||||
if idc.print_insn_mnem(prev) == 'LEA':
|
||||
str_addr = idc.get_operand_value(prev, 0)
|
||||
lib_name = idc.get_strlit_contents(str_addr)
|
||||
if lib_name:
|
||||
print(f" {idc.here():08X}: OpenLibrary({lib_name.decode()})")
|
||||
|
||||
find_library_opens()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cross-Platform Comparison
|
||||
|
||||
| AmigaOS Concept | Win32 Equivalent | Linux ELF Equivalent | Notes |
|
||||
|---|---|---|---|
|
||||
| LVO-based library call (`JSR -30(A6)`) | IAT (Import Address Table) thunk | PLT (Procedure Linkage Table) stub | Both use indirection; Amiga's is register+offset, modern OSes use memory-based tables |
|
||||
| `.fd` file (function descriptor) | `.lib` import library + `GetProcAddress` | `.so` ELF symbol table | `.fd` files are human-readable text; PE/ELF symbol tables are binary |
|
||||
| `OpenLibrary("dos.library", 36)` | `LoadLibrary("kernel32.dll")` | `dlopen("libc.so.6", ...)` | Same pattern: load by name, get base pointer, resolve functions |
|
||||
| Library base in A6 | DLL base address in EAX/RAX | Shared object handle from `dlopen` | Amiga uses a dedicated register convention; Win32/Linux use a variable |
|
||||
| JMP table at negative offsets | IAT entries at RVA offsets | `.got.plt` entries | Amiga's table grows downward from base; PE/ELF tables are at positive offsets |
|
||||
| No runtime linking required (ROM libraries always present) | Delay-load DLLs | Lazy binding via `LD_BIND_NOW` | Amiga ROM libraries are always mapped — no load failure possible |
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
### How do I identify library calls without .fd files?
|
||||
|
||||
If the library is a standard AmigaOS library, `.fd` files are in `NDK39/fd/`. For third-party libraries, search the binary for the JMP table using `4EF9` (the `JMP ABS.L` opcode) clustered at regular 6-byte intervals. See [library_jmp_table.md](library_jmp_table.md).
|
||||
|
||||
### What if the binary uses a custom calling convention?
|
||||
|
||||
Some demos and games bypass the OS calling convention entirely — they call library functions directly by address (no LVO indirection). This is usually done for speed or obfuscation. In these cases, identify calls by the address falling within a known library's code segment, not by LVO pattern.
|
||||
|
||||
### Why does the LVO look wrong — it's not a multiple of 6?
|
||||
|
||||
Check the `.fd` file's `##bias` value. Bias = `|LVO|`. So `##bias 30` → LVO `−30` → slot 4 (`30÷6−1`). If you see `JSR (-$1E,A6)`, convert to decimal: `-30`. The hex `$1E` = 30 decimal. Always work in decimal when matching `.fd` biases.
|
||||
|
||||
### Can the same LVO appear in two different registers?
|
||||
|
||||
Yes. `JSR (-30,A5)` and `JSR (-30,A6)` are different calls if A5 and A6 hold different library bases. The LVO alone does not identify the call — the **register + LVO pair** does.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- NDK39: `fd/` directory — all library `.fd` files (plain text, open in any editor)
|
||||
|
|
|
|||
|
|
@ -4,7 +4,30 @@
|
|||
|
||||
## Overview
|
||||
|
||||
Manually parsing a HUNK binary from a hex dump is a foundational Amiga RE skill. It reveals segment boundaries, symbol tables, and relocation data before any tool processing.
|
||||
A hex dump. Four bytes: `00 00 03 F3`. That's `HUNK_HEADER` — the start of an Amiga executable. Everything that follows — code segments, data, BSS, relocations, symbols — is a stream of tagged longword records. Reading this stream by hand is the first skill every Amiga reverse engineer should develop: it reveals the skeleton of the binary before any tool touches it, and it catches corrupted files, packing wrappers, and obfuscated headers that automated loaders may silently misinterpret.
|
||||
|
||||
This article walks through manual HUNK parsing from a raw hex dump, covering the header, code/data/BSS segments, HUNK_RELOC32 patching, HUNK_SYMBOL extraction, and HUNK_EXT import/export resolution — all with copy-paste Python scripts.
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph "HUNK Stream"
|
||||
HDR["HUNK_HEADER<br/>000003F3"]
|
||||
CODE["HUNK_CODE<br/>000003E9"]
|
||||
DATA["HUNK_DATA<br/>000003EA"]
|
||||
BSS["HUNK_BSS<br/>000003EB"]
|
||||
RELOC["HUNK_RELOC32<br/>000003EC"]
|
||||
SYM["HUNK_SYMBOL<br/>000003F0"]
|
||||
END["HUNK_END<br/>000003F2"]
|
||||
end
|
||||
HDR -->|"num hunks, sizes"| CODE
|
||||
CODE -->|"code bytes"| DATA
|
||||
DATA -->|"data bytes"| BSS
|
||||
BSS -->|"size only"| RELOC
|
||||
RELOC -->|"patch offsets"| SYM
|
||||
SYM -->|"debug names"| END
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -115,6 +138,107 @@ IDA's Amiga loader applies relocations automatically, so all cross-hunk pointers
|
|||
|
||||
---
|
||||
|
||||
## Decision Guide — HUNK Analysis Scenarios
|
||||
|
||||
| Scenario | Tool | Why |
|
||||
|---|---|---|
|
||||
| Quick symbol dump | `hunkinfo` or hex grep for `$3F0` | Instant, no scripting needed |
|
||||
| Unknown / corrupted file | Manual hex walk (Step 1–2) | Identifies problems automated tools hide |
|
||||
| Full symbol + reloc extraction | Python script (Steps 3–4) | Exports everything for external analysis |
|
||||
| Standard RE in IDA | IDA Amiga HUNK loader | Automatic — no manual steps needed |
|
||||
| Obfuscated / packed binary | Manual hex walk first | Detect non-standard headers before IDA silently fails |
|
||||
|
||||
---
|
||||
|
||||
## Named Antipatterns
|
||||
|
||||
### 1. "The Missing Relocation"
|
||||
|
||||
**What it looks like** — seeing `MOVE.L #$0000000, An` in a HUNK_CODE section and assuming the value is zero:
|
||||
|
||||
```asm
|
||||
MOVE.L #$00000000, D1 ; looks like D1 = 0
|
||||
; But HUNK_RELOC32 at this offset changes it at load time!
|
||||
```
|
||||
|
||||
**Why it fails:** `HUNK_RELOC32` replaces placeholder longwords in CODE/DATA with actual addresses at load time. A `$00000000` may become `$00123456` after relocation. Without checking the relocation table, you're reading pre-patch values — completely wrong.
|
||||
|
||||
**Correct:** Always cross-reference every longword in CODE/DATA against the HUNK_RELOC32 table before interpreting it as a value.
|
||||
|
||||
### 2. "The End-of-Hunk Confusion"
|
||||
|
||||
**What it looks like** — finding `000003F2` (HUNK_END) and assuming that's the end of the file:
|
||||
|
||||
```hex
|
||||
000003F2 ← HUNK_END of hunk 0
|
||||
000003E9 ← HUNK_CODE of hunk 1 — file continues!
|
||||
```
|
||||
|
||||
**Why it fails:** `HUNK_END` marks the end of a single hunk (code segment), not the end of the file. Multi-hunk executables have multiple `HUNK_END` markers — one per segment. Stopping at the first one loses all remaining hunks.
|
||||
|
||||
**Correct:** Continue parsing after `HUNK_END` until you reach either EOF or the end of the header-declared hunk count.
|
||||
|
||||
---
|
||||
|
||||
## Use-Case Cookbook
|
||||
|
||||
### Detect a Packed Binary (Cruncher Wrapper)
|
||||
|
||||
Packed executables often have unusual hunk structures:
|
||||
|
||||
```bash
|
||||
xxd mybinary | head -4
|
||||
# Normal: 0000 03F3 ... (HUNK_HEADER, num_hunks = N)
|
||||
# Packed: 0000 03F3 0000 0001 0000 0000 ... (single hunk, huge size)
|
||||
# → single hunks with massive CODE segments = likely decruncher stub
|
||||
```
|
||||
|
||||
### Extract Strings from a HUNK Binary Without Loading
|
||||
|
||||
```python
|
||||
import struct, sys
|
||||
data = open(sys.argv[1], 'rb').read()
|
||||
for i in range(0, len(data), 4):
|
||||
tag = struct.unpack_from('>I', data, i)[0]
|
||||
if tag in (0x3E9, 0x3EA): # CODE or DATA
|
||||
size = struct.unpack_from('>I', data, i+4)[0] & 0x3FFFFFFF
|
||||
segment = data[i+8 : i+8+size*4]
|
||||
# Extract printable ASCII runs
|
||||
import re
|
||||
for m in re.finditer(rb'[\x20-\x7E]{4,}', segment):
|
||||
print(f'{i+8+m.start():08X}: {m.group().decode("ascii")}')
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cross-Platform Comparison
|
||||
|
||||
| Amiga Concept | Win32 PE Equivalent | Linux ELF Equivalent | Notes |
|
||||
|---|---|---|---|
|
||||
| HUNK_HEADER | PE `MZ` + `PE\0\0` signature | ELF `\x7FELF` magic | Same: file type identifier at offset 0 |
|
||||
| Hunk sizes in longs | PE section `SizeOfRawData` | ELF `p_filesz` | Amiga uses 32-bit longword units; PE/ELF use bytes |
|
||||
| HUNK_RELOC32 | PE `.reloc` section | ELF `.rela.dyn` | Same purpose: load-time address patching |
|
||||
| HUNK_SYMBOL | PDB debug symbols (external) | ELF `.symtab` (embedded) | Amiga debug symbols in-line; PE keeps them separate |
|
||||
| HUNK_EXT import/export | PE Import/Export Directory | ELF `.dynsym` | Same concept: cross-module symbol resolution |
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
### Why are hunk sizes in longs, not bytes?
|
||||
|
||||
The Amiga's 68000 CPU is a 16/32-bit architecture where memory is naturally addressed in 16-bit words and 32-bit longwords. Using longword units for hunk sizes keeps the headers word-aligned and simplifies the loader. Multiply by 4 to get byte sizes.
|
||||
|
||||
### What's the difference between HUNK_UNIT and HUNK_HEADER?
|
||||
|
||||
`HUNK_UNIT` (`$3E7`) marks an object file (`.o`), intended for linking. `HUNK_HEADER` (`$3F3`) marks a linked executable. Object files contain HUNK_EXT symbols for unresolved references; executables have all references resolved.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- NDK39: `dos/doshunks.h`
|
||||
|
|
|
|||
|
|
@ -4,7 +4,31 @@
|
|||
|
||||
## Overview
|
||||
|
||||
Every AmigaOS library has a **JMP table** at negative offsets from its base pointer. Reconstructing this table maps LVOs to function names and is essential for identifying all OS calls made by a binary under analysis.
|
||||
You've loaded a shared library binary into IDA Pro. It has no symbols. The disassembly shows a block of `JMP ABS.L` instructions at a known negative offset from the structure header — but every target is labeled `sub_1234AB`, `sub_5678CD`. You're staring at the library's **JMP table** — the dispatch mechanism for every public function — and it's entirely opaque.
|
||||
|
||||
Reconstructing the JMP table is the critical first step in any library reverse engineering effort. Once done, every `JSR (-N,A6)` in every application that uses this library becomes readable. This article covers the complete methodology: from raw hex dump to a fully annotated JMP table with function names, argument registers, and LVO mappings.
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Library Base Structure"
|
||||
JMP["JMP table<br/>(negative offsets)<br/>────────────────<br/>lib_base - 42: JMP Read<br/>lib_base - 36: JMP Close<br/>lib_base - 30: JMP Open<br/>lib_base - 6: JMP Open() std"]
|
||||
HEADER["struct Library<br/>(positive offsets)<br/>────────────────<br/>lib_base + 0: lib_Node<br/>lib_base + 14: lib_Version<br/>lib_base + 22: lib_Name"]
|
||||
PRIVATE["Private Data<br/>lib_base + N..."]
|
||||
end
|
||||
subgraph "LVO Resolution"
|
||||
FD[".fd file<br/>##bias 30 → Open(d1,d2)"]
|
||||
SCRIPT["IDA Python<br/>apply_lvo_names.py"]
|
||||
end
|
||||
CALLER["Application<br/>JSR -30(A6)"]
|
||||
IMPL["Function<br/>Implementation"]
|
||||
|
||||
CALLER -->|"dispatches via<br/>negative offset"| JMP
|
||||
JMP -->|"JMP abs.l →"| IMPL
|
||||
FD -.->|"maps LVO → name"| JMP
|
||||
SCRIPT -.->|"automated<br/>annotation"| JMP
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -126,7 +150,7 @@ In IDA, this appears as `jsr ($fffffffe2,a6)` with displacement `-30` (`$FFFFFFE
|
|||
|
||||
## Common Library Bases and LVO Tables
|
||||
|
||||
See [`../../../[lvo_table.md](../../04_linking_and_libraries/lvo_table.md)](../../../04_linking_and_libraries/lvo_table.md) for complete LVO offset tables for:
|
||||
See [lvo_table.md](../../04_linking_and_libraries/lvo_table.md) for complete LVO offset tables for:
|
||||
- `exec.library`
|
||||
- `dos.library`
|
||||
- `graphics.library`
|
||||
|
|
@ -134,6 +158,221 @@ See [`../../../[lvo_table.md](../../04_linking_and_libraries/lvo_table.md)](../.
|
|||
|
||||
---
|
||||
|
||||
## Reconstructing Unknown Third-Party Library Tables
|
||||
|
||||
When the `.fd` file is unavailable — common for third-party libraries like `muimaster.library`, `reqtools.library`, or `miami.library` — you must reconstruct the table from the binary.
|
||||
|
||||
### Step 1: Locate the Table by Scanning for JMP Opcodes
|
||||
|
||||
A JMP table is a dense cluster of `4EF9` opcodes at 6-byte intervals:
|
||||
|
||||
```python
|
||||
# IDA Python: find JMP table clusters
|
||||
def find_jmp_tables(min_entries=10):
|
||||
"""Scan for clusters of JMP ABS.L (4EF9) at 6-byte spacing."""
|
||||
ea = idc.get_inf_attr(INF_MIN_EA)
|
||||
max_ea = idc.get_inf_attr(INF_MAX_EA)
|
||||
clusters = []
|
||||
while ea < max_ea:
|
||||
if idc.get_wide_word(ea) == 0x4EF9: # JMP ABS.L
|
||||
# Check if next 6-byte offset is also 4EF9
|
||||
count = 1
|
||||
test_ea = ea - 6
|
||||
while test_ea > idc.get_inf_attr(INF_MIN_EA):
|
||||
if idc.get_wide_word(test_ea) == 0x4EF9:
|
||||
count += 1
|
||||
test_ea -= 6
|
||||
else:
|
||||
break
|
||||
if count >= min_entries:
|
||||
clusters.append((test_ea + 6, count))
|
||||
ea += 2
|
||||
return clusters
|
||||
|
||||
for start_ea, count in find_jmp_tables():
|
||||
print(f"JMP table at {start_ea:#010x}: {count} entries")
|
||||
```
|
||||
|
||||
### Step 2: Find the Library Base
|
||||
|
||||
The first JMP table entry (the Open() standard at LVO -6) sits 6 bytes before the library base. The library base itself starts with `struct Library` — identifiable by the `lib_Node.ln_Type` field (NT_LIBRARY = 9) at offset `+8`.
|
||||
|
||||
```c
|
||||
/* Verify we found the right structure: */
|
||||
BYTE type = *(BYTE *)(library_base + 8);
|
||||
if (type == 9) { /* NT_LIBRARY — confirmed */ }
|
||||
```
|
||||
|
||||
### Step 3: Extract Function Names from Debug Strings
|
||||
|
||||
Many libraries contain inline debug strings naming each function. Search for printable ASCII near the JMP targets:
|
||||
|
||||
```python
|
||||
import idc
|
||||
|
||||
def extract_function_names_from_strings(lib_base):
|
||||
"""Look for function name strings near JMP targets."""
|
||||
for lvo in range(-6, -300, -6):
|
||||
jmp_ea = lib_base + lvo
|
||||
if idc.get_wide_word(jmp_ea) == 0x4EF9:
|
||||
target = idc.get_wide_dword(jmp_ea + 2)
|
||||
# Search 64 bytes around target for a null-terminated string
|
||||
for offset in range(-32, 32):
|
||||
name = idc.get_strlit_contents(target + offset)
|
||||
if name and name.isalpha():
|
||||
print(f"LVO {lvo:+d}: candidate name '{name}'")
|
||||
break
|
||||
```
|
||||
|
||||
### Step 4: Verify by Argument Register Usage
|
||||
|
||||
Cross-reference the reconstructed LVO names with the NDK `.fd` register assignments. If `dos_lib.fd` says `Read(file,buffer,length)(d1,d2,d3)` and the function at LVO -42 uses D1, D2, D3 as arguments, the identification is confirmed.
|
||||
|
||||
---
|
||||
|
||||
## Decision Guide — Manual vs Automated Reconstruction
|
||||
|
||||
| Criterion | Manual (.fd lookup) | Automated (Python script) |
|
||||
|---|---|---|
|
||||
| **When to use** | Known AmigaOS library with available `.fd` | Unknown or third-party library |
|
||||
| **Speed** | ~5 min per library | ~30 sec (script) + verification |
|
||||
| **Accuracy** | 100% (official documentation) | 80–95% (heuristic string matching) |
|
||||
| **Works without .fd** | No | Yes |
|
||||
| **Handles version differences** | No — single `.fd` per OS version | Yes — reads actual binary |
|
||||
| **Best for** | Standard AmigaOS reverse engineering | Third-party library analysis |
|
||||
|
||||
---
|
||||
|
||||
## Named Antipatterns
|
||||
|
||||
### 1. "The Ghost Entry"
|
||||
|
||||
**What it looks like** — a JMP table entry pointing to an `RTS` instruction:
|
||||
|
||||
```asm
|
||||
JMP sub_RTS_only ; LVO -156 = dos.library ???
|
||||
; at sub_RTS_only:
|
||||
RTS ; empty function — this is a stub
|
||||
```
|
||||
|
||||
**Why it fails:** Some libraries include **private** or **reserved** LVOs that are intentionally empty stubs. Assuming every JMP entry maps to a real function produces wrong annotations. These stubs exist to reserve table slots for future expansion.
|
||||
|
||||
**Correct:** Check the JMP target for more than just `RTS`. If the target has no meaningful code (just `RTS` or `MOVEQ #0,D0; RTS`), mark it as `_reserved_lvo_N` rather than guessing a function name.
|
||||
|
||||
### 2. "The Wrong LVO Increment"
|
||||
|
||||
**What it looks like** — calculating LVO as `−4 × slot` instead of `−6 × slot`:
|
||||
|
||||
```python
|
||||
# BROKEN: 4-byte entries are for AmigaOS 1.x only
|
||||
lvo = -4 * slot # wrong for all 2.0+ libraries!
|
||||
```
|
||||
|
||||
**Why it fails:** AmigaOS 1.x ROM libraries used 4-byte JMP entries (JMP rel16). All 2.0+ libraries use 6-byte entries (JMP abs32). Using the wrong multiplier offsets every LVO after slot 0.
|
||||
|
||||
**Correct:** Always use `LVO = −6 × (slot + 1)`. Verify by checking the opcode at the first slot: `4EF9` = 6-byte JMP, `60xx` = 4-byte BRA rel.
|
||||
|
||||
### 3. "The Unsorted LVO Map"
|
||||
|
||||
**What it looks like** — applying LVO names in arbitrary order and getting some right, some wrong:
|
||||
|
||||
```python
|
||||
# BROKEN: the dict iteration order may not match the table order
|
||||
for lvo, name in LVO_MAP.items(): # Python 3.6+ preserves insertion order, but 3.5 doesn't
|
||||
apply_name(base + lvo, name)
|
||||
```
|
||||
|
||||
**Why it fails:** LVO maps are inherently ordered — slot 0 maps to `-6`, slot 1 to `-12`, etc. If the map is applied out of order and a duplicate LVO exists, the wrong name gets applied last and overwrites the correct one.
|
||||
|
||||
**Correct:** Iterate in sorted LVO order and verify each entry against the expected JMP opcode before renaming.
|
||||
|
||||
---
|
||||
|
||||
## Use-Case Cookbook
|
||||
|
||||
### Dump an Unknown Library's Full LVO Table
|
||||
|
||||
```python
|
||||
# IDA Python: extract and dump the JMP table of any library
|
||||
def dump_lvo_table(lib_base_addr, num_entries=50):
|
||||
lib_base = idc.get_wide_dword(lib_base_addr)
|
||||
print(f"{'LVO':>6} {'Offset':>10} {'Target':>10} {'Function'}")
|
||||
print("-" * 60)
|
||||
for slot in range(num_entries):
|
||||
lvo = -6 * (slot + 1)
|
||||
jmp_ea = lib_base + lvo
|
||||
opcode = idc.get_wide_word(jmp_ea)
|
||||
if opcode != 0x4EF9:
|
||||
break # end of table
|
||||
target = idc.get_wide_dword(jmp_ea + 2)
|
||||
name = idc.get_name(target) or f"sub_{target:X}"
|
||||
print(f"{lvo:+6d} {jmp_ea:#012x} {target:#010x} {name}")
|
||||
|
||||
# Usage: point to the _DOSBase global
|
||||
dump_lvo_table(idc.get_name_ea_simple("_DOSBase"))
|
||||
```
|
||||
|
||||
### Cross-Reference All Callers of a Specific Library Function
|
||||
|
||||
Once the JMP table is annotated, every `JSR (-30,A6)` in the disassembly where A6=`DOSBase` resolves to `dos_Open`. To find all callers:
|
||||
|
||||
1. Xref the `dos_Open` function implementation (the target of the JMP entry)
|
||||
2. Filter to only those references from `JSR` instructions (not data)
|
||||
3. Each caller is a function that opens files — trace D1 (filename) to see which files
|
||||
|
||||
### Verify a Reconstructed Table Against the Real .fd File
|
||||
|
||||
```bash
|
||||
# Host-side script: compare IDA output against NDK .fd
|
||||
python3 << 'EOF'
|
||||
import re, sys
|
||||
|
||||
fd_lvos = {}
|
||||
with open("NDK39/fd/dos_lib.fd") as f:
|
||||
bias = 0
|
||||
for line in f:
|
||||
m = re.match(r"##bias\s+(\d+)", line)
|
||||
if m:
|
||||
bias = int(m.group(1))
|
||||
m = re.match(r"(\w+)\(", line)
|
||||
if m and bias:
|
||||
fd_lvos[-bias] = m.group(1)
|
||||
|
||||
# Compare with your reconstruction...
|
||||
print(f"Found {len(fd_lvos)} functions in dos_lib.fd")
|
||||
EOF
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cross-Platform Comparison
|
||||
|
||||
| Amiga Concept | Win32 Equivalent | Linux ELF Equivalent | Notes |
|
||||
|---|---|---|---|
|
||||
| JMP table at negative offsets | COM vtable (always at offset 0) | `.plt` section entries | Amiga's negative-offset design allows the library base pointer to serve double duty |
|
||||
| 6-byte JMP ABS.L entries | 4-byte function pointers in vtable | 16-byte PLT stubs (x86-64) | Amiga entries are executable code, not data pointers |
|
||||
| LVO = −6 × slot | vtable index (0-based) | GOT entry offset | Amiga uses byte offsets; COM uses index; ELF uses memory offsets |
|
||||
| `.fd` file maps LVO→name | `.idl` / `.h` COM interface definition | ELF symbol table `.dynsym` | `.fd` is human-readable text; COM/ELF use binary metadata |
|
||||
| Library base from `OpenLibrary()` | `CoCreateInstance()` returns interface ptr | `dlopen()` returns handle | Same pattern: opaque handle resolves to function table |
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
### How do I know when the JMP table ends?
|
||||
|
||||
The table ends when the pattern `4EF9` at 6-byte spacing breaks. The last valid entry is followed by the `struct Library` header at offset 0. The total number of entries is `lib_NegSize / 6` (stored in the library structure itself at a library-specific offset).
|
||||
|
||||
### What if the library uses 4-byte JMP entries (AmigaOS 1.x)?
|
||||
|
||||
1.x libraries (e.g., Kickstart 1.2/1.3 ROM) use `JMP rel16` (4 bytes: opcode `60xx` + 2-byte offset). To handle both: check the opcode at the first entry. `4EF9` = 6-byte, `60xx` = 4-byte. Adjust your LVO formula accordingly: `LVO = −4 × (slot + 1)` for 4-byte entries.
|
||||
|
||||
### Can SetFunction() break my JMP table reconstruction?
|
||||
|
||||
Yes. `SetFunction()` modifies the JMP table in RAM — the `4EF9` target address changes. If you're analyzing a RAM dump rather than a disk binary, some entries may point to patches rather than original functions. Always note whether your analysis target is a cold binary or a live memory snapshot.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- NDK39: `fd/` directory — all library `.fd` files
|
||||
|
|
|
|||
|
|
@ -4,7 +4,30 @@
|
|||
|
||||
## Overview
|
||||
|
||||
Different Amiga compilers produce distinct code signatures. Recognising these helps quickly identify compiler origin, locate `main()`, and distinguish OS glue from application logic.
|
||||
You've loaded a HUNK binary into IDA Pro. Before you can even begin tracing logic, you need to answer a basic question: **which compiler produced this code?** The answer determines everything else — whether strings are PC-relative or absolute, whether `main()` starts with `LINK A5` or `SUBQ.L #N,SP`, whether `DIVS.L` is a compiler intrinsic or a library call.
|
||||
|
||||
Amiga compilers — SAS/C, GCC, VBCC, StormC, Aztec C — each leave a **fingerprint** in the generated assembly. These fingerprints are consistent enough that a single function prologue can identify the compiler with >90% accuracy. This article catalogs the distinguishing patterns for each major Amiga compiler and provides a systematic methodology for compiler identification from disassembly alone.
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph "Source"
|
||||
C["C source<br/>myapp.c"]
|
||||
end
|
||||
subgraph "Compiler Backend"
|
||||
SAS["SAS/C<br/>LINK A5 + MOVEM<br/>absolute strings"]
|
||||
G["GCC<br/>SUBQ.L + PC-relative<br/>libgcc helpers"]
|
||||
VB["VBCC<br/>MOVEM only<br/>tight loops"]
|
||||
end
|
||||
subgraph "Disassembly Fingerprint"
|
||||
IDA["IDA Pro / Ghidra"]
|
||||
end
|
||||
C --> SAS
|
||||
C --> G
|
||||
C --> VB
|
||||
SAS -->|"LINK A5 #-N<br/>MOVE.L #_str,D1"| IDA
|
||||
G -->|"SUBQ.L #N,SP<br/>LEA _str(PC),A0"| IDA
|
||||
VB -->|"MOVEM.L D2-D4,-(SP)<br/>minimal overhead"| IDA
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -120,6 +143,74 @@ VBCC's OS call inline expansion looks identical to GCC's inline-asm stubs.
|
|||
|
||||
---
|
||||
|
||||
## StormC 3.x / 4.x Patterns
|
||||
|
||||
StormC was the first native Amiga C++ IDE. It used a custom frontend (based on EDG) but generated Amiga hunk output directly.
|
||||
|
||||
### Function Prologue
|
||||
|
||||
```asm
|
||||
; StormC typical function:
|
||||
LINK A6, #-N ; StormC uses A6 as frame pointer by default
|
||||
MOVEM.L D2-D7/A2-A5, -(SP) ; aggressive register save
|
||||
; ...
|
||||
MOVEM.L (SP)+, D2-D7/A2-A5
|
||||
UNLK A6
|
||||
RTS
|
||||
```
|
||||
|
||||
> [!WARNING]
|
||||
> StormC's use of `A6` as a frame pointer conflicts with the OS convention of `A6` = library base. In StormC-compiled code, A6 near `LINK`/`UNLK` is a frame pointer, NOT a library base. This is the #1 misidentification cause when reversing StormC output.
|
||||
|
||||
### Distinguishing from SAS/C
|
||||
|
||||
| Pattern | SAS/C | StormC |
|
||||
|---|---|---|
|
||||
| Frame pointer register | A5 | A6 |
|
||||
| Preserved registers | D2-D7/A2-A3 | D2-D7/A2-A5 |
|
||||
| Startup module | `__main` | `_main` with C++ static constructor calls |
|
||||
|
||||
---
|
||||
|
||||
## Aztec C 5.x Patterns
|
||||
|
||||
Aztec C (Manx) was a popular budget compiler in the late 1980s. Its code generation is primitive compared to SAS/C or GCC.
|
||||
|
||||
### Distinctive Features
|
||||
|
||||
```asm
|
||||
; Aztec C function — no LINK, uses stack offset from SP directly:
|
||||
MOVE.L D2, -(SP) ; save only what's needed
|
||||
...
|
||||
MOVE.L (SP)+, D2
|
||||
RTS
|
||||
```
|
||||
|
||||
Aztec C is identifiable by **absence of LINK/UNLK** combined with **16-bit MOVE** instructions where other compilers use 32-bit (e.g., `MOVE.W D0, 4(SP)` instead of `MOVE.L`). It also generates `JSR ___ltoa` and `JSR ___printf` calls with AZTEC-prefixed helper names.
|
||||
|
||||
---
|
||||
|
||||
## Hand-Coded Assembly (Assembler)
|
||||
|
||||
Not all Amiga code came from a compiler. Demos, games, and high-performance libraries were often hand-written in assembler.
|
||||
|
||||
### Telltale Signs
|
||||
|
||||
| Sign | What It Means |
|
||||
|---|---|
|
||||
| `MOVEM.L D0-D7/A0-A6, -(SP)` | No compiler saves ALL registers — this is hand-coded |
|
||||
| `BTST #6, ($BFE001)` | Direct CIA register read — compilers go through `cia.resource` or `graphics.library` |
|
||||
| `MOVE.W #$4000, ($DFF09A)` | Direct custom chip register write — compilers use OS functions |
|
||||
| `LEA _copperlist(PC), A0` + `MOVE.L A0, ($DFF080)` | Hardware banging with PC-relative addressing |
|
||||
| `MOVE SR, D0` / `ANDI #$F8FF, SR` | Supervisor mode toggling — no compiler generates this |
|
||||
| Missing startup stub | No `MOVE.L 4.W, A6` — the code runs bare-metal |
|
||||
|
||||
If you see direct register pokes to `$DFFxxx` or `$BFExxx` without any OS library calls, you're looking at hand-coded assembly, and standard compiler identification doesn't apply.
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## Distinguishing Compiler Artefacts from Logic
|
||||
|
||||
| Pattern | Compiler | Meaning |
|
||||
|
|
@ -148,6 +239,158 @@ After identifying the startup stub (`MOVE.L 4.W, A6` → `JSR _OpenLibraries`):
|
|||
|
||||
---
|
||||
|
||||
## Decision Guide — Compiler Identification Flowchart
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
Q["Unknown function<br/>prologue seen"]
|
||||
Q -->|"LINK A5, #-N"| SAS["SAS/C 6.x<br/>confirmed"]
|
||||
Q -->|"LINK A6, #-N"| BRANCH["Check register save"]
|
||||
BRANCH -->|"Saves A2-A5"| STORM["StormC<br/>confirmed"]
|
||||
BRANCH -->|"Saves A2-A3"| GCC_RARE["GCC with<br/>-fno-omit-frame-pointer"]
|
||||
Q -->|"SUBQ.L #N,SP<br/>or MOVEM only"| BRANCH2["Check string refs"]
|
||||
BRANCH2 -->|"LEA str(PC), A0"| GCC["GCC (bebbo)<br/>confirmed"]
|
||||
BRANCH2 -->|"MOVE.L #_str, D1"| SAS2["SAS/C (leaf<br/>function)"]
|
||||
Q -->|"No LINK/UNLK<br/>16-bit MOVEs"| AZTEC["Aztec C 5.x<br/>confirmed"]
|
||||
Q -->|"Direct $DFFxxx<br/>register writes"| ASM["Hand-coded<br/>assembly"]
|
||||
```
|
||||
|
||||
| Clue | Compiler | Confidence |
|
||||
|---|---|---|
|
||||
| `LINK A5, #-N` + `MOVEM.L D2-D7/A2-A3` | SAS/C 6.x | >95% |
|
||||
| `LINK A6, #-N` + `MOVEM.L D2-D7/A2-A5` | StormC | >90% |
|
||||
| `SUBQ.L #N, SP` + `LEA (PC), An` | GCC (bebbo) | >95% |
|
||||
| `JSR ___divsi3` / `JSR ___modsi3` | GCC (bebbo) | 100% |
|
||||
| `DIVS.L` with no JSR | SAS/C 6.x | >90% |
|
||||
| `MOVE.W D0, 4(SP)` (16-bit stack ops) | Aztec C | >80% |
|
||||
| Direct `$DFFxxx` write, no OS calls | Assembler | 100% |
|
||||
| `MOVEM.L D0-D7/A0-A6` | Assembler | 100% |
|
||||
|
||||
---
|
||||
|
||||
## Named Antipatterns
|
||||
|
||||
### 1. "The Frame Pointer Confusion"
|
||||
|
||||
**What it looks like** — assuming A6 always holds a library base in StormC-compiled code:
|
||||
|
||||
```asm
|
||||
LINK A6, #-24 ; A6 is now a FRAME POINTER, not a library base
|
||||
MOVEM.L D2-D5/A2-A5, -(SP)
|
||||
MOVEA.L (_DOSBase).L, A6 ; NOW A6 is a library base — but LINK changed it
|
||||
JSR (-30,A6) ; this works only because A6 was reloaded
|
||||
```
|
||||
|
||||
**Why it fails:** StormC uses A6 as the C frame pointer. Between `LINK A6` and the library base reload, A6 points to the stack frame, not a library. Any `JSR (-N,A6)` in that window hits the stack as a fake "JMP table" and crashes.
|
||||
|
||||
**Correct:** In StormC output, always verify that A6 was reloaded from a known library global before treating `JSR (-N,A6)` as a library call.
|
||||
|
||||
### 2. "The String Reloc Mirage"
|
||||
|
||||
**What it looks like** — seeing `MOVE.L #$XXXXXXXX, D1` and assuming it's an immediate value when it's actually a relocation:
|
||||
|
||||
```asm
|
||||
MOVE.L #$00001234, D1 ; in the raw binary, this is $00001234
|
||||
; after HUNK_RELOC32, it becomes actual string addr
|
||||
JSR (-48,A6) ; Write(stdout, ???)
|
||||
```
|
||||
|
||||
**Why it fails:** Without parsing `HUNK_RELOC32` entries, `#$00001234` looks like a constant. But it's a placeholder that exec replaces with the actual address at load time. You can't know what string it points to from static analysis alone — you need to read the relocation target.
|
||||
|
||||
**Correct:** Always cross-reference `HUNK_RELOC32` entries (see [hunk_reconstruction.md](hunk_reconstruction.md)) before interpreting `MOVE.L #immediate` as a value in SAS/C output.
|
||||
|
||||
---
|
||||
|
||||
## Use-Case Cookbook
|
||||
|
||||
### Pattern 1: Identify the Compiler from a Single Function
|
||||
|
||||
```ascii
|
||||
┌─────────────────────────────────┐
|
||||
│ 1. Look at function prologue │
|
||||
│ ├─ LINK A5? → SAS/C │
|
||||
│ ├─ LINK A6? → StormC │
|
||||
│ ├─ SUBQ.L #N,SP? → GCC │
|
||||
│ └─ None? → Continue │
|
||||
│ 2. Look at string references │
|
||||
│ ├─ LEA str(PC),An? → GCC │
|
||||
│ └─ MOVE.L #str,An? → SAS/C │
|
||||
│ 3. Look at division │
|
||||
│ ├─ JSR ___divsi3? → GCC │
|
||||
│ └─ DIVS.L? → SAS/C │
|
||||
│ 4. Look at startup stub │
|
||||
│ ├─ JSR ___main? → GCC │
|
||||
│ └─ JSR _main? → SAS/C │
|
||||
└─────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Pattern 2: Find All Functions in a Compiler-Specific Binary
|
||||
|
||||
SAS/C functions start with `LINK A5, #-N` followed by `MOVEM.L`. Search IDA for:
|
||||
```
|
||||
Search → Text → "LINK A5"
|
||||
```
|
||||
Every hit is a function entry point. Press `P` on each to create an IDA function.
|
||||
|
||||
GCC functions start with `SUBQ.L #N,SP` or `MOVEM.L`. Search for:
|
||||
```
|
||||
Search → Text → "MOVEM.L"
|
||||
```
|
||||
Filter to those NOT preceded by `LINK` — those are GCC leaf or non-leaf functions.
|
||||
|
||||
### Pattern 3: Distinguish OS Glue Code from Application Logic
|
||||
|
||||
OS glue (the startup stub + compiler helper functions) precedes `main()` and follows a fixed pattern:
|
||||
|
||||
```asm
|
||||
; Universal OS glue pattern:
|
||||
MOVE.L 4.W, A6 ; SysBase
|
||||
; ... library opens ...
|
||||
JSR _main ; application logic starts HERE
|
||||
; ... library closes ...
|
||||
MOVEQ #0, D0 ; return 0
|
||||
RTS ; back to DOS
|
||||
```
|
||||
|
||||
Everything before the `JSR _main` is compiler/OS glue — skip it when tracing application logic.
|
||||
|
||||
---
|
||||
|
||||
## Cross-Platform Comparison
|
||||
|
||||
| Amiga Concept | Win32 Equivalent | Linux ELF Equivalent | Notes |
|
||||
|---|---|---|---|
|
||||
| SAS/C `LINK A5` prologue | MSVC `push ebp; mov ebp, esp` | GCC `push rbp; mov rbp, rsp` | Same frame-pointer setup, different register |
|
||||
| GCC PC-relative strings | Position-independent code (`/DYNAMICBASE`) | `-fPIC` + GOT-relative access | Same goal: eliminate relocations for security/performance |
|
||||
| SAS/C absolute strings | Non-PIE executables | Non-PIE, absolute addresses | Relocation-heavy; simpler but slower to load |
|
||||
| Compiler fingerprinting | `.rdata` section compiler strings | `.comment` ELF section | Amiga has NO embedded compiler ID — must deduce from code patterns |
|
||||
| `JSR ___divsi3` (libgcc) | `__alldiv` (MSVC runtime) | `__divdi3` (libgcc) | All compilers call helper functions for complex operations |
|
||||
| HUNK_RELOC32 in disassembly | PE `.reloc` section | ELF `.rela.dyn` | Same concept; Amiga relocs are embedded in the hunk stream |
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
### Can a single binary use multiple compilers?
|
||||
|
||||
Yes — and it's common. An application compiled with SAS/C may link a third-party library compiled with GCC. The startup stub and `main()` follow one compiler's pattern, but library functions (especially if statically linked) may show another compiler's fingerprints. Always identify the compiler for each code segment independently.
|
||||
|
||||
### What about the AmigaOS ROM itself?
|
||||
|
||||
The Kickstart ROM was compiled with Green Hills C (later versions) or SAS/C (earlier versions). ROM code is identifiable by its use of **absolute addresses** rather than base-relative PSI (Program Segment Independence) linking. The startup stub is absent — ROM code begins at a RomTag structure.
|
||||
|
||||
### How do I tell SAS/C 5.x from SAS/C 6.x?
|
||||
|
||||
SAS/C 6.x generates `MOVEM.L D2-D7/A2-A3` in prologues. SAS/C 5.x saves fewer registers (`MOVEM.L D2-D5`). Also, 6.x uses `LINK A5, #-.w` for small frames and `LINK A5, #-.l` for large ones; 5.x only uses the `.w` variant.
|
||||
|
||||
### Does this work for C++ code?
|
||||
|
||||
StormC is the primary Amiga C++ compiler. C++ code is identifiable by:
|
||||
- `JSR ___nw__FUl` (operator new) calls
|
||||
- Virtual function tables — arrays of function pointers in the data hunk
|
||||
- `this` pointer in A0 (StormC convention) for method calls
|
||||
- Static constructor calls in the startup sequence
|
||||
|
||||
## References
|
||||
|
||||
- SAS/C 6.x manual — code generation chapter
|
||||
|
|
|
|||
|
|
@ -4,7 +4,33 @@
|
|||
|
||||
## Overview
|
||||
|
||||
String references are the fastest entry point into a disassembled Amiga binary. Library name strings, error messages, and format strings immediately reveal program intent and identify OS API usage patterns.
|
||||
A binary is a sea of bytes. Most of it is unintelligible machine code. But floating in that sea are islands of ASCII: library names, error messages, format strings, screen titles. Each string is a **label on a code path** — the first thing a reverse engineer should find, because it's the only human-readable content in the entire binary.
|
||||
|
||||
String cross-reference analysis is the fastest entry point into an unknown Amiga binary. Find the `.library` strings → find `OpenLibrary` calls → identify every OS API the program uses. Find error messages → find the error-handling code paths. Find format strings → find printf/logging sites → understand program flow. This article covers the complete string-driven RE methodology.
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "String Types"
|
||||
LIB[".library strings<br/>→ OpenLibrary calls"]
|
||||
ERR["Error messages<br/>→ failure code paths"]
|
||||
FMT["Format strings<br/>→ printf/logging"]
|
||||
TITLE["Screen/window titles<br/>→ product identity"]
|
||||
PATH["File path strings<br/>→ file I/O targets"]
|
||||
end
|
||||
subgraph "What They Reveal"
|
||||
API["API usage map"]
|
||||
FLOW["Program flow"]
|
||||
ID["Product name/version"]
|
||||
FILES["File access patterns"]
|
||||
end
|
||||
LIB --> API
|
||||
ERR --> FLOW
|
||||
FMT --> FLOW
|
||||
TITLE --> ID
|
||||
PATH --> FILES
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -112,6 +138,115 @@ for s in idautils.Strings():
|
|||
for ref in refs:
|
||||
func = idc.get_func_name(ref.frm)
|
||||
print(f"{s.ea:#x} [{text!r:40s}] ← {func or 'unknown'} @ {ref.frm:#x}")
|
||||
|
||||
---
|
||||
|
||||
## Decision Guide — String-Driven Entry Points
|
||||
|
||||
| String Type | What to Do First | What It Tells You |
|
||||
|---|---|---|
|
||||
| `".library"` | Xref → find OpenLibrary | Every OS API the program uses |
|
||||
| `"Error:"` / `"Can't"` / `"Failed"` | Xref → error handler | Failure code paths, rare branches |
|
||||
| `"%d"` / `"%s"` / `"%ld"` | Xref → VPrintf/printf | Logging sites, parameter types |
|
||||
| File paths (`"SYS:"`, `"LIBS:"`, `"PROGDIR:"`) | Xref → Open/Lock/LoadSeg | File I/O targets |
|
||||
| Screen/window titles | Xref → OpenScreen/OpenWindow | Application identity, version |
|
||||
|
||||
---
|
||||
|
||||
## Named Antipatterns
|
||||
|
||||
### 1. "The Dead String"
|
||||
|
||||
**What it looks like** — finding an error string with no cross-references and assuming the code path is unreachable:
|
||||
|
||||
```asm
|
||||
LEA _err_fatal(PC), A0 ; "FATAL: disk error"
|
||||
; No xref to this string — but it's used via computed address!
|
||||
```
|
||||
|
||||
**Why it fails:** Some programs build string addresses dynamically (e.g., through a string table indexed at runtime). IDA won't detect these as xrefs. The string IS used — just not through a static reference.
|
||||
|
||||
**Correct:** For strings without xrefs, check if they're part of a larger string table (consecutive string data). If so, a function loading a base address + computed offset may reference them dynamically.
|
||||
|
||||
### 2. "The Null Bait"
|
||||
|
||||
**What it looks like** — IDA showing a 100-character "string" because it didn't stop at an embedded null:
|
||||
|
||||
```asm
|
||||
; SAS/C strings are Pascal-style: length-prefixed, NOT null-terminated!
|
||||
DC.B $0E, "Hello, World!", 0 ; length byte = 14, then data, then null
|
||||
; IDA sees only "Hello, World!" — misses the length byte
|
||||
```
|
||||
|
||||
**Why it fails:** SAS/C uses Pascal-style strings (length byte prefix) for some internal data. IDA's C-style null-terminated string detection stops at the first null and may misinterpret string boundaries.
|
||||
|
||||
**Correct:** Check the byte before the string. If it equals the string length, it's a Pascal string — the string starts at that byte, not after it.
|
||||
|
||||
---
|
||||
|
||||
## Use-Case Cookbook
|
||||
|
||||
### Map Every OS API Call from Strings Alone
|
||||
|
||||
```python
|
||||
# IDA Python: from .library strings → OpenLibrary → all calls
|
||||
import idautils, idc
|
||||
|
||||
LIBRARIES = {}
|
||||
for s in idautils.Strings():
|
||||
text = str(s)
|
||||
if text.endswith('.library'):
|
||||
for xref in idautils.XrefsTo(s.ea):
|
||||
# Walk forward from xref to find JSR (-552,A6)
|
||||
ea = xref.frm
|
||||
for _ in range(20):
|
||||
if idc.print_insn_mnem(ea) == 'JSR':
|
||||
op = idc.print_operand(ea, 0)
|
||||
if '-552' in op:
|
||||
# Find where D0 (result) is stored
|
||||
next_ea = idc.next_head(ea)
|
||||
if idc.print_insn_mnem(next_ea) == 'MOVE.L':
|
||||
dest = idc.print_operand(next_ea, 0)
|
||||
LIBRARIES[text] = dest
|
||||
print(f"{text} → stored at {dest}")
|
||||
ea = idc.next_head(ea)
|
||||
```
|
||||
|
||||
### Find All Version Strings
|
||||
|
||||
Version strings often follow the pattern `"$VER: name version (date)"`:
|
||||
|
||||
```bash
|
||||
strings mybinary | grep -i '\$VER:'
|
||||
# Output: $VER: MyApp 1.23 (12.04.1993)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cross-Platform Comparison
|
||||
|
||||
| Amiga Concept | Win32 Equivalent | Linux Equivalent | Notes |
|
||||
|---|---|---|---|
|
||||
| `.library` strings → OpenLibrary | `.dll` strings → LoadLibrary | `.so` strings → dlopen | Same pattern: string identifies dynamically loaded module |
|
||||
| String xref analysis | `strings.exe` + IDA cross-reference | `strings` + radare2/Ghidra xref | Universal RE technique: strings are the first foothold |
|
||||
| SAS/C Pascal strings | Delphi/BCB short strings | N/A (C-dominated ecosystem) | Pascal-style strings are rare outside Amiga SAS/C |
|
||||
| `$VER:` version string convention | `VS_VERSION_INFO` resource | `.comment` ELF section | Amiga's convention is informal but widely followed |
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
### Why do some strings have no xrefs in IDA?
|
||||
|
||||
Possible causes: (1) the string is referenced via a computed address (base+index), (2) the string is in a data table accessed by offset, (3) the string is dead code from a library compiled in but never called, (4) IDA's string detection split a long string incorrectly.
|
||||
|
||||
### How do I handle non-ASCII strings (German umlauts, etc.)?
|
||||
|
||||
Amiga uses ISO 8859-1 (Latin-1) encoding. Characters above `$7F` are valid Latin-1 but may display incorrectly in IDA's default ASCII view. Set IDA's string encoding to Latin-1 or use `idc.get_strlit_contents(ea, -1, STRTYPE_C_16)` for wide strings.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
```
|
||||
|
||||
---
|
||||
|
|
|
|||
|
|
@ -4,7 +4,26 @@
|
|||
|
||||
## Overview
|
||||
|
||||
Amiga executables use OS structures extensively — `ExecBase`, `Node`, `Process`, `IORequest`, etc. This document describes how to recover and annotate these structures in disassembly by matching field access patterns against NDK39 header offsets.
|
||||
You see `MOVE.L ($17A,A6), A0` in disassembly. You know A6 is SysBase. But `+$17A` — what field is that? Without structure definitions, every offset is just a number. With them, `+$17A` becomes `SysBase->LibList` and the disassembly transforms from arithmetic to narrative.
|
||||
|
||||
Amiga executables are built on a deep stack of OS structures — `ExecBase`, `Node`, `List`, `Task`, `Process`, `IORequest`, `Message`, `MsgPort`. Recovering these structures in disassembly means matching **base register + constant offset** patterns against the NDK 3.9 header definitions. This article covers the methodology, the most commonly encountered structures, and the IDA Pro workflows that automate the process.
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
subgraph "Disassembly"
|
||||
RAW["MOVE.L ($17A,A6), A0"]
|
||||
end
|
||||
subgraph "NDK Headers"
|
||||
HDR["exec/execbase.h<br/>LibList at +0x17A"]
|
||||
end
|
||||
subgraph "Annotated"
|
||||
ANN["SysBase->LibList.lh_Head<br/>= first library node"]
|
||||
end
|
||||
RAW -->|"match offset"| HDR
|
||||
HDR -->|"apply structure type"| ANN
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -105,6 +124,128 @@ Use `File → Load file → Parse C header file` → select `exec/execbase.h`, `
|
|||
|
||||
---
|
||||
|
||||
## Decision Guide — When to Use Each Approach
|
||||
|
||||
| Approach | Speed | Accuracy | Best For |
|
||||
|---|---|---|---|
|
||||
| **`.fd` file mapping** | Instant | 100% (known libs) | Library function identification — not structure recovery |
|
||||
| **Manual offset matching** | 1–5 min per struct | 100% (verified against NDK) | Small structures or one-off field identification |
|
||||
| **IDA Structure subview + `T` hotkey** | 30 sec | 100% (if struct defined) | Batch annotation of known structures |
|
||||
| **Parse C header file** | 1 min setup | 100% | Importing full NDK type system |
|
||||
| **Heuristic: offset clustering** | ~2 min | 70–90% | Unknown structures — group accesses by register, infer field boundaries |
|
||||
|
||||
---
|
||||
|
||||
## Named Antipatterns
|
||||
|
||||
### 1. "The Offset Hallucination"
|
||||
|
||||
**What it looks like** — assuming `($1A,A6)` is always `lib_Version` because the number looks right:
|
||||
|
||||
```asm
|
||||
MOVE.W ($1A,A6), D0 ; looks like version?
|
||||
; Actually: lib_Version is at +$14 (offset 20), NOT +$1A (offset 26)
|
||||
; +$1A = lib_Node.ln_Name (upper word of STRPTR)
|
||||
```
|
||||
|
||||
**Why it fails:** Hex offsets are deceptive. `$14` and `$1A` differ by 6 bytes — one field apart in a packed structure. Without the header definition, off-by-one-field errors are invisible until runtime.
|
||||
|
||||
**Correct:** Always verify against the NDK header. `lib_Version` is at `+$14` (UWORD), not `+$1A`.
|
||||
|
||||
### 2. "The Nested Structure Blur"
|
||||
|
||||
**What it looks like** — accessing `SysBase->LibList.lh_Head` but interpreting it as `SysBase->TaskWait.lh_Head`:
|
||||
|
||||
```asm
|
||||
MOVEA.L ($17A,A6), A0 ; +$17A = LibList (correct)
|
||||
; Not: +$132 = TaskWait — that's a different list entirely
|
||||
```
|
||||
|
||||
**Why it fails:** `SysBase` has multiple `struct List` fields. `LibList` (`+$17A`), `DeviceList` (`+$182`), `TaskReady` (`+$128`), `TaskWait` (`+$132`), `MemList` (`+$280`) — all use the same `lh_Head` access pattern. Without checking the exact offset, you'll identify the wrong list.
|
||||
|
||||
**Correct:** The offset is the discriminator. `+$17A` = LibList, `+$128` = TaskReady. Never guess.
|
||||
|
||||
---
|
||||
|
||||
## Use-Case Cookbook
|
||||
|
||||
### Recover an Unknown Allocator — Trace AllocMem → FreeMem Pair
|
||||
|
||||
```asm
|
||||
; Find the alloc:
|
||||
MOVEQ #$1000, D0 ; size = 4096
|
||||
MOVE.L #$10002, D1 ; MEMF_CLEAR | MEMF_PUBLIC
|
||||
MOVEA.L 4.W, A6
|
||||
JSR (-198,A6) ; AllocMem → D0 = ptr
|
||||
MOVEA.L D0, A4
|
||||
|
||||
; ... later, find the free:
|
||||
MOVEA.L A4, A1
|
||||
MOVEQ #$1000, D0
|
||||
JSR (-210,A6) ; FreeMem(A1=ptr, D0=size)
|
||||
```
|
||||
|
||||
Trace D0 from AllocMem through the function to identify **which struct** is being allocated. If the code then accesses `($14,A4)`, you have a `struct Library` allocation.
|
||||
|
||||
### Batch-Annotate All ExecBase Accesses
|
||||
|
||||
```python
|
||||
# IDA Python: apply ExecBase structure to all SysBase-relative accesses
|
||||
def apply_execbase_structure():
|
||||
sid = idc.get_struc_id("ExecBase")
|
||||
if sid == idc.BADADDR:
|
||||
idc.import_type(-1, "ExecBase")
|
||||
sid = idc.get_struc_id("ExecBase")
|
||||
|
||||
sysbase = idc.get_name_ea_simple("SysBase")
|
||||
if sysbase == idc.BADADDR:
|
||||
print("SysBase not found")
|
||||
return
|
||||
|
||||
# Find all instructions referencing SysBase-relative offsets
|
||||
for xref in idautils.XrefsTo(sysbase):
|
||||
ea = xref.frm
|
||||
# Navigate forward looking for offset(An) operands
|
||||
for i in range(10):
|
||||
ea = idc.next_head(ea)
|
||||
for n in range(2):
|
||||
op = idc.print_operand(ea, n)
|
||||
if op and '(' in op and 'A6' in op:
|
||||
idc.op_stroff(ea, n, sid, 0)
|
||||
print(f"Applied ExecBase at {ea:#010x}")
|
||||
|
||||
apply_execbase_structure()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cross-Platform Comparison
|
||||
|
||||
| Amiga Concept | Win32 Equivalent | Linux ELF Equivalent | Notes |
|
||||
|---|---|---|---|
|
||||
| Structure recovery by offset matching | PDB symbol file (debug info) | DWARF debug info `.debug_info` | Amiga lacks embedded debug info — must match offsets manually |
|
||||
| NDK headers as ground truth | Windows SDK headers + PDB | GLibc headers + DWARF | Same idea: header defines layout, disassembly shows access pattern |
|
||||
| `MOVE.L ($14,A6), D0` | `mov eax, [esi+14h]` | `mov rax, [rbp+0x14]` | Universal pattern: base register + constant offset |
|
||||
| IDA `T` hotkey for struct offset | IDA `T` on x86/ARM too | Same | IDA's struct offset annotation is platform-independent |
|
||||
|
||||
---
|
||||
|
||||
## FAQ
|
||||
|
||||
### How do I identify a struct when the base register changes?
|
||||
|
||||
Track register writes backward. If A4 holds a struct pointer but you can't tell which struct, find the last `MOVEA.L ..., A4`. If it came from `AllocMem`, the size in D0 tells you the struct size — match against known struct sizes from NDK. If it came from `OpenLibrary`, it's a library base.
|
||||
|
||||
### What if the OS version changed the struct layout?
|
||||
|
||||
Commodore maintained binary compatibility — fields were appended, never reordered. An offset that works on Kickstart 1.3 also works on 3.1, because the earlier fields are at the same positions. However, fields added in later versions only exist in those versions. Always check `lib_Version` before accessing fields added after OS 1.3.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
---
|
||||
|
||||
## Exec Node Traversal Loops
|
||||
|
||||
A recurring pattern: walking the `LibList` or `DeviceList`:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue