amiga-bootcamp/05_reversing/static/struct_recovery.md

279 lines
9.2 KiB
Markdown
Raw Permalink Normal View History

[← Home](../../README.md) · [Reverse Engineering](../README.md)
# Recovering Data Structures
## Overview
2026-04-27 18:34:07 -04:00
You see `MOVE.L ($17A,A6), A0` in disassembly. You know A6 is SysBase. But `+$17A` — what field is that? Without structure definitions, every offset is just a number. With them, `+$17A` becomes `SysBase->LibList` and the disassembly transforms from arithmetic to narrative.
Amiga executables are built on a deep stack of OS structures — `ExecBase`, `Node`, `List`, `Task`, `Process`, `IORequest`, `Message`, `MsgPort`. Recovering these structures in disassembly means matching **base register + constant offset** patterns against the NDK 3.9 header definitions. This article covers the methodology, the most commonly encountered structures, and the IDA Pro workflows that automate the process.
```mermaid
graph LR
subgraph "Disassembly"
RAW["MOVE.L ($17A,A6), A0"]
end
subgraph "NDK Headers"
HDR["exec/execbase.h<br/>LibList at +0x17A"]
end
subgraph "Annotated"
ANN["SysBase->LibList.lh_Head<br/>= first library node"]
end
RAW -->|"match offset"| HDR
HDR -->|"apply structure type"| ANN
```
---
---
## The MOVE.L offset(An),Dm Pattern
Structure field accesses appear as:
```asm
MOVEA.L _DOSBase, A6
MOVE.L ($17A,A6), A0 ; SysBase->LibList at offset +0x17A
MOVE.L (A0), A1 ; lh_Head
```
The key is the **base register** and **constant offset**. Match the offset against known structure definitions.
---
## Common Structures and Key Offsets
### `struct ExecBase` (at absolute address `$4`)
| Offset | Field | Type |
|---|---|---|
| +0 | `LibNode` | `struct Library` |
| +0x128 | `TaskReady` | `struct List` |
| +0x132 | `TaskWait` | `struct List` |
| +0x17A | `LibList` | `struct List` |
| +0x182 | `DeviceList` | `struct List` |
| +0x21E | `ChipRevBits0` | `UWORD` |
| +0x280 | `MemList` | `struct List` |
### `struct Node` (8 bytes)
| Offset | Field |
|---|---|
| +0 | `ln_Succ` (next node) |
| +4 | `ln_Pred` (prev node) |
| +8 | `ln_Type` (UBYTE) |
| +9 | `ln_Pri` (BYTE priority) |
| +10 | `ln_Name` (STRPTR) |
List traversal:
```asm
MOVEA.L lh_Head, A0 ; first node
.loop:
TST.L (A0) ; ln_Succ == NULL?
BEQ.S .done
; process node at A0
MOVEA.L (A0), A0 ; A0 = ln_Succ
BRA.S .loop
```
### `struct Process` (extends `struct Task`)
| Offset | Field |
|---|---|
| +0 | `pr_Task` (struct Task) |
| +92 | `pr_MsgPort` |
| +128 | `pr_CLI` (BPTR, non-NULL if CLI) |
| +172 | `pr_SegList` (BPTR to segment list) |
Detection in disassembly:
```asm
MOVE.L ($80,A4), D0 ; pr_CLI at offset +0x80
BEQ.S .wb_launch ; NULL = Workbench
```
### `struct IORequest` / `struct IOStdReq`
| Offset | Field |
|---|---|
| +0 | `io_Message` (struct Message) |
| +20 | `io_Device` |
| +24 | `io_Unit` |
| +28 | `io_Command` (UWORD) |
| +30 | `io_Flags` (UBYTE) |
| +32 | `io_Error` (BYTE) |
| +36 | `io_Length` (ULONG) |
| +40 | `io_Actual` (ULONG) |
| +44 | `io_Data` (APTR) |
| +48 | `io_Offset` (ULONG) |
---
## Annotating Structures in IDA Pro
### Define a structure type:
1. `View → Open Subviews → Local Types``Insert` → paste C struct definition
2. IDA parses the struct and creates a type entry
3. Navigate to the base register in disassembly
4. Press `T` (structure offset) on any `offset(An)` operand
5. Select the struct type → all accesses auto-annotated
### Import NDK39 headers:
Use `File → Load file → Parse C header file` → select `exec/execbase.h`, `exec/tasks.h`, etc. from NDK39.
---
2026-04-27 18:34:07 -04:00
## Decision Guide — When to Use Each Approach
| Approach | Speed | Accuracy | Best For |
|---|---|---|---|
| **`.fd` file mapping** | Instant | 100% (known libs) | Library function identification — not structure recovery |
| **Manual offset matching** | 15 min per struct | 100% (verified against NDK) | Small structures or one-off field identification |
| **IDA Structure subview + `T` hotkey** | 30 sec | 100% (if struct defined) | Batch annotation of known structures |
| **Parse C header file** | 1 min setup | 100% | Importing full NDK type system |
| **Heuristic: offset clustering** | ~2 min | 7090% | Unknown structures — group accesses by register, infer field boundaries |
---
## Named Antipatterns
### 1. "The Offset Hallucination"
**What it looks like** — assuming `($1A,A6)` is always `lib_Version` because the number looks right:
```asm
MOVE.W ($1A,A6), D0 ; looks like version?
; Actually: lib_Version is at +$14 (offset 20), NOT +$1A (offset 26)
; +$1A = lib_Node.ln_Name (upper word of STRPTR)
```
**Why it fails:** Hex offsets are deceptive. `$14` and `$1A` differ by 6 bytes — one field apart in a packed structure. Without the header definition, off-by-one-field errors are invisible until runtime.
**Correct:** Always verify against the NDK header. `lib_Version` is at `+$14` (UWORD), not `+$1A`.
### 2. "The Nested Structure Blur"
**What it looks like** — accessing `SysBase->LibList.lh_Head` but interpreting it as `SysBase->TaskWait.lh_Head`:
```asm
MOVEA.L ($17A,A6), A0 ; +$17A = LibList (correct)
; Not: +$132 = TaskWait — that's a different list entirely
```
**Why it fails:** `SysBase` has multiple `struct List` fields. `LibList` (`+$17A`), `DeviceList` (`+$182`), `TaskReady` (`+$128`), `TaskWait` (`+$132`), `MemList` (`+$280`) — all use the same `lh_Head` access pattern. Without checking the exact offset, you'll identify the wrong list.
**Correct:** The offset is the discriminator. `+$17A` = LibList, `+$128` = TaskReady. Never guess.
---
## Use-Case Cookbook
### Recover an Unknown Allocator — Trace AllocMem → FreeMem Pair
```asm
; Find the alloc:
MOVEQ #$1000, D0 ; size = 4096
MOVE.L #$10002, D1 ; MEMF_CLEAR | MEMF_PUBLIC
MOVEA.L 4.W, A6
JSR (-198,A6) ; AllocMem → D0 = ptr
MOVEA.L D0, A4
; ... later, find the free:
MOVEA.L A4, A1
MOVEQ #$1000, D0
JSR (-210,A6) ; FreeMem(A1=ptr, D0=size)
```
Trace D0 from AllocMem through the function to identify **which struct** is being allocated. If the code then accesses `($14,A4)`, you have a `struct Library` allocation.
### Batch-Annotate All ExecBase Accesses
```python
# IDA Python: apply ExecBase structure to all SysBase-relative accesses
def apply_execbase_structure():
sid = idc.get_struc_id("ExecBase")
if sid == idc.BADADDR:
idc.import_type(-1, "ExecBase")
sid = idc.get_struc_id("ExecBase")
sysbase = idc.get_name_ea_simple("SysBase")
if sysbase == idc.BADADDR:
print("SysBase not found")
return
# Find all instructions referencing SysBase-relative offsets
for xref in idautils.XrefsTo(sysbase):
ea = xref.frm
# Navigate forward looking for offset(An) operands
for i in range(10):
ea = idc.next_head(ea)
for n in range(2):
op = idc.print_operand(ea, n)
if op and '(' in op and 'A6' in op:
idc.op_stroff(ea, n, sid, 0)
print(f"Applied ExecBase at {ea:#010x}")
apply_execbase_structure()
```
---
## Cross-Platform Comparison
| Amiga Concept | Win32 Equivalent | Linux ELF Equivalent | Notes |
|---|---|---|---|
| Structure recovery by offset matching | PDB symbol file (debug info) | DWARF debug info `.debug_info` | Amiga lacks embedded debug info — must match offsets manually |
| NDK headers as ground truth | Windows SDK headers + PDB | GLibc headers + DWARF | Same idea: header defines layout, disassembly shows access pattern |
| `MOVE.L ($14,A6), D0` | `mov eax, [esi+14h]` | `mov rax, [rbp+0x14]` | Universal pattern: base register + constant offset |
| IDA `T` hotkey for struct offset | IDA `T` on x86/ARM too | Same | IDA's struct offset annotation is platform-independent |
---
## FAQ
### How do I identify a struct when the base register changes?
Track register writes backward. If A4 holds a struct pointer but you can't tell which struct, find the last `MOVEA.L ..., A4`. If it came from `AllocMem`, the size in D0 tells you the struct size — match against known struct sizes from NDK. If it came from `OpenLibrary`, it's a library base.
### What if the OS version changed the struct layout?
Commodore maintained binary compatibility — fields were appended, never reordered. An offset that works on Kickstart 1.3 also works on 3.1, because the earlier fields are at the same positions. However, fields added in later versions only exist in those versions. Always check `lib_Version` before accessing fields added after OS 1.3.
---
## References
---
## Exec Node Traversal Loops
A recurring pattern: walking the `LibList` or `DeviceList`:
```asm
; Annotated after struct recovery:
MOVEA.L SysBase, A6
LEA (LibList,A6), A0 ; &SysBase->LibList
MOVEA.L (lh_Head,A0), A1 ; first lib node
.scan:
MOVEA.L (ln_Succ,A1), A2 ; peek next
TST.L A2
BEQ.S .not_found
; compare ln_Name string
MOVEA.L (ln_Name,A1), A0
JSR ___strcmp
TST.L D0
BEQ.S .found
MOVEA.L A2, A1
BRA.S .scan
```
---
## References
- NDK39: `exec/execbase.h`, `exec/tasks.h`, `exec/nodes.h`, `exec/io.h`
- [exec_base.md](../../06_exec_os/exec_base.md) — full ExecBase field listing
- [lists_nodes.md](../../06_exec_os/lists_nodes.md) — MinList/List traversal
- IDA Pro: Structure subview, Local Types, T hotkey for struct offset