amiga-bootcamp/05_reversing/static/asm68k_binaries.md

926 lines
54 KiB
Markdown
Raw Permalink Normal View History

[← Home](../../README.md) · [Reverse Engineering](../README.md)
# Hand-Written Assembly Reverse Engineering — Pure m68k Binaries
## Overview
Unlike compiler-generated code with predictable prologues, frame-pointer conventions, and library-call idioms, hand-written 68000 assembly is **unconstrained**. The author may use any register for any purpose, invent ad-hoc calling conventions, self-modify code, or jump into the middle of instructions. This is the norm for Amiga demos, most pre-1990 games, trackmos, bootblock intros, and hardware-banging utilities — and it demands a fundamentally different reversing strategy than C/C++ binaries.
```mermaid
graph TB
subgraph "Compiler Binary"
CPROLOGUE["LINK A5, #-N<br/>MOVEM.L D2-D7,-(SP)"]
CEXIT["UNLK A5<br/>RTS"]
CLIB["JSR LVO(A6)<br/>predictable ABI"]
end
subgraph "Hand-Written Assembly"
ACUSTOM["Custom calling convention<br/>any register = any purpose"]
AJMP["JMP (A0) / JMP $1234.W<br/>opaque control flow"]
ASMC["Self-modifying code<br/>move.w #imm, (next_insn+2)"]
AHW["Hardware register banging<br/>MOVE.W D0, $DFF180"]
end
CPROLOGUE -.->|"absent"| ACUSTOM
CEXIT -.->|"unpredictable"| AJMP
CLIB -.->|"may skip OS"| AHW
```
---
## Architecture
### What Makes Hand-Written Assembly Different
| Trait | Compiler Output | Hand-Written Assembly |
|---|---|---|
| **Function boundaries** | `LINK`/`UNLK` or `SUBQ`/`ADDQ` pairs | No universal marker; code may flow into data |
| **Calling convention** | Standard ABI (A6=lib base, D0/D1=scratch, A0/A1=scratch) | Author-defined per routine; may repurpose any register |
| **Strings** | `dc.b "text",0` with cross-reference chains | May be XOR-obfuscated, embedded mid-instruction, or stored as bitmaps |
| **Library calls** | `JSR LVO(A6)` with reloc entries | May call via absolute address, JMP table, or custom trap |
| **Loop structures** | `DBcc Dn, label` (counted) or `TST/BEQ` (conditional) | May unroll completely, use address-range compares, or rely on raster timing |
| **Data embedding** | Separate `DATA` hunk | Routinely mixed with code; data tables inside branch-not-taken paths |
### Common Environments
- **Bootblock intros** (1024 bytes, no OS): All registers free, hardware banging only
- **Trackmos / demos**: Often take over the system entirely; disable multitasking; use custom copper lists and blitter queues
- **Games (pre-1992)**: Usually bypass `graphics.library` for speed; hit hardware registers directly
- **Hardware drivers**: Heavy CIA/custom chip register manipulation; interrupt-driven
- **Virus / bootblock payloads**: Deliberately obfuscated; anti-debugging tricks
- **Cracktros / trainer menus**: Small (<4 KB), pre-launch patches to game code, often packed
- **Trackdisk loaders**: Custom DMA-driven disk reading; Rob Northen (RNC) loaders, raw MFM decoders
- **Non-HUNK binaries**: Raw absolute-load code at fixed addresses (e.g., `$C00000` for trapdoor Fast RAM)
- **ROM-resident code**: Kickstart modules, expansion ROMs (DiagROM, SCSI controller firmware)
- **Self-relocating code**: Code that copies and patches itself to run at any address
### The Assembly Author's Toolkit — Common Patterns Across the Demoscene
These patterns recur across hundreds of hand-written Amiga productions. Recognizing them accelerates function identification and purpose deduction.
#### Hardware Base Pointer Convention
Most authors dedicate a register to `$DFF000` for the entire program lifetime. The choice of register is often an **author fingerprint**:
| Register | Common Users | Notes |
|---|---|---|
| **A4** | Majority of demoscene productions | `LEA $DFF000, A4` at program start; all hardware writes use `MOVE.W Dn, $offset(A4)` |
| **A5** | Some demos, trackmos | May conflict with SAS/C A5 frame pointer convention in mixed C+asm code |
| **A6** | Rare — conflicts with exec library base | Only used when the program never calls exec and A6 is freed |
```asm
; The classic demoscene init pattern:
; Save OS registers, take over the machine
MOVE.W $DFF01C, old_intena ; save INTENA state
MOVE.W #$7FFF, $DFF09A ; disable all interrupts
MOVE.W #$7FFF, $DFF09C ; clear all interrupt requests
LEA $DFF000, A4 ; A4 = custom chip base for entire program
; Now all hardware writes are: MOVE.W D0, $XXX(A4)
```
#### Custom Register Offset Tables
Precomputed address tables indexed by effect number dispatch hardware writes without runtime calculation:
```asm
; Effect dispatcher via offset table:
effect_dispatch:
MOVE.W effect_num(PC), D0
ADD.W D0, D0 ; word index
MOVE.W effect_offsets(PC, D0.W), D0
JMP (PC, D0.W) ; jump to effect handler
effect_offsets:
DC.W fx_plasma - effect_offsets
DC.W fx_rotozoom - effect_offsets
DC.W fx_vector3d - effect_offsets
DC.W fx_tunnel - effect_offsets
```
#### Cycle-Counted Sequences
Instruction sequences timed to exact 68000 CPU cycles for per-scanline effects:
```asm
; Color change per scanline — 4-cycle loop (on 68000, fastest possible):
; Each color register write needs: MOVE.W Dn, (Am) = 8 cycles
; Plus: DBF D7, loop = 10 cycles (taken), 12 cycles (not taken)
; A full scanline is ~227 color clocks / 454 CPU cycles on PAL
; This limits color changes to ~50 per scanline at best
raster_colors:
MOVE.W (A0)+, (A4) ; write next color to COLOR00 ($DFF180)
DBF D7, raster_colors ; 10 cycles when taken
```
#### MOVEM.L Bulk Save/Restore
56-byte register dumps to stack for non-standard register preservation — used when a routine needs to save/restore an unusual subset of registers:
```asm
; Save D0-D7 and A0-A6 to stack (15 registers × 4 = 60 bytes):
MOVEM.L D0-D7/A0-A6, -(SP)
; ... body of interrupt handler or complex effect ...
MOVEM.L (SP)+, D0-D7/A0-A6
RTE
```
#### Hand-Optimized Idioms That Confuse Disassemblers
| Idiom | What It Does | Disassembly Trap |
|---|---|---|
| `ADD.W Dn, Dn` | `ASL.W #1, Dn` (multiply by 2) | IDA shows `ADD.W` — the shift intent is invisible |
| `SUB.W Dn, Dn` | `MOVEQ #0, Dn` (clear register) | Same result, but reveals author style |
| `OR.B Dn, Dn` / `Scc` chain | Compare Dn to zero, then set conditionally | Disassembler shows raw ops, not intent |
| `MOVE SR, Dn` | Save CCR across branches | Used instead of recomputing flags; rare in compiler output |
| `SWAP Dn` / `MOVE.W Dn, ...` | Access upper word of 32-bit register | Common in 16-bit coordinate manipulation |
| `EXT.L Dn` | Sign-extend word to long | Indicates 16-bit signed value widening to 32-bit |
| `MOVEQ #0, Dn` over `CLR.L Dn` | Both clear Dn, but MOVEQ is 2 bytes, CLR.L is 2 bytes too | MOVEQ preserves upper bits of address registers? No — author choice |
### Control Flow Archetypes
<!-- TODO: Expand — Mermaid diagrams for each archetype -->
| Archetype | Signature Pattern | Typical In |
|---|---|---|
| **State machine via jump table** | `MOVE.W state(PC), D0` / `ADD.W D0, D0` / `MOVE.W jt(PC, D0.W), D0` / `JMP (PC, D0.W)` | Game AI, effect sequencers, menu systems |
| **VBlank-driven frame loop** | `MOVE.L $6C.W, old_vbl` / `MOVE.L #my_vbl, $6C.W` / main loop waits on flag set by VBlank | Demos, games, any framed application |
| **Copper-interrupt-driven** | `MOVE.L #copper_irq, $68.W` (Level 3 interrupt) / per-scanline effect changes | Raster bars, multiplexed sprites, palette splits |
| **Blitter-continuation via interrupt** | Sets `INTREQ` bit for blitter, interrupt handler chains to next blit in queue | Demos with complex blitter pipelines |
| **Custom event loop (no exec)** | Polling loop reading CIA / custom chip registers directly; no `Wait()` / `WaitPort()` | Games bypassing OS, bootblock intros |
| **Audio-driver callback chain** | Audio interrupt (Level 4) feeds next sample pair from custom module replayer | Protracker/Soundtracker replayers |
#### Protracker Replayer — Reference Architecture
The most commonly found audio subsystem in Amiga binaries. Understanding its internals saves hours of reverse engineering:
```asm
; Standard Protracker replayer entry points:
;
; mt_init — initialize replayer with module data pointer
; mt_music — call once per frame to advance pattern playback
; mt_end — shutdown replayer, restore system state
;
; Registration pattern (CIA-based timing):
; Save old CIA interrupt vector
MOVE.L $6C.W, old_level6 ; Level 6 = CIA-B timer interrupt
; Install replayer interrupt
MOVE.L #mt_irq, $6C.W
; Configure CIA-B Timer A for the desired tempo
MOVE.B #$7F, $BFDD00 ; CIA-B ICR mask
MOVE.B #$81, $BFDD00 ; enable Timer A interrupt
; Set timer period (e.g., 125 bpm → ~17060 cycles between ticks)
MOVE.B #$7F, $BFDE00 ; CIA-B Timer A low byte
MOVE.B #$42, $BFDE00 ; CIA-B Timer A high byte
; The interrupt handler (mt_irq):
mt_irq:
MOVEM.L D0-D7/A0-A6, -(SP) ; save all registers
BSR mt_music ; advance replayer state
MOVEM.L (SP)+, D0-D7/A0-A6 ; restore all registers
MOVE.W #$0008, $DFF09C ; acknowledge CIA-B interrupt
RTE
```
**Key identification markers**:
- Writes to `$BFDD00`/`$BFDE00` (CIA-B registers) — CIA timer setup
- `MOVE.L #handler, $6C.W` — Level 6 interrupt vector installation
- `MOVEM.L D0-D7/A0-A6, -(SP)` in the handler — all registers saved (standard for audio ISRs)
- Audio register writes (`$DFF0A0``$DFF0D0`) — AUDxLCH/LCL/PER/VOL
- Signature `mt_` or `_mt_` function names in HUNK_SYMBOL if available
---
## Identification: Detecting Hand-Written Assembly
> [!WARNING]
> Skip this section if you already know the binary is hand-written. The identification rules are covered in [m68k_codegen_patterns.md](m68k_codegen_patterns.md) and [compiler_fingerprints.md](../compiler_fingerprints.md).
### Heuristics That Suggest Assembly
<!-- TODO: Expand — pattern catalog with IDA script snippets, binary scoring system -->
- **No `LINK` or `SUBQ.L #N,SP`** in the entire binary
- **No `JSR LVO(A6)` patterns** — library calls are `JSR absolute_address` or `JMP (table, Dn.W)`
- **Hardware register constants** (`$DFF000``$DFF200`, `$BFE000``$BFEF01`) appear as immediates
- **`MOVEM.L` used aggressively** for per-routine save/restore with non-standard register sets
- **`RTE` without preceding `MOVE` to SR** — custom interrupt handling
- **`ORI #$0700, SR`** / `ANDI #$F8FF, SR` — direct interrupt level manipulation
- **`JMP (A0)` or `JSR (A0)`** with dynamically computed target — jump tables, state machines
- **`LEA offset(PC), An`** used for data tables rather than `MOVE.L #absolute_address, An` — PC-relative addressing for position-independent data
- **`STOP #$2xxx`** — wait for interrupt without OS involvement
- **`MOVE USP, An` / `MOVE An, USP`** — user stack pointer manipulation, almost never generated by compilers
- **`MOVEC`** (68010+) to/from VBR, SFC, DFC — supervisor-level register access
- **`RESET` instruction** — rarely used outside hand-written hardware init code
### Binary Scoring: Assembly Confidence
<!-- TODO: Add scoring table — each heuristic contributes points toward a "hand-written confidence" score -->
---
## Decision Guide: Choosing Your Approach
```mermaid
graph TD
START["Binary loaded in IDA/Ghidra"]
HAS_SYMBOLS{"Has HUNK_SYMBOL<br/>debug info?"}
HAS_OS_CALLS{"Uses OS library<br/>calls?"}
HAS_STRINGS{"Has readable<br/>strings?"}
HAS_CRUNCHER{"Packed / crunched<br/>(PowerPacker, Imploder)?"}
PURE_ASM["Pure assembly methodology"]
START --> HAS_CRUNCHER
HAS_CRUNCHER -->|"Yes"| UNPACK["Unpack first<br/>→ see exe_crunchers.md"]
HAS_CRUNCHER -->|"No"| HAS_SYMBOLS
HAS_SYMBOLS -->|"Yes"| NAMED["Name functions from symbols<br/>then trace logic"]
HAS_SYMBOLS -->|"No"| HAS_OS_CALLS
HAS_OS_CALLS -->|"Yes"| OS_ANCHOR["Anchor on library calls<br/>→ identify callers by xref"]
HAS_OS_CALLS -->|"No"| HAS_STRINGS
HAS_STRINGS -->|"Yes"| STR_ANCHOR["Anchor on string xrefs<br/>→ trace outward"]
HAS_STRINGS -->|"No"| PURE_ASM
```
### When to Use Pure Assembly Methodology vs When to Fall Back
<!-- TODO: Expand — decision matrix -->
| Scenario | Recommended Approach |
|---|---|
| Binary has zero library calls, heavy custom registers | Pure assembly methodology (this article) |
| Binary has some library calls mixed with hardware banging | Hybrid: anchor on library xrefs first, then pure asm for hardware sections |
| Binary is packed/crunched | Unpack first, then re-evaluate |
| Binary has HUNK_SYMBOL debug info | Standard RE workflow with named functions |
| Binary is a ROM module (Kickstart) | ROM-specific workflow (+ known entry points from exec Scan) |
---
## Methodology
### Phase 1: Triage
1. **Dump hunk structure**: `hunkinfo` shows CODE/DATA/BSS layout and relocation entries. Raw binaries (no HUNK header) skip directly to step 7.
2. **Scan for hardware registers**: grep for `$DFF`, `$BFE`, `$BFD` patterns. A binary that touches `$DFF000``$DFF1FE` directly is almost certainly hand-written or a game bypassing the OS.
3. **Find the entry point**: Resident tag `RT_MATCHWORD` ($4AFC) / `HUNK_HEADER` entry for HUNK; raw bootblock starts executing at `$7C00` in RAM after ROM loads it.
4. **Identify interrupt vectors**: `$60``$7C` offsets in hunk 0 — these are the m68k exception vectors (Bus Error through Level 7 Autovector). Hand-written binaries often overwrite them.
5. **Detect cruncher/packer**: Scan for known decrunch stub signatures:
| Cruncher | Signature Bytes (at or near start) | Notes |
|---|---|---|
| **PowerPacker** | `$42` followed by `MOVE.L`/`LEA` pattern | Uses powerpacker.library; header contains original size |
| **Imploder** | `$49` (often) | ATN!Imploder by Animators Of Death; smaller header than PowerPacker |
| **Shrinkler** | Context-mixing LZ; no fixed magic | Very high compression ratio; decrunch is slow (minutes on 7 MHz) |
| **ByteKiller** | `BRA.S` over data, then `MOVEM.L` pattern | Simple LZ variant; common in 19881990 productions |
| **CrunchMania** | `CR![version]` text marker | One of the fastest decrunchers; popular for 4K intros |
| **TetraPack** | Multi-part header | Compresses data+relocs separately |
6. **Check for overlay system**: Look for `HUNK_OVERLAY` or custom overlay loader at entry. The overlay manager swaps code segments from disk — the binary on disk is larger than what's in memory at any moment.
7. **Identify non-HUNK binary type**:
- **Bootblock**: Exactly 1024 bytes (2 disk blocks), loaded to `$7C00` by Kickstart ROM
- **Absolute-load blob**: Loaded to a fixed address (often `$C00000` for trapdoor Fast RAM)
- **ROM module**: Has `RT_MATCHWORD` resident tag; part of Kickstart or expansion ROM
- **Trackmo loader**: First sector contains a custom loader, not a bootblock — the loader then reads the rest of the demo from disk
### Phase 2: Map Control Flow
- **Chase `JMP`/`JSR` chains** from entry point outward. Mark each reached address. When you stop finding new addresses, the unreachable remainder is potential data or SMC target.
- **Identify jump tables**: `JMP (A0, Dn.W)` or `MOVE.W offset(PC, Dn.W), D0``JMP (PC, D0.W)`. Count table entries by looking at the range of Dn values. IDA needs manual jump table specification for these.
- **Cross-reference data tables**: values loaded via `LEA table(PC), An`. These tables are often copper lists, sprite control words, or audio sample pointers.
- **Detect self-modifying code**: Any `MOVE`/`LEA` targeting an address within the CODE hunk boundaries is an SMC candidate. Flag and verify with dynamic analysis.
- **Identify interrupt service routines**: Trace from vector table addresses. ISRs end with `RTE`, not `RTS`. They typically save/restore many registers at entry/exit.
- **Map copper list interactions**: `COP1LC`/`COP2LC` writes indicate copper list switches. A `MOVE.L #new_list, $DFF080` (COP1LC write) triggers the copper to jump to a new instruction list — this is how demos switch between effects mid-frame.
- **Trace blitter wait loops**: `BTST #6, $DFF002` / `BNE wait` — the standard "wait for blitter" pattern (polling DMAB_BLTDONE in DMACONR). Also `TST.B $DFF000` loop (wait for blitter via custom chip bus test).
- **Flag unreachable code**: Code between `RTS`/`RTE`/`JMP` that isn't directly branched to — potential data, SMC target, or second-stage code loaded later.
- **Identify Level 3 interrupt chains**: Music replayers and blitter queues commonly hook into the vertical blank interrupt (Level 3). The handler dispatches to multiple subscribers — find the dispatch loop to understand the full interrupt architecture.
### Phase 3: Reconstruct Calling Conventions
- **Map per-routine register usage**: For each identified function, track:
- Which registers are **preserved** (saved/restored via `MOVEM.L` or stack pushes). The `MOVEM.L` save mask encodes this explicitly.
- Which registers are **destroyed** (modified without save). These are the function's scratch/output registers.
- Which registers hold **input parameters**. Look for registers used without prior initialization.
- Which registers hold **return values**. D0 is conventional even in hand-written code, but not guaranteed.
- **Identify custom ABIs**: Some authors consistently use e.g., A2=data pointer → data segment base, A3=copper list cursor, A4=hardware base ($DFF000), D7=scratch counter. These conventions are stable across a single author's body of work.
- **Build a register allocation map**: Color-coded table of which registers carry which meaning across the program. This is the single most valuable artifact for understanding hand-written asm.
- **Detect authorial fingerprints**: Consistent register conventions + coding idioms (e.g., always using `MOVEQ #0, Dn` over `CLR.L Dn`) suggest a single author or codebase reuse. This matters for provenance and for predicting conventions in unreversed sections.
- **Watch for `USP` manipulation**: `MOVE USP, An` / `MOVE An, USP` is almost never generated by compilers. It indicates the author is using the User Stack Pointer for a second stack (common in context-switching code, coroutines, or task systems).
### Phase 4: Reconstruct Data Structures
<!-- TODO: Expand — struct reconstruction for non-C binaries -->
- **Copper list format**: 3-word instructions (IR1, IR2, data) or 2-word wait/move pairs
- **Sprite control words**: `SPRxPOS`/`SPRxCTL` word pairs, attached sprite mode detection
- **Blitter minterm lookup tables**: Precomputed blitter operation descriptions
- **Audio sample tables**: Period/waveform pointer/volume structures for music replayers
- **Custom module formats**: Pattern data, sample lists, effect command tables for Protracker/Soundtracker variants
- **Bitmap/bitplane layouts**: Interleaved vs linear, planar depth detection from blitter source/dest usage
- **Custom BSS-like allocations**: Large zeroed regions used as frame buffers, audio buffers, or look-up tables
### Phase 5: Hardware Interaction Mapping
<!-- TODO: Expand — custom chip register usage analysis -->
For each custom chip register touched, document:
- **Which register** (address)
- **From where** (code location)
- **In what sequence** (interaction with other register writes)
- **Purpose** (deduced from context: blitter setup, copper list switch, audio start, sprite positioning)
Build a **hardware register access matrix**:
<!-- TODO: Add table template -->
| Register | Writes From | Reads From | Deduced Purpose |
|---|---|---|---|
| `$DFF058` (BLTCON0) | `$01234`, `$05678` | — | Blitter operation setup |
| `$DFF096` (DMACON) | `$00123` | `$04567` | DMA channel enable/disable |
| ... | ... | ... | ... |
### Phase 6: Annotate
<!-- TODO: Expand — IDA/Ghidra annotation workflow for asm binaries -->
- **Rename functions**: Descriptive names based on deduced purpose (`vbl_irq_handler`, `blitter_queue_submit`, `copper_list_build`)
- **Add comments**: Document register conventions at function entry, magic constants, hardware register purposes
- **Create struct types**: For custom data structures discovered in Phase 4
- **Mark non-code regions**: Force IDA/Ghidra to treat copper lists, sprite data, audio samples as data, not code
- **Cross-reference hardware registers**: Create named constants for all `$DFFxxx`/`$BFExxx` addresses in the database
- **Build a call graph**: Mermaid diagram of the full control flow for documentation
### Phase 7: Dynamic Verification
<!-- TODO: Expand — FS-UAE debugger methodology -->
- **Breakpoint on custom chip registers**: Verify that register writes occur at expected times
- **Watchpoint on memory buffers**: Confirm copper list format, audio sample layout
- **Trace mode**: Follow execution through a single frame to verify control flow reconstruction
- **Modify-and-test**: Patch the binary and run it — if it breaks, your understanding was incomplete
- **Compare static vs dynamic**: Does the code path you predicted match what actually executes?
---
## Tool-Specific Workflows
<!-- TODO: Expand — detailed walkthroughs for each tool -->
### IDA Pro
<!-- TODO: IDA-specific: HUNK loader quirks, auto-analysis overrides, scripting for jump table resolution, dealing with data-in-code sections, creating custom register name enums -->
### Ghidra
<!-- TODO: Ghidra-specific: Amiga plugin capabilities, 68k SLEIGH processor module limitations, script-based annotation, bookmarking hardware registers -->
### FS-UAE Debugger
<!-- TODO: FS-UAE debugger: attaching to running demo, breakpoints on custom chip addresses, memory watchpoints, trace output parsing, cycle-count verification -->
### Command-Line Pre-Analysis Pipeline
<!-- TODO: hunkinfo → custom Python scanner → IDA/Ghidra import workflow -->
---
## Best Practices
<!-- TODO: Numbered list of actionable recommendations -->
1. **Never assume the ABI** — document the actual calling convention before tracing callers
2. **Start from the entry point and work outward** — don't try to understand everything at once
3. **Identify hardware register usage before control flow** — knowing which chips are used narrows the purpose
4. **Treat every `MOVE` to an absolute address as a potential self-modifying code write** — until proven otherwise
5. **Build a mermaid diagram of the control flow** — it reveals dead code, missing connections, and loop structures
6. **Cross-reference relocation entries with code** — relocs tell you which addresses matter
7. **Don't trust auto-analysis on mixed code/data sections** — manually define code/data boundaries
8. **Run the binary in an emulator** — some behaviors (self-modifying code paths, copper effects) are invisible in static analysis
9. **Look for known signatures first** — Protracker replayers, decrunch stubs, common macro libraries leave distinctive patterns
10. **Document your register map as you work** — it prevents costly re-analysis when you realize A3 was actually a struct pointer
---
## Antipatterns
### 1. The Compiler Assumption
**Wrong**: Assuming `A6` holds a library base, `D0`/`D1` are scratch, and `A0`/`A1` are pointer temps.
**Why it fails**: Hand-written code may use `A6` as a general-purpose data register, `D6` as a frame pointer, or any other non-standard assignment. The author may have declared their own calling convention documented nowhere.
<!-- TODO: Add bad/good code pair -->
### 2. The Prologue Scanner
**Wrong**: Scanning for `LINK A5` or `SUBQ.L #N,SP` to find function boundaries.
**Why it fails**: Hand-written assembly may have no standard function entry/exit markers. A routine might start with `MOVEM.L`, a label, or just fall through from the previous block.
<!-- TODO: Add bad/good code pair -->
### 3. The String Hop
**Wrong**: Assuming `LEA _string(PC), A0` means A0 points to a C string.
**Why it fails**: Hand-written code may use `LEA` to point to bytecode tables, sprite data, copper lists, or packed structures. The "string" might be a custom encoding.
<!-- TODO: Add bad/good code pair -->
### 4. The Register Reuse Confusion
**Wrong**: Assuming a register used in one context retains the same meaning throughout the program.
**Why it fails**: Hand-written asm aggressively reuses registers. The same D0 might be a loop counter in one block, an audio sample value in the next, and a scratch temporary in a third — all within 50 instructions. You must re-derive register meaning at each basic block.
<!-- TODO: Add bad/good code pair -->
### 5. The Disassembly Loop Trap
**Wrong**: Letting IDA's auto-analysis recursively disassemble from every possible entry point.
**Why it fails**: Mixed code/data sections cause IDA to decode data as instructions, creating phantom functions from copper lists or audio samples. This pollutes the symbol table with nonsense and obscures real control flow.
<!-- TODO: Add bad/good code pair -->
### 6. The Constant-as-Code Mistake
**Wrong**: Treating jump table offsets, copper list data, or sprite control words as instructions.
**Why it fails**: IDA/Ghidra don't know the difference between `$0180` (a copper WAIT for line 0) and `MOVE.B D0, D0` (which happens to encode as `$1000`). Without manual intervention, hardware data tables get disassembled into garbage.
<!-- TODO: Add bad/good code pair -->
### 7. The One-Pass Delusion
**Wrong**: Attempting linear top-to-bottom analysis and expecting to understand everything on the first pass.
**Why it fails**: Hand-written asm often uses forward references, self-modifying code patched by an earlier init routine, or data tables that only make sense after you understand the code that consumes them. Reverse engineering is inherently iterative.
<!-- TODO: Add bad/good code pair -->
### 8. The MOVEM Black Box
**Wrong**: Treating `MOVEM.L D0-D7/A0-A6, -(SP)` / `MOVEM.L (SP)+, D0-D7/A0-A6` as opaque blocks.
**Why it fails**: Understanding which registers are saved and restored tells you the function's register contract. A routine that saves D5-D7/A4-A5 preserves those across its call — they likely carry important state (frame counter, hardware base pointer, data cursor).
<!-- TODO: Add bad/good code pair -->
---
## Pitfalls
### 1. Assuming the OS Is Present
<!-- TODO: Expand — add worked example from real bootblock/demo code -->
```asm
; This works on a running system:
MOVE.L 4.W, A6 ; SysBase
JSR LVO(-198, A6) ; OpenLibrary
```
```asm
; But in a bootblock or demo, $4.W may contain garbage
; and libraries haven't been initialized yet.
; The code might be:
MOVE.L #$DFF000, A5 ; custom chip base, not SysBase
JSR _custom_init(PC) ; custom initialization
```
### 2. Misreading Jump Tables
Hand-written jump tables frequently use PC-relative indirect jumps with custom offsets that IDA doesn't auto-resolve.
<!-- TODO: Add worked example — MOVE.W jt(PC, D0.W), D0 / JMP (PC, D0.W) walkthrough -->
### 3. Self-Modifying Code Deception
```asm
; The code you see is NOT what executes:
MOVE.W #$4E71, (next_insn+2, PC) ; patch a NOP into the next instruction
next_insn:
CMPI.W #$0000, D0 ; becomes NOP at runtime
```
<!-- TODO: Expand with detection methodology — FS-UAE trace comparison, pattern scanning -->
### 4. Copper List Misidentification
Copper instructions are 2-word pairs that look like MOVE instructions in disassembly:
```asm
; A copper list at $20000 decoded as instructions by IDA:
; DC.W $0180, $0000 → OR.B #$80, D0 / OR.B #0, D0 (garbage!)
; DC.W $0182, $0FFF → OR.B #$82, D0 / OR.B #$FF, D0 (more garbage)
; DC.W $FFFF, $FFFE → invalid opcode or data
;
; Correct interpretation:
; $0180, $0000 = WAIT for line 0 (VP=$00, HP=$00)
; $0182, $0FFF = WAIT for line 0, HP=$0F (standard copper wait)
; $FFFF, $FFFE = END of copper list (WAIT forever — never triggers)
```
**Detection methodology**:
1. `COP1LC`/`COP2LC` writes give you the copper list address — start your data definition there
2. Copper instructions come in **pairs of 16-bit words**. IR1 (first word) encodes the operation or register address; IR2 (second word) is the data or WAIT position.
3. **WAIT**: IR1 bit 0 = 1. Decode VP (bits 815 of IR1, bits 07 of IR2), HP (bits 17 of IR1, bits 815 of IR2).
4. **MOVE**: IR1 bit 0 = 0. IR1 is the register address ($DFFxxx), IR2 is the value to write.
5. A `$FFFF, $FFFE` pair terminates the list.
6. Mark the entire copper list address range as **data**, not code. Create an array of 4-byte copper instruction structs in IDA/Ghidra.
### 5. CIA Timer Code Confusion
CIA register access (`$BFE001``$BFEF01` for CIAA, `$BFD000``$BFDFFF` for CIAB) looks like any other memory access, but the TOD clock read sequence and timer control register patterns are distinctive:
```asm
; CIA-A Timer A setup (often used for timing in games/demos):
MOVE.B #$7F, $BFEE01 ; CIA-A ICR — clear all pending interrupts
MOVE.B #$81, $BFEE01 ; CIA-A ICR — enable Timer A interrupt
MOVE.B #low_byte, $BFE401 ; CIA-A Timer A low byte
MOVE.B #high_byte, $BFE501 ; CIA-A Timer A high byte
; CIA-B Timer A/B setup (used by Protracker replayers!):
MOVE.B #$7F, $BFDD00 ; CIA-B ICR — clear pending
MOVE.B #$81, $BFDD00 ; CIA-B ICR — enable Timer A
MOVE.B #lo, $BFDE00 ; CIA-B Timer A low (adjacent to CIA-B base $BFD000)
; Common mistake:
; MOVE.B $BFE801, D0 → reading CIAA SDR (serial data register) — could be
; mistaken for keyboard data, but it's actually the serial port.
; Keyboard data is $BFEC01 (CIAA parallel port).
```
**Key CIA registers for RE identification**:
| Register | Address | Purpose |
|---|---|---|
| CIAA ICR | `$BFEE01` | Interrupt Control Register — enables/disables CIA-A interrupts |
| CIAA Timer A Lo | `$BFE401` | Timer A low byte |
| CIAA Timer A Hi | `$BFE501` | Timer A high byte |
| CIAB ICR | `$BFDD00` | Interrupt Control Register — enables CIA-B interrupts (used by Protracker!) |
| CIAB Timer A Lo | `$BFDE00` | Timer A low byte (Protracker tempo control) |
| CIAB Timer A Hi | `$BFDF00` | Timer A high byte |
### 6. Blitter Queue Confusion
Blitter register writes (`BLTCON0`, `BLTSIZE`, etc.) look like ordinary memory stores to IDA. Without understanding that these are I/O registers, the disassembly shows meaningless `MOVE.W D0, abs_addr` sequences:
```asm
; This looks like garbage writes to random addresses:
MOVE.W #$09F0, $DFF040 ; BLTCON0 = use A,B,C channels, minterm=$F0
MOVE.W #$0000, $DFF042 ; BLTCON1 = no fill, no line mode
MOVE.W #$FFFF, $DFF044 ; BLTAFWM = first word mask (all bits)
MOVE.W #$FFFF, $DFF046 ; BLTALWM = last word mask (all bits)
MOVE.L #src, $DFF050 ; BLTAPT = source A pointer
MOVE.L #dst, $DFF054 ; BLTDPT = destination D pointer
MOVE.W #0, $DFF064 ; BLTAMOD = source A modulo (0 = linear)
MOVE.W #0, $DFF066 ; BLTDMOD = dest D modulo
MOVE.W #(h<<6)|w, $DFF058 ; BLTSIZE = start blit! (writing this triggers DMA)
; But this is a standard blitter rectangle copy. The register write ORDER
; is fixed: BLTCON0→BLTCON1→BLTAFWM→BLTALWM→Pointers→Modulos→BLTSIZE.
; BLTSIZE is always LAST — writing it starts the blit.
```
**How to identify a blitter operation**:
1. The sequence always ends with a write to `$DFF058` (BLTSIZE) — this is the trigger
2. `BLTCON0` ($DFF040) encodes the minterm and active channels (bits 815 = minterm, bit 12=D, bit 11=C, bit 10=B, bit 9=A)
3. Pointer registers ($DFF048$DFF054) hold source/destination addresses — these are your key to understanding what data is being moved
4. The blit size `(h<<6)|w` in BLTSIZE: height in upper 10 bits, width in lower 6 bits (width is in words, 0 = 64 words)
5. Blitter wait: `BTST #6, $DFF002` (bit 6 of DMACONR = DMAB_BLTDONE) — polls until blitter finished
### 7. MOVEM Register Tracking Across Long Spans
<!-- TODO: A MOVEM.L save at the top of a function and a matching restore 200 instructions later is easy to miss. Missing it means you think registers survive the call when they're actually clobbered. -->
### 8. Code Embedded in Interrupt Vector Table
<!-- TODO: The vector table at $60-$7C (hunk offset) may contain short code sequences instead of pointers. A `BRA.W` at the vector location is valid — it jumps directly to the handler without an intermediate pointer. IDA may treat these as separate functions. -->
### 9. Dual-Playfield Register Set Confusion
<!-- TODO: Dual playfield uses separate sets of bitplane pointers (BPL1PT vs BPLxPT). Writes to both sets look like redundant operations but serve different playfields. -->
### 10. Stack-Based State Machines
Some hand-written code uses the stack as a state machine — pushing return addresses that represent state transitions, using `RTS` as a computed goto:
```asm
; Instead of a switch statement, the author pushes state transition addresses:
MOVE.L #STATE_IDLE, -(SP) ; push initial state
...
STATE_DISPATCH:
RTS ; "return" to the state on top of stack
STATE_IDLE:
; ... handle idle ...
MOVE.L #STATE_PLAYING, -(SP) ; push next state
BRA STATE_DISPATCH
STATE_PLAYING:
; ... handle playing ...
MOVE.L #STATE_PAUSED, -(SP) ; push next state
BRA STATE_DISPATCH
```
This pattern breaks all standard call/return analysis because `RTS` doesn't return to a caller — it jumps to the next state. IDA/Ghidra see `RTS` as a function exit and stop disassembling.
**Detection**: Look for `MOVE.L #addr, -(SP)` or `PEA addr(PC)` (push effective address) followed by `RTS` (or a branch to an `RTS`). These are state pushes, not function call setups.
### 11. Absolute Address Dependencies
Code that assumes a fixed load address (common in non-HUNK binaries) will break if relocated. For HUNK binaries, relocation entries tell you which absolute addresses must be patched at load time. Non-HUNK binaries lack relocation metadata entirely.
```asm
; Absolute dependency example — works only at $C00000:
LEA $C01000, A0 ; data at fixed offset from load address
JSR $C00500 ; subroutine at fixed address within binary
; For a HUNK binary, these would be:
LEA _data(PC), A0 ; PC-relative (no relocation needed)
JSR _subroutine(PC) ; PC-relative
```
**Critical**: Bootblock code at `$7C00` uses absolute JMP/JSR within the 1024-byte range. If you relocate the code for analysis, patch all absolute addresses or analyze in-place at the original address.
---
## Use-Case Cookbook
### Pattern 1: Finding the Main Loop in a Demo
<!-- TODO: Step-by-step walkthrough — follow entry point, find VBlank handler, identify frame counter increment, trace back to main loop that waits on frame counter. IDA Python script to automate. -->
### Pattern 2: Identifying a Custom Interrupt Handler
<!-- TODO: Walkthrough — grep for writes to $6C.W/$68.W/$70.W (vector table), trace to the handler code, identify RTE, document register saving convention. IDA Python to auto-detect. -->
### Pattern 3: Reconstructing a Jump Table
<!-- TODO: Walkthrough — find MOVE.W jt(PC, Dn.W), D0 / ADD.W D0, D0 / JMP (PC, D0.W) pattern, count entries, resolve offsets, rename targets. IDA Python script. -->
### Pattern 4: Detecting Self-Modifying Code with IDAPython
<!-- TODO: Walkthrough — scan for instructions that compute addresses within the CODE segment and write to them, flag as potential SMC, cross-reference with dynamic trace. -->
### Pattern 5: Identifying a Protracker Replay Routine
The most commonly found audio subsystem in Amiga binaries. Here's the full identification workflow:
1. **Find the CIA interrupt vector write**: Search for `MOVE.L #xxx, $6C.W` — this installs the Level 6 (CIA-B timer) interrupt handler used by Protracker for tempo.
2. **Identify the CIA-B timer setup**: `MOVE.B #$7F, $BFDD00` / `MOVE.B #$81, $BFDD00` — this configures CIA-B to generate timer interrupts.
3. **Trace to the interrupt handler**: The handler saves ALL registers (`MOVEM.L D0-D7/A0-A6, -(SP)`), calls the replayer tick function, then restores all and does `RTE`.
4. **Find the audio register writes**: Look for writes to `$DFF0A0``$DFF0D0` (AUDxLCH/LCL/PER/VOL). The pattern `MOVE.L sample_ptr, $DFF0A0` / `MOVE.W period, $DFF0A6` / `MOVE.W vol, $DFF0A8` is the per-channel audio update.
5. **Identify effect command dispatch**: A `MOVE.W effect_cmd, D0` / `ANDI.W #$0F, D0` / `ADD.W D0, D0` / `JMP (effect_table, D0.W)` pattern dispatches to arpeggio, portamento, vibrato, etc. handlers.
6. **Map the pattern data layout**: The replayer reads pattern data via sequential `MOVE.B (A0)+` — map the track/note mapping. Standard format: 4 bytes per note (upper nibble = sample number, lower 12 bits = period).
**IDA Python script fragment** to auto-detect Protracker replayers:
```python
# Search for the Level 6 vector installation pattern:
# MOVE.L #handler, $6C.W = 21FC xxxx xxxx 006C
ea = idaapi.find_binary(0, BADADDR, "21 FC ?? ?? ?? ?? 00 6C", 16, SEARCH_DOWN)
if ea != BADADDR:
handler = Dword(ea + 2)
print(f"Found Level 6 interrupt handler at ${ea:08X} → ${handler:08X}")
```
### Pattern 6: Reversing a Bootblock Virus
Bootblock viruses are the ideal entry point for learning Amiga RE — they're small (1024 bytes), self-contained, and exercise key system mechanisms:
#### Lamer Exterminator (October 1989)
- **Size**: 1024 bytes (exactly 2 disk blocks)
- **Residence**: Installs itself in memory, hooks system vectors
- **Infection vector**: Writes itself to any write-enabled disk's bootblock during disk access
- **Damage routine**: After activation, overwrites victim bootblocks 84 times with the string `"LAMER!"` — this trashes the disk
- **CoolCapture**: Uses the CoolCapture vector for post-reset survival — after a warm reset, the virus re-activates from the captured state
- **Detection text**: Sometimes leaves identifiable strings in the bootblock
#### SADDAM Bootblock Virus
- **Size**: 1024 bytes
- **Residence**: Copies itself to `$7F000` in memory (just below the 512KB Chip RAM boundary)
- **Interrupt hooking**: Hooks Level 3 interrupt (Vertical Blank/Copper/Blitter) via the interrupt vector table
- **Infection trigger**: First "read Rootblock" command after a reset — this infects any disk accessed after boot
- **Stealth**: Writes the original bootblock back to disk when the rootblock is read (hiding its presence)
- **System modification**: Clears `CoolCapture`, `KickTagPtr`, and `KickCheckSum` — disables the system's ability to detect bootblock changes
- **Anti-detection text**: Contains the misleading string `"A2000 MB Memory Controller V2"` to disguise itself as a hardware ROM
- **Damage trigger**: After ~30,000 interrupt calls, crashes the system by showing an alert in a Level 3 interrupt context
#### Common Virus RE Workflow
1. **Extract the bootblock**: The first 1024 bytes of an infected disk (blocks 01)
2. **Determine load address**: Bootblocks are loaded to `$7C00` by the Kickstart ROM
3. **Identify the infection mechanism**: Look for `DoIO()` / `SendIO()` calls to `trackdisk.device` for writing back to disk
4. **Find the residency mechanism**: `CoolCapture`, `KickTagPtr` manipulation, or RAM copy to `$7F000` + vector hooking
5. **Trace the trigger condition**: What event activates the virus? Timer count, disk access count, specific command?
6. **Document the payload**: Does it corrupt data? Display a message? Overwrite bootblocks?
### Pattern 7: Finding the Decrunch Stub in a Packed Demo
The decrunch stub is the gateway to the real binary. Finding and understanding it is prerequisite to all further analysis:
**Identification by signature**:
| Cruncher | Magic/Pattern | Decrunch Stub Size | Notes |
|---|---|---|---|
| **PowerPacker** | `$42` followed by LEA/MOVE pattern near entry | ~200300 bytes | Uses powerpacker.library; `ppDecrunch()` is the library call |
| **Imploder** | Entry has `MOVE.L D0, -(SP)` / `LEA xxx(PC), A0` pattern | ~300400 bytes | ATN!Imploder; slower decompression, better ratio than early PP |
| **Shrinkler** | Entry starts with context-mixing setup code | ~2KB | Extremely high ratio; decrunch takes minutes on 7 MHz 68000 |
| **ByteKiller** | Short BRA.S over header data, then MOVEM.L pattern | ~100 bytes | Simple LZ variant; very common in 19881991 productions |
| **CrunchMania** | String `"CR!"` at or near entry | ~150 bytes | Fastest decruncher; popular for 4K intros |
**Decrunch strategy**:
1. Identify the stub: The first code that executes after the entry point. It reads packed data and expands it to a destination address.
2. Let the stub run in an emulator: Set a breakpoint after the decrunch loop completes (look for the `JMP` or `JSR` to the unpacked entry point).
3. Dump the decrunched memory: The real binary is now in RAM. Save it for static analysis.
4. Optionally: Write an unpacker script — for known formats, run the original cruncher's decruncher against the packed data in a standalone tool.
**Rob Northen Copylock / Trace Vector Decoder (TVD)**:
A special case that appears like a cruncher but is actually a protection system:
- Encrypted code is executed one instruction at a time using the 68000 **trace exception**
- The trace handler (interrupt vector `$24`) decrypts the next instruction, executes it, then sets the trace bit again
- This prevents static disassembly — you only see the encrypted bytes and the trace handler, not the real code
- **Detection**: `MOVE #$8000, SR` (set trace bit), `ORI #$8000, SR` in the entry code, plus a custom handler at vector `$24`
- **Solution**: Let it execute in FS-UAE with a trace logger, or single-step through and record each decrypted instruction
### Pattern 8: Identifying a Custom Memory Allocator
<!-- TODO: Walkthrough — game/demo custom heap management; find the alloc/free routines by looking for LINK-like constructs (linked list of free blocks) without library calls; track the allocation pattern to understand memory layout. -->
### Pattern 9: Reconstructing a Blitter Queue
<!-- TODO: Walkthrough — identify blitter register write sequences (BLTCON0, BLTSIZE), find the queue submission routine, map the queue data structure, trace blitter-interrupt continuation. -->
### Pattern 10: Recovering a Sprite Multiplexer
<!-- TODO: Walkthrough — copper list sprite pointer updates per raster line, sprite control word pairs, attached sprite mode detection, mapping which logical sprite occupies which scanline range. -->
### Pattern 11: Extracting a Custom Module Replayer
<!-- TODO: Walkthrough — identifying pattern data format, sample table layout, effect command dispatch; documenting the custom format to enable playback or conversion to standard Protracker MOD. -->
### Pattern 12: Tracing a Trackloader
<!-- TODO: Walkthrough — trackdisk.device bypass, raw MFM decoding in software, custom DSKSYNC-based sync word detection, multi-revolution loading strategies, Rob Northen loader identification. -->
---
## Real-World Examples
### Demo Productions — RE Challenge Highlights
| Production | Group | Year | Key RE Challenge |
|---|---|---|---|
| **Arte** | Sanity | 1993 | Dense blitter queue system; effects dispatched via jump table with per-effect copper list switching; multi-part architecture with custom module loader |
| **Desert Dream** | Kefrens | 1993 | Multi-part trackmo with per-part custom loaders; heavy copper wizardry (raster bars, palette splits, sprite multiplexing); custom Protracker variant replayer |
| **Nexus 7** | Andromeda | 1994 | 3D vector engine with custom math routines (no FPU); object system with update/render phases; blitter-filled polygons |
| **Enigma** | Phenomena | 1991 | Modular effect system — each effect is a self-contained subroutine registered in a dispatch table; custom memory management across effect transitions |
| **State of the Art** | Spaceballs | 1992 | Morphing effects, rotate-zoomer, vector balls; heavy use of precomputed tables; custom blitter queue for compositing |
| **Hardwired** | Crionics & Silents | 1991 | Early 3D vector engine; spreadsheet-generated sine tables identified by their perfect mathematical precision; copper-chunky display mode simulation |
### Games
| Title | Year | Key RE Challenge |
|---|---|---|
| **Shadow of the Beast** | 1989 | 13-level parallax scrolling using dual playfield + sprite overlays; custom blitter queues for sprite rendering; 512-color still images via palette-split copper lists |
| **Turrican II** | 1991 | Sprite multiplexer with 20+ sprites on screen; copper-driven status bar split; large state machine for enemy AI |
| **Lotus Turbo Challenge 2** | 1991 | Software road rendering with copper sky gradient; blitter-driven car sprite compositing; 2-player split-screen via copper screen split |
| **Cannon Fodder** | 1993 | OS-friendly (uses graphics.library!) but still hits hardware for scrolling; custom memory allocator for soldier/bullet objects |
| **Pinball Dreams** | 1992 | Multi-ball physics engine; copper-driven score display; custom module replayer with sound effects mixing into music channels |
### Bootblock Intros — The Art of 1024 Bytes
Bootblock intros compress entire demoscene effects into two disk sectors:
- **Red Sector Inc. (RSI)** bootblocks: Often include a simple scrolltext, starfield, and a logo — all in 1024 bytes of raw m68k
- **Tristar & Red Sector Inc. (TRSI)** bootblocks: More advanced effects (copper bars, vector objects)
- **SADDAM virus**: A case study in anti-RE techniques within a bootblock — misleading strings, interrupt hooking, stealth write-back
- **Lamer Exterminator**: The most infamous Amiga virus, studied for its CoolCapture survival mechanism
---
## Cross-Platform Comparison
| Platform | Assembly RE Challenge | Amiga Analog |
|---|---|---|
| **C64 (6502)** | Zero-page usage, self-modifying code, raster interrupts | Custom chip register banging, copper-synced code |
| **Atari ST (68000)** | Similar CPU but different hardware registers | Amiga custom chips vs ST's simpler shifter/blitter |
| **DOS (x86)** | Segment:offset addressing, BIOS/DOS interrupt vectors | Amiga library JMP tables, exec interrupt vectors |
| **NES (6502)** | Tight mapper constraints, PPU timing loops | Similar raster-sync challenges in demos |
| **Arcade (68000)** | Shared CPU family, custom hardware | Same CPU, different memory maps and custom chips |
| **SNES (65816)** | Hardware register banging, HDMA (like copper) | Copper list is the direct analog of SNES HDMA channels |
| **Genesis/Mega Drive (68000)** | Same CPU, VDP register interface, Z80 coprocessor | Closest analog — 68000 + custom video hardware, similar register-banging style |
| **Game Boy (Z80-like)** | Tight memory (8KB), scanline interrupts, OAM DMA | Similar to bootblock constraints — extreme optimization in tiny space |
---
## Historical Context — Why Hand-Written Assembly Dominated
Before 1990, there were few practical alternatives to assembly for Amiga software that needed to be fast:
| Factor | Detail |
|---|---|
| **Compiler quality** | Pre-SAS/C 5.x compilers (Lattice C, Manx Aztec C, early SAS/C) generated code 520× slower than hand-tuned assembly for graphics/audio |
| **Hardware gap** | A 7 MHz 68000 with 512 KB Chip RAM had zero margin for inefficient code — games and demos needed every CPU cycle |
| **OS overhead** | The AmigaOS graphics.library added measurable overhead (layer locking, clipping rectangle checks). Games bypassed it entirely and wrote directly to `$DFFxxx` registers |
| **Demoscene culture** | Assembly was the "real" language of the demoscene. Using a compiler was considered lazy — the code *itself* was the art form |
| **Size constraints** | Bootblocks (1024 bytes), 4K intros, and single-disk demos imposed hard size limits. Assembly gave precise control over every byte |
| **Custom chip intimacy** | Copper lists, blitter queues, and audio DMA are fundamentally low-level. High-level languages abstracted away the very features that made Amiga programming distinctive |
**The transition**: By 19921994, faster CPUs (68020+), more RAM, and mature compilers (SAS/C 6.x, GCC 2.95.x) made C viable for commercial software. But the demoscene stayed with assembly into the late 1990s — and AGA productions on 68060 accelerators continue to use hand-written assembly today.
---
## Modern Analogies
<!-- TODO: Expand — connect hand-written asm concepts to modern developer experience -->
| Hand-Written Asm Concept | Modern Analogy | Where It Holds / Breaks |
|---|---|---|
| Cycle-counted raster effects | GPU fragment shader dispatch | Holds: per-pixel/per-scanline execution; breaks: asm is imperative timing, shaders are data-parallel |
| Custom blitter queue | GPU command buffer / DMA transfer list | Holds: structured descriptor-based hardware offload; breaks: blitter is in-order, GPUs reorder |
| Hardware register banging | MMIO device drivers in embedded systems | Holds: same concept — memory-mapped I/O; breaks: Amiga registers are video/audio, not peripherals |
| Self-modifying code | JIT compilation (V8, LuaJIT, WASM) | Holds: code generation at runtime; breaks: SMC patches existing code, JIT generates new code |
| Copper list | G-sync / FreeSync adaptive refresh + shader constants per scanline | Holds: timing-sensitive display updates; breaks: copper is a programmable coprocessor, not a protocol |
| Stack-based state machine | Coroutine dispatch / async/await | Holds: non-linear control flow; breaks: stack manipulation vs language-level async |
| Position-independent code | ASLR + PIE executables | Holds: same goal (run anywhere); breaks: asm PIC is manual, modern PIC is linker/loader assisted |
---
## FAQ
### Q1: How do I know if a function is an interrupt handler vs a regular subroutine?
<!-- TODO -->
### Q2: What's the best way to detect self-modifying code?
<!-- TODO -->
### Q3: How do I handle code that mixes data and instructions?
<!-- TODO -->
### Q4: How do I tell code from data in a mixed section?
<!-- TODO: Heuristics: what does the byte sequence look like as both code and data? Which interpretation produces more cross-references? Check against known data formats (copper list, sprite, audio). -->
### Q5: How do I handle encrypted or obfuscated code?
<!-- TODO: Detection (high entropy, no readable strings), decryption routine identification (XOR loop at entry point), dynamic extraction via emulator memory dump, dealing with layered encryption (decryptor decrypts next decryptor). -->
### Q6: How do I deal with copper-synced code?
<!-- TODO: Code that runs at specific scanlines via copper WAIT; the same function may execute multiple times per frame at different raster positions; execution context matters — what's the beam position, which bitplane is being displayed, what's in the color registers? -->
### Q7: What about self-relocating code?
<!-- TODO: How to detect (code copies itself, patches absolute addresses), how to trace the relocation table, how to produce a static IDA database that matches the relocated layout. -->
### Q8: How do I identify custom chip register usage patterns?
<!-- TODO: Group registers by chip (blitter, copper, audio, sprite, bitplane), identify common write sequences (blitter setup = BLTCON0→BLTAPT→BLTBPT→...→BLTSIZE), build a state machine of expected register write order for each chip. -->
### Q9: Why do I see `MOVE.W D0, $DFF000` — absolute short addressing to custom registers?
<!-- TODO: The Amiga custom chips sit in the low 64KB of the 16MB address space, so absolute short addressing mode (sign-extended 16-bit offset) can reach them. This is an optimization — 2 bytes shorter than absolute long and 4 cycles faster. Hand-written code uses this aggressively. -->
### Q10: How do I trace blitter operations without hardware?
<!-- TODO: Blitter emulation in FS-UAE debugger; reading blitter register state at breakpoints; deriving source/dest/minterm from BLTCON0/BLTCON1; calculating blit size from BLTSIZE; understanding blitter nasty mode (BLTPRI) and its effect on CPU synchronization. -->
### Q11: What's the difference between a software interrupt and a hardware interrupt in the code?
<!-- TODO: Hardware interrupts set by custom chips (INTREQ bits), software interrupts triggered by CPU writing to INTREQ, the distinction matters for understanding the event source. TRAP #N instructions are yet another category. -->
### Q12: How do I identify which demo group or author wrote this?
<!-- TODO: Stylistic fingerprints — register conventions (e.g., A4=hardware base), macro library signatures (Photon's startup code), code layout (effects as subroutines vs inline), comment strings in the binary, known author-specific optimization tricks. -->
### Q13: How do I reverse engineer an audio driver / module replayer?
<!-- TODO: Audio interrupt (Level 4) analysis, sample period calculation, sample pointer advancement, volume/period effect command dispatch, identifying Protracker vs NoiseTracker vs Soundtracker vs custom format differences. -->
### Q14: What do I do when IDA creates 500 phantom functions from copper data?
<!-- TODO: Batch-undefine approach, scripting to identify copper list boundaries, creating a copper list data type, converting undefined bytes to copper instruction arrays. -->
---
## FPGA / Emulation Impact
<!-- TODO: Expand — timing-critical code (cycle-exact loops), self-modifying code on FPGA (cache coherency), copper-synced code verification on MiSTer, blitter timing accuracy requirements for demos that push blitter bandwidth limits, 68000 vs 68020+ behavior differences (MOVE SR is privileged on 68010+, loop mode on 68010, etc.) -->
---
## References
- [m68k_codegen_patterns.md](m68k_codegen_patterns.md) — Compiler codegen fingerprint catalog
- [compiler_fingerprints.md](../compiler_fingerprints.md) — Compiler identification at a glance
- [string_xref_analysis.md](string_xref_analysis.md) — String cross-reference methodology
- [hunk_reconstruction.md](hunk_reconstruction.md) — HUNK binary reconstruction
- [struct_recovery.md](struct_recovery.md) — Struct layout reconstruction
- [api_call_identification.md](api_call_identification.md) — Library call recognition
- [exe_crunchers.md](../../03_loader_and_exec_format/exe_crunchers.md) — Decruncher identification and unpacking
- [code_vs_data_disambiguation.md](code_vs_data_disambiguation.md) — distinguishing code bytes from data/variables
- [copper_programming.md](../../08_graphics/copper_programming.md) — Copper list format and programming
- [blitter_programming.md](../../08_graphics/blitter_programming.md) — Blitter operation reference
- [paula_audio.md](../../01_hardware/ocs_a500/paula_audio.md) — Audio hardware register reference
- [custom_registers.md](../../01_hardware/ocs_a500/custom_registers.md) — Complete custom chip register map
- *M68000 Family Programmer's Reference Manual* — Instruction set and timing
- *Amiga Hardware Reference Manual* — Custom chip register map and DMA cycles
- *Amiga Disk Drives Inside & Out* (Abt Electronics) — Trackloader and MFM encoding reference