54 KiB
Hand-Written Assembly Reverse Engineering — Pure m68k Binaries
Overview
Unlike compiler-generated code with predictable prologues, frame-pointer conventions, and library-call idioms, hand-written 68000 assembly is unconstrained. The author may use any register for any purpose, invent ad-hoc calling conventions, self-modify code, or jump into the middle of instructions. This is the norm for Amiga demos, most pre-1990 games, trackmos, bootblock intros, and hardware-banging utilities — and it demands a fundamentally different reversing strategy than C/C++ binaries.
graph TB
subgraph "Compiler Binary"
CPROLOGUE["LINK A5, #-N<br/>MOVEM.L D2-D7,-(SP)"]
CEXIT["UNLK A5<br/>RTS"]
CLIB["JSR LVO(A6)<br/>predictable ABI"]
end
subgraph "Hand-Written Assembly"
ACUSTOM["Custom calling convention<br/>any register = any purpose"]
AJMP["JMP (A0) / JMP $1234.W<br/>opaque control flow"]
ASMC["Self-modifying code<br/>move.w #imm, (next_insn+2)"]
AHW["Hardware register banging<br/>MOVE.W D0, $DFF180"]
end
CPROLOGUE -.->|"absent"| ACUSTOM
CEXIT -.->|"unpredictable"| AJMP
CLIB -.->|"may skip OS"| AHW
Architecture
What Makes Hand-Written Assembly Different
| Trait | Compiler Output | Hand-Written Assembly |
|---|---|---|
| Function boundaries | LINK/UNLK or SUBQ/ADDQ pairs |
No universal marker; code may flow into data |
| Calling convention | Standard ABI (A6=lib base, D0/D1=scratch, A0/A1=scratch) | Author-defined per routine; may repurpose any register |
| Strings | dc.b "text",0 with cross-reference chains |
May be XOR-obfuscated, embedded mid-instruction, or stored as bitmaps |
| Library calls | JSR LVO(A6) with reloc entries |
May call via absolute address, JMP table, or custom trap |
| Loop structures | DBcc Dn, label (counted) or TST/BEQ (conditional) |
May unroll completely, use address-range compares, or rely on raster timing |
| Data embedding | Separate DATA hunk |
Routinely mixed with code; data tables inside branch-not-taken paths |
Common Environments
- Bootblock intros (1024 bytes, no OS): All registers free, hardware banging only
- Trackmos / demos: Often take over the system entirely; disable multitasking; use custom copper lists and blitter queues
- Games (pre-1992): Usually bypass
graphics.libraryfor speed; hit hardware registers directly - Hardware drivers: Heavy CIA/custom chip register manipulation; interrupt-driven
- Virus / bootblock payloads: Deliberately obfuscated; anti-debugging tricks
- Cracktros / trainer menus: Small (<4 KB), pre-launch patches to game code, often packed
- Trackdisk loaders: Custom DMA-driven disk reading; Rob Northen (RNC) loaders, raw MFM decoders
- Non-HUNK binaries: Raw absolute-load code at fixed addresses (e.g.,
$C00000for trapdoor Fast RAM) - ROM-resident code: Kickstart modules, expansion ROMs (DiagROM, SCSI controller firmware)
- Self-relocating code: Code that copies and patches itself to run at any address
The Assembly Author's Toolkit — Common Patterns Across the Demoscene
These patterns recur across hundreds of hand-written Amiga productions. Recognizing them accelerates function identification and purpose deduction.
Hardware Base Pointer Convention
Most authors dedicate a register to $DFF000 for the entire program lifetime. The choice of register is often an author fingerprint:
| Register | Common Users | Notes |
|---|---|---|
| A4 | Majority of demoscene productions | LEA $DFF000, A4 at program start; all hardware writes use MOVE.W Dn, $offset(A4) |
| A5 | Some demos, trackmos | May conflict with SAS/C A5 frame pointer convention in mixed C+asm code |
| A6 | Rare — conflicts with exec library base | Only used when the program never calls exec and A6 is freed |
; The classic demoscene init pattern:
; Save OS registers, take over the machine
MOVE.W $DFF01C, old_intena ; save INTENA state
MOVE.W #$7FFF, $DFF09A ; disable all interrupts
MOVE.W #$7FFF, $DFF09C ; clear all interrupt requests
LEA $DFF000, A4 ; A4 = custom chip base for entire program
; Now all hardware writes are: MOVE.W D0, $XXX(A4)
Custom Register Offset Tables
Precomputed address tables indexed by effect number dispatch hardware writes without runtime calculation:
; Effect dispatcher via offset table:
effect_dispatch:
MOVE.W effect_num(PC), D0
ADD.W D0, D0 ; word index
MOVE.W effect_offsets(PC, D0.W), D0
JMP (PC, D0.W) ; jump to effect handler
effect_offsets:
DC.W fx_plasma - effect_offsets
DC.W fx_rotozoom - effect_offsets
DC.W fx_vector3d - effect_offsets
DC.W fx_tunnel - effect_offsets
Cycle-Counted Sequences
Instruction sequences timed to exact 68000 CPU cycles for per-scanline effects:
; Color change per scanline — 4-cycle loop (on 68000, fastest possible):
; Each color register write needs: MOVE.W Dn, (Am) = 8 cycles
; Plus: DBF D7, loop = 10 cycles (taken), 12 cycles (not taken)
; A full scanline is ~227 color clocks / 454 CPU cycles on PAL
; This limits color changes to ~50 per scanline at best
raster_colors:
MOVE.W (A0)+, (A4) ; write next color to COLOR00 ($DFF180)
DBF D7, raster_colors ; 10 cycles when taken
MOVEM.L Bulk Save/Restore
56-byte register dumps to stack for non-standard register preservation — used when a routine needs to save/restore an unusual subset of registers:
; Save D0-D7 and A0-A6 to stack (15 registers × 4 = 60 bytes):
MOVEM.L D0-D7/A0-A6, -(SP)
; ... body of interrupt handler or complex effect ...
MOVEM.L (SP)+, D0-D7/A0-A6
RTE
Hand-Optimized Idioms That Confuse Disassemblers
| Idiom | What It Does | Disassembly Trap |
|---|---|---|
ADD.W Dn, Dn |
ASL.W #1, Dn (multiply by 2) |
IDA shows ADD.W — the shift intent is invisible |
SUB.W Dn, Dn |
MOVEQ #0, Dn (clear register) |
Same result, but reveals author style |
OR.B Dn, Dn / Scc chain |
Compare Dn to zero, then set conditionally | Disassembler shows raw ops, not intent |
MOVE SR, Dn |
Save CCR across branches | Used instead of recomputing flags; rare in compiler output |
SWAP Dn / MOVE.W Dn, ... |
Access upper word of 32-bit register | Common in 16-bit coordinate manipulation |
EXT.L Dn |
Sign-extend word to long | Indicates 16-bit signed value widening to 32-bit |
MOVEQ #0, Dn over CLR.L Dn |
Both clear Dn, but MOVEQ is 2 bytes, CLR.L is 2 bytes too | MOVEQ preserves upper bits of address registers? No — author choice |
Control Flow Archetypes
| Archetype | Signature Pattern | Typical In |
|---|---|---|
| State machine via jump table | MOVE.W state(PC), D0 / ADD.W D0, D0 / MOVE.W jt(PC, D0.W), D0 / JMP (PC, D0.W) |
Game AI, effect sequencers, menu systems |
| VBlank-driven frame loop | MOVE.L $6C.W, old_vbl / MOVE.L #my_vbl, $6C.W / main loop waits on flag set by VBlank |
Demos, games, any framed application |
| Copper-interrupt-driven | MOVE.L #copper_irq, $68.W (Level 3 interrupt) / per-scanline effect changes |
Raster bars, multiplexed sprites, palette splits |
| Blitter-continuation via interrupt | Sets INTREQ bit for blitter, interrupt handler chains to next blit in queue |
Demos with complex blitter pipelines |
| Custom event loop (no exec) | Polling loop reading CIA / custom chip registers directly; no Wait() / WaitPort() |
Games bypassing OS, bootblock intros |
| Audio-driver callback chain | Audio interrupt (Level 4) feeds next sample pair from custom module replayer | Protracker/Soundtracker replayers |
Protracker Replayer — Reference Architecture
The most commonly found audio subsystem in Amiga binaries. Understanding its internals saves hours of reverse engineering:
; Standard Protracker replayer entry points:
;
; mt_init — initialize replayer with module data pointer
; mt_music — call once per frame to advance pattern playback
; mt_end — shutdown replayer, restore system state
;
; Registration pattern (CIA-based timing):
; Save old CIA interrupt vector
MOVE.L $6C.W, old_level6 ; Level 6 = CIA-B timer interrupt
; Install replayer interrupt
MOVE.L #mt_irq, $6C.W
; Configure CIA-B Timer A for the desired tempo
MOVE.B #$7F, $BFDD00 ; CIA-B ICR mask
MOVE.B #$81, $BFDD00 ; enable Timer A interrupt
; Set timer period (e.g., 125 bpm → ~17060 cycles between ticks)
MOVE.B #$7F, $BFDE00 ; CIA-B Timer A low byte
MOVE.B #$42, $BFDE00 ; CIA-B Timer A high byte
; The interrupt handler (mt_irq):
mt_irq:
MOVEM.L D0-D7/A0-A6, -(SP) ; save all registers
BSR mt_music ; advance replayer state
MOVEM.L (SP)+, D0-D7/A0-A6 ; restore all registers
MOVE.W #$0008, $DFF09C ; acknowledge CIA-B interrupt
RTE
Key identification markers:
- Writes to
$BFDD00/$BFDE00(CIA-B registers) — CIA timer setup MOVE.L #handler, $6C.W— Level 6 interrupt vector installationMOVEM.L D0-D7/A0-A6, -(SP)in the handler — all registers saved (standard for audio ISRs)- Audio register writes (
$DFF0A0–$DFF0D0) — AUDxLCH/LCL/PER/VOL - Signature
mt_or_mt_function names in HUNK_SYMBOL if available
Identification: Detecting Hand-Written Assembly
Warning
Skip this section if you already know the binary is hand-written. The identification rules are covered in m68k_codegen_patterns.md and compiler_fingerprints.md.
Heuristics That Suggest Assembly
- No
LINKorSUBQ.L #N,SPin the entire binary - No
JSR LVO(A6)patterns — library calls areJSR absolute_addressorJMP (table, Dn.W) - Hardware register constants (
$DFF000–$DFF200,$BFE000–$BFEF01) appear as immediates MOVEM.Lused aggressively for per-routine save/restore with non-standard register setsRTEwithout precedingMOVEto SR — custom interrupt handlingORI #$0700, SR/ANDI #$F8FF, SR— direct interrupt level manipulationJMP (A0)orJSR (A0)with dynamically computed target — jump tables, state machinesLEA offset(PC), Anused for data tables rather thanMOVE.L #absolute_address, An— PC-relative addressing for position-independent dataSTOP #$2xxx— wait for interrupt without OS involvementMOVE USP, An/MOVE An, USP— user stack pointer manipulation, almost never generated by compilersMOVEC(68010+) to/from VBR, SFC, DFC — supervisor-level register accessRESETinstruction — rarely used outside hand-written hardware init code
Binary Scoring: Assembly Confidence
Decision Guide: Choosing Your Approach
graph TD
START["Binary loaded in IDA/Ghidra"]
HAS_SYMBOLS{"Has HUNK_SYMBOL<br/>debug info?"}
HAS_OS_CALLS{"Uses OS library<br/>calls?"}
HAS_STRINGS{"Has readable<br/>strings?"}
HAS_CRUNCHER{"Packed / crunched<br/>(PowerPacker, Imploder)?"}
PURE_ASM["Pure assembly methodology"]
START --> HAS_CRUNCHER
HAS_CRUNCHER -->|"Yes"| UNPACK["Unpack first<br/>→ see exe_crunchers.md"]
HAS_CRUNCHER -->|"No"| HAS_SYMBOLS
HAS_SYMBOLS -->|"Yes"| NAMED["Name functions from symbols<br/>then trace logic"]
HAS_SYMBOLS -->|"No"| HAS_OS_CALLS
HAS_OS_CALLS -->|"Yes"| OS_ANCHOR["Anchor on library calls<br/>→ identify callers by xref"]
HAS_OS_CALLS -->|"No"| HAS_STRINGS
HAS_STRINGS -->|"Yes"| STR_ANCHOR["Anchor on string xrefs<br/>→ trace outward"]
HAS_STRINGS -->|"No"| PURE_ASM
When to Use Pure Assembly Methodology vs When to Fall Back
| Scenario | Recommended Approach |
|---|---|
| Binary has zero library calls, heavy custom registers | Pure assembly methodology (this article) |
| Binary has some library calls mixed with hardware banging | Hybrid: anchor on library xrefs first, then pure asm for hardware sections |
| Binary is packed/crunched | Unpack first, then re-evaluate |
| Binary has HUNK_SYMBOL debug info | Standard RE workflow with named functions |
| Binary is a ROM module (Kickstart) | ROM-specific workflow (+ known entry points from exec Scan) |
Methodology
Phase 1: Triage
- Dump hunk structure:
hunkinfoshows CODE/DATA/BSS layout and relocation entries. Raw binaries (no HUNK header) skip directly to step 7. - Scan for hardware registers: grep for
$DFF,$BFE,$BFDpatterns. A binary that touches$DFF000–$DFF1FEdirectly is almost certainly hand-written or a game bypassing the OS. - Find the entry point: Resident tag
RT_MATCHWORD($4AFC) /HUNK_HEADERentry for HUNK; raw bootblock starts executing at$7C00in RAM after ROM loads it. - Identify interrupt vectors:
$60–$7Coffsets in hunk 0 — these are the m68k exception vectors (Bus Error through Level 7 Autovector). Hand-written binaries often overwrite them. - Detect cruncher/packer: Scan for known decrunch stub signatures:
Cruncher Signature Bytes (at or near start) Notes PowerPacker $42followed byMOVE.L/LEApatternUses powerpacker.library; header contains original size Imploder $49(often)ATN!Imploder by Animators Of Death; smaller header than PowerPacker Shrinkler Context-mixing LZ; no fixed magic Very high compression ratio; decrunch is slow (minutes on 7 MHz) ByteKiller BRA.Sover data, thenMOVEM.LpatternSimple LZ variant; common in 1988–1990 productions CrunchMania CR![version]text markerOne of the fastest decrunchers; popular for 4K intros TetraPack Multi-part header Compresses data+relocs separately - Check for overlay system: Look for
HUNK_OVERLAYor custom overlay loader at entry. The overlay manager swaps code segments from disk — the binary on disk is larger than what's in memory at any moment. - Identify non-HUNK binary type:
- Bootblock: Exactly 1024 bytes (2 disk blocks), loaded to
$7C00by Kickstart ROM - Absolute-load blob: Loaded to a fixed address (often
$C00000for trapdoor Fast RAM) - ROM module: Has
RT_MATCHWORDresident tag; part of Kickstart or expansion ROM - Trackmo loader: First sector contains a custom loader, not a bootblock — the loader then reads the rest of the demo from disk
- Bootblock: Exactly 1024 bytes (2 disk blocks), loaded to
Phase 2: Map Control Flow
- Chase
JMP/JSRchains from entry point outward. Mark each reached address. When you stop finding new addresses, the unreachable remainder is potential data or SMC target. - Identify jump tables:
JMP (A0, Dn.W)orMOVE.W offset(PC, Dn.W), D0→JMP (PC, D0.W). Count table entries by looking at the range of Dn values. IDA needs manual jump table specification for these. - Cross-reference data tables: values loaded via
LEA table(PC), An. These tables are often copper lists, sprite control words, or audio sample pointers. - Detect self-modifying code: Any
MOVE/LEAtargeting an address within the CODE hunk boundaries is an SMC candidate. Flag and verify with dynamic analysis. - Identify interrupt service routines: Trace from vector table addresses. ISRs end with
RTE, notRTS. They typically save/restore many registers at entry/exit. - Map copper list interactions:
COP1LC/COP2LCwrites indicate copper list switches. AMOVE.L #new_list, $DFF080(COP1LC write) triggers the copper to jump to a new instruction list — this is how demos switch between effects mid-frame. - Trace blitter wait loops:
BTST #6, $DFF002/BNE wait— the standard "wait for blitter" pattern (polling DMAB_BLTDONE in DMACONR). AlsoTST.B $DFF000loop (wait for blitter via custom chip bus test). - Flag unreachable code: Code between
RTS/RTE/JMPthat isn't directly branched to — potential data, SMC target, or second-stage code loaded later. - Identify Level 3 interrupt chains: Music replayers and blitter queues commonly hook into the vertical blank interrupt (Level 3). The handler dispatches to multiple subscribers — find the dispatch loop to understand the full interrupt architecture.
Phase 3: Reconstruct Calling Conventions
- Map per-routine register usage: For each identified function, track:
- Which registers are preserved (saved/restored via
MOVEM.Lor stack pushes). TheMOVEM.Lsave mask encodes this explicitly. - Which registers are destroyed (modified without save). These are the function's scratch/output registers.
- Which registers hold input parameters. Look for registers used without prior initialization.
- Which registers hold return values. D0 is conventional even in hand-written code, but not guaranteed.
- Which registers are preserved (saved/restored via
- Identify custom ABIs: Some authors consistently use e.g., A2=data pointer → data segment base, A3=copper list cursor, A4=hardware base ($DFF000), D7=scratch counter. These conventions are stable across a single author's body of work.
- Build a register allocation map: Color-coded table of which registers carry which meaning across the program. This is the single most valuable artifact for understanding hand-written asm.
- Detect authorial fingerprints: Consistent register conventions + coding idioms (e.g., always using
MOVEQ #0, DnoverCLR.L Dn) suggest a single author or codebase reuse. This matters for provenance and for predicting conventions in unreversed sections. - Watch for
USPmanipulation:MOVE USP, An/MOVE An, USPis almost never generated by compilers. It indicates the author is using the User Stack Pointer for a second stack (common in context-switching code, coroutines, or task systems).
Phase 4: Reconstruct Data Structures
- Copper list format: 3-word instructions (IR1, IR2, data) or 2-word wait/move pairs
- Sprite control words:
SPRxPOS/SPRxCTLword pairs, attached sprite mode detection - Blitter minterm lookup tables: Precomputed blitter operation descriptions
- Audio sample tables: Period/waveform pointer/volume structures for music replayers
- Custom module formats: Pattern data, sample lists, effect command tables for Protracker/Soundtracker variants
- Bitmap/bitplane layouts: Interleaved vs linear, planar depth detection from blitter source/dest usage
- Custom BSS-like allocations: Large zeroed regions used as frame buffers, audio buffers, or look-up tables
Phase 5: Hardware Interaction Mapping
For each custom chip register touched, document:
- Which register (address)
- From where (code location)
- In what sequence (interaction with other register writes)
- Purpose (deduced from context: blitter setup, copper list switch, audio start, sprite positioning)
Build a hardware register access matrix:
| Register | Writes From | Reads From | Deduced Purpose |
|---|---|---|---|
$DFF058 (BLTCON0) |
$01234, $05678 |
— | Blitter operation setup |
$DFF096 (DMACON) |
$00123 |
$04567 |
DMA channel enable/disable |
| ... | ... | ... | ... |
Phase 6: Annotate
- Rename functions: Descriptive names based on deduced purpose (
vbl_irq_handler,blitter_queue_submit,copper_list_build) - Add comments: Document register conventions at function entry, magic constants, hardware register purposes
- Create struct types: For custom data structures discovered in Phase 4
- Mark non-code regions: Force IDA/Ghidra to treat copper lists, sprite data, audio samples as data, not code
- Cross-reference hardware registers: Create named constants for all
$DFFxxx/$BFExxxaddresses in the database - Build a call graph: Mermaid diagram of the full control flow for documentation
Phase 7: Dynamic Verification
- Breakpoint on custom chip registers: Verify that register writes occur at expected times
- Watchpoint on memory buffers: Confirm copper list format, audio sample layout
- Trace mode: Follow execution through a single frame to verify control flow reconstruction
- Modify-and-test: Patch the binary and run it — if it breaks, your understanding was incomplete
- Compare static vs dynamic: Does the code path you predicted match what actually executes?
Tool-Specific Workflows
IDA Pro
Ghidra
FS-UAE Debugger
Command-Line Pre-Analysis Pipeline
Best Practices
- Never assume the ABI — document the actual calling convention before tracing callers
- Start from the entry point and work outward — don't try to understand everything at once
- Identify hardware register usage before control flow — knowing which chips are used narrows the purpose
- Treat every
MOVEto an absolute address as a potential self-modifying code write — until proven otherwise - Build a mermaid diagram of the control flow — it reveals dead code, missing connections, and loop structures
- Cross-reference relocation entries with code — relocs tell you which addresses matter
- Don't trust auto-analysis on mixed code/data sections — manually define code/data boundaries
- Run the binary in an emulator — some behaviors (self-modifying code paths, copper effects) are invisible in static analysis
- Look for known signatures first — Protracker replayers, decrunch stubs, common macro libraries leave distinctive patterns
- Document your register map as you work — it prevents costly re-analysis when you realize A3 was actually a struct pointer
Antipatterns
1. The Compiler Assumption
Wrong: Assuming A6 holds a library base, D0/D1 are scratch, and A0/A1 are pointer temps.
Why it fails: Hand-written code may use A6 as a general-purpose data register, D6 as a frame pointer, or any other non-standard assignment. The author may have declared their own calling convention documented nowhere.
2. The Prologue Scanner
Wrong: Scanning for LINK A5 or SUBQ.L #N,SP to find function boundaries.
Why it fails: Hand-written assembly may have no standard function entry/exit markers. A routine might start with MOVEM.L, a label, or just fall through from the previous block.
3. The String Hop
Wrong: Assuming LEA _string(PC), A0 means A0 points to a C string.
Why it fails: Hand-written code may use LEA to point to bytecode tables, sprite data, copper lists, or packed structures. The "string" might be a custom encoding.
4. The Register Reuse Confusion
Wrong: Assuming a register used in one context retains the same meaning throughout the program.
Why it fails: Hand-written asm aggressively reuses registers. The same D0 might be a loop counter in one block, an audio sample value in the next, and a scratch temporary in a third — all within 50 instructions. You must re-derive register meaning at each basic block.
5. The Disassembly Loop Trap
Wrong: Letting IDA's auto-analysis recursively disassemble from every possible entry point.
Why it fails: Mixed code/data sections cause IDA to decode data as instructions, creating phantom functions from copper lists or audio samples. This pollutes the symbol table with nonsense and obscures real control flow.
6. The Constant-as-Code Mistake
Wrong: Treating jump table offsets, copper list data, or sprite control words as instructions.
Why it fails: IDA/Ghidra don't know the difference between $0180 (a copper WAIT for line 0) and MOVE.B D0, D0 (which happens to encode as $1000). Without manual intervention, hardware data tables get disassembled into garbage.
7. The One-Pass Delusion
Wrong: Attempting linear top-to-bottom analysis and expecting to understand everything on the first pass.
Why it fails: Hand-written asm often uses forward references, self-modifying code patched by an earlier init routine, or data tables that only make sense after you understand the code that consumes them. Reverse engineering is inherently iterative.
8. The MOVEM Black Box
Wrong: Treating MOVEM.L D0-D7/A0-A6, -(SP) / MOVEM.L (SP)+, D0-D7/A0-A6 as opaque blocks.
Why it fails: Understanding which registers are saved and restored tells you the function's register contract. A routine that saves D5-D7/A4-A5 preserves those across its call — they likely carry important state (frame counter, hardware base pointer, data cursor).
Pitfalls
1. Assuming the OS Is Present
; This works on a running system:
MOVE.L 4.W, A6 ; SysBase
JSR LVO(-198, A6) ; OpenLibrary
; But in a bootblock or demo, $4.W may contain garbage
; and libraries haven't been initialized yet.
; The code might be:
MOVE.L #$DFF000, A5 ; custom chip base, not SysBase
JSR _custom_init(PC) ; custom initialization
2. Misreading Jump Tables
Hand-written jump tables frequently use PC-relative indirect jumps with custom offsets that IDA doesn't auto-resolve.
3. Self-Modifying Code Deception
; The code you see is NOT what executes:
MOVE.W #$4E71, (next_insn+2, PC) ; patch a NOP into the next instruction
next_insn:
CMPI.W #$0000, D0 ; becomes NOP at runtime
4. Copper List Misidentification
Copper instructions are 2-word pairs that look like MOVE instructions in disassembly:
; A copper list at $20000 decoded as instructions by IDA:
; DC.W $0180, $0000 → OR.B #$80, D0 / OR.B #0, D0 (garbage!)
; DC.W $0182, $0FFF → OR.B #$82, D0 / OR.B #$FF, D0 (more garbage)
; DC.W $FFFF, $FFFE → invalid opcode or data
;
; Correct interpretation:
; $0180, $0000 = WAIT for line 0 (VP=$00, HP=$00)
; $0182, $0FFF = WAIT for line 0, HP=$0F (standard copper wait)
; $FFFF, $FFFE = END of copper list (WAIT forever — never triggers)
Detection methodology:
COP1LC/COP2LCwrites give you the copper list address — start your data definition there- Copper instructions come in pairs of 16-bit words. IR1 (first word) encodes the operation or register address; IR2 (second word) is the data or WAIT position.
- WAIT: IR1 bit 0 = 1. Decode VP (bits 8–15 of IR1, bits 0–7 of IR2), HP (bits 1–7 of IR1, bits 8–15 of IR2).
- MOVE: IR1 bit 0 = 0. IR1 is the register address ($DFFxxx), IR2 is the value to write.
- A
$FFFF, $FFFEpair terminates the list. - Mark the entire copper list address range as data, not code. Create an array of 4-byte copper instruction structs in IDA/Ghidra.
5. CIA Timer Code Confusion
CIA register access ($BFE001–$BFEF01 for CIAA, $BFD000–$BFDFFF for CIAB) looks like any other memory access, but the TOD clock read sequence and timer control register patterns are distinctive:
; CIA-A Timer A setup (often used for timing in games/demos):
MOVE.B #$7F, $BFEE01 ; CIA-A ICR — clear all pending interrupts
MOVE.B #$81, $BFEE01 ; CIA-A ICR — enable Timer A interrupt
MOVE.B #low_byte, $BFE401 ; CIA-A Timer A low byte
MOVE.B #high_byte, $BFE501 ; CIA-A Timer A high byte
; CIA-B Timer A/B setup (used by Protracker replayers!):
MOVE.B #$7F, $BFDD00 ; CIA-B ICR — clear pending
MOVE.B #$81, $BFDD00 ; CIA-B ICR — enable Timer A
MOVE.B #lo, $BFDE00 ; CIA-B Timer A low (adjacent to CIA-B base $BFD000)
; Common mistake:
; MOVE.B $BFE801, D0 → reading CIAA SDR (serial data register) — could be
; mistaken for keyboard data, but it's actually the serial port.
; Keyboard data is $BFEC01 (CIAA parallel port).
Key CIA registers for RE identification:
| Register | Address | Purpose |
|---|---|---|
| CIAA ICR | $BFEE01 |
Interrupt Control Register — enables/disables CIA-A interrupts |
| CIAA Timer A Lo | $BFE401 |
Timer A low byte |
| CIAA Timer A Hi | $BFE501 |
Timer A high byte |
| CIAB ICR | $BFDD00 |
Interrupt Control Register — enables CIA-B interrupts (used by Protracker!) |
| CIAB Timer A Lo | $BFDE00 |
Timer A low byte (Protracker tempo control) |
| CIAB Timer A Hi | $BFDF00 |
Timer A high byte |
6. Blitter Queue Confusion
Blitter register writes (BLTCON0, BLTSIZE, etc.) look like ordinary memory stores to IDA. Without understanding that these are I/O registers, the disassembly shows meaningless MOVE.W D0, abs_addr sequences:
; This looks like garbage writes to random addresses:
MOVE.W #$09F0, $DFF040 ; BLTCON0 = use A,B,C channels, minterm=$F0
MOVE.W #$0000, $DFF042 ; BLTCON1 = no fill, no line mode
MOVE.W #$FFFF, $DFF044 ; BLTAFWM = first word mask (all bits)
MOVE.W #$FFFF, $DFF046 ; BLTALWM = last word mask (all bits)
MOVE.L #src, $DFF050 ; BLTAPT = source A pointer
MOVE.L #dst, $DFF054 ; BLTDPT = destination D pointer
MOVE.W #0, $DFF064 ; BLTAMOD = source A modulo (0 = linear)
MOVE.W #0, $DFF066 ; BLTDMOD = dest D modulo
MOVE.W #(h<<6)|w, $DFF058 ; BLTSIZE = start blit! (writing this triggers DMA)
; But this is a standard blitter rectangle copy. The register write ORDER
; is fixed: BLTCON0→BLTCON1→BLTAFWM→BLTALWM→Pointers→Modulos→BLTSIZE.
; BLTSIZE is always LAST — writing it starts the blit.
How to identify a blitter operation:
- The sequence always ends with a write to
$DFF058(BLTSIZE) — this is the trigger BLTCON0($DFF040) encodes the minterm and active channels (bits 8–15 = minterm, bit 12=D, bit 11=C, bit 10=B, bit 9=A)- Pointer registers ($DFF048–$DFF054) hold source/destination addresses — these are your key to understanding what data is being moved
- The blit size
(h<<6)|win BLTSIZE: height in upper 10 bits, width in lower 6 bits (width is in words, 0 = 64 words) - Blitter wait:
BTST #6, $DFF002(bit 6 of DMACONR = DMAB_BLTDONE) — polls until blitter finished
7. MOVEM Register Tracking Across Long Spans
8. Code Embedded in Interrupt Vector Table
9. Dual-Playfield Register Set Confusion
10. Stack-Based State Machines
Some hand-written code uses the stack as a state machine — pushing return addresses that represent state transitions, using RTS as a computed goto:
; Instead of a switch statement, the author pushes state transition addresses:
MOVE.L #STATE_IDLE, -(SP) ; push initial state
...
STATE_DISPATCH:
RTS ; "return" to the state on top of stack
STATE_IDLE:
; ... handle idle ...
MOVE.L #STATE_PLAYING, -(SP) ; push next state
BRA STATE_DISPATCH
STATE_PLAYING:
; ... handle playing ...
MOVE.L #STATE_PAUSED, -(SP) ; push next state
BRA STATE_DISPATCH
This pattern breaks all standard call/return analysis because RTS doesn't return to a caller — it jumps to the next state. IDA/Ghidra see RTS as a function exit and stop disassembling.
Detection: Look for MOVE.L #addr, -(SP) or PEA addr(PC) (push effective address) followed by RTS (or a branch to an RTS). These are state pushes, not function call setups.
11. Absolute Address Dependencies
Code that assumes a fixed load address (common in non-HUNK binaries) will break if relocated. For HUNK binaries, relocation entries tell you which absolute addresses must be patched at load time. Non-HUNK binaries lack relocation metadata entirely.
; Absolute dependency example — works only at $C00000:
LEA $C01000, A0 ; data at fixed offset from load address
JSR $C00500 ; subroutine at fixed address within binary
; For a HUNK binary, these would be:
LEA _data(PC), A0 ; PC-relative (no relocation needed)
JSR _subroutine(PC) ; PC-relative
Critical: Bootblock code at $7C00 uses absolute JMP/JSR within the 1024-byte range. If you relocate the code for analysis, patch all absolute addresses or analyze in-place at the original address.
Use-Case Cookbook
Pattern 1: Finding the Main Loop in a Demo
Pattern 2: Identifying a Custom Interrupt Handler
Pattern 3: Reconstructing a Jump Table
Pattern 4: Detecting Self-Modifying Code with IDAPython
Pattern 5: Identifying a Protracker Replay Routine
The most commonly found audio subsystem in Amiga binaries. Here's the full identification workflow:
- Find the CIA interrupt vector write: Search for
MOVE.L #xxx, $6C.W— this installs the Level 6 (CIA-B timer) interrupt handler used by Protracker for tempo. - Identify the CIA-B timer setup:
MOVE.B #$7F, $BFDD00/MOVE.B #$81, $BFDD00— this configures CIA-B to generate timer interrupts. - Trace to the interrupt handler: The handler saves ALL registers (
MOVEM.L D0-D7/A0-A6, -(SP)), calls the replayer tick function, then restores all and doesRTE. - Find the audio register writes: Look for writes to
$DFF0A0–$DFF0D0(AUDxLCH/LCL/PER/VOL). The patternMOVE.L sample_ptr, $DFF0A0/MOVE.W period, $DFF0A6/MOVE.W vol, $DFF0A8is the per-channel audio update. - Identify effect command dispatch: A
MOVE.W effect_cmd, D0/ANDI.W #$0F, D0/ADD.W D0, D0/JMP (effect_table, D0.W)pattern dispatches to arpeggio, portamento, vibrato, etc. handlers. - Map the pattern data layout: The replayer reads pattern data via sequential
MOVE.B (A0)+— map the track/note mapping. Standard format: 4 bytes per note (upper nibble = sample number, lower 12 bits = period).
IDA Python script fragment to auto-detect Protracker replayers:
# Search for the Level 6 vector installation pattern:
# MOVE.L #handler, $6C.W = 21FC xxxx xxxx 006C
ea = idaapi.find_binary(0, BADADDR, "21 FC ?? ?? ?? ?? 00 6C", 16, SEARCH_DOWN)
if ea != BADADDR:
handler = Dword(ea + 2)
print(f"Found Level 6 interrupt handler at ${ea:08X} → ${handler:08X}")
Pattern 6: Reversing a Bootblock Virus
Bootblock viruses are the ideal entry point for learning Amiga RE — they're small (1024 bytes), self-contained, and exercise key system mechanisms:
Lamer Exterminator (October 1989)
- Size: 1024 bytes (exactly 2 disk blocks)
- Residence: Installs itself in memory, hooks system vectors
- Infection vector: Writes itself to any write-enabled disk's bootblock during disk access
- Damage routine: After activation, overwrites victim bootblocks 84 times with the string
"LAMER!"— this trashes the disk - CoolCapture: Uses the CoolCapture vector for post-reset survival — after a warm reset, the virus re-activates from the captured state
- Detection text: Sometimes leaves identifiable strings in the bootblock
SADDAM Bootblock Virus
- Size: 1024 bytes
- Residence: Copies itself to
$7F000in memory (just below the 512KB Chip RAM boundary) - Interrupt hooking: Hooks Level 3 interrupt (Vertical Blank/Copper/Blitter) via the interrupt vector table
- Infection trigger: First "read Rootblock" command after a reset — this infects any disk accessed after boot
- Stealth: Writes the original bootblock back to disk when the rootblock is read (hiding its presence)
- System modification: Clears
CoolCapture,KickTagPtr, andKickCheckSum— disables the system's ability to detect bootblock changes - Anti-detection text: Contains the misleading string
"A2000 MB Memory Controller V2"to disguise itself as a hardware ROM - Damage trigger: After ~30,000 interrupt calls, crashes the system by showing an alert in a Level 3 interrupt context
Common Virus RE Workflow
- Extract the bootblock: The first 1024 bytes of an infected disk (blocks 0–1)
- Determine load address: Bootblocks are loaded to
$7C00by the Kickstart ROM - Identify the infection mechanism: Look for
DoIO()/SendIO()calls totrackdisk.devicefor writing back to disk - Find the residency mechanism:
CoolCapture,KickTagPtrmanipulation, or RAM copy to$7F000+ vector hooking - Trace the trigger condition: What event activates the virus? Timer count, disk access count, specific command?
- Document the payload: Does it corrupt data? Display a message? Overwrite bootblocks?
Pattern 7: Finding the Decrunch Stub in a Packed Demo
The decrunch stub is the gateway to the real binary. Finding and understanding it is prerequisite to all further analysis:
Identification by signature:
| Cruncher | Magic/Pattern | Decrunch Stub Size | Notes |
|---|---|---|---|
| PowerPacker | $42 followed by LEA/MOVE pattern near entry |
~200–300 bytes | Uses powerpacker.library; ppDecrunch() is the library call |
| Imploder | Entry has MOVE.L D0, -(SP) / LEA xxx(PC), A0 pattern |
~300–400 bytes | ATN!Imploder; slower decompression, better ratio than early PP |
| Shrinkler | Entry starts with context-mixing setup code | ~2KB | Extremely high ratio; decrunch takes minutes on 7 MHz 68000 |
| ByteKiller | Short BRA.S over header data, then MOVEM.L pattern | ~100 bytes | Simple LZ variant; very common in 1988–1991 productions |
| CrunchMania | String "CR!" at or near entry |
~150 bytes | Fastest decruncher; popular for 4K intros |
Decrunch strategy:
- Identify the stub: The first code that executes after the entry point. It reads packed data and expands it to a destination address.
- Let the stub run in an emulator: Set a breakpoint after the decrunch loop completes (look for the
JMPorJSRto the unpacked entry point). - Dump the decrunched memory: The real binary is now in RAM. Save it for static analysis.
- Optionally: Write an unpacker script — for known formats, run the original cruncher's decruncher against the packed data in a standalone tool.
Rob Northen Copylock / Trace Vector Decoder (TVD): A special case that appears like a cruncher but is actually a protection system:
- Encrypted code is executed one instruction at a time using the 68000 trace exception
- The trace handler (interrupt vector
$24) decrypts the next instruction, executes it, then sets the trace bit again - This prevents static disassembly — you only see the encrypted bytes and the trace handler, not the real code
- Detection:
MOVE #$8000, SR(set trace bit),ORI #$8000, SRin the entry code, plus a custom handler at vector$24 - Solution: Let it execute in FS-UAE with a trace logger, or single-step through and record each decrypted instruction
Pattern 8: Identifying a Custom Memory Allocator
Pattern 9: Reconstructing a Blitter Queue
Pattern 10: Recovering a Sprite Multiplexer
Pattern 11: Extracting a Custom Module Replayer
Pattern 12: Tracing a Trackloader
Real-World Examples
Demo Productions — RE Challenge Highlights
| Production | Group | Year | Key RE Challenge |
|---|---|---|---|
| Arte | Sanity | 1993 | Dense blitter queue system; effects dispatched via jump table with per-effect copper list switching; multi-part architecture with custom module loader |
| Desert Dream | Kefrens | 1993 | Multi-part trackmo with per-part custom loaders; heavy copper wizardry (raster bars, palette splits, sprite multiplexing); custom Protracker variant replayer |
| Nexus 7 | Andromeda | 1994 | 3D vector engine with custom math routines (no FPU); object system with update/render phases; blitter-filled polygons |
| Enigma | Phenomena | 1991 | Modular effect system — each effect is a self-contained subroutine registered in a dispatch table; custom memory management across effect transitions |
| State of the Art | Spaceballs | 1992 | Morphing effects, rotate-zoomer, vector balls; heavy use of precomputed tables; custom blitter queue for compositing |
| Hardwired | Crionics & Silents | 1991 | Early 3D vector engine; spreadsheet-generated sine tables identified by their perfect mathematical precision; copper-chunky display mode simulation |
Games
| Title | Year | Key RE Challenge |
|---|---|---|
| Shadow of the Beast | 1989 | 13-level parallax scrolling using dual playfield + sprite overlays; custom blitter queues for sprite rendering; 512-color still images via palette-split copper lists |
| Turrican II | 1991 | Sprite multiplexer with 20+ sprites on screen; copper-driven status bar split; large state machine for enemy AI |
| Lotus Turbo Challenge 2 | 1991 | Software road rendering with copper sky gradient; blitter-driven car sprite compositing; 2-player split-screen via copper screen split |
| Cannon Fodder | 1993 | OS-friendly (uses graphics.library!) but still hits hardware for scrolling; custom memory allocator for soldier/bullet objects |
| Pinball Dreams | 1992 | Multi-ball physics engine; copper-driven score display; custom module replayer with sound effects mixing into music channels |
Bootblock Intros — The Art of 1024 Bytes
Bootblock intros compress entire demoscene effects into two disk sectors:
- Red Sector Inc. (RSI) bootblocks: Often include a simple scrolltext, starfield, and a logo — all in 1024 bytes of raw m68k
- Tristar & Red Sector Inc. (TRSI) bootblocks: More advanced effects (copper bars, vector objects)
- SADDAM virus: A case study in anti-RE techniques within a bootblock — misleading strings, interrupt hooking, stealth write-back
- Lamer Exterminator: The most infamous Amiga virus, studied for its CoolCapture survival mechanism
Cross-Platform Comparison
| Platform | Assembly RE Challenge | Amiga Analog |
|---|---|---|
| C64 (6502) | Zero-page usage, self-modifying code, raster interrupts | Custom chip register banging, copper-synced code |
| Atari ST (68000) | Similar CPU but different hardware registers | Amiga custom chips vs ST's simpler shifter/blitter |
| DOS (x86) | Segment:offset addressing, BIOS/DOS interrupt vectors | Amiga library JMP tables, exec interrupt vectors |
| NES (6502) | Tight mapper constraints, PPU timing loops | Similar raster-sync challenges in demos |
| Arcade (68000) | Shared CPU family, custom hardware | Same CPU, different memory maps and custom chips |
| SNES (65816) | Hardware register banging, HDMA (like copper) | Copper list is the direct analog of SNES HDMA channels |
| Genesis/Mega Drive (68000) | Same CPU, VDP register interface, Z80 coprocessor | Closest analog — 68000 + custom video hardware, similar register-banging style |
| Game Boy (Z80-like) | Tight memory (8KB), scanline interrupts, OAM DMA | Similar to bootblock constraints — extreme optimization in tiny space |
Historical Context — Why Hand-Written Assembly Dominated
Before 1990, there were few practical alternatives to assembly for Amiga software that needed to be fast:
| Factor | Detail |
|---|---|
| Compiler quality | Pre-SAS/C 5.x compilers (Lattice C, Manx Aztec C, early SAS/C) generated code 5–20× slower than hand-tuned assembly for graphics/audio |
| Hardware gap | A 7 MHz 68000 with 512 KB Chip RAM had zero margin for inefficient code — games and demos needed every CPU cycle |
| OS overhead | The AmigaOS graphics.library added measurable overhead (layer locking, clipping rectangle checks). Games bypassed it entirely and wrote directly to $DFFxxx registers |
| Demoscene culture | Assembly was the "real" language of the demoscene. Using a compiler was considered lazy — the code itself was the art form |
| Size constraints | Bootblocks (1024 bytes), 4K intros, and single-disk demos imposed hard size limits. Assembly gave precise control over every byte |
| Custom chip intimacy | Copper lists, blitter queues, and audio DMA are fundamentally low-level. High-level languages abstracted away the very features that made Amiga programming distinctive |
The transition: By 1992–1994, faster CPUs (68020+), more RAM, and mature compilers (SAS/C 6.x, GCC 2.95.x) made C viable for commercial software. But the demoscene stayed with assembly into the late 1990s — and AGA productions on 68060 accelerators continue to use hand-written assembly today.
Modern Analogies
| Hand-Written Asm Concept | Modern Analogy | Where It Holds / Breaks |
|---|---|---|
| Cycle-counted raster effects | GPU fragment shader dispatch | Holds: per-pixel/per-scanline execution; breaks: asm is imperative timing, shaders are data-parallel |
| Custom blitter queue | GPU command buffer / DMA transfer list | Holds: structured descriptor-based hardware offload; breaks: blitter is in-order, GPUs reorder |
| Hardware register banging | MMIO device drivers in embedded systems | Holds: same concept — memory-mapped I/O; breaks: Amiga registers are video/audio, not peripherals |
| Self-modifying code | JIT compilation (V8, LuaJIT, WASM) | Holds: code generation at runtime; breaks: SMC patches existing code, JIT generates new code |
| Copper list | G-sync / FreeSync adaptive refresh + shader constants per scanline | Holds: timing-sensitive display updates; breaks: copper is a programmable coprocessor, not a protocol |
| Stack-based state machine | Coroutine dispatch / async/await | Holds: non-linear control flow; breaks: stack manipulation vs language-level async |
| Position-independent code | ASLR + PIE executables | Holds: same goal (run anywhere); breaks: asm PIC is manual, modern PIC is linker/loader assisted |
FAQ
Q1: How do I know if a function is an interrupt handler vs a regular subroutine?
Q2: What's the best way to detect self-modifying code?
Q3: How do I handle code that mixes data and instructions?
Q4: How do I tell code from data in a mixed section?
Q5: How do I handle encrypted or obfuscated code?
Q6: How do I deal with copper-synced code?
Q7: What about self-relocating code?
Q8: How do I identify custom chip register usage patterns?
Q9: Why do I see MOVE.W D0, $DFF000 — absolute short addressing to custom registers?
Q10: How do I trace blitter operations without hardware?
Q11: What's the difference between a software interrupt and a hardware interrupt in the code?
Q12: How do I identify which demo group or author wrote this?
Q13: How do I reverse engineer an audio driver / module replayer?
Q14: What do I do when IDA creates 500 phantom functions from copper data?
FPGA / Emulation Impact
References
- m68k_codegen_patterns.md — Compiler codegen fingerprint catalog
- compiler_fingerprints.md — Compiler identification at a glance
- string_xref_analysis.md — String cross-reference methodology
- hunk_reconstruction.md — HUNK binary reconstruction
- struct_recovery.md — Struct layout reconstruction
- api_call_identification.md — Library call recognition
- exe_crunchers.md — Decruncher identification and unpacking
- code_vs_data_disambiguation.md — distinguishing code bytes from data/variables
- copper_programming.md — Copper list format and programming
- blitter_programming.md — Blitter operation reference
- paula_audio.md — Audio hardware register reference
- custom_registers.md — Complete custom chip register map
- M68000 Family Programmer's Reference Manual — Instruction set and timing
- Amiga Hardware Reference Manual — Custom chip register map and DMA cycles
- Amiga Disk Drives Inside & Out (Abt Electronics) — Trackloader and MFM encoding reference