amiga-bootcamp/05_reversing/static/asm68k_binaries.md

54 KiB
Raw Blame History

← Home · Reverse Engineering

Hand-Written Assembly Reverse Engineering — Pure m68k Binaries

Overview

Unlike compiler-generated code with predictable prologues, frame-pointer conventions, and library-call idioms, hand-written 68000 assembly is unconstrained. The author may use any register for any purpose, invent ad-hoc calling conventions, self-modify code, or jump into the middle of instructions. This is the norm for Amiga demos, most pre-1990 games, trackmos, bootblock intros, and hardware-banging utilities — and it demands a fundamentally different reversing strategy than C/C++ binaries.

graph TB
    subgraph "Compiler Binary"
        CPROLOGUE["LINK A5, #-N<br/>MOVEM.L D2-D7,-(SP)"]
        CEXIT["UNLK A5<br/>RTS"]
        CLIB["JSR LVO(A6)<br/>predictable ABI"]
    end
    subgraph "Hand-Written Assembly"
        ACUSTOM["Custom calling convention<br/>any register = any purpose"]
        AJMP["JMP (A0) / JMP $1234.W<br/>opaque control flow"]
        ASMC["Self-modifying code<br/>move.w #imm, (next_insn+2)"]
        AHW["Hardware register banging<br/>MOVE.W D0, $DFF180"]
    end
    CPROLOGUE -.->|"absent"| ACUSTOM
    CEXIT -.->|"unpredictable"| AJMP
    CLIB -.->|"may skip OS"| AHW

Architecture

What Makes Hand-Written Assembly Different

Trait Compiler Output Hand-Written Assembly
Function boundaries LINK/UNLK or SUBQ/ADDQ pairs No universal marker; code may flow into data
Calling convention Standard ABI (A6=lib base, D0/D1=scratch, A0/A1=scratch) Author-defined per routine; may repurpose any register
Strings dc.b "text",0 with cross-reference chains May be XOR-obfuscated, embedded mid-instruction, or stored as bitmaps
Library calls JSR LVO(A6) with reloc entries May call via absolute address, JMP table, or custom trap
Loop structures DBcc Dn, label (counted) or TST/BEQ (conditional) May unroll completely, use address-range compares, or rely on raster timing
Data embedding Separate DATA hunk Routinely mixed with code; data tables inside branch-not-taken paths

Common Environments

  • Bootblock intros (1024 bytes, no OS): All registers free, hardware banging only
  • Trackmos / demos: Often take over the system entirely; disable multitasking; use custom copper lists and blitter queues
  • Games (pre-1992): Usually bypass graphics.library for speed; hit hardware registers directly
  • Hardware drivers: Heavy CIA/custom chip register manipulation; interrupt-driven
  • Virus / bootblock payloads: Deliberately obfuscated; anti-debugging tricks
  • Cracktros / trainer menus: Small (<4 KB), pre-launch patches to game code, often packed
  • Trackdisk loaders: Custom DMA-driven disk reading; Rob Northen (RNC) loaders, raw MFM decoders
  • Non-HUNK binaries: Raw absolute-load code at fixed addresses (e.g., $C00000 for trapdoor Fast RAM)
  • ROM-resident code: Kickstart modules, expansion ROMs (DiagROM, SCSI controller firmware)
  • Self-relocating code: Code that copies and patches itself to run at any address

The Assembly Author's Toolkit — Common Patterns Across the Demoscene

These patterns recur across hundreds of hand-written Amiga productions. Recognizing them accelerates function identification and purpose deduction.

Hardware Base Pointer Convention

Most authors dedicate a register to $DFF000 for the entire program lifetime. The choice of register is often an author fingerprint:

Register Common Users Notes
A4 Majority of demoscene productions LEA $DFF000, A4 at program start; all hardware writes use MOVE.W Dn, $offset(A4)
A5 Some demos, trackmos May conflict with SAS/C A5 frame pointer convention in mixed C+asm code
A6 Rare — conflicts with exec library base Only used when the program never calls exec and A6 is freed
; The classic demoscene init pattern:
; Save OS registers, take over the machine
MOVE.W  $DFF01C, old_intena    ; save INTENA state
MOVE.W  #$7FFF, $DFF09A        ; disable all interrupts
MOVE.W  #$7FFF, $DFF09C        ; clear all interrupt requests
LEA     $DFF000, A4             ; A4 = custom chip base for entire program
; Now all hardware writes are: MOVE.W D0, $XXX(A4)

Custom Register Offset Tables

Precomputed address tables indexed by effect number dispatch hardware writes without runtime calculation:

; Effect dispatcher via offset table:
effect_dispatch:
    MOVE.W  effect_num(PC), D0
    ADD.W   D0, D0               ; word index
    MOVE.W  effect_offsets(PC, D0.W), D0
    JMP     (PC, D0.W)           ; jump to effect handler

effect_offsets:
    DC.W    fx_plasma - effect_offsets
    DC.W    fx_rotozoom - effect_offsets
    DC.W    fx_vector3d - effect_offsets
    DC.W    fx_tunnel - effect_offsets

Cycle-Counted Sequences

Instruction sequences timed to exact 68000 CPU cycles for per-scanline effects:

; Color change per scanline — 4-cycle loop (on 68000, fastest possible):
; Each color register write needs: MOVE.W Dn, (Am) = 8 cycles
; Plus: DBF D7, loop = 10 cycles (taken), 12 cycles (not taken)
; A full scanline is ~227 color clocks / 454 CPU cycles on PAL
; This limits color changes to ~50 per scanline at best
raster_colors:
    MOVE.W  (A0)+, (A4)          ; write next color to COLOR00 ($DFF180)
    DBF     D7, raster_colors    ; 10 cycles when taken

MOVEM.L Bulk Save/Restore

56-byte register dumps to stack for non-standard register preservation — used when a routine needs to save/restore an unusual subset of registers:

; Save D0-D7 and A0-A6 to stack (15 registers × 4 = 60 bytes):
    MOVEM.L D0-D7/A0-A6, -(SP)
    ; ... body of interrupt handler or complex effect ...
    MOVEM.L (SP)+, D0-D7/A0-A6
    RTE

Hand-Optimized Idioms That Confuse Disassemblers

Idiom What It Does Disassembly Trap
ADD.W Dn, Dn ASL.W #1, Dn (multiply by 2) IDA shows ADD.W — the shift intent is invisible
SUB.W Dn, Dn MOVEQ #0, Dn (clear register) Same result, but reveals author style
OR.B Dn, Dn / Scc chain Compare Dn to zero, then set conditionally Disassembler shows raw ops, not intent
MOVE SR, Dn Save CCR across branches Used instead of recomputing flags; rare in compiler output
SWAP Dn / MOVE.W Dn, ... Access upper word of 32-bit register Common in 16-bit coordinate manipulation
EXT.L Dn Sign-extend word to long Indicates 16-bit signed value widening to 32-bit
MOVEQ #0, Dn over CLR.L Dn Both clear Dn, but MOVEQ is 2 bytes, CLR.L is 2 bytes too MOVEQ preserves upper bits of address registers? No — author choice

Control Flow Archetypes

Archetype Signature Pattern Typical In
State machine via jump table MOVE.W state(PC), D0 / ADD.W D0, D0 / MOVE.W jt(PC, D0.W), D0 / JMP (PC, D0.W) Game AI, effect sequencers, menu systems
VBlank-driven frame loop MOVE.L $6C.W, old_vbl / MOVE.L #my_vbl, $6C.W / main loop waits on flag set by VBlank Demos, games, any framed application
Copper-interrupt-driven MOVE.L #copper_irq, $68.W (Level 3 interrupt) / per-scanline effect changes Raster bars, multiplexed sprites, palette splits
Blitter-continuation via interrupt Sets INTREQ bit for blitter, interrupt handler chains to next blit in queue Demos with complex blitter pipelines
Custom event loop (no exec) Polling loop reading CIA / custom chip registers directly; no Wait() / WaitPort() Games bypassing OS, bootblock intros
Audio-driver callback chain Audio interrupt (Level 4) feeds next sample pair from custom module replayer Protracker/Soundtracker replayers

Protracker Replayer — Reference Architecture

The most commonly found audio subsystem in Amiga binaries. Understanding its internals saves hours of reverse engineering:

; Standard Protracker replayer entry points:
;
; mt_init   — initialize replayer with module data pointer
; mt_music  — call once per frame to advance pattern playback
; mt_end    — shutdown replayer, restore system state
;
; Registration pattern (CIA-based timing):

    ; Save old CIA interrupt vector
    MOVE.L  $6C.W, old_level6      ; Level 6 = CIA-B timer interrupt
    ; Install replayer interrupt
    MOVE.L  #mt_irq, $6C.W
    ; Configure CIA-B Timer A for the desired tempo
    MOVE.B  #$7F, $BFDD00          ; CIA-B ICR mask
    MOVE.B  #$81, $BFDD00          ; enable Timer A interrupt
    ; Set timer period (e.g., 125 bpm → ~17060 cycles between ticks)
    MOVE.B  #$7F, $BFDE00          ; CIA-B Timer A low byte
    MOVE.B  #$42, $BFDE00          ; CIA-B Timer A high byte

; The interrupt handler (mt_irq):
mt_irq:
    MOVEM.L D0-D7/A0-A6, -(SP)     ; save all registers
    BSR     mt_music               ; advance replayer state
    MOVEM.L (SP)+, D0-D7/A0-A6     ; restore all registers
    MOVE.W  #$0008, $DFF09C        ; acknowledge CIA-B interrupt
    RTE

Key identification markers:

  • Writes to $BFDD00/$BFDE00 (CIA-B registers) — CIA timer setup
  • MOVE.L #handler, $6C.W — Level 6 interrupt vector installation
  • MOVEM.L D0-D7/A0-A6, -(SP) in the handler — all registers saved (standard for audio ISRs)
  • Audio register writes ($DFF0A0$DFF0D0) — AUDxLCH/LCL/PER/VOL
  • Signature mt_ or _mt_ function names in HUNK_SYMBOL if available

Identification: Detecting Hand-Written Assembly

Warning

Skip this section if you already know the binary is hand-written. The identification rules are covered in m68k_codegen_patterns.md and compiler_fingerprints.md.

Heuristics That Suggest Assembly

  • No LINK or SUBQ.L #N,SP in the entire binary
  • No JSR LVO(A6) patterns — library calls are JSR absolute_address or JMP (table, Dn.W)
  • Hardware register constants ($DFF000$DFF200, $BFE000$BFEF01) appear as immediates
  • MOVEM.L used aggressively for per-routine save/restore with non-standard register sets
  • RTE without preceding MOVE to SR — custom interrupt handling
  • ORI #$0700, SR / ANDI #$F8FF, SR — direct interrupt level manipulation
  • JMP (A0) or JSR (A0) with dynamically computed target — jump tables, state machines
  • LEA offset(PC), An used for data tables rather than MOVE.L #absolute_address, An — PC-relative addressing for position-independent data
  • STOP #$2xxx — wait for interrupt without OS involvement
  • MOVE USP, An / MOVE An, USP — user stack pointer manipulation, almost never generated by compilers
  • MOVEC (68010+) to/from VBR, SFC, DFC — supervisor-level register access
  • RESET instruction — rarely used outside hand-written hardware init code

Binary Scoring: Assembly Confidence


Decision Guide: Choosing Your Approach

graph TD
    START["Binary loaded in IDA/Ghidra"]
    HAS_SYMBOLS{"Has HUNK_SYMBOL<br/>debug info?"}
    HAS_OS_CALLS{"Uses OS library<br/>calls?"}
    HAS_STRINGS{"Has readable<br/>strings?"}
    HAS_CRUNCHER{"Packed / crunched<br/>(PowerPacker, Imploder)?"}
    PURE_ASM["Pure assembly methodology"]
    
    START --> HAS_CRUNCHER
    HAS_CRUNCHER -->|"Yes"| UNPACK["Unpack first<br/>→ see exe_crunchers.md"]
    HAS_CRUNCHER -->|"No"| HAS_SYMBOLS
    HAS_SYMBOLS -->|"Yes"| NAMED["Name functions from symbols<br/>then trace logic"]
    HAS_SYMBOLS -->|"No"| HAS_OS_CALLS
    HAS_OS_CALLS -->|"Yes"| OS_ANCHOR["Anchor on library calls<br/>→ identify callers by xref"]
    HAS_OS_CALLS -->|"No"| HAS_STRINGS
    HAS_STRINGS -->|"Yes"| STR_ANCHOR["Anchor on string xrefs<br/>→ trace outward"]
    HAS_STRINGS -->|"No"| PURE_ASM

When to Use Pure Assembly Methodology vs When to Fall Back

Scenario Recommended Approach
Binary has zero library calls, heavy custom registers Pure assembly methodology (this article)
Binary has some library calls mixed with hardware banging Hybrid: anchor on library xrefs first, then pure asm for hardware sections
Binary is packed/crunched Unpack first, then re-evaluate
Binary has HUNK_SYMBOL debug info Standard RE workflow with named functions
Binary is a ROM module (Kickstart) ROM-specific workflow (+ known entry points from exec Scan)

Methodology

Phase 1: Triage

  1. Dump hunk structure: hunkinfo shows CODE/DATA/BSS layout and relocation entries. Raw binaries (no HUNK header) skip directly to step 7.
  2. Scan for hardware registers: grep for $DFF, $BFE, $BFD patterns. A binary that touches $DFF000$DFF1FE directly is almost certainly hand-written or a game bypassing the OS.
  3. Find the entry point: Resident tag RT_MATCHWORD ($4AFC) / HUNK_HEADER entry for HUNK; raw bootblock starts executing at $7C00 in RAM after ROM loads it.
  4. Identify interrupt vectors: $60$7C offsets in hunk 0 — these are the m68k exception vectors (Bus Error through Level 7 Autovector). Hand-written binaries often overwrite them.
  5. Detect cruncher/packer: Scan for known decrunch stub signatures:
    Cruncher Signature Bytes (at or near start) Notes
    PowerPacker $42 followed by MOVE.L/LEA pattern Uses powerpacker.library; header contains original size
    Imploder $49 (often) ATN!Imploder by Animators Of Death; smaller header than PowerPacker
    Shrinkler Context-mixing LZ; no fixed magic Very high compression ratio; decrunch is slow (minutes on 7 MHz)
    ByteKiller BRA.S over data, then MOVEM.L pattern Simple LZ variant; common in 19881990 productions
    CrunchMania CR![version] text marker One of the fastest decrunchers; popular for 4K intros
    TetraPack Multi-part header Compresses data+relocs separately
  6. Check for overlay system: Look for HUNK_OVERLAY or custom overlay loader at entry. The overlay manager swaps code segments from disk — the binary on disk is larger than what's in memory at any moment.
  7. Identify non-HUNK binary type:
    • Bootblock: Exactly 1024 bytes (2 disk blocks), loaded to $7C00 by Kickstart ROM
    • Absolute-load blob: Loaded to a fixed address (often $C00000 for trapdoor Fast RAM)
    • ROM module: Has RT_MATCHWORD resident tag; part of Kickstart or expansion ROM
    • Trackmo loader: First sector contains a custom loader, not a bootblock — the loader then reads the rest of the demo from disk

Phase 2: Map Control Flow

  • Chase JMP/JSR chains from entry point outward. Mark each reached address. When you stop finding new addresses, the unreachable remainder is potential data or SMC target.
  • Identify jump tables: JMP (A0, Dn.W) or MOVE.W offset(PC, Dn.W), D0JMP (PC, D0.W). Count table entries by looking at the range of Dn values. IDA needs manual jump table specification for these.
  • Cross-reference data tables: values loaded via LEA table(PC), An. These tables are often copper lists, sprite control words, or audio sample pointers.
  • Detect self-modifying code: Any MOVE/LEA targeting an address within the CODE hunk boundaries is an SMC candidate. Flag and verify with dynamic analysis.
  • Identify interrupt service routines: Trace from vector table addresses. ISRs end with RTE, not RTS. They typically save/restore many registers at entry/exit.
  • Map copper list interactions: COP1LC/COP2LC writes indicate copper list switches. A MOVE.L #new_list, $DFF080 (COP1LC write) triggers the copper to jump to a new instruction list — this is how demos switch between effects mid-frame.
  • Trace blitter wait loops: BTST #6, $DFF002 / BNE wait — the standard "wait for blitter" pattern (polling DMAB_BLTDONE in DMACONR). Also TST.B $DFF000 loop (wait for blitter via custom chip bus test).
  • Flag unreachable code: Code between RTS/RTE/JMP that isn't directly branched to — potential data, SMC target, or second-stage code loaded later.
  • Identify Level 3 interrupt chains: Music replayers and blitter queues commonly hook into the vertical blank interrupt (Level 3). The handler dispatches to multiple subscribers — find the dispatch loop to understand the full interrupt architecture.

Phase 3: Reconstruct Calling Conventions

  • Map per-routine register usage: For each identified function, track:
    • Which registers are preserved (saved/restored via MOVEM.L or stack pushes). The MOVEM.L save mask encodes this explicitly.
    • Which registers are destroyed (modified without save). These are the function's scratch/output registers.
    • Which registers hold input parameters. Look for registers used without prior initialization.
    • Which registers hold return values. D0 is conventional even in hand-written code, but not guaranteed.
  • Identify custom ABIs: Some authors consistently use e.g., A2=data pointer → data segment base, A3=copper list cursor, A4=hardware base ($DFF000), D7=scratch counter. These conventions are stable across a single author's body of work.
  • Build a register allocation map: Color-coded table of which registers carry which meaning across the program. This is the single most valuable artifact for understanding hand-written asm.
  • Detect authorial fingerprints: Consistent register conventions + coding idioms (e.g., always using MOVEQ #0, Dn over CLR.L Dn) suggest a single author or codebase reuse. This matters for provenance and for predicting conventions in unreversed sections.
  • Watch for USP manipulation: MOVE USP, An / MOVE An, USP is almost never generated by compilers. It indicates the author is using the User Stack Pointer for a second stack (common in context-switching code, coroutines, or task systems).

Phase 4: Reconstruct Data Structures

  • Copper list format: 3-word instructions (IR1, IR2, data) or 2-word wait/move pairs
  • Sprite control words: SPRxPOS/SPRxCTL word pairs, attached sprite mode detection
  • Blitter minterm lookup tables: Precomputed blitter operation descriptions
  • Audio sample tables: Period/waveform pointer/volume structures for music replayers
  • Custom module formats: Pattern data, sample lists, effect command tables for Protracker/Soundtracker variants
  • Bitmap/bitplane layouts: Interleaved vs linear, planar depth detection from blitter source/dest usage
  • Custom BSS-like allocations: Large zeroed regions used as frame buffers, audio buffers, or look-up tables

Phase 5: Hardware Interaction Mapping

For each custom chip register touched, document:

  • Which register (address)
  • From where (code location)
  • In what sequence (interaction with other register writes)
  • Purpose (deduced from context: blitter setup, copper list switch, audio start, sprite positioning)

Build a hardware register access matrix:

Register Writes From Reads From Deduced Purpose
$DFF058 (BLTCON0) $01234, $05678 Blitter operation setup
$DFF096 (DMACON) $00123 $04567 DMA channel enable/disable
... ... ... ...

Phase 6: Annotate

  • Rename functions: Descriptive names based on deduced purpose (vbl_irq_handler, blitter_queue_submit, copper_list_build)
  • Add comments: Document register conventions at function entry, magic constants, hardware register purposes
  • Create struct types: For custom data structures discovered in Phase 4
  • Mark non-code regions: Force IDA/Ghidra to treat copper lists, sprite data, audio samples as data, not code
  • Cross-reference hardware registers: Create named constants for all $DFFxxx/$BFExxx addresses in the database
  • Build a call graph: Mermaid diagram of the full control flow for documentation

Phase 7: Dynamic Verification

  • Breakpoint on custom chip registers: Verify that register writes occur at expected times
  • Watchpoint on memory buffers: Confirm copper list format, audio sample layout
  • Trace mode: Follow execution through a single frame to verify control flow reconstruction
  • Modify-and-test: Patch the binary and run it — if it breaks, your understanding was incomplete
  • Compare static vs dynamic: Does the code path you predicted match what actually executes?

Tool-Specific Workflows

IDA Pro

Ghidra

FS-UAE Debugger

Command-Line Pre-Analysis Pipeline


Best Practices

  1. Never assume the ABI — document the actual calling convention before tracing callers
  2. Start from the entry point and work outward — don't try to understand everything at once
  3. Identify hardware register usage before control flow — knowing which chips are used narrows the purpose
  4. Treat every MOVE to an absolute address as a potential self-modifying code write — until proven otherwise
  5. Build a mermaid diagram of the control flow — it reveals dead code, missing connections, and loop structures
  6. Cross-reference relocation entries with code — relocs tell you which addresses matter
  7. Don't trust auto-analysis on mixed code/data sections — manually define code/data boundaries
  8. Run the binary in an emulator — some behaviors (self-modifying code paths, copper effects) are invisible in static analysis
  9. Look for known signatures first — Protracker replayers, decrunch stubs, common macro libraries leave distinctive patterns
  10. Document your register map as you work — it prevents costly re-analysis when you realize A3 was actually a struct pointer

Antipatterns

1. The Compiler Assumption

Wrong: Assuming A6 holds a library base, D0/D1 are scratch, and A0/A1 are pointer temps.

Why it fails: Hand-written code may use A6 as a general-purpose data register, D6 as a frame pointer, or any other non-standard assignment. The author may have declared their own calling convention documented nowhere.

2. The Prologue Scanner

Wrong: Scanning for LINK A5 or SUBQ.L #N,SP to find function boundaries.

Why it fails: Hand-written assembly may have no standard function entry/exit markers. A routine might start with MOVEM.L, a label, or just fall through from the previous block.

3. The String Hop

Wrong: Assuming LEA _string(PC), A0 means A0 points to a C string.

Why it fails: Hand-written code may use LEA to point to bytecode tables, sprite data, copper lists, or packed structures. The "string" might be a custom encoding.

4. The Register Reuse Confusion

Wrong: Assuming a register used in one context retains the same meaning throughout the program.

Why it fails: Hand-written asm aggressively reuses registers. The same D0 might be a loop counter in one block, an audio sample value in the next, and a scratch temporary in a third — all within 50 instructions. You must re-derive register meaning at each basic block.

5. The Disassembly Loop Trap

Wrong: Letting IDA's auto-analysis recursively disassemble from every possible entry point.

Why it fails: Mixed code/data sections cause IDA to decode data as instructions, creating phantom functions from copper lists or audio samples. This pollutes the symbol table with nonsense and obscures real control flow.

6. The Constant-as-Code Mistake

Wrong: Treating jump table offsets, copper list data, or sprite control words as instructions.

Why it fails: IDA/Ghidra don't know the difference between $0180 (a copper WAIT for line 0) and MOVE.B D0, D0 (which happens to encode as $1000). Without manual intervention, hardware data tables get disassembled into garbage.

7. The One-Pass Delusion

Wrong: Attempting linear top-to-bottom analysis and expecting to understand everything on the first pass.

Why it fails: Hand-written asm often uses forward references, self-modifying code patched by an earlier init routine, or data tables that only make sense after you understand the code that consumes them. Reverse engineering is inherently iterative.

8. The MOVEM Black Box

Wrong: Treating MOVEM.L D0-D7/A0-A6, -(SP) / MOVEM.L (SP)+, D0-D7/A0-A6 as opaque blocks.

Why it fails: Understanding which registers are saved and restored tells you the function's register contract. A routine that saves D5-D7/A4-A5 preserves those across its call — they likely carry important state (frame counter, hardware base pointer, data cursor).


Pitfalls

1. Assuming the OS Is Present

; This works on a running system:
MOVE.L  4.W, A6            ; SysBase
JSR     LVO(-198, A6)      ; OpenLibrary
; But in a bootblock or demo, $4.W may contain garbage
; and libraries haven't been initialized yet.
; The code might be:
MOVE.L  #$DFF000, A5       ; custom chip base, not SysBase
JSR     _custom_init(PC)   ; custom initialization

2. Misreading Jump Tables

Hand-written jump tables frequently use PC-relative indirect jumps with custom offsets that IDA doesn't auto-resolve.

3. Self-Modifying Code Deception

; The code you see is NOT what executes:
MOVE.W  #$4E71, (next_insn+2, PC)  ; patch a NOP into the next instruction
next_insn:
CMPI.W  #$0000, D0                  ; becomes NOP at runtime

4. Copper List Misidentification

Copper instructions are 2-word pairs that look like MOVE instructions in disassembly:

; A copper list at $20000 decoded as instructions by IDA:
; DC.W $0180, $0000  → OR.B #$80, D0 / OR.B #0, D0   (garbage!)
; DC.W $0182, $0FFF  → OR.B #$82, D0 / OR.B #$FF, D0  (more garbage)
; DC.W $FFFF, $FFFE  → invalid opcode or data
;
; Correct interpretation:
; $0180, $0000 = WAIT for line 0 (VP=$00, HP=$00)
; $0182, $0FFF = WAIT for line 0, HP=$0F (standard copper wait)
; $FFFF, $FFFE = END of copper list (WAIT forever — never triggers)

Detection methodology:

  1. COP1LC/COP2LC writes give you the copper list address — start your data definition there
  2. Copper instructions come in pairs of 16-bit words. IR1 (first word) encodes the operation or register address; IR2 (second word) is the data or WAIT position.
  3. WAIT: IR1 bit 0 = 1. Decode VP (bits 815 of IR1, bits 07 of IR2), HP (bits 17 of IR1, bits 815 of IR2).
  4. MOVE: IR1 bit 0 = 0. IR1 is the register address ($DFFxxx), IR2 is the value to write.
  5. A $FFFF, $FFFE pair terminates the list.
  6. Mark the entire copper list address range as data, not code. Create an array of 4-byte copper instruction structs in IDA/Ghidra.

5. CIA Timer Code Confusion

CIA register access ($BFE001$BFEF01 for CIAA, $BFD000$BFDFFF for CIAB) looks like any other memory access, but the TOD clock read sequence and timer control register patterns are distinctive:

; CIA-A Timer A setup (often used for timing in games/demos):
MOVE.B  #$7F, $BFEE01      ; CIA-A ICR — clear all pending interrupts
MOVE.B  #$81, $BFEE01      ; CIA-A ICR — enable Timer A interrupt
MOVE.B  #low_byte, $BFE401 ; CIA-A Timer A low byte
MOVE.B  #high_byte, $BFE501 ; CIA-A Timer A high byte

; CIA-B Timer A/B setup (used by Protracker replayers!):
MOVE.B  #$7F, $BFDD00      ; CIA-B ICR — clear pending
MOVE.B  #$81, $BFDD00      ; CIA-B ICR — enable Timer A
MOVE.B  #lo, $BFDE00       ; CIA-B Timer A low (adjacent to CIA-B base $BFD000)

; Common mistake:
; MOVE.B $BFE801, D0 → reading CIAA SDR (serial data register) — could be
; mistaken for keyboard data, but it's actually the serial port.
; Keyboard data is $BFEC01 (CIAA parallel port).

Key CIA registers for RE identification:

Register Address Purpose
CIAA ICR $BFEE01 Interrupt Control Register — enables/disables CIA-A interrupts
CIAA Timer A Lo $BFE401 Timer A low byte
CIAA Timer A Hi $BFE501 Timer A high byte
CIAB ICR $BFDD00 Interrupt Control Register — enables CIA-B interrupts (used by Protracker!)
CIAB Timer A Lo $BFDE00 Timer A low byte (Protracker tempo control)
CIAB Timer A Hi $BFDF00 Timer A high byte

6. Blitter Queue Confusion

Blitter register writes (BLTCON0, BLTSIZE, etc.) look like ordinary memory stores to IDA. Without understanding that these are I/O registers, the disassembly shows meaningless MOVE.W D0, abs_addr sequences:

; This looks like garbage writes to random addresses:
MOVE.W  #$09F0, $DFF040     ; BLTCON0 = use A,B,C channels, minterm=$F0
MOVE.W  #$0000, $DFF042     ; BLTCON1 = no fill, no line mode
MOVE.W  #$FFFF, $DFF044     ; BLTAFWM = first word mask (all bits)
MOVE.W  #$FFFF, $DFF046     ; BLTALWM = last word mask (all bits)
MOVE.L  #src, $DFF050       ; BLTAPT = source A pointer
MOVE.L  #dst, $DFF054       ; BLTDPT = destination D pointer
MOVE.W  #0, $DFF064         ; BLTAMOD = source A modulo (0 = linear)
MOVE.W  #0, $DFF066         ; BLTDMOD = dest D modulo
MOVE.W  #(h<<6)|w, $DFF058  ; BLTSIZE = start blit! (writing this triggers DMA)

; But this is a standard blitter rectangle copy. The register write ORDER
; is fixed: BLTCON0→BLTCON1→BLTAFWM→BLTALWM→Pointers→Modulos→BLTSIZE.
; BLTSIZE is always LAST — writing it starts the blit.

How to identify a blitter operation:

  1. The sequence always ends with a write to $DFF058 (BLTSIZE) — this is the trigger
  2. BLTCON0 ($DFF040) encodes the minterm and active channels (bits 815 = minterm, bit 12=D, bit 11=C, bit 10=B, bit 9=A)
  3. Pointer registers ($DFF048$DFF054) hold source/destination addresses — these are your key to understanding what data is being moved
  4. The blit size (h<<6)|w in BLTSIZE: height in upper 10 bits, width in lower 6 bits (width is in words, 0 = 64 words)
  5. Blitter wait: BTST #6, $DFF002 (bit 6 of DMACONR = DMAB_BLTDONE) — polls until blitter finished

7. MOVEM Register Tracking Across Long Spans

8. Code Embedded in Interrupt Vector Table

9. Dual-Playfield Register Set Confusion

10. Stack-Based State Machines

Some hand-written code uses the stack as a state machine — pushing return addresses that represent state transitions, using RTS as a computed goto:

; Instead of a switch statement, the author pushes state transition addresses:
    MOVE.L  #STATE_IDLE, -(SP)    ; push initial state
    ...
STATE_DISPATCH:
    RTS                           ; "return" to the state on top of stack

STATE_IDLE:
    ; ... handle idle ...
    MOVE.L  #STATE_PLAYING, -(SP) ; push next state
    BRA     STATE_DISPATCH

STATE_PLAYING:
    ; ... handle playing ...
    MOVE.L  #STATE_PAUSED, -(SP)  ; push next state
    BRA     STATE_DISPATCH

This pattern breaks all standard call/return analysis because RTS doesn't return to a caller — it jumps to the next state. IDA/Ghidra see RTS as a function exit and stop disassembling.

Detection: Look for MOVE.L #addr, -(SP) or PEA addr(PC) (push effective address) followed by RTS (or a branch to an RTS). These are state pushes, not function call setups.

11. Absolute Address Dependencies

Code that assumes a fixed load address (common in non-HUNK binaries) will break if relocated. For HUNK binaries, relocation entries tell you which absolute addresses must be patched at load time. Non-HUNK binaries lack relocation metadata entirely.

; Absolute dependency example — works only at $C00000:
    LEA     $C01000, A0             ; data at fixed offset from load address
    JSR     $C00500                 ; subroutine at fixed address within binary

; For a HUNK binary, these would be:
    LEA     _data(PC), A0           ; PC-relative (no relocation needed)
    JSR     _subroutine(PC)         ; PC-relative

Critical: Bootblock code at $7C00 uses absolute JMP/JSR within the 1024-byte range. If you relocate the code for analysis, patch all absolute addresses or analyze in-place at the original address.


Use-Case Cookbook

Pattern 1: Finding the Main Loop in a Demo

Pattern 2: Identifying a Custom Interrupt Handler

Pattern 3: Reconstructing a Jump Table

Pattern 4: Detecting Self-Modifying Code with IDAPython

Pattern 5: Identifying a Protracker Replay Routine

The most commonly found audio subsystem in Amiga binaries. Here's the full identification workflow:

  1. Find the CIA interrupt vector write: Search for MOVE.L #xxx, $6C.W — this installs the Level 6 (CIA-B timer) interrupt handler used by Protracker for tempo.
  2. Identify the CIA-B timer setup: MOVE.B #$7F, $BFDD00 / MOVE.B #$81, $BFDD00 — this configures CIA-B to generate timer interrupts.
  3. Trace to the interrupt handler: The handler saves ALL registers (MOVEM.L D0-D7/A0-A6, -(SP)), calls the replayer tick function, then restores all and does RTE.
  4. Find the audio register writes: Look for writes to $DFF0A0$DFF0D0 (AUDxLCH/LCL/PER/VOL). The pattern MOVE.L sample_ptr, $DFF0A0 / MOVE.W period, $DFF0A6 / MOVE.W vol, $DFF0A8 is the per-channel audio update.
  5. Identify effect command dispatch: A MOVE.W effect_cmd, D0 / ANDI.W #$0F, D0 / ADD.W D0, D0 / JMP (effect_table, D0.W) pattern dispatches to arpeggio, portamento, vibrato, etc. handlers.
  6. Map the pattern data layout: The replayer reads pattern data via sequential MOVE.B (A0)+ — map the track/note mapping. Standard format: 4 bytes per note (upper nibble = sample number, lower 12 bits = period).

IDA Python script fragment to auto-detect Protracker replayers:

# Search for the Level 6 vector installation pattern:
# MOVE.L #handler, $6C.W = 21FC xxxx xxxx 006C
ea = idaapi.find_binary(0, BADADDR, "21 FC ?? ?? ?? ?? 00 6C", 16, SEARCH_DOWN)
if ea != BADADDR:
    handler = Dword(ea + 2)
    print(f"Found Level 6 interrupt handler at ${ea:08X} → ${handler:08X}")

Pattern 6: Reversing a Bootblock Virus

Bootblock viruses are the ideal entry point for learning Amiga RE — they're small (1024 bytes), self-contained, and exercise key system mechanisms:

Lamer Exterminator (October 1989)

  • Size: 1024 bytes (exactly 2 disk blocks)
  • Residence: Installs itself in memory, hooks system vectors
  • Infection vector: Writes itself to any write-enabled disk's bootblock during disk access
  • Damage routine: After activation, overwrites victim bootblocks 84 times with the string "LAMER!" — this trashes the disk
  • CoolCapture: Uses the CoolCapture vector for post-reset survival — after a warm reset, the virus re-activates from the captured state
  • Detection text: Sometimes leaves identifiable strings in the bootblock

SADDAM Bootblock Virus

  • Size: 1024 bytes
  • Residence: Copies itself to $7F000 in memory (just below the 512KB Chip RAM boundary)
  • Interrupt hooking: Hooks Level 3 interrupt (Vertical Blank/Copper/Blitter) via the interrupt vector table
  • Infection trigger: First "read Rootblock" command after a reset — this infects any disk accessed after boot
  • Stealth: Writes the original bootblock back to disk when the rootblock is read (hiding its presence)
  • System modification: Clears CoolCapture, KickTagPtr, and KickCheckSum — disables the system's ability to detect bootblock changes
  • Anti-detection text: Contains the misleading string "A2000 MB Memory Controller V2" to disguise itself as a hardware ROM
  • Damage trigger: After ~30,000 interrupt calls, crashes the system by showing an alert in a Level 3 interrupt context

Common Virus RE Workflow

  1. Extract the bootblock: The first 1024 bytes of an infected disk (blocks 01)
  2. Determine load address: Bootblocks are loaded to $7C00 by the Kickstart ROM
  3. Identify the infection mechanism: Look for DoIO() / SendIO() calls to trackdisk.device for writing back to disk
  4. Find the residency mechanism: CoolCapture, KickTagPtr manipulation, or RAM copy to $7F000 + vector hooking
  5. Trace the trigger condition: What event activates the virus? Timer count, disk access count, specific command?
  6. Document the payload: Does it corrupt data? Display a message? Overwrite bootblocks?

Pattern 7: Finding the Decrunch Stub in a Packed Demo

The decrunch stub is the gateway to the real binary. Finding and understanding it is prerequisite to all further analysis:

Identification by signature:

Cruncher Magic/Pattern Decrunch Stub Size Notes
PowerPacker $42 followed by LEA/MOVE pattern near entry ~200300 bytes Uses powerpacker.library; ppDecrunch() is the library call
Imploder Entry has MOVE.L D0, -(SP) / LEA xxx(PC), A0 pattern ~300400 bytes ATN!Imploder; slower decompression, better ratio than early PP
Shrinkler Entry starts with context-mixing setup code ~2KB Extremely high ratio; decrunch takes minutes on 7 MHz 68000
ByteKiller Short BRA.S over header data, then MOVEM.L pattern ~100 bytes Simple LZ variant; very common in 19881991 productions
CrunchMania String "CR!" at or near entry ~150 bytes Fastest decruncher; popular for 4K intros

Decrunch strategy:

  1. Identify the stub: The first code that executes after the entry point. It reads packed data and expands it to a destination address.
  2. Let the stub run in an emulator: Set a breakpoint after the decrunch loop completes (look for the JMP or JSR to the unpacked entry point).
  3. Dump the decrunched memory: The real binary is now in RAM. Save it for static analysis.
  4. Optionally: Write an unpacker script — for known formats, run the original cruncher's decruncher against the packed data in a standalone tool.

Rob Northen Copylock / Trace Vector Decoder (TVD): A special case that appears like a cruncher but is actually a protection system:

  • Encrypted code is executed one instruction at a time using the 68000 trace exception
  • The trace handler (interrupt vector $24) decrypts the next instruction, executes it, then sets the trace bit again
  • This prevents static disassembly — you only see the encrypted bytes and the trace handler, not the real code
  • Detection: MOVE #$8000, SR (set trace bit), ORI #$8000, SR in the entry code, plus a custom handler at vector $24
  • Solution: Let it execute in FS-UAE with a trace logger, or single-step through and record each decrypted instruction

Pattern 8: Identifying a Custom Memory Allocator

Pattern 9: Reconstructing a Blitter Queue

Pattern 10: Recovering a Sprite Multiplexer

Pattern 11: Extracting a Custom Module Replayer

Pattern 12: Tracing a Trackloader


Real-World Examples

Demo Productions — RE Challenge Highlights

Production Group Year Key RE Challenge
Arte Sanity 1993 Dense blitter queue system; effects dispatched via jump table with per-effect copper list switching; multi-part architecture with custom module loader
Desert Dream Kefrens 1993 Multi-part trackmo with per-part custom loaders; heavy copper wizardry (raster bars, palette splits, sprite multiplexing); custom Protracker variant replayer
Nexus 7 Andromeda 1994 3D vector engine with custom math routines (no FPU); object system with update/render phases; blitter-filled polygons
Enigma Phenomena 1991 Modular effect system — each effect is a self-contained subroutine registered in a dispatch table; custom memory management across effect transitions
State of the Art Spaceballs 1992 Morphing effects, rotate-zoomer, vector balls; heavy use of precomputed tables; custom blitter queue for compositing
Hardwired Crionics & Silents 1991 Early 3D vector engine; spreadsheet-generated sine tables identified by their perfect mathematical precision; copper-chunky display mode simulation

Games

Title Year Key RE Challenge
Shadow of the Beast 1989 13-level parallax scrolling using dual playfield + sprite overlays; custom blitter queues for sprite rendering; 512-color still images via palette-split copper lists
Turrican II 1991 Sprite multiplexer with 20+ sprites on screen; copper-driven status bar split; large state machine for enemy AI
Lotus Turbo Challenge 2 1991 Software road rendering with copper sky gradient; blitter-driven car sprite compositing; 2-player split-screen via copper screen split
Cannon Fodder 1993 OS-friendly (uses graphics.library!) but still hits hardware for scrolling; custom memory allocator for soldier/bullet objects
Pinball Dreams 1992 Multi-ball physics engine; copper-driven score display; custom module replayer with sound effects mixing into music channels

Bootblock Intros — The Art of 1024 Bytes

Bootblock intros compress entire demoscene effects into two disk sectors:

  • Red Sector Inc. (RSI) bootblocks: Often include a simple scrolltext, starfield, and a logo — all in 1024 bytes of raw m68k
  • Tristar & Red Sector Inc. (TRSI) bootblocks: More advanced effects (copper bars, vector objects)
  • SADDAM virus: A case study in anti-RE techniques within a bootblock — misleading strings, interrupt hooking, stealth write-back
  • Lamer Exterminator: The most infamous Amiga virus, studied for its CoolCapture survival mechanism

Cross-Platform Comparison

Platform Assembly RE Challenge Amiga Analog
C64 (6502) Zero-page usage, self-modifying code, raster interrupts Custom chip register banging, copper-synced code
Atari ST (68000) Similar CPU but different hardware registers Amiga custom chips vs ST's simpler shifter/blitter
DOS (x86) Segment:offset addressing, BIOS/DOS interrupt vectors Amiga library JMP tables, exec interrupt vectors
NES (6502) Tight mapper constraints, PPU timing loops Similar raster-sync challenges in demos
Arcade (68000) Shared CPU family, custom hardware Same CPU, different memory maps and custom chips
SNES (65816) Hardware register banging, HDMA (like copper) Copper list is the direct analog of SNES HDMA channels
Genesis/Mega Drive (68000) Same CPU, VDP register interface, Z80 coprocessor Closest analog — 68000 + custom video hardware, similar register-banging style
Game Boy (Z80-like) Tight memory (8KB), scanline interrupts, OAM DMA Similar to bootblock constraints — extreme optimization in tiny space

Historical Context — Why Hand-Written Assembly Dominated

Before 1990, there were few practical alternatives to assembly for Amiga software that needed to be fast:

Factor Detail
Compiler quality Pre-SAS/C 5.x compilers (Lattice C, Manx Aztec C, early SAS/C) generated code 520× slower than hand-tuned assembly for graphics/audio
Hardware gap A 7 MHz 68000 with 512 KB Chip RAM had zero margin for inefficient code — games and demos needed every CPU cycle
OS overhead The AmigaOS graphics.library added measurable overhead (layer locking, clipping rectangle checks). Games bypassed it entirely and wrote directly to $DFFxxx registers
Demoscene culture Assembly was the "real" language of the demoscene. Using a compiler was considered lazy — the code itself was the art form
Size constraints Bootblocks (1024 bytes), 4K intros, and single-disk demos imposed hard size limits. Assembly gave precise control over every byte
Custom chip intimacy Copper lists, blitter queues, and audio DMA are fundamentally low-level. High-level languages abstracted away the very features that made Amiga programming distinctive

The transition: By 19921994, faster CPUs (68020+), more RAM, and mature compilers (SAS/C 6.x, GCC 2.95.x) made C viable for commercial software. But the demoscene stayed with assembly into the late 1990s — and AGA productions on 68060 accelerators continue to use hand-written assembly today.


Modern Analogies

Hand-Written Asm Concept Modern Analogy Where It Holds / Breaks
Cycle-counted raster effects GPU fragment shader dispatch Holds: per-pixel/per-scanline execution; breaks: asm is imperative timing, shaders are data-parallel
Custom blitter queue GPU command buffer / DMA transfer list Holds: structured descriptor-based hardware offload; breaks: blitter is in-order, GPUs reorder
Hardware register banging MMIO device drivers in embedded systems Holds: same concept — memory-mapped I/O; breaks: Amiga registers are video/audio, not peripherals
Self-modifying code JIT compilation (V8, LuaJIT, WASM) Holds: code generation at runtime; breaks: SMC patches existing code, JIT generates new code
Copper list G-sync / FreeSync adaptive refresh + shader constants per scanline Holds: timing-sensitive display updates; breaks: copper is a programmable coprocessor, not a protocol
Stack-based state machine Coroutine dispatch / async/await Holds: non-linear control flow; breaks: stack manipulation vs language-level async
Position-independent code ASLR + PIE executables Holds: same goal (run anywhere); breaks: asm PIC is manual, modern PIC is linker/loader assisted

FAQ

Q1: How do I know if a function is an interrupt handler vs a regular subroutine?

Q2: What's the best way to detect self-modifying code?

Q3: How do I handle code that mixes data and instructions?

Q4: How do I tell code from data in a mixed section?

Q5: How do I handle encrypted or obfuscated code?

Q6: How do I deal with copper-synced code?

Q7: What about self-relocating code?

Q8: How do I identify custom chip register usage patterns?

Q9: Why do I see MOVE.W D0, $DFF000 — absolute short addressing to custom registers?

Q10: How do I trace blitter operations without hardware?

Q11: What's the difference between a software interrupt and a hardware interrupt in the code?

Q12: How do I identify which demo group or author wrote this?

Q13: How do I reverse engineer an audio driver / module replayer?

Q14: What do I do when IDA creates 500 phantom functions from copper data?


FPGA / Emulation Impact


References