rslr/amiga-bootcamp

Fork 0

mirror of https://github.com/alfishe/amiga-bootcamp.git synced 2026-06-12 16:16:28 +00:00

Ilia Sharin 0aafceb31e More information. Especially about RE and AREXX automation

2026-04-29 21:42:28 -04:00

54 KiB

Raw Permalink Blame History

← Home · Reverse Engineering

Hand-Written Assembly Reverse Engineering — Pure m68k Binaries

Overview

Unlike compiler-generated code with predictable prologues, frame-pointer conventions, and library-call idioms, hand-written 68000 assembly is unconstrained. The author may use any register for any purpose, invent ad-hoc calling conventions, self-modify code, or jump into the middle of instructions. This is the norm for Amiga demos, most pre-1990 games, trackmos, bootblock intros, and hardware-banging utilities — and it demands a fundamentally different reversing strategy than C/C++ binaries.

graph TB
    subgraph "Compiler Binary"
        CPROLOGUE["LINK A5, #-N<br/>MOVEM.L D2-D7,-(SP)"]
        CEXIT["UNLK A5<br/>RTS"]
        CLIB["JSR LVO(A6)<br/>predictable ABI"]
    end
    subgraph "Hand-Written Assembly"
        ACUSTOM["Custom calling convention<br/>any register = any purpose"]
        AJMP["JMP (A0) / JMP $1234.W<br/>opaque control flow"]
        ASMC["Self-modifying code<br/>move.w #imm, (next_insn+2)"]
        AHW["Hardware register banging<br/>MOVE.W D0, $DFF180"]
    end
    CPROLOGUE -.->|"absent"| ACUSTOM
    CEXIT -.->|"unpredictable"| AJMP
    CLIB -.->|"may skip OS"| AHW

Architecture

What Makes Hand-Written Assembly Different

Trait	Compiler Output	Hand-Written Assembly
Function boundaries	`LINK`/`UNLK` or `SUBQ`/`ADDQ` pairs	No universal marker; code may flow into data
Calling convention	Standard ABI (A6=lib base, D0/D1=scratch, A0/A1=scratch)	Author-defined per routine; may repurpose any register
Strings	`dc.b "text",0` with cross-reference chains	May be XOR-obfuscated, embedded mid-instruction, or stored as bitmaps
Library calls	`JSR LVO(A6)` with reloc entries	May call via absolute address, JMP table, or custom trap
Loop structures	`DBcc Dn, label` (counted) or `TST/BEQ` (conditional)	May unroll completely, use address-range compares, or rely on raster timing
Data embedding	Separate `DATA` hunk	Routinely mixed with code; data tables inside branch-not-taken paths

Common Environments

Bootblock intros (1024 bytes, no OS): All registers free, hardware banging only
Trackmos / demos: Often take over the system entirely; disable multitasking; use custom copper lists and blitter queues
Games (pre-1992): Usually bypass graphics.library for speed; hit hardware registers directly
Hardware drivers: Heavy CIA/custom chip register manipulation; interrupt-driven
Virus / bootblock payloads: Deliberately obfuscated; anti-debugging tricks
Cracktros / trainer menus: Small (<4 KB), pre-launch patches to game code, often packed
Trackdisk loaders: Custom DMA-driven disk reading; Rob Northen (RNC) loaders, raw MFM decoders
Non-HUNK binaries: Raw absolute-load code at fixed addresses (e.g., $C00000 for trapdoor Fast RAM)
ROM-resident code: Kickstart modules, expansion ROMs (DiagROM, SCSI controller firmware)
Self-relocating code: Code that copies and patches itself to run at any address

The Assembly Author's Toolkit — Common Patterns Across the Demoscene

These patterns recur across hundreds of hand-written Amiga productions. Recognizing them accelerates function identification and purpose deduction.

Hardware Base Pointer Convention

Most authors dedicate a register to $DFF000 for the entire program lifetime. The choice of register is often an author fingerprint:

Register	Common Users	Notes
A4	Majority of demoscene productions	`LEA $DFF000, A4` at program start; all hardware writes use `MOVE.W Dn, $offset(A4)`
A5	Some demos, trackmos	May conflict with SAS/C A5 frame pointer convention in mixed C+asm code
A6	Rare — conflicts with exec library base	Only used when the program never calls exec and A6 is freed

; The classic demoscene init pattern:
; Save OS registers, take over the machine
MOVE.W  $DFF01C, old_intena    ; save INTENA state
MOVE.W  #$7FFF, $DFF09A        ; disable all interrupts
MOVE.W  #$7FFF, $DFF09C        ; clear all interrupt requests
LEA     $DFF000, A4             ; A4 = custom chip base for entire program
; Now all hardware writes are: MOVE.W D0, $XXX(A4)

Custom Register Offset Tables

Precomputed address tables indexed by effect number dispatch hardware writes without runtime calculation:

; Effect dispatcher via offset table:
effect_dispatch:
    MOVE.W  effect_num(PC), D0
    ADD.W   D0, D0               ; word index
    MOVE.W  effect_offsets(PC, D0.W), D0
    JMP     (PC, D0.W)           ; jump to effect handler

effect_offsets:
    DC.W    fx_plasma - effect_offsets
    DC.W    fx_rotozoom - effect_offsets
    DC.W    fx_vector3d - effect_offsets
    DC.W    fx_tunnel - effect_offsets

Cycle-Counted Sequences

Instruction sequences timed to exact 68000 CPU cycles for per-scanline effects:

; Color change per scanline — 4-cycle loop (on 68000, fastest possible):
; Each color register write needs: MOVE.W Dn, (Am) = 8 cycles
; Plus: DBF D7, loop = 10 cycles (taken), 12 cycles (not taken)
; A full scanline is ~227 color clocks / 454 CPU cycles on PAL
; This limits color changes to ~50 per scanline at best
raster_colors:
    MOVE.W  (A0)+, (A4)          ; write next color to COLOR00 ($DFF180)
    DBF     D7, raster_colors    ; 10 cycles when taken

MOVEM.L Bulk Save/Restore

56-byte register dumps to stack for non-standard register preservation — used when a routine needs to save/restore an unusual subset of registers:

; Save D0-D7 and A0-A6 to stack (15 registers × 4 = 60 bytes):
    MOVEM.L D0-D7/A0-A6, -(SP)
    ; ... body of interrupt handler or complex effect ...
    MOVEM.L (SP)+, D0-D7/A0-A6
    RTE

Hand-Optimized Idioms That Confuse Disassemblers

Idiom	What It Does	Disassembly Trap
`ADD.W Dn, Dn`	`ASL.W #1, Dn` (multiply by 2)	IDA shows `ADD.W` — the shift intent is invisible
`SUB.W Dn, Dn`	`MOVEQ #0, Dn` (clear register)	Same result, but reveals author style
`OR.B Dn, Dn` / `Scc` chain	Compare Dn to zero, then set conditionally	Disassembler shows raw ops, not intent
`MOVE SR, Dn`	Save CCR across branches	Used instead of recomputing flags; rare in compiler output
`SWAP Dn` / `MOVE.W Dn, ...`	Access upper word of 32-bit register	Common in 16-bit coordinate manipulation
`EXT.L Dn`	Sign-extend word to long	Indicates 16-bit signed value widening to 32-bit
`MOVEQ #0, Dn` over `CLR.L Dn`	Both clear Dn, but MOVEQ is 2 bytes, CLR.L is 2 bytes too	MOVEQ preserves upper bits of address registers? No — author choice

Control Flow Archetypes

Archetype	Signature Pattern	Typical In
State machine via jump table	`MOVE.W state(PC), D0` / `ADD.W D0, D0` / `MOVE.W jt(PC, D0.W), D0` / `JMP (PC, D0.W)`	Game AI, effect sequencers, menu systems
VBlank-driven frame loop	`MOVE.L $6C.W, old_vbl` / `MOVE.L #my_vbl, $6C.W` / main loop waits on flag set by VBlank	Demos, games, any framed application
Copper-interrupt-driven	`MOVE.L #copper_irq, $68.W` (Level 3 interrupt) / per-scanline effect changes	Raster bars, multiplexed sprites, palette splits
Blitter-continuation via interrupt	Sets `INTREQ` bit for blitter, interrupt handler chains to next blit in queue	Demos with complex blitter pipelines
Custom event loop (no exec)	Polling loop reading CIA / custom chip registers directly; no `Wait()` / `WaitPort()`	Games bypassing OS, bootblock intros
Audio-driver callback chain	Audio interrupt (Level 4) feeds next sample pair from custom module replayer	Protracker/Soundtracker replayers

Protracker Replayer — Reference Architecture

The most commonly found audio subsystem in Amiga binaries. Understanding its internals saves hours of reverse engineering:

; Standard Protracker replayer entry points:
;
; mt_init   — initialize replayer with module data pointer
; mt_music  — call once per frame to advance pattern playback
; mt_end    — shutdown replayer, restore system state
;
; Registration pattern (CIA-based timing):

    ; Save old CIA interrupt vector
    MOVE.L  $6C.W, old_level6      ; Level 6 = CIA-B timer interrupt
    ; Install replayer interrupt
    MOVE.L  #mt_irq, $6C.W
    ; Configure CIA-B Timer A for the desired tempo
    MOVE.B  #$7F, $BFDD00          ; CIA-B ICR mask
    MOVE.B  #$81, $BFDD00          ; enable Timer A interrupt
    ; Set timer period (e.g., 125 bpm → ~17060 cycles between ticks)
    MOVE.B  #$7F, $BFDE00          ; CIA-B Timer A low byte
    MOVE.B  #$42, $BFDE00          ; CIA-B Timer A high byte

; The interrupt handler (mt_irq):
mt_irq:
    MOVEM.L D0-D7/A0-A6, -(SP)     ; save all registers
    BSR     mt_music               ; advance replayer state
    MOVEM.L (SP)+, D0-D7/A0-A6     ; restore all registers
    MOVE.W  #$0008, $DFF09C        ; acknowledge CIA-B interrupt
    RTE

Key identification markers:

Writes to $BFDD00/$BFDE00 (CIA-B registers) — CIA timer setup
MOVE.L #handler, $6C.W — Level 6 interrupt vector installation
MOVEM.L D0-D7/A0-A6, -(SP) in the handler — all registers saved (standard for audio ISRs)
Audio register writes ($DFF0A0–$DFF0D0) — AUDxLCH/LCL/PER/VOL
Signature mt_ or _mt_ function names in HUNK_SYMBOL if available

Identification: Detecting Hand-Written Assembly

Warning

Skip this section if you already know the binary is hand-written. The identification rules are covered in m68k_codegen_patterns.md and compiler_fingerprints.md.

Heuristics That Suggest Assembly

No LINK or SUBQ.L #N,SP in the entire binary
No JSR LVO(A6) patterns — library calls are JSR absolute_address or JMP (table, Dn.W)
Hardware register constants ($DFF000–$DFF200, $BFE000–$BFEF01) appear as immediates
MOVEM.L used aggressively for per-routine save/restore with non-standard register sets
RTE without preceding MOVE to SR — custom interrupt handling
ORI #$0700, SR / ANDI #$F8FF, SR — direct interrupt level manipulation
JMP (A0) or JSR (A0) with dynamically computed target — jump tables, state machines
LEA offset(PC), An used for data tables rather than MOVE.L #absolute_address, An — PC-relative addressing for position-independent data
STOP #$2xxx — wait for interrupt without OS involvement
MOVE USP, An / MOVE An, USP — user stack pointer manipulation, almost never generated by compilers
MOVEC (68010+) to/from VBR, SFC, DFC — supervisor-level register access
RESET instruction — rarely used outside hand-written hardware init code

Binary Scoring: Assembly Confidence

Decision Guide: Choosing Your Approach

graph TD
    START["Binary loaded in IDA/Ghidra"]
    HAS_SYMBOLS{"Has HUNK_SYMBOL<br/>debug info?"}
    HAS_OS_CALLS{"Uses OS library<br/>calls?"}
    HAS_STRINGS{"Has readable<br/>strings?"}
    HAS_CRUNCHER{"Packed / crunched<br/>(PowerPacker, Imploder)?"}
    PURE_ASM["Pure assembly methodology"]
    
    START --> HAS_CRUNCHER
    HAS_CRUNCHER -->|"Yes"| UNPACK["Unpack first<br/>→ see exe_crunchers.md"]
    HAS_CRUNCHER -->|"No"| HAS_SYMBOLS
    HAS_SYMBOLS -->|"Yes"| NAMED["Name functions from symbols<br/>then trace logic"]
    HAS_SYMBOLS -->|"No"| HAS_OS_CALLS
    HAS_OS_CALLS -->|"Yes"| OS_ANCHOR["Anchor on library calls<br/>→ identify callers by xref"]
    HAS_OS_CALLS -->|"No"| HAS_STRINGS
    HAS_STRINGS -->|"Yes"| STR_ANCHOR["Anchor on string xrefs<br/>→ trace outward"]
    HAS_STRINGS -->|"No"| PURE_ASM

When to Use Pure Assembly Methodology vs When to Fall Back

Scenario	Recommended Approach
Binary has zero library calls, heavy custom registers	Pure assembly methodology (this article)
Binary has some library calls mixed with hardware banging	Hybrid: anchor on library xrefs first, then pure asm for hardware sections
Binary is packed/crunched	Unpack first, then re-evaluate
Binary has HUNK_SYMBOL debug info	Standard RE workflow with named functions
Binary is a ROM module (Kickstart)	ROM-specific workflow (+ known entry points from exec Scan)

Methodology

Phase 1: Triage

Dump hunk structure: hunkinfo shows CODE/DATA/BSS layout and relocation entries. Raw binaries (no HUNK header) skip directly to step 7.
Scan for hardware registers: grep for $DFF, $BFE, $BFD patterns. A binary that touches $DFF000–$DFF1FE directly is almost certainly hand-written or a game bypassing the OS.
Find the entry point: Resident tag RT_MATCHWORD ($4AFC) / HUNK_HEADER entry for HUNK; raw bootblock starts executing at $7C00 in RAM after ROM loads it.
Identify interrupt vectors: $60–$7C offsets in hunk 0 — these are the m68k exception vectors (Bus Error through Level 7 Autovector). Hand-written binaries often overwrite them.

Detect cruncher/packer: Scan for known decrunch stub signatures:

Cruncher	Signature Bytes (at or near start)	Notes
PowerPacker	`$42` followed by `MOVE.L`/`LEA` pattern	Uses powerpacker.library; header contains original size
Imploder	`$49` (often)	ATN!Imploder by Animators Of Death; smaller header than PowerPacker
Shrinkler	Context-mixing LZ; no fixed magic	Very high compression ratio; decrunch is slow (minutes on 7 MHz)
ByteKiller	`BRA.S` over data, then `MOVEM.L` pattern	Simple LZ variant; common in 1988–1990 productions
CrunchMania	`CR![version]` text marker	One of the fastest decrunchers; popular for 4K intros
TetraPack	Multi-part header	Compresses data+relocs separately

Check for overlay system: Look for HUNK_OVERLAY or custom overlay loader at entry. The overlay manager swaps code segments from disk — the binary on disk is larger than what's in memory at any moment.
Identify non-HUNK binary type:
- Bootblock: Exactly 1024 bytes (2 disk blocks), loaded to $7C00 by Kickstart ROM
- Absolute-load blob: Loaded to a fixed address (often $C00000 for trapdoor Fast RAM)
- ROM module: Has RT_MATCHWORD resident tag; part of Kickstart or expansion ROM
- Trackmo loader: First sector contains a custom loader, not a bootblock — the loader then reads the rest of the demo from disk

Phase 2: Map Control Flow

Chase JMP/JSR chains from entry point outward. Mark each reached address. When you stop finding new addresses, the unreachable remainder is potential data or SMC target.
Identify jump tables: JMP (A0, Dn.W) or MOVE.W offset(PC, Dn.W), D0 → JMP (PC, D0.W). Count table entries by looking at the range of Dn values. IDA needs manual jump table specification for these.
Cross-reference data tables: values loaded via LEA table(PC), An. These tables are often copper lists, sprite control words, or audio sample pointers.
Detect self-modifying code: Any MOVE/LEA targeting an address within the CODE hunk boundaries is an SMC candidate. Flag and verify with dynamic analysis.
Identify interrupt service routines: Trace from vector table addresses. ISRs end with RTE, not RTS. They typically save/restore many registers at entry/exit.
Map copper list interactions: COP1LC/COP2LC writes indicate copper list switches. A MOVE.L #new_list, $DFF080 (COP1LC write) triggers the copper to jump to a new instruction list — this is how demos switch between effects mid-frame.
Trace blitter wait loops: BTST #6, $DFF002 / BNE wait — the standard "wait for blitter" pattern (polling DMAB_BLTDONE in DMACONR). Also TST.B $DFF000 loop (wait for blitter via custom chip bus test).
Flag unreachable code: Code between RTS/RTE/JMP that isn't directly branched to — potential data, SMC target, or second-stage code loaded later.
Identify Level 3 interrupt chains: Music replayers and blitter queues commonly hook into the vertical blank interrupt (Level 3). The handler dispatches to multiple subscribers — find the dispatch loop to understand the full interrupt architecture.

Phase 3: Reconstruct Calling Conventions

Map per-routine register usage: For each identified function, track:
- Which registers are preserved (saved/restored via MOVEM.L or stack pushes). The MOVEM.L save mask encodes this explicitly.
- Which registers are destroyed (modified without save). These are the function's scratch/output registers.
- Which registers hold input parameters. Look for registers used without prior initialization.
- Which registers hold return values. D0 is conventional even in hand-written code, but not guaranteed.
Identify custom ABIs: Some authors consistently use e.g., A2=data pointer → data segment base, A3=copper list cursor, A4=hardware base ($DFF000), D7=scratch counter. These conventions are stable across a single author's body of work.
Build a register allocation map: Color-coded table of which registers carry which meaning across the program. This is the single most valuable artifact for understanding hand-written asm.
Detect authorial fingerprints: Consistent register conventions + coding idioms (e.g., always using MOVEQ #0, Dn over CLR.L Dn) suggest a single author or codebase reuse. This matters for provenance and for predicting conventions in unreversed sections.
Watch for USP manipulation: MOVE USP, An / MOVE An, USP is almost never generated by compilers. It indicates the author is using the User Stack Pointer for a second stack (common in context-switching code, coroutines, or task systems).

Phase 4: Reconstruct Data Structures

Copper list format: 3-word instructions (IR1, IR2, data) or 2-word wait/move pairs
Sprite control words: SPRxPOS/SPRxCTL word pairs, attached sprite mode detection
Blitter minterm lookup tables: Precomputed blitter operation descriptions
Audio sample tables: Period/waveform pointer/volume structures for music replayers
Custom module formats: Pattern data, sample lists, effect command tables for Protracker/Soundtracker variants
Bitmap/bitplane layouts: Interleaved vs linear, planar depth detection from blitter source/dest usage
Custom BSS-like allocations: Large zeroed regions used as frame buffers, audio buffers, or look-up tables

Phase 5: Hardware Interaction Mapping

For each custom chip register touched, document:

Which register (address)
From where (code location)
In what sequence (interaction with other register writes)
Purpose (deduced from context: blitter setup, copper list switch, audio start, sprite positioning)

Build a hardware register access matrix:

Register	Writes From	Reads From	Deduced Purpose
`$DFF058` (BLTCON0)	`$01234`, `$05678`	—	Blitter operation setup
`$DFF096` (DMACON)	`$00123`	`$04567`	DMA channel enable/disable
...	...	...	...

Phase 6: Annotate

Rename functions: Descriptive names based on deduced purpose (vbl_irq_handler, blitter_queue_submit, copper_list_build)
Add comments: Document register conventions at function entry, magic constants, hardware register purposes
Create struct types: For custom data structures discovered in Phase 4
Mark non-code regions: Force IDA/Ghidra to treat copper lists, sprite data, audio samples as data, not code
Cross-reference hardware registers: Create named constants for all $DFFxxx/$BFExxx addresses in the database
Build a call graph: Mermaid diagram of the full control flow for documentation

Phase 7: Dynamic Verification

Breakpoint on custom chip registers: Verify that register writes occur at expected times
Watchpoint on memory buffers: Confirm copper list format, audio sample layout
Trace mode: Follow execution through a single frame to verify control flow reconstruction
Modify-and-test: Patch the binary and run it — if it breaks, your understanding was incomplete
Compare static vs dynamic: Does the code path you predicted match what actually executes?

Tool-Specific Workflows

IDA Pro

Ghidra

FS-UAE Debugger

Command-Line Pre-Analysis Pipeline

Best Practices

Never assume the ABI — document the actual calling convention before tracing callers
Start from the entry point and work outward — don't try to understand everything at once
Identify hardware register usage before control flow — knowing which chips are used narrows the purpose
Treat every MOVE to an absolute address as a potential self-modifying code write — until proven otherwise
Build a mermaid diagram of the control flow — it reveals dead code, missing connections, and loop structures
Cross-reference relocation entries with code — relocs tell you which addresses matter
Don't trust auto-analysis on mixed code/data sections — manually define code/data boundaries
Run the binary in an emulator — some behaviors (self-modifying code paths, copper effects) are invisible in static analysis
Look for known signatures first — Protracker replayers, decrunch stubs, common macro libraries leave distinctive patterns
Document your register map as you work — it prevents costly re-analysis when you realize A3 was actually a struct pointer

Antipatterns

1. The Compiler Assumption

Wrong: Assuming A6 holds a library base, D0/D1 are scratch, and A0/A1 are pointer temps.

Why it fails: Hand-written code may use A6 as a general-purpose data register, D6 as a frame pointer, or any other non-standard assignment. The author may have declared their own calling convention documented nowhere.

2. The Prologue Scanner

Wrong: Scanning for LINK A5 or SUBQ.L #N,SP to find function boundaries.

Why it fails: Hand-written assembly may have no standard function entry/exit markers. A routine might start with MOVEM.L, a label, or just fall through from the previous block.

3. The String Hop

Wrong: Assuming LEA _string(PC), A0 means A0 points to a C string.

Why it fails: Hand-written code may use LEA to point to bytecode tables, sprite data, copper lists, or packed structures. The "string" might be a custom encoding.

4. The Register Reuse Confusion

Wrong: Assuming a register used in one context retains the same meaning throughout the program.

Why it fails: Hand-written asm aggressively reuses registers. The same D0 might be a loop counter in one block, an audio sample value in the next, and a scratch temporary in a third — all within 50 instructions. You must re-derive register meaning at each basic block.

5. The Disassembly Loop Trap

Wrong: Letting IDA's auto-analysis recursively disassemble from every possible entry point.

Why it fails: Mixed code/data sections cause IDA to decode data as instructions, creating phantom functions from copper lists or audio samples. This pollutes the symbol table with nonsense and obscures real control flow.

6. The Constant-as-Code Mistake

Wrong: Treating jump table offsets, copper list data, or sprite control words as instructions.

Why it fails: IDA/Ghidra don't know the difference between $0180 (a copper WAIT for line 0) and MOVE.B D0, D0 (which happens to encode as $1000). Without manual intervention, hardware data tables get disassembled into garbage.

7. The One-Pass Delusion

Wrong: Attempting linear top-to-bottom analysis and expecting to understand everything on the first pass.

Why it fails: Hand-written asm often uses forward references, self-modifying code patched by an earlier init routine, or data tables that only make sense after you understand the code that consumes them. Reverse engineering is inherently iterative.

8. The MOVEM Black Box

Wrong: Treating MOVEM.L D0-D7/A0-A6, -(SP) / MOVEM.L (SP)+, D0-D7/A0-A6 as opaque blocks.

Why it fails: Understanding which registers are saved and restored tells you the function's register contract. A routine that saves D5-D7/A4-A5 preserves those across its call — they likely carry important state (frame counter, hardware base pointer, data cursor).

Pitfalls

1. Assuming the OS Is Present

; This works on a running system:
MOVE.L  4.W, A6            ; SysBase
JSR     LVO(-198, A6)      ; OpenLibrary

; But in a bootblock or demo, $4.W may contain garbage
; and libraries haven't been initialized yet.
; The code might be:
MOVE.L  #$DFF000, A5       ; custom chip base, not SysBase
JSR     _custom_init(PC)   ; custom initialization

2. Misreading Jump Tables

Hand-written jump tables frequently use PC-relative indirect jumps with custom offsets that IDA doesn't auto-resolve.

3. Self-Modifying Code Deception

; The code you see is NOT what executes:
MOVE.W  #$4E71, (next_insn+2, PC)  ; patch a NOP into the next instruction
next_insn:
CMPI.W  #$0000, D0                  ; becomes NOP at runtime

4. Copper List Misidentification

Copper instructions are 2-word pairs that look like MOVE instructions in disassembly:

; A copper list at $20000 decoded as instructions by IDA:
; DC.W $0180, $0000  → OR.B #$80, D0 / OR.B #0, D0   (garbage!)
; DC.W $0182, $0FFF  → OR.B #$82, D0 / OR.B #$FF, D0  (more garbage)
; DC.W $FFFF, $FFFE  → invalid opcode or data
;
; Correct interpretation:
; $0180, $0000 = WAIT for line 0 (VP=$00, HP=$00)
; $0182, $0FFF = WAIT for line 0, HP=$0F (standard copper wait)
; $FFFF, $FFFE = END of copper list (WAIT forever — never triggers)

Detection methodology:

COP1LC/COP2LC writes give you the copper list address — start your data definition there
Copper instructions come in pairs of 16-bit words. IR1 (first word) encodes the operation or register address; IR2 (second word) is the data or WAIT position.
WAIT: IR1 bit 0 = 1. Decode VP (bits 8–15 of IR1, bits 0–7 of IR2), HP (bits 1–7 of IR1, bits 8–15 of IR2).
MOVE: IR1 bit 0 = 0. IR1 is the register address ($DFFxxx), IR2 is the value to write.
A $FFFF, $FFFE pair terminates the list.
Mark the entire copper list address range as data, not code. Create an array of 4-byte copper instruction structs in IDA/Ghidra.

5. CIA Timer Code Confusion

CIA register access ($BFE001–$BFEF01 for CIAA, $BFD000–$BFDFFF for CIAB) looks like any other memory access, but the TOD clock read sequence and timer control register patterns are distinctive:

; CIA-A Timer A setup (often used for timing in games/demos):
MOVE.B  #$7F, $BFEE01      ; CIA-A ICR — clear all pending interrupts
MOVE.B  #$81, $BFEE01      ; CIA-A ICR — enable Timer A interrupt
MOVE.B  #low_byte, $BFE401 ; CIA-A Timer A low byte
MOVE.B  #high_byte, $BFE501 ; CIA-A Timer A high byte

; CIA-B Timer A/B setup (used by Protracker replayers!):
MOVE.B  #$7F, $BFDD00      ; CIA-B ICR — clear pending
MOVE.B  #$81, $BFDD00      ; CIA-B ICR — enable Timer A
MOVE.B  #lo, $BFDE00       ; CIA-B Timer A low (adjacent to CIA-B base $BFD000)

; Common mistake:
; MOVE.B $BFE801, D0 → reading CIAA SDR (serial data register) — could be
; mistaken for keyboard data, but it's actually the serial port.
; Keyboard data is $BFEC01 (CIAA parallel port).

Key CIA registers for RE identification:

Register	Address	Purpose
CIAA ICR	`$BFEE01`	Interrupt Control Register — enables/disables CIA-A interrupts
CIAA Timer A Lo	`$BFE401`	Timer A low byte
CIAA Timer A Hi	`$BFE501`	Timer A high byte
CIAB ICR	`$BFDD00`	Interrupt Control Register — enables CIA-B interrupts (used by Protracker!)
CIAB Timer A Lo	`$BFDE00`	Timer A low byte (Protracker tempo control)
CIAB Timer A Hi	`$BFDF00`	Timer A high byte

6. Blitter Queue Confusion

Blitter register writes (BLTCON0, BLTSIZE, etc.) look like ordinary memory stores to IDA. Without understanding that these are I/O registers, the disassembly shows meaningless MOVE.W D0, abs_addr sequences:

; This looks like garbage writes to random addresses:
MOVE.W  #$09F0, $DFF040     ; BLTCON0 = use A,B,C channels, minterm=$F0
MOVE.W  #$0000, $DFF042     ; BLTCON1 = no fill, no line mode
MOVE.W  #$FFFF, $DFF044     ; BLTAFWM = first word mask (all bits)
MOVE.W  #$FFFF, $DFF046     ; BLTALWM = last word mask (all bits)
MOVE.L  #src, $DFF050       ; BLTAPT = source A pointer
MOVE.L  #dst, $DFF054       ; BLTDPT = destination D pointer
MOVE.W  #0, $DFF064         ; BLTAMOD = source A modulo (0 = linear)
MOVE.W  #0, $DFF066         ; BLTDMOD = dest D modulo
MOVE.W  #(h<<6)|w, $DFF058  ; BLTSIZE = start blit! (writing this triggers DMA)

; But this is a standard blitter rectangle copy. The register write ORDER
; is fixed: BLTCON0→BLTCON1→BLTAFWM→BLTALWM→Pointers→Modulos→BLTSIZE.
; BLTSIZE is always LAST — writing it starts the blit.

How to identify a blitter operation:

The sequence always ends with a write to $DFF058 (BLTSIZE) — this is the trigger
BLTCON0 ($DFF040) encodes the minterm and active channels (bits 8–15 = minterm, bit 12=D, bit 11=C, bit 10=B, bit 9=A)
Pointer registers ($DFF048–$DFF054) hold source/destination addresses — these are your key to understanding what data is being moved
The blit size (h<<6)|w in BLTSIZE: height in upper 10 bits, width in lower 6 bits (width is in words, 0 = 64 words)
Blitter wait: BTST #6, $DFF002 (bit 6 of DMACONR = DMAB_BLTDONE) — polls until blitter finished

7. MOVEM Register Tracking Across Long Spans

8. Code Embedded in Interrupt Vector Table

9. Dual-Playfield Register Set Confusion

10. Stack-Based State Machines

Some hand-written code uses the stack as a state machine — pushing return addresses that represent state transitions, using RTS as a computed goto:

; Instead of a switch statement, the author pushes state transition addresses:
    MOVE.L  #STATE_IDLE, -(SP)    ; push initial state
    ...
STATE_DISPATCH:
    RTS                           ; "return" to the state on top of stack

STATE_IDLE:
    ; ... handle idle ...
    MOVE.L  #STATE_PLAYING, -(SP) ; push next state
    BRA     STATE_DISPATCH

STATE_PLAYING:
    ; ... handle playing ...
    MOVE.L  #STATE_PAUSED, -(SP)  ; push next state
    BRA     STATE_DISPATCH

This pattern breaks all standard call/return analysis because RTS doesn't return to a caller — it jumps to the next state. IDA/Ghidra see RTS as a function exit and stop disassembling.

Detection: Look for MOVE.L #addr, -(SP) or PEA addr(PC) (push effective address) followed by RTS (or a branch to an RTS). These are state pushes, not function call setups.

11. Absolute Address Dependencies

Code that assumes a fixed load address (common in non-HUNK binaries) will break if relocated. For HUNK binaries, relocation entries tell you which absolute addresses must be patched at load time. Non-HUNK binaries lack relocation metadata entirely.

; Absolute dependency example — works only at $C00000:
    LEA     $C01000, A0             ; data at fixed offset from load address
    JSR     $C00500                 ; subroutine at fixed address within binary

; For a HUNK binary, these would be:
    LEA     _data(PC), A0           ; PC-relative (no relocation needed)
    JSR     _subroutine(PC)         ; PC-relative

Critical: Bootblock code at $7C00 uses absolute JMP/JSR within the 1024-byte range. If you relocate the code for analysis, patch all absolute addresses or analyze in-place at the original address.

Use-Case Cookbook

Pattern 1: Finding the Main Loop in a Demo

Pattern 2: Identifying a Custom Interrupt Handler

Pattern 3: Reconstructing a Jump Table

Pattern 4: Detecting Self-Modifying Code with IDAPython

Pattern 5: Identifying a Protracker Replay Routine

The most commonly found audio subsystem in Amiga binaries. Here's the full identification workflow:

Find the CIA interrupt vector write: Search for MOVE.L #xxx, $6C.W — this installs the Level 6 (CIA-B timer) interrupt handler used by Protracker for tempo.
Identify the CIA-B timer setup: MOVE.B #$7F, $BFDD00 / MOVE.B #$81, $BFDD00 — this configures CIA-B to generate timer interrupts.
Trace to the interrupt handler: The handler saves ALL registers (MOVEM.L D0-D7/A0-A6, -(SP)), calls the replayer tick function, then restores all and does RTE.
Find the audio register writes: Look for writes to $DFF0A0–$DFF0D0 (AUDxLCH/LCL/PER/VOL). The pattern MOVE.L sample_ptr, $DFF0A0 / MOVE.W period, $DFF0A6 / MOVE.W vol, $DFF0A8 is the per-channel audio update.
Identify effect command dispatch: A MOVE.W effect_cmd, D0 / ANDI.W #$0F, D0 / ADD.W D0, D0 / JMP (effect_table, D0.W) pattern dispatches to arpeggio, portamento, vibrato, etc. handlers.
Map the pattern data layout: The replayer reads pattern data via sequential MOVE.B (A0)+ — map the track/note mapping. Standard format: 4 bytes per note (upper nibble = sample number, lower 12 bits = period).

IDA Python script fragment to auto-detect Protracker replayers:

# Search for the Level 6 vector installation pattern:
# MOVE.L #handler, $6C.W = 21FC xxxx xxxx 006C
ea = idaapi.find_binary(0, BADADDR, "21 FC ?? ?? ?? ?? 00 6C", 16, SEARCH_DOWN)
if ea != BADADDR:
    handler = Dword(ea + 2)
    print(f"Found Level 6 interrupt handler at ${ea:08X} → ${handler:08X}")

Pattern 6: Reversing a Bootblock Virus

Bootblock viruses are the ideal entry point for learning Amiga RE — they're small (1024 bytes), self-contained, and exercise key system mechanisms:

Lamer Exterminator (October 1989)

Size: 1024 bytes (exactly 2 disk blocks)
Residence: Installs itself in memory, hooks system vectors
Infection vector: Writes itself to any write-enabled disk's bootblock during disk access
Damage routine: After activation, overwrites victim bootblocks 84 times with the string "LAMER!" — this trashes the disk
CoolCapture: Uses the CoolCapture vector for post-reset survival — after a warm reset, the virus re-activates from the captured state
Detection text: Sometimes leaves identifiable strings in the bootblock

SADDAM Bootblock Virus

Size: 1024 bytes
Residence: Copies itself to $7F000 in memory (just below the 512KB Chip RAM boundary)
Interrupt hooking: Hooks Level 3 interrupt (Vertical Blank/Copper/Blitter) via the interrupt vector table
Infection trigger: First "read Rootblock" command after a reset — this infects any disk accessed after boot
Stealth: Writes the original bootblock back to disk when the rootblock is read (hiding its presence)
System modification: Clears CoolCapture, KickTagPtr, and KickCheckSum — disables the system's ability to detect bootblock changes
Anti-detection text: Contains the misleading string "A2000 MB Memory Controller V2" to disguise itself as a hardware ROM
Damage trigger: After ~30,000 interrupt calls, crashes the system by showing an alert in a Level 3 interrupt context

Common Virus RE Workflow

Extract the bootblock: The first 1024 bytes of an infected disk (blocks 0–1)
Determine load address: Bootblocks are loaded to $7C00 by the Kickstart ROM
Identify the infection mechanism: Look for DoIO() / SendIO() calls to trackdisk.device for writing back to disk
Find the residency mechanism: CoolCapture, KickTagPtr manipulation, or RAM copy to $7F000 + vector hooking
Trace the trigger condition: What event activates the virus? Timer count, disk access count, specific command?
Document the payload: Does it corrupt data? Display a message? Overwrite bootblocks?

Pattern 7: Finding the Decrunch Stub in a Packed Demo

The decrunch stub is the gateway to the real binary. Finding and understanding it is prerequisite to all further analysis:

Identification by signature:

Cruncher	Magic/Pattern	Decrunch Stub Size	Notes
PowerPacker	`$42` followed by LEA/MOVE pattern near entry	~200–300 bytes	Uses powerpacker.library; `ppDecrunch()` is the library call
Imploder	Entry has `MOVE.L D0, -(SP)` / `LEA xxx(PC), A0` pattern	~300–400 bytes	ATN!Imploder; slower decompression, better ratio than early PP
Shrinkler	Entry starts with context-mixing setup code	~2KB	Extremely high ratio; decrunch takes minutes on 7 MHz 68000
ByteKiller	Short BRA.S over header data, then MOVEM.L pattern	~100 bytes	Simple LZ variant; very common in 1988–1991 productions
CrunchMania	String `"CR!"` at or near entry	~150 bytes	Fastest decruncher; popular for 4K intros

Decrunch strategy:

Identify the stub: The first code that executes after the entry point. It reads packed data and expands it to a destination address.
Let the stub run in an emulator: Set a breakpoint after the decrunch loop completes (look for the JMP or JSR to the unpacked entry point).
Dump the decrunched memory: The real binary is now in RAM. Save it for static analysis.
Optionally: Write an unpacker script — for known formats, run the original cruncher's decruncher against the packed data in a standalone tool.

Rob Northen Copylock / Trace Vector Decoder (TVD): A special case that appears like a cruncher but is actually a protection system:

Encrypted code is executed one instruction at a time using the 68000 trace exception
The trace handler (interrupt vector $24) decrypts the next instruction, executes it, then sets the trace bit again
This prevents static disassembly — you only see the encrypted bytes and the trace handler, not the real code
Detection: MOVE #$8000, SR (set trace bit), ORI #$8000, SR in the entry code, plus a custom handler at vector $24
Solution: Let it execute in FS-UAE with a trace logger, or single-step through and record each decrypted instruction

Pattern 8: Identifying a Custom Memory Allocator

Pattern 9: Reconstructing a Blitter Queue

Pattern 10: Recovering a Sprite Multiplexer

Pattern 11: Extracting a Custom Module Replayer

Pattern 12: Tracing a Trackloader

Real-World Examples

Demo Productions — RE Challenge Highlights

Production	Group	Year	Key RE Challenge
Arte	Sanity	1993	Dense blitter queue system; effects dispatched via jump table with per-effect copper list switching; multi-part architecture with custom module loader
Desert Dream	Kefrens	1993	Multi-part trackmo with per-part custom loaders; heavy copper wizardry (raster bars, palette splits, sprite multiplexing); custom Protracker variant replayer
Nexus 7	Andromeda	1994	3D vector engine with custom math routines (no FPU); object system with update/render phases; blitter-filled polygons
Enigma	Phenomena	1991	Modular effect system — each effect is a self-contained subroutine registered in a dispatch table; custom memory management across effect transitions
State of the Art	Spaceballs	1992	Morphing effects, rotate-zoomer, vector balls; heavy use of precomputed tables; custom blitter queue for compositing
Hardwired	Crionics & Silents	1991	Early 3D vector engine; spreadsheet-generated sine tables identified by their perfect mathematical precision; copper-chunky display mode simulation

Games

Title	Year	Key RE Challenge
Shadow of the Beast	1989	13-level parallax scrolling using dual playfield + sprite overlays; custom blitter queues for sprite rendering; 512-color still images via palette-split copper lists
Turrican II	1991	Sprite multiplexer with 20+ sprites on screen; copper-driven status bar split; large state machine for enemy AI
Lotus Turbo Challenge 2	1991	Software road rendering with copper sky gradient; blitter-driven car sprite compositing; 2-player split-screen via copper screen split
Cannon Fodder	1993	OS-friendly (uses graphics.library!) but still hits hardware for scrolling; custom memory allocator for soldier/bullet objects
Pinball Dreams	1992	Multi-ball physics engine; copper-driven score display; custom module replayer with sound effects mixing into music channels

Bootblock Intros — The Art of 1024 Bytes

Bootblock intros compress entire demoscene effects into two disk sectors:

Red Sector Inc. (RSI) bootblocks: Often include a simple scrolltext, starfield, and a logo — all in 1024 bytes of raw m68k
Tristar & Red Sector Inc. (TRSI) bootblocks: More advanced effects (copper bars, vector objects)
SADDAM virus: A case study in anti-RE techniques within a bootblock — misleading strings, interrupt hooking, stealth write-back
Lamer Exterminator: The most infamous Amiga virus, studied for its CoolCapture survival mechanism

Cross-Platform Comparison

Platform	Assembly RE Challenge	Amiga Analog
C64 (6502)	Zero-page usage, self-modifying code, raster interrupts	Custom chip register banging, copper-synced code
Atari ST (68000)	Similar CPU but different hardware registers	Amiga custom chips vs ST's simpler shifter/blitter
DOS (x86)	Segment:offset addressing, BIOS/DOS interrupt vectors	Amiga library JMP tables, exec interrupt vectors
NES (6502)	Tight mapper constraints, PPU timing loops	Similar raster-sync challenges in demos
Arcade (68000)	Shared CPU family, custom hardware	Same CPU, different memory maps and custom chips
SNES (65816)	Hardware register banging, HDMA (like copper)	Copper list is the direct analog of SNES HDMA channels
Genesis/Mega Drive (68000)	Same CPU, VDP register interface, Z80 coprocessor	Closest analog — 68000 + custom video hardware, similar register-banging style
Game Boy (Z80-like)	Tight memory (8KB), scanline interrupts, OAM DMA	Similar to bootblock constraints — extreme optimization in tiny space

Historical Context — Why Hand-Written Assembly Dominated

Before 1990, there were few practical alternatives to assembly for Amiga software that needed to be fast:

Factor	Detail
Compiler quality	Pre-SAS/C 5.x compilers (Lattice C, Manx Aztec C, early SAS/C) generated code 5–20× slower than hand-tuned assembly for graphics/audio
Hardware gap	A 7 MHz 68000 with 512 KB Chip RAM had zero margin for inefficient code — games and demos needed every CPU cycle
OS overhead	The AmigaOS graphics.library added measurable overhead (layer locking, clipping rectangle checks). Games bypassed it entirely and wrote directly to `$DFFxxx` registers
Demoscene culture	Assembly was the "real" language of the demoscene. Using a compiler was considered lazy — the code itself was the art form
Size constraints	Bootblocks (1024 bytes), 4K intros, and single-disk demos imposed hard size limits. Assembly gave precise control over every byte
Custom chip intimacy	Copper lists, blitter queues, and audio DMA are fundamentally low-level. High-level languages abstracted away the very features that made Amiga programming distinctive

The transition: By 1992–1994, faster CPUs (68020+), more RAM, and mature compilers (SAS/C 6.x, GCC 2.95.x) made C viable for commercial software. But the demoscene stayed with assembly into the late 1990s — and AGA productions on 68060 accelerators continue to use hand-written assembly today.

Modern Analogies

Hand-Written Asm Concept	Modern Analogy	Where It Holds / Breaks
Cycle-counted raster effects	GPU fragment shader dispatch	Holds: per-pixel/per-scanline execution; breaks: asm is imperative timing, shaders are data-parallel
Custom blitter queue	GPU command buffer / DMA transfer list	Holds: structured descriptor-based hardware offload; breaks: blitter is in-order, GPUs reorder
Hardware register banging	MMIO device drivers in embedded systems	Holds: same concept — memory-mapped I/O; breaks: Amiga registers are video/audio, not peripherals
Self-modifying code	JIT compilation (V8, LuaJIT, WASM)	Holds: code generation at runtime; breaks: SMC patches existing code, JIT generates new code
Copper list	G-sync / FreeSync adaptive refresh + shader constants per scanline	Holds: timing-sensitive display updates; breaks: copper is a programmable coprocessor, not a protocol
Stack-based state machine	Coroutine dispatch / async/await	Holds: non-linear control flow; breaks: stack manipulation vs language-level async
Position-independent code	ASLR + PIE executables	Holds: same goal (run anywhere); breaks: asm PIC is manual, modern PIC is linker/loader assisted

FAQ

Q1: How do I know if a function is an interrupt handler vs a regular subroutine?

Q2: What's the best way to detect self-modifying code?

Q3: How do I handle code that mixes data and instructions?

Q4: How do I tell code from data in a mixed section?

Q5: How do I handle encrypted or obfuscated code?

Q6: How do I deal with copper-synced code?

Q7: What about self-relocating code?

Q8: How do I identify custom chip register usage patterns?

Q9: Why do I see `MOVE.W D0, $DFF000` — absolute short addressing to custom registers?

Q10: How do I trace blitter operations without hardware?

Q11: What's the difference between a software interrupt and a hardware interrupt in the code?

Q12: How do I identify which demo group or author wrote this?

Q13: How do I reverse engineer an audio driver / module replayer?

Q14: What do I do when IDA creates 500 phantom functions from copper data?

FPGA / Emulation Impact

References

m68k_codegen_patterns.md — Compiler codegen fingerprint catalog
compiler_fingerprints.md — Compiler identification at a glance
string_xref_analysis.md — String cross-reference methodology
hunk_reconstruction.md — HUNK binary reconstruction
struct_recovery.md — Struct layout reconstruction
api_call_identification.md — Library call recognition
exe_crunchers.md — Decruncher identification and unpacking
code_vs_data_disambiguation.md — distinguishing code bytes from data/variables
copper_programming.md — Copper list format and programming
blitter_programming.md — Blitter operation reference
paula_audio.md — Audio hardware register reference
custom_registers.md — Complete custom chip register map
M68000 Family Programmer's Reference Manual — Instruction set and timing
Amiga Hardware Reference Manual — Custom chip register map and DMA cycles
Amiga Disk Drives Inside & Out (Abt Electronics) — Trackloader and MFM encoding reference

54 KiB Raw Permalink Blame History Unescape Escape