docs(amiga): add Tier 4 content — AHI, cross-compilation, RTG, demoscene section

- New: 11_libraries/ahi_programming.md — AHI retargetable audio API - New: 13_toolchain/cross_compilation_guide.md — cross-compiling for Amiga - New: 08_graphics/rtg_programming.md — RTG Picasso96/CyberGraphX programming - New: 17_demoscene/ — full demoscene techniques section: - copper_effects.md (6 techniques, 10 Pouet screenshots, antipatterns) - sprite_techniques.md (5 techniques, antipatterns) - pixel_tricks.md (5 techniques, antipatterns) - 3d_rendering.md (fixed-point math, 4 techniques, antipatterns) - timing_optimization.md (7 techniques, instruction timing tables) - README.md (section index with Mermaid diagrams) - images/ (10 authentic Amiga screenshots from Pouet.net) - New: 05_reversing/games/ (4 copper-analysis screenshots) - Updated: README index, TODO status (30/30 complete) - Added external references: Pouet/Demozoo links, Scoopex YouTube tutorial series, Amiga Graphics Archive, coppershade.org
2026-07-28 04:36:28 +00:00 · 2026-05-13 17:49:28 -04:00 · 2026-05-13 17:49:28 -04:00 · f8f8d1c834
commit f8f8d1c834
parent 7e327e6640
31 changed files with 5433 additions and 15 deletions
--- a/05_reversing/games/Agony_Level1_Horizon.png
+++ b/05_reversing/games/Agony_Level1_Horizon.png
--- a/05_reversing/games/Agony_Level2_Horizon.png
+++ b/05_reversing/games/Agony_Level2_Horizon.png
--- a/05_reversing/games/BioChallenge.png
+++ b/05_reversing/games/BioChallenge.png
--- a/05_reversing/games/Starray.png
+++ b/05_reversing/games/Starray.png
--- a/08_graphics/README.md
+++ b/08_graphics/README.md
@ -22,4 +22,4 @@ The Amiga graphics system is built on custom DMA-driven hardware (Agnus/Alice +
 | [text_fonts.md](text_fonts.md) | TextFont bitmap layout, baseline rendering, algorithmic styles, AvailFonts, ColorTextFont, Compugraphic outlines, font scaling, 3 cookbooks, 4 antipatterns, 6 FAQ |
 | [pixel_conversion.md](pixel_conversion.md) | Chunky ↔ Planar conversion deep dive: naive, merge/butterfly (Kalms), Copper Chunky, Akiko hardware, Blitter-assisted, RTG bypass, SoA/AoS theory, GPU swizzle modern parallels |
 | [animation.md](animation.md) | GEL system deep dive: BOBs, VSprites, AnimObs, hardware foundation (Blitter/Copper/Sprite interaction), collision detection, double buffering, performance tuning |
-| [rtg_programming.md](rtg_programming.md) | Retargetable Graphics (CyberGraphX/Picasso96): Planar vs Chunky, LockBitMapTags, Pixel Formats, Direct VRAM rendering |
+| [rtg_programming.md](rtg_programming.md) | Retargetable Graphics (CyberGraphX/Picasso96): LockBitMapTags VRAM access, pixel formats, double buffering, bus bandwidth, 16-card catalog, 5 antipatterns, dirty-rect optimization, modern analogies, 6 FAQ |
--- a/08_graphics/blitter_programming.md
+++ b/08_graphics/blitter_programming.md
@ -978,3 +978,4 @@ The Blitter is one of the most complex subsystems to reproduce accurately in an
 - See also: [rastport.md](rastport.md) — RastPort drawing context (uses Blitter for all draw operations)
 - See also: [display_modes.md](display_modes.md) — DMA slot budget (Blitter competes for bus bandwidth)
 - See also: [Akiko — CD32 C2P](../01_hardware/aga_a1200_a4000/akiko_cd32.md) — hardware Chunky-to-Planar conversion (CD32 alternative to CPU/Blitter C2P)
 - **Scoopex Amiga Hardware Programming** (Photon) — [YouTube: Blitter episodes](https://www.youtube.com/playlist?list=PLc3ltHgmiidpK-s0eP5hTKJnjdTHz0_bW) — Video walkthroughs of Blitter setup, cookie-cut masking, line draw, and fill mode. Companion articles: [coppershade.org](http://coppershade.org/articles/)
--- a/08_graphics/copper_programming.md
+++ b/08_graphics/copper_programming.md
@ -318,3 +318,4 @@ Reposition sprites mid-frame to display more than 8 sprites:
 - [copper.md](../01_hardware/ocs_a500/copper.md) — register-level reference
 - [copper.md](copper.md) — graphics.library UCopList API
 - [Video Signal & Timing](../01_hardware/common/video_timing.md) — beam counters, scanline anatomy, clock tree
 - **Scoopex Amiga Hardware Programming** (Photon) — [YouTube: Copper tutorials](https://www.youtube.com/playlist?list=PLc3ltHgmiidpK-s0eP5hTKJnjdTHz0_bW) — Video walkthroughs of copper list construction, copper bars, and raster effects. Companion articles: [coppershade.org](http://coppershade.org/articles/)
--- a/08_graphics/rtg_programming.md
+++ b/08_graphics/rtg_programming.md
@ -437,3 +437,209 @@ Some RTG cards cannot coexist with native chipset output:
 | **Radeon (PCI)** | Yes | No native output | Pure RTG; no Amiga video |
 If your application relies on dragging screens between native and RTG displays, test on hardware that supports it — or restrict usage to Picasso96/CyberGraphX cards with known switcher support.
 ---
 ## Named Antipatterns
 ### "The Infinite Lock" — Holding VRAM Lock Across Wait
 ```c
 /* BAD: Holding the bitmap lock while waiting for the next frame.
   The RTG driver cannot perform screen depth arrangement or
   VRAM defragmentation while the lock is held. Some drivers
   degrade multitasking severely or deadlock entirely. */
 lock = LockBitMapTags(bm, LBMI_BASEADDRESS, &vram, TAG_DONE);
 RenderFrame(vram);
 Wait(1L << win->UserPort->mp_SigBit);  /* DEADLOCK RISK */
 UnLockBitMap(lock);
 ```
 ```c
 /* CORRECT: Render, unlock, then wait */
 lock = LockBitMapTags(bm, LBMI_BASEADDRESS, &vram, TAG_DONE);
 RenderFrame(vram);
 UnLockBitMap(lock);              /* unlock FIRST */
 Wait(1L << win->UserPort->mp_SigBit);  /* safe to wait */
 ```
 ### "The Pitch Assumption" — Hardcoding Row Stride
 ```c
 /* BAD: Assuming pitch equals width * bytes_per_pixel.
   Graphics cards align scanlines to 16/32/64 byte boundaries.
   This produces diagonal tearing or garbage rendering. */
 UBYTE *row = vram + (y * screen->Width * 4);  /* WRONG */
 ```
 ```c
 /* CORRECT: Always use the pitch returned by LockBitMapTags */
 UBYTE *row = vram + (y * pitch);  /* pitch from LBMI_BYTESPERROW */
 ```
 ### "The Zorro II Crawl" — Full-Screen 32-bit Over Slow Bus
 ```c
 /* BAD: Pushing a full 800×600 32-bit frame over Zorro II every frame.
   Zorro II bandwidth: ~3.5 MB/s.
   Frame size: 800×600×4 = 1.92 MB.
   Result: ~1.8 FPS — unusable. */
 for (y = 0; y < 600; y++)
    for (x = 0; x < 800; x++)
        ((ULONG*)(vram + y * pitch))[x] = pixel;
 ```
 ```c
 /* CORRECT: Use dirty rectangles — only update what changed.
   Or use 8-bit indexed mode for high-res on Zorro II. */
 for (int i = 0; i < numDirtyRects; i++)
 {
    struct Rect *r = &dirtyRects[i];
    for (y = r->y1; y < r->y2; y++)
        for (x = r->x1; x < r->x2; x++)
            ((ULONG*)(vram + y * pitch))[x] = pixel;
 }
 ```
 ### "The Endianness Surprise" — Assuming Native Byte Order
 ```c
 /* BAD: Assuming 32-bit pixels are in Amiga big-endian ARGB order.
   Many PCI graphics cards (via Mediator) expose little-endian
   BGRA format. The pixel you see as 0x00FF0000 (red in ARGB)
   might actually render as blue on a little-endian card. */
 ULONG red = 0x00FF0000;  /* might be blue! */
 ```
 ```c
 /* CORRECT: Query the pixel format and adapt */
 ULONG red;
 switch (format) {
    case PIXFMT_ARGB32: red = 0x00FF0000; break;
    case PIXFMT_BGRA32: red = 0x000000FF; break;
    case PIXFMT_RGB24:  /* 3 bytes per pixel */ break;
    default: red = 0; break;
 }
 ```
 ### "The Silent AllocBitMap Fail" — VRAM Exhaustion
 ```c
 /* BAD: Not checking if AllocBitMap actually allocated in VRAM.
   When VRAM is exhausted, AllocBitMap silently falls back to
   Fast RAM. Your bitmap is now in system RAM — every pixel write
   crosses the Zorro bus TWICE (CPU write + DMA blit to card). */
 struct BitMap *bm = AllocBitMap(800, 600, 32, BMF_DISPLAYABLE, NULL);
 /* No check — might be in Fast RAM! */
 ```
 ```c
 /* CORRECT: Verify the bitmap is in VRAM */
 struct BitMap *bm = AllocBitMap(800, 600, 32, BMF_DISPLAYABLE, NULL);
 if (!bm) { /* total failure */ }
 /* Check if it's actually in VRAM */
 LONG flags = (LONG)GetAttr(P96GA_BitMapFlags, bm);
 if (!(flags & BMF_HARDWARE))
 {
    /* Fallback to system RAM — warn user or reduce resolution */
    Printf("Warning: no VRAM — falling back to system RAM\n");
 }
 ```
 ---
 ## Historical Context & Modern Analogies
 ### RTG Evolution Timeline
 ```mermaid
 timeline
    title Amiga RTG Evolution
    1985 : OCS — Native chipset only\nMax 640×256, 32 colors
    1990 : ECS — SuperHires, Productivity\nStill planar, still chip RAM
    1992 : AGA — 256 colors, HAM8\nPlanar bottleneck remains
    1994 : Picasso II — First P96 card\nCirrus GD5426, Zorro II, 2 MB
    1995 : CyberVision 64 — S3 Trio64\nZorro III, hardware blitter
    1996 : Picasso IV — Cirrus GD5446\n4 MB VRAM, built-in flicker fixer
    1997 : Mediator PCI — PC cards on Amiga\nVoodoo 3, Radeon support
    1998 : CyberVision PPC — Permedia 2\nCPU local bus, 3D acceleration
    2000 : Warp3D — 3D API for Amiga\nUses RTG cards for 3D rendering
    2005 : Picasso96 maintained by Individual Computers
 Modern P96 drivers for Radeon
    2010 : PiStorm + Radeon — ARM-accelerated\nFull 32-bit RTG on budget hardware
    2020 : Vampire + SAGA — FPGA RTG\nBuilt-in RTG core, DVI output
 ```
 ### Modern Analogies
 | Amiga RTG Concept | Modern Equivalent | Notes |
 |-------------------|-------------------|-------|
 | `LockBitMapTags()` | `SDL_LockSurface()` / `ID3D11Texture2D::Map()` | Direct VRAM access |
 | `UnLockBitMap()` | `SDL_UnlockSurface()` / `Unmap()` | Release VRAM lock |
 | `LBMI_BYTESPERROW` (pitch) | `SDL_Surface.pitch` / `D3D11_TEXTURE_DATA.rowPitch` | Scanline stride |
 | `PIXFMT_ARGB32` | `DXGI_FORMAT_B8G8R8A8_UNORM` | Pixel format negotiation |
 | `AllocBitMap(BMF_DISPLAYABLE)` | `SDL_CreateTexture(streaming)` | GPU-accessible allocation |
 | `ChangeScreenBuffer()` (flip) | `SDL_RenderPresent()` / `IDXGISwapChain::Present()` | Double-buffer swap |
 | `BltBitMap()` in VRAM | GPU blit / `CopyResource()` | On-card data movement |
 | Zorro II bandwidth (3.5 MB/s) | PCIe x1 2.0 (500 MB/s) | Bus bottleneck |
 | `BMF_HARDWARE` flag | `D3D11_USAGE_DEFAULT` | VRAM-resident |
 | Fast RAM fallback | System RAM staging texture | Slow fallback path |
 | Picasso96 vs CyberGraphX | OpenGL vs DirectX | Competing API standards |
 ---
 ## Use Cases
 | Application | RTG Mode | Resolution | Notable Pattern |
 |-------------|----------|------------|-----------------|
 | **Workbench 3.5+** | 8/16-bit indexed | 800×600+ | Desktop on RTG card instead of custom chipset |
 | **Web browsers (IBrowse, AWeb)** | 24/32-bit true color | 1024×768 | Full-color image rendering requires chunky pixels |
 | **Image editors (Photogenics)** | 32-bit ARGB | 800×600+ | Per-pixel alpha blending in VRAM |
 | **Video players (AmigaAMP visuals)** | 16-bit RGB565 | 640×480 | Frame-by-frame VRAM blit with dirty rects |
 | **3D games (Wipeout 2097, Heretic II)** | 16/32-bit + Warp3D | 640×480 | GPU 3D rendering on Voodoo/Radeon |
 | **DTP (PageStream)** | 8-bit indexed | 1024×768 | High-res workspace with color preview |
 | **Terminal emulators** | 8-bit indexed | 1024×768 | Fast text blitting with hardware acceleration |
 | **Screen capture / recording** | 32-bit ARGB | Any | LockBitMapTags to read back VRAM |
 ---
 ## FAQ
 **Q: Should I target Picasso96 or CyberGraphX?**
 A: Target both — use `LockBitMapTags()` which is API-compatible between P96 and CGX. For mode enumeration and pixel format queries, detect which library is available at runtime and call the appropriate function. Most modern Amiga systems use P96.
 **Q: Can I use RTG and native chipset screens at the same time?**
 A: Yes — if your RTG card supports it (Picasso II/IV, CyberVision 64). The two outputs go to separate monitors, or the card's monitor switcher toggles between them. Some cards (CyberVision PPC, BlizzardVision) cannot display native chipset output at all.
 **Q: What's the minimum VRAM for a usable RTG display?**
 A: 2 MB is the practical minimum. An 800×600×16-bit display uses ~960 KB, leaving room for double-buffering and OS structures. With 1 MB, you're limited to 640×480×16-bit or 800×600×8-bit.
 **Q: Can I use LockBitMapTags on a native chipset bitmap?**
 A: No — `LockBitMapTags()` only works on RTG (chunky) bitmaps. For native chipset bitmaps, use standard `graphics.library` functions (`WritePixel()`, `BltBitMap()`) or direct planar manipulation. See [pixel_conversion.md](pixel_conversion.md) for planar↔chunky bridging.
 **Q: How do I detect if an RTG card is present?**
 A: Query the display database with `NextDisplayInfo()` and check for modes with the `DIPF_IS_RTG` flag. If any RTG modes exist, an RTG card is installed.
 **Q: Why does my RTG screen flicker when I drag it?**
 A: Some RTG cards don't support screen dragging (notably CPU-slot cards like CyberVision PPC). The Picasso IV and CyberVision 64 support it via hardware monitor switching. If dragging is not supported, `SA_Quiet, TRUE` prevents the drag bar from appearing.
 ---
 ## References
 ### SDK & Headers
 - `picasso96api.h` — Picasso96 API functions and constants
 - `cybergraphics.h` — CyberGraphX API functions and constants
 - `graphics/gfxbase.h` — GfxBase extensions for RTG support
 ### Related Knowledge Base Articles
 - [Pixel Conversion](pixel_conversion.md) — chunky↔planar conversion algorithms
 - [Display Modes](display_modes.md) — ModeID selection, display database
 - [Bitmap](bitmap.md) — BitMap structure, allocation, interleaved layout
 - [Views](views.md) — ViewPort/View display pipeline
 - [RTG Driver Development](../16_driver_development/rtg_driver.md) — writing RTG card drivers
 - [Screens](../09_intuition/screens.md) — screen opening with RTG DisplayIDs
 - [Bus Architecture](../01_hardware/common/bus_architecture.md) — Zorro II/III bandwidth limits
--- a/08_graphics/sprites.md
+++ b/08_graphics/sprites.md
@ -569,6 +569,7 @@ These must be set **every frame** — typically by the Copper list during vertic
 - NDK39: `graphics/sprite.h`, `hardware/custom.h`
 - ADCD 2.1: `GetSprite`, `MoveSprite`, `ChangeSprite`, `FreeSprite`, `AllocSpriteData`, `GetExtSprite`
 - Phaze101 — *Amiga 68000 Assembly Course* Chapter 7 (sprite programming tutorial)
 - **Scoopex Amiga Hardware Programming** (Photon) — [YouTube: Sprite episodes](https://www.youtube.com/playlist?list=PLc3ltHgmiidpK-s0eP5hTKJnjdTHz0_bW) — Video walkthroughs of sprite setup, multiplexing, and attached sprites. Companion site: [coppershade.org](http://coppershade.org/)
 - ChibiAkumas — *68000 Assembly Lessons* (sprite and blitter tutorials)
 - Codetapper — Game reverse engineering analyses (Jim Power, Lionheart)
--- a/11_libraries/README.md
+++ b/11_libraries/README.md
@ -23,3 +23,4 @@ Shared libraries beyond the core exec/dos/graphics/intuition subsystems. These p
 | [datatypes.md](datatypes.md) | DataTypes system: object-oriented file loading for images, sound, text, animation via BOOPSI classes |
 | [amigaguide.md](amigaguide.md) | AmigaGuide hypertext help system: database format, @commands, API, ARexx integration, cross-database linking |
 | [translator.md](translator.md) | translator.library: English-to-phonetic translation for speech synthesis, narrator.device integration, ARPABET phonemes |
 | [ahi_programming.md](ahi_programming.md) | AHI retargetable audio: 16-bit multi-channel playback, software mixing, sample loading, streaming hooks, SFX cookbooks, 4 antipatterns, FPGA/MiSTer impact, modern analogies, 7 FAQ |
--- a/11_libraries/ahi_programming.md
+++ b/11_libraries/ahi_programming.md
@ -0,0 +1,737 @@
 [← Home](../README.md) · [Libraries](README.md)
 # AHI — Audio Hardware Interface Programming
 ## What Is AHI?
 AHI (Audio Hardware Interface) is AmigaOS's retargetable audio system. Introduced in 1996 by Martin Blom, it solves a fundamental problem: Paula has only 4 hardware audio channels at 8-bit resolution, locked to Chip RAM. AHI provides a **hardware-agnostic API** for 16-bit (and higher) multi-channel audio with software mixing, supporting Paula, sound cards (Delfina, Prisma Megamix, X-Surf), FPGA audio cores, and USB audio devices through a uniform interface.
 AHI is to audio what RTG (Picasso96/CyberGraphX) is to graphics — a retargetable abstraction layer that decouples applications from hardware specifics.
 ```mermaid
 graph TB
    subgraph "Application Layer"
        APP["Application"]
        MUI["MUI Sound Class"]
        DT["Datatypes Sound"]
    end
    subgraph "AHI System"
        AHIDEV["ahi.device<br/>OpenDevice / CMD_WRITE"]
        MIXER["AHI Software Mixer<br/>CallHookPkt MixerFunc"]
        CTRL["AHIAudioCtrl<br/>Channel management"]
    end
    subgraph "AHI Drivers"
        PAULA["paula.audio<br/>4× 8-bit DMA"]
        HIFI["paula.audio (HiFi)<br/>14-bit calibrated"]
        SOUNDCARD["Delfina / Prisma<br/>16-bit hardware"]
        FPGA["FPGA Audio Core<br/>TOSLink / I2S"]
        USB["USB Audio Class<br/>via Poseidon"]
    end
    APP --> AHIDEV
    MUI --> AHIDEV
    DT --> AHIDEV
    AHIDEV --> CTRL
    CTRL --> MIXER
    MIXER --> PAULA
    MIXER --> HIFI
    MIXER --> SOUNDCARD
    MIXER --> FPGA
    MIXER --> USB
    style AHIDEV fill:#e8f4fd,stroke:#2196f3,color:#333
    style MIXER fill:#fff3e0,stroke:#ff9800,color:#333
 ```
 ### Why AHI Matters
 | Problem | Without AHI | With AHI |
 |---------|------------|----------|
 | > 4 channels | Manual software mixing — every app reinvents it | Built-in N-channel mixer |
 | 16-bit audio | Paula can't do it natively | 16-bit playback on any hardware |
 | Hardware independence | Apps hard-coded to Paula registers | Apps use AHI API, driver handles hardware |
 | Recording | Only via Paula's limited ADC | Any recording-capable AHI device |
 | Multi-app audio | One app owns Paula — others get silence | AHI arbitrates between applications |
 ---
 ## Architecture
 ### Audio Pipeline
 ```mermaid
 sequenceDiagram
    participant APP as Application
    participant AHI as ahi.device
    participant CTRL as AudioCtrl
    participant MIX as Software Mixer
    participant DRV as AHI Driver
    participant HW as Audio Hardware
    APP->>AHI: OpenDevice("ahi.device", unit, ioreq, 0)
    APP->>AHI: AHI_ControlAudio(tags: channels, freq, etc.)
    AHI->>CTRL: Allocate channels, configure
    APP->>AHI: AHI_LoadSound(sampleID, format, data)
    AHI->>DRV: AHIsub_LoadSound (prepare for hardware)
    APP->>AHI: AHI_Play(freq, vol, pan)
    AHI->>CTRL: Schedule sound on channel
    loop Every mixing period
        AHI->>MIX: CallHookPkt(MixerFunc)
        MIX->>MIX: Mix all active channels
        MIX->>DRV: Feed PCM buffer
        DRV->>HW: DMA / I2S / USB transfer
    end
    APP->>AHI: AHI_Stop()
    APP->>AHI: CloseDevice()
 ```
 ### Component Map
 | Component | Type | Description |
 |-----------|------|-------------|
 | `ahi.device` | Device | Exec device — opened with `OpenDevice()` |
 | `AHIAudioCtrl` | Structure | Controls channel allocation, mixing, format |
 | `AHI_Request` | IORequest | Standard exec I/O request for device commands |
 | Mixer Hook | Hook | AHI calls this to produce mixed PCM buffers |
 | Player Hook | Hook | AHI calls this to advance module position (for trackers) |
 | `*.audio` driver | Shared library | Hardware-specific driver in `DEVS:AHI/` |
 ---
 ## API Reference
 ### Device Opening and Control
 ```c
 #include <devices/ahi.h>
 #include <proto/ahi.h>
 struct MsgPort *AHImp = NULL;
 struct AHIRequest *AHIio = NULL;
 struct AHIAudioCtrl *AudioCtrl = NULL;
 BYTE AHIDevice = -1;
 BOOL InitAHI(void)
 {
    AHImp = CreateMsgPort();
    if (!AHImp) return FALSE;
    AHIio = (struct AHIRequest *)CreateIORequest(
        AHImp, sizeof(struct AHIRequest));
    if (!AHIio) { DeleteMsgPort(AHImp); return FALSE; }
    AHIio->ahir_Version = 4;  /* Request AHI v4+ */
    AHIDevice = OpenDevice(AHINAME, 0, (struct IORequest *)AHIio, 0);
    if (AHIDevice)
    {
        /* AHI not available — fall back to audio.device */
        DeleteIORequest((struct IORequest *)AHIio);
        DeleteMsgPort(AHImp);
        return FALSE;
    }
    return TRUE;
 }
 ```
 ### Audio Control — AHI_ControlAudio()
 Configures the audio session (channels, sample rate, format):
 ```c
 /* Create audio control structure with desired parameters */
 AudioCtrl = AHI_AllocAudio(
    AHIA_AudioID,     AHI_DEFAULT_ID,
    AHIA_MixFreq,     44100,
    AHIA_Channels,    4,           /* 4 simultaneous sounds */
    AHIA_Sounds,      16,          /* up to 16 loaded samples */
    AHIA_SampleType,  AHIST_M16S,  /* 16-bit mono signed */
    AHIA_PlayerFunc,  NULL,        /* no player hook (non-module) */
    AHIA_PlayerFreq,  50,          /* mixing frequency (Hz) */
    AHIA_MinMixFreq,  8000,
    TAG_DONE);
 if (!AudioCtrl)
 {
    /* Fallback: try fewer channels or lower sample rate */
    AudioCtrl = AHI_AllocAudio(
        AHIA_AudioID,    AHI_DEFAULT_ID,
        AHIA_MixFreq,    22050,
        AHIA_Channels,   2,
        AHIA_Sounds,     8,
        AHIA_SampleType, AHIST_M16S,
        TAG_DONE);
 }
 ```
 ### Sample Types
 | Constant | Value | Description |
 |----------|-------|-------------|
 | `AHIST_M8S` | 0 | 8-bit mono signed |
 | `AHIST_M16S` | 2 | 16-bit mono signed (most common) |
 | `AHIST_S8S` | 4 | 8-bit stereo signed |
 | `AHIST_S16S` | 6 | 16-bit stereo signed |
 | `AHIST_M32S` | 10 | 32-bit mono signed |
 | `AHIST_S32S` | 14 | 32-bit stereo signed |
 | `AHIST_M16S_SYS` | — | 16-bit mono, system byte order |
 | `AHIST_S16S_SYS` | — | 16-bit stereo, system byte order |
 ### Loading and Playing Sounds
 ```c
 /* Define sample IDs */
 enum {
    SND_JUMP = 0,
    SND_COIN,
    SND_EXPLODE,
    SND_MUSIC,
    SOUND_COUNT
 };
 /* Load a sample from memory */
 APTR sampleData = LoadIFFSample("SYS:Sounds/jump.8svx");
 ULONG sampleLength = GetSampleLength();
 AHI_LoadSound(SND_JUMP,
    AHIST_M16S,          /* sample type */
    sampleData,           /* pointer to PCM data */
    sampleLength * 2,     /* number of BYTES (not samples!) */
    AudioCtrl);
 /* Play the sound on channel 0 */
 AHI_Play(AudioCtrl,
    AHIP_BeginChannel,  0,
    AHIP_Freq,          44100,      /* playback frequency */
    AHIP_Vol,           0x10000,    /* full volume (fixed-point 16.16) */
    AHIP_Pan,           0x8000,     /* center (0=left, 0x10000=right) */
    AHIP_Sound,         SND_JUMP,
    AHIP_EndChannel,    0,
    TAG_DONE);
 /* Stop sound on channel 0 */
 AHI_Play(AudioCtrl,
    AHIP_BeginChannel,  0,
    AHIP_Sound,         AHI_NOSOUND,
    AHIP_EndChannel,    0,
    TAG_DONE);
 ```
 ### Volume and Panning
 ```c
 /* Volume: fixed-point 16.16 format */
 /* 0x00000 = silence, 0x10000 = full volume, 0x20000 = +6dB boost */
 AHI_SetVol(0, 0x10000, 0x8000, AudioCtrl, AHISF_IMM);
 /* Pan: fixed-point 16.16 format */
 /* 0x00000 = hard left, 0x8000 = center, 0x10000 = hard right */
 AHI_SetVol(0, 0x8000, 0x00000, AudioCtrl, AHISF_IMM);  /* left only */
 ```
 ### Setting Frequency
 ```c
 /* Change playback frequency of a running sound */
 AHI_SetFreq(0, 22050, AudioCtrl, AHISF_IMM);
 /* Play a sample at double speed (pitch shift) */
 AHI_SetFreq(0, 88200, AudioCtrl, AHISF_IMM);
 ```
 ### Cleanup
 ```c
 void CleanupAHI(void)
 {
    if (AudioCtrl)
    {
        /* Stop all sounds first */
        for (int ch = 0; ch < 4; ch++)
            AHI_Play(AudioCtrl,
                AHIP_BeginChannel, ch,
                AHIP_Sound, AHI_NOSOUND,
                AHIP_EndChannel, ch,
                TAG_DONE);
        AHI_FreeAudio(AudioCtrl);
        AudioCtrl = NULL;
    }
    if (!AHIDevice)
    {
        CloseDevice((struct IORequest *)AHIio);
        AHIDevice = -1;
    }
    if (AHIio)
    {
        DeleteIORequest((struct IORequest *)AHIio);
        AHIio = NULL;
    }
    if (AHImp)
    {
        DeleteMsgPort(AHImp);
        AHImp = NULL;
    }
 }
 ```
 ---
 ## Audio Mode Selection
 ### Decision Guide
 ```mermaid
 graph TD
    START["Need audio playback?"] --> Q1{"Need > 4 channels\\nor 16-bit?"}
    Q1 -->|No| Q2{"Target is stock\\nA500/A1200?"}
    Q1 -->|Yes| AHI["Use AHI\\n16-bit, N channels"]
    Q2 -->|Yes| PAULA["Use audio.device\\nor direct Paula"]
    Q2 -->|No| Q3{"Sound card\\navailable?"}
    Q3 -->|Yes| AHI
    Q3 -->|No| Q4{"Need recording?"}
    Q4 -->|Yes| AHI
    Q4 -->|No| PAULA
    style AHI fill:#e8f5e9,stroke:#4caf50,color:#333
    style PAULA fill:#fff3e0,stroke:#ff9800,color:#333
 ```
 ### Audio Mode Comparison
 | Mode | Channels | Bit Depth | Mixing Freq | Latency | Notes |
 |------|----------|-----------|-------------|---------|-------|
 | **audio.device** | 4 (hardware) | 8-bit signed | 28–35 kHz max | ~1 DMA cycle | Native Paula — zero CPU for playback |
 | **AHI paula.audio** | 4–16 (software) | 8/16-bit | 8000–48000 Hz | ~1–20 ms | Software mixed, output via Paula |
 | **AHI HiFi 14-bit** | 2 (stereo) | 14-bit | 44100 Hz | ~10–20 ms | Uses all 4 Paula channels for stereo |
 | **AHI Delfina** | 16–64 | 16/24-bit | 44100–96000 | ~5 ms | Hardware DSP mixing |
 | **AHI Prisma Megamix** | 8–16 | 16/24-bit | 44100–192000 | ~5 ms | FPGA-based, I2S output |
 | **AHI USB Audio** | 2–8 | 16/24-bit | 44100–96000 | ~10 ms | Via Poseidon USB stack |
 ---
 ## Practical Cookbook
 ### Cookbook 1: Simple Sound Effects Player
 ```c
 #include <proto/ahi.h>
 #include <proto/exec.h>
 #include <devices/ahi.h>
 struct AHIAudioCtrl *ctrl;
 APTR sounds[16];
 ULONG soundLengths[16];
 BOOL InitSoundSystem(void)
 {
    ctrl = AHI_AllocAudio(
        AHIA_AudioID,    AHI_DEFAULT_ID,
        AHIA_MixFreq,    44100,
        AHIA_Channels,   4,
        AHIA_Sounds,     16,
        AHIA_SampleType, AHIST_M16S,
        TAG_DONE);
    return (ctrl != NULL);
 }
 BOOL LoadSound(ULONG id, const char *filename)
 {
    /* Load raw 16-bit signed PCM from file */
    BPTR fh = Open(filename, MODE_OLDFILE);
    if (!fh) return FALSE;
    ULONG size = Seek(fh, 0, OFFSET_END);
    Seek(fh, 0, OFFSET_BEGINNING);
    APTR data = AllocMem(size, MEMF_ANY);
    if (!data) { Close(fh); return FALSE; }
    Read(fh, data, size);
    Close(fh);
    sounds[id] = data;
    soundLengths[id] = size;
    AHI_LoadSound(id, AHIST_M16S, data, size / 2, ctrl);
    return TRUE;
 }
 void PlaySFX(ULONG id, int channel, ULONG freq, ULONG volume)
 {
    AHI_Play(ctrl,
        AHIP_BeginChannel, channel,
        AHIP_Freq,         freq,
        AHIP_Vol,          volume,
        AHIP_Pan,          0x8000,     /* center */
        AHIP_Sound,        id,
        AHIP_EndChannel,   channel,
        TAG_DONE);
 }
 void StopSFX(int channel)
 {
    AHI_Play(ctrl,
        AHIP_BeginChannel, channel,
        AHIP_Sound,        AHI_NOSOUND,
        AHIP_EndChannel,   channel,
        TAG_DONE);
 }
 void CleanupSoundSystem(void)
 {
    if (ctrl)
    {
        AHI_FreeAudio(ctrl);
        ctrl = NULL;
    }
    for (int i = 0; i < 16; i++)
    {
        if (sounds[i])
        {
            FreeMem(sounds[i], soundLengths[i]);
            sounds[i] = NULL;
        }
    }
 }
 ```
 ### Cookbook 2: Streaming Playback (Music / Module Player)
 For continuous audio streaming (playing modules, MP3, WAV), use the Player and Mixer hooks:
 ```c
 #include <proto/ahi.h>
 #include <devices/ahi.h>
 #define BUFFER_SAMPLES 4096
 #define NUM_BUFFERS    2
 struct AHIAudioCtrl *ctrl;
 struct Hook playerHook;
 APTR buffers[NUM_BUFFERS];
 int currentBuffer = 0;
 /* Player hook — called at PlayerFreq rate.
   Fill the next buffer with decoded audio. */
 LONG ASM PlayerFunc(REG(a0, struct Hook *hook),
                    REG(a2, struct AHIAudioCtrl *ctrl),
                    REG(a1, void *unused))
 {
    APTR buf = buffers[currentBuffer];
    /* Decode next BUFFER_SAMPLES of audio into buf.
       This is where you'd call your MP3 decoder, MOD player, etc. */
    DecodeNextChunk(buf, BUFFER_SAMPLES);
    /* Tell AHI which buffer to play next */
    AHI_SetSound(0, AHI_NOSOUND, 0, 0, ctrl, 0);
    AHI_LoadSound(0, AHIST_S16S, buf, BUFFER_SAMPLES, ctrl);
    AHI_Play(ctrl,
        AHIP_BeginChannel, 0,
        AHIP_Freq,         44100,
        AHIP_Vol,          0x10000,
        AHIP_Pan,          0x8000,
        AHIP_Sound,        0,
        AHIP_EndChannel,   0,
        TAG_DONE);
    currentBuffer = 1 - currentBuffer;  /* ping-pong */
    return 0;
 }
 void StartStreaming(void)
 {
    /* Allocate double-buffer */
    for (int i = 0; i < NUM_BUFFERS; i++)
        buffers[i] = AllocMem(BUFFER_SAMPLES * 4, MEMF_ANY);
    /* Set up player hook */
    playerHook.h_Entry = (HOOKFUNC)PlayerFunc;
    playerHook.h_Data  = NULL;
    ctrl = AHI_AllocAudio(
        AHIA_AudioID,     AHI_DEFAULT_ID,
        AHIA_MixFreq,     44100,
        AHIA_Channels,    2,
        AHIA_Sounds,      2,
        AHIA_SampleType,  AHIST_S16S,
        AHIA_PlayerFunc,  &playerHook,
        AHIA_PlayerFreq,  50,         /* 50 Hz callback */
        TAG_DONE);
 }
 void StopStreaming(void)
 {
    if (ctrl) AHI_FreeAudio(ctrl);
    for (int i = 0; i < NUM_BUFFERS; i++)
        if (buffers[i]) FreeMem(buffers[i], BUFFER_SAMPLES * 4);
 }
 ```
 ### Cookbook 3: Enumerating Available Audio Modes
 ```c
 /* List all available AHI audio modes */
 void ListAudioModes(void)
 {
    struct AHIAudioModeRequester *req = NULL;
    ULONG modeID = AHI_INVALID_ID;
    while ((modeID = AHI_NextAudioID(modeID)) != AHI_INVALID_ID)
    {
        STRPTR name = AHI_GetAudioAttrs(modeID,
            AHIDB_Name,    TAG_DONE);
        STRPTR author = AHI_GetAudioAttrs(modeID,
            AHIDB_Author,  TAG_DONE);
        LONG bits = (LONG)AHI_GetAudioAttrs(modeID,
            AHIDB_Bits,    TAG_DONE);
        LONG chans = (LONG)AHI_GetAudioAttrs(modeID,
            AHIDB_MaxChannels, TAG_DONE);
        Printf("Mode: %s  Author: %s  Bits: %ld  Channels: %ld\n",
            name ? name : "(unknown)",
            author ? author : "(unknown)",
            bits, chans);
    }
 }
 ```
 ---
 ## Named Antipatterns
 ### "The Memory Leak" — Not Freeing Sounds
 ```c
 /* BAD: Loading sounds without freeing them.
   Each AHI_LoadSound allocates driver resources.
   If you load the same sample repeatedly without
   AHI_UnloadSound, you leak driver memory. */
 void PlayNewSFX(APTR data, ULONG len)
 {
    static int id = 0;
    AHI_LoadSound(id, AHIST_M16S, data, len, ctrl);
    AHI_Play(ctrl, AHIP_BeginChannel, 0,
             AHIP_Freq, 44100, AHIP_Vol, 0x10000,
             AHIP_Sound, id, AHIP_EndChannel, 0, TAG_DONE);
    id++;  /* keeps allocating new sound IDs — never freed! */
 }
 ```
 ```c
 /* CORRECT: Unload before reloading, or reuse sound IDs */
 void PlaySFX(ULONG id, APTR data, ULONG len)
 {
    AHI_UnloadSound(id, ctrl);     /* free previous */
    AHI_LoadSound(id, AHIST_M16S, data, len, ctrl);
    AHI_Play(ctrl, AHIP_BeginChannel, 0,
             AHIP_Freq, 44100, AHIP_Vol, 0x10000,
             AHIP_Sound, id, AHIP_EndChannel, 0, TAG_DONE);
 }
 ```
 ### "The Stale Pointer" — Using Sound Data After Free
 ```c
 /* BAD: AHI does NOT copy your sample data — it uses
   your pointer directly. If you free the data while
   the sound is playing, you feed garbage to the DAC. */
 void PlayAndFree(APTR data, ULONG len)
 {
    AHI_LoadSound(0, AHIST_M16S, data, len, ctrl);
    AHI_Play(ctrl, AHIP_BeginChannel, 0,
             AHIP_Sound, 0, AHIP_EndChannel, 0, TAG_DONE);
    FreeMem(data, len);  /* AHI still reading from data! */
 }
 ```
 ```c
 /* CORRECT: Keep data alive until sound finishes.
   Use a notification hook or delay, then free. */
 void PlayAndWait(APTR data, ULONG len)
 {
    AHI_LoadSound(0, AHIST_M16S, data, len, ctrl);
    AHI_Play(ctrl, AHIP_BeginChannel, 0,
             AHIP_Sound, 0, AHIP_EndChannel, 0, TAG_DONE);
    /* Wait for sound to finish (approximate) */
    ULONG ms = (len / 2) * 1000 / 44100;
    Delay(ms / 20);  /* tick granularity */
    AHI_UnloadSound(0, ctrl);
    FreeMem(data, len);
 }
 ```
 ### "The Unchecked AllocAudio" — Assuming AHI Is Available
 ```c
 /* BAD: AHI may not be installed, or may not support
   the requested mode. NULL AudioCtrl = crash. */
 ctrl = AHI_AllocAudio(AHIA_MixFreq, 48000, TAG_DONE);
 AHI_Play(ctrl, ...);  /* crash if ctrl is NULL */
 ```
 ```c
 /* CORRECT: Always check, always fall back */
 ctrl = AHI_AllocAudio(AHIA_MixFreq, 48000, TAG_DONE);
 if (!ctrl)
    ctrl = AHI_AllocAudio(AHIA_MixFreq, 22050, TAG_DONE);
 if (!ctrl)
    ctrl = AHI_AllocAudio(AHIA_MixFreq, 11025, TAG_DONE);
 if (!ctrl)
 {
    /* Fall back to audio.device (Paula) */
    UsePaulaDirectly();
 }
 ```
 ### "The Volume Overflow" — Exceeding 0x10000
 ```c
 /* BAD: AHI volume is fixed-point 16.16.
   0x10000 = 100% volume. Values above this can
   cause clipping distortion or driver-dependent
   behavior (some drivers clamp, some wrap). */
 AHI_SetVol(0, 0x20000, 0x8000, ctrl, AHISF_IMM);  /* 200% — distorted! */
 ```
 ```c
 /* CORRECT: Stay within 0x00000 to 0x10000 */
 AHI_SetVol(0, 0x10000, 0x8000, ctrl, AHISF_IMM);  /* 100% — clean */
 ```
 ---
 ## Historical Context & Modern Analogies
 ### Evolution of Amiga Audio
 ```mermaid
 timeline
    title Amiga Audio Evolution
    1985 : OS 1.0 — Paula 4× 8-bit DMA\\naudio.device wraps Paula
    1987 : Soundtracker — 4-channel MOD format\\nDirect Paula register access
    1990 : OS 2.0 — No audio changes\\nTrackers dominate
    1992 : OctaMED — 8-channel mixing\\nSoftware mixing to Paula
    1995 : Play16 — 14-bit Paula hack\\nCalibrated PWM on all channels
    1996 : AHI 1.0 — Retargetable audio\\npaula.audio driver, software mixing
    1998 : AHI 2.x — Recording support\\nDelfina DSP sound card driver
    2000 : AHI 3.x — USB audio\\nPoseidon stack integration
    2005 : AHI 4.x — FPGA audio\\nPrisma Megamix, TOSLink
    2015 : AHI 6.x — 32-bit floating point\\nModern ARM/FPGA platforms
 ```
 ### Competitive Landscape
 | Platform | Audio System | Max Channels | Bit Depth | Software Mixing | Year |
 |----------|-------------|-------------|-----------|-----------------|------|
 | **Amiga Paula** | DMA hardware | 4 (hardware) | 8-bit | No — fixed 4 channels | 1985 |
 | **Amiga AHI** | Retargetable API | 2–64 | 8/16/24/32-bit | Yes (configurable) | 1996 |
 | **PC Sound Blaster** | DSP commands | 1 (stereo from SB16) | 8/16-bit | No | 1989 |
 | **PC DirectSound** | Windows API | Unlimited | 16-bit | Yes | 1995 |
 | **Mac OS Sound Manager** | OS API | 16 | 16-bit | Yes | 1991 |
 | **Atari ST Yamaha** | PSG chip | 3 square waves | N/A | No — synth only | 1985 |
 ### Modern Analogies
 | Amiga AHI Concept | Modern Equivalent | Notes |
 |-------------------|-------------------|-------|
 | `AHI_AllocAudio()` | `AudioContext.create()` (Web Audio) / `alcOpenDevice()` (OpenAL) | Audio session creation |
 | `AHI_LoadSound()` | `AudioBuffer` (Web Audio) / `AL_BUFFER` (OpenAL) | Upload PCM to audio system |
 | `AHI_Play()` | `AudioBufferSourceNode.start()` / `alSourcePlay()` | Trigger playback |
 | `AHI_SetVol()` | `GainNode.gain` (Web Audio) / `alSourcef(AL_GAIN)` | Volume control |
 | `AHI_SetFreq()` | `playbackRate` (Web Audio) / `AL_PITCH` (OpenAL) | Pitch/speed control |
 | Player Hook | `ScriptProcessorNode` / `AudioWorklet` (Web Audio) | Callback-driven streaming |
 | Mixer Hook | Custom audio graph (Web Audio) / mixing callback | Software mixing pipeline |
 | `paula.audio` driver | Built-in audio driver (any OS) | Default hardware driver |
 | AHI sound card drivers | ASIO / CoreAudio / ALSA drivers | Hardware-specific backends |
 | `AHIA_MixFreq` | `sampleRate` (Web Audio) | Master mixing frequency |
 | `AHISF_IMM` flag | `audioContext.currentTime` | Immediate vs scheduled |
 ---
 ## Use Cases
 | Application | AHI Mode | Channels | Notable Pattern |
 |-------------|----------|----------|-----------------|
 | **AmigaAmp** | HiFi 14-bit Stereo++ | 2 (stereo) | MP3 decoding via mpega.library, streaming playback |
 | **DeliTracker** | paula.audio or HiFi | 4–8 | MOD/S3M/XM multi-format module player |
 | **HippoPlayer** | paula.audio | 4 | Lightweight module player |
 | **Timidity** | AHI 16-bit | 2 | MIDI to WAV rendering via GUS patches |
 | **Games (post-1996)** | AHI default | 4–8 | SFX playback with software mixing |
 | **Video editors** | AHI default | 2 | Audio scrubbing with accurate sync |
 | **Speech synthesis** | AHI 8-bit | 1 | narrator.device output via AHI |
 | **Software instruments** | Prisma/FPGA | 16+ | Low-latency real-time synthesis |
 ---
 ## FPGA / MiSTer Impact
 AHI is highly relevant to FPGA-based Amiga implementations:
 | Platform | AHI Driver | Audio Path | Notes |
 |----------|-----------|------------|-------|
 | **Minimig** | paula.audio | Paula DMA (emulated) | FPGA implements Paula registers — AHI works transparently |
 | **MiSTer Amiga** | paula.audio | FPGA Paula | Same as Minimig — standard Paula emulation |
 | **MiSTer + Prisma Megamix** | prisma.audio | FPGA I2S output | True 16/24-bit audio via FPGA core |
 | **Vampire (Apollo Core)** | paula.audio + custom | FPGA audio DAC | Higher-quality DAC than original Paula |
 | **PiStorm** | paula.audio | Paula (via FPGA bus) | PiStorm uses real Paula chip on A1200/A600 |
 > [!NOTE]
 > AHI's `paula.audio` driver works on all FPGA Amiga implementations because they faithfully emulate Paula's DMA registers. No special FPGA driver is needed for basic AHI support.
 ---
 ## FAQ
 **Q: Does AHI replace audio.device?**
 A: No — AHI is a higher-level system. `audio.device` still exists for direct Paula access. AHI's `paula.audio` driver uses `audio.device` internally (or direct Paula registers) as its output path. Applications can use either API.
 **Q: What's the minimum AmigaOS version for AHI?**
 A: AHI requires AmigaOS 2.04+ (V37). It works on any Amiga with enough RAM. The system is a third-party addition, not part of the ROM — it installs into `DEVS:` and `LIBS:`.
 **Q: Can I use AHI and audio.device at the same time?**
 A: It depends on the driver. The `paula.audio` driver claims Paula's audio channels, which means `audio.device` can't use them simultaneously. Other drivers (Delfina, Prisma, USB) use separate hardware, so Paula remains free for `audio.device`.
 **Q: How do I get the best audio quality on a stock Amiga?**
 A: Use AHI with the "HiFi 14-bit Stereo++" mode of `paula.audio`. This uses all 4 Paula channels in a calibrated PWM configuration to achieve ~14-bit resolution at 44100 Hz stereo. The downside: no channels left for system sounds, and CPU usage is higher.
 **Q: What's the latency of AHI software mixing?**
 A: Typically 1–20 ms depending on `AHIA_PlayerFreq` and the driver. At 50 Hz player frequency, the buffer is ~20 ms (44100/50 = 882 samples). Lower latency requires higher player frequency but increases CPU load.
 **Q: Can AHI play MP3/OGG/FLAC files directly?**
 A: No — AHI is a raw PCM output system. It plays sample buffers, not compressed formats. You need a decoder library (`mpega.library` for MP3, `ogg.player` for OGG) to decompress into PCM, then feed the PCM to AHI via streaming playback (Player Hook).
 **Q: How do I install AHI on my Amiga?**
 A: Download the AHI distribution from Aminet (`driver/audio/ahiusr.lha`). Extract to `SYS:`, run the installer. It places `ahi.device` in `DEVS:`, driver libraries in `DEVS:AHI/`, and preferences in `SYS:Prefs/`.
 ---
 ## References
 ### SDK & Documentation
 - **AHI Developer's Guide** — Martin Blom, distributed with AHI SDK
 - **AHI User Guide** — Aminet `driver/audio/ahiusr.lha`
 - **AHI SDK** — Aminet `dev/misc/ahidev.lha`
 ### NDK Headers
 - `devices/ahi.h` — AHI device commands, sample type constants
 - `libraries/ahi_sub.h` — AHI driver sub-functions (for driver authors)
 ### Related Knowledge Base Articles
 - [audio.device](../10_devices/audio.md) — native Paula audio, 4-channel DMA, MOD format
 - [Writing AHI Drivers](../16_driver_development/ahi_driver.md) — creating custom AHI drivers
 - [Datatypes](datatypes.md) — sound datatype can use AHI for playback
 - [translator.library](translator.md) — speech synthesis output via audio/AHI
--- a/13_toolchain/README.md
+++ b/13_toolchain/README.md
@ -17,4 +17,5 @@ Development tools for building Amiga software, from native compilers to modern c
 | [pragmas.md](pragmas.md) | Compiler pragmas and inline stubs: SAS/C pragmas, GCC inline asm, proto headers, fd2pragma |
 | [ndk.md](ndk.md) | NDK versions (3.1/3.9/3.2): contents, downloads, cross-compiler integration |
 | [makefiles.md](makefiles.md) | Makefile patterns for GCC cross-compilation, vasm/vlink assembly, mixed C+asm projects |
 | [cross_compilation_guide.md](cross_compilation_guide.md) | Modern cross-compilation: CMake toolchain files (GCC + VBCC), VS Code integration, GitHub Actions CI/CD, mixed C+asm, Docker builds |
 | [debugging.md](debugging.md) | Debugging tools: Enforcer/MuForce memory watchdog, SnoopDOS tracing, FS-UAE GDB remote, kprintf, debugging checklist |
--- a/13_toolchain/cross_compilation_guide.md
+++ b/13_toolchain/cross_compilation_guide.md
@ -0,0 +1,603 @@
 [← Home](../README.md) · [Toolchain](README.md)
 # Modern Cross-Compilation Guide — CMake, VS Code, CI/CD
 ## Overview
 Building Amiga software in 2026 means writing C/C++ on a modern host (Linux, macOS, or Windows) and producing 68000 hunk executables. This guide covers the **build system integration** layer: CMake project configuration, VS Code debugging setup, and CI/CD pipelines — connecting the compiler tools documented in [gcc_amiga.md](gcc_amiga.md), [vbcc.md](vbcc.md), and [vasm_vlink.md](vasm_vlink.md).
 ```mermaid
 graph TB
    subgraph "Host Environment"
        CMAKE["CMake Build System"]
        VS["VS Code / IDE"]
        CI["CI/CD Pipeline<br/>(GitHub Actions)"]
    end
    subgraph "Toolchain"
        GCC["m68k-amigaos-gcc<br/>(bebbo)"]
        VBCC["vc + vlink<br/>(VBCC)"]
        VASM["vasmm68k_mot<br/>(vasm)"]
    end
    subgraph "Output"
        HUNK["Amiga HUNK<br/>Executable"]
        ADF["ADF Disk Image"]
        HDF["HDF Hard File"]
        LHA["LHA Archive<br/>(for distribution)"]
    end
    CMAKE --> GCC
    CMAKE --> VBCC
    CMAKE --> VASM
    VS --> CMAKE
    CI --> CMAKE
    GCC --> HUNK
    VBCC --> HUNK
    VASM --> HUNK
    HUNK --> ADF
    HUNK --> HDF
    HUNK --> LHA
    style CMAKE fill:#e8f4fd,stroke:#2196f3,color:#333
    style HUNK fill:#fff3e0,stroke:#ff9800,color:#333
 ```
 ---
 ## Toolchain Selection Decision Guide
 ```mermaid
 graph TD
    START["Starting a new Amiga project?"] --> Q1{"Language?"}
    Q1 -->|"C / C++"| Q2{"Need C++ or<br/>modern optimization?"}
    Q1 -->|"Assembly only"| VASM["Use vasm + vlink<br/>(vasm_vlink.md)"]
    Q1 -->|"Mixed C + ASM"| Q3
    Q2 -->|Yes| GCC["Use GCC (bebbo)<br/>(gcc_amiga.md)"]
    Q2 -->|No| Q3{"Prefer smaller<br/>binaries?"}
    Q3 -->|Yes| VBCC["Use VBCC<br/>(vbcc.md)"]
    Q3 -->|No| GCC
    GCC --> CMAKE["Configure with CMake"]
    VBCC --> CMAKE
    VASM --> MAKEFILE["Use Makefile or CMake"]
    style GCC fill:#e8f5e9,stroke:#4caf50,color:#333
    style VBCC fill:#e8f4fd,stroke:#2196f3,color:#333
    style VASM fill:#fff3e0,stroke:#ff9800,color:#333
    style CMAKE fill:#f3e5f5,stroke:#9c27b0,color:#333
 ```
 | Toolchain | Best For | Binary Size | C++ Support | Optimization |
 |-----------|----------|-------------|-------------|-------------|
 | **GCC (bebbo)** | C/C++, modern standards, libraries | Larger | C++14 | Strong (GCC 6.5 optimizer) |
 | **VBCC** | C, size-critical code, demos | Smaller | No | Good (vbcc optimizer) |
 | **vasm only** | Pure assembly, bootblocks, demos | Smallest | N/A | Manual |
 ---
 ## CMake Configuration
 ### Toolchain File for GCC (bebbo)
 Create `toolchain/m68k-amigaos.cmake`:
 ```cmake
 # CMake toolchain file for m68k-amigaos-gcc (bebbo)
 set(CMAKE_SYSTEM_NAME Generic)
 set(CMAKE_SYSTEM_PROCESSOR m68k)
 # Toolchain prefix
 set(TOOLCHAIN_PREFIX m68k-amigaos)
 # Find tools
 find_program(CMAKE_C_COMPILER ${TOOLCHAIN_PREFIX}-gcc)
 find_program(CMAKE_CXX_COMPILER ${TOOLCHAIN_PREFIX}-g++)
 find_program(CMAKE_ASM_COMPILER ${TOOLCHAIN_PREFIX}-gcc)
 find_program(CMAKE_AR ${TOOLCHAIN_PREFIX}-ar)
 find_program(CMAKE_STRIP ${TOOLCHAIN_PREFIX}-strip)
 find_program(CMAKE_RANLIB ${TOOLCHAIN_PREFIX}-ranlib)
 # Amiga-specific flags
 set(CMAKE_C_FLAGS_INIT "-noixemul -m68000 -Os -fomit-frame-pointer -Wall")
 set(CMAKE_CXX_FLAGS_INIT "-noixemul -m68000 -Os -fomit-frame-pointer -Wall")
 set(CMAKE_EXE_LINKER_FLAGS_INIT "-noixemul -Wl,--gc-sections")
 # Don't build host executables for tests
 set(CMAKE_CROSSCOMPILING TRUE)
 set(CMAKE_TRY_COMPILE_TARGET_TYPE STATIC_LIBRARY)
 # Output suffix for Amiga executables
 set(CMAKE_EXECUTABLE_SUFFIX "")
 ```
 ### Toolchain File for VBCC
 Create `toolchain/vbcc-m68k-amigaos.cmake`:
 ```cmake
 # CMake toolchain file for VBCC m68k-amigaos
 set(CMAKE_SYSTEM_NAME Generic)
 set(CMAKE_SYSTEM_PROCESSOR m68k)
 # VBCC uses 'vc' as compiler driver
 find_program(CMAKE_C_COMPILER vc)
 # VBCC flags
 set(CMAKE_C_FLAGS_INIT "-m68000 -O2 -c99 -DCPU_68000")
 set(CMAKE_EXE_LINKER_FLAGS_INIT "")
 # VBCC doesn't have ar/ranlib in standard form
 set(CMAKE_AR ${TOOLCHAIN_PREFIX}-ar)
 set(CMAKE_RANLIB ${TOOLCHAIN_PREFIX}-ranlib)
 set(CMAKE_CROSSCOMPILING TRUE)
 set(CMAKE_TRY_COMPILE_TARGET_TYPE STATIC_LIBRARY)
 set(CMAKE_EXECUTABLE_SUFFIX "")
 ```
 ### Project CMakeLists.txt
 ```cmake
 cmake_minimum_required(VERSION 3.20)
 project(MyAmigaApp C)
 # Source files
 set(SOURCES
    src/main.c
    src/gfx.c
    src/input.c
    src/audio.c
 )
 # Assembly files (vasm, Motorola syntax)
 set(ASSEMBLY
    asm/copper.s
    asm/blitter.s
 )
 # NDK includes
 include_directories(
    ${CMAKE_SOURCE_DIR}/include
    ${CMAKE_SOURCE_DIR}/ndk/include
 )
 # Build executable
 add_executable(${PROJECT_NAME} ${SOURCES} ${ASSEMBLY})
 # Link against Amiga libraries
 target_link_libraries(${PROJECT_NAME} amiga)
 # Custom command: generate ADF after build
 add_custom_command(TARGET ${PROJECT_NAME} POST_BUILD
    COMMAND ${CMAKE_SOURCE_DIR}/scripts/makeadf.sh
        $<TARGET_FILE:${PROJECT_NAME}>
        ${CMAKE_BINARY_DIR}/${PROJECT_NAME}.adf
    COMMENT "Creating ADF disk image"
 )
 ```
 ### Building
 ```bash
 # Configure (GCC toolchain)
 cmake -B build -DCMAKE_TOOLCHAIN_FILE=toolchain/m68k-amigaos.cmake .
 # Configure (VBCC toolchain)
 cmake -B build -DCMAKE_TOOLCHAIN_FILE=toolchain/vbcc-m68k-amigaos.cmake .
 # Build
 cmake --build build -j$(nproc)
 # Output: build/MyAmigaApp (Amiga hunk executable)
 ```
 ---
 ## VS Code Integration
 ### launch.json — Debug with FS-UAE
 ```json
 {
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Run in FS-UAE",
            "type": "cppdbg",
            "request": "launch",
            "program": "${workspaceFolder}/build/MyAmigaApp",
            "args": [],
            "stopAtEntry": false,
            "cwd": "${workspaceFolder}",
            "environment": [],
            "externalConsole": false,
            "preLaunchTask": "build",
            "postDebugTask": "launch-fsuae",
            "miDebuggerPath": "/usr/bin/m68k-amigaos-gdb"
        }
    ]
 }
 ```
 ### tasks.json — Build + Launch
 ```json
 {
    "version": "2.0.0",
    "tasks": [
        {
            "label": "configure",
            "type": "shell",
            "command": "cmake",
            "args": ["-B", "build", "-DCMAKE_TOOLCHAIN_FILE=toolchain/m68k-amigaos.cmake", "."]
        },
        {
            "label": "build",
            "type": "shell",
            "command": "cmake",
            "args": ["--build", "build", "-j4"],
            "group": {
                "kind": "build",
                "isDefault": true
            },
            "dependsOn": "configure",
            "problemMatcher": "$gcc"
        },
        {
            "label": "launch-fsuae",
            "type": "shell",
            "command": "fs-uae",
            "args": [
                "--hard_drive_0=${workspaceFolder}/build/disk.hdf",
                "--amiga_model=A1200"
            ]
        }
    ]
 }
 ```
 ### settings.json — IntelliSense
 ```json
 {
    "C_Cpp.default.compilerPath": "m68k-amigaos-gcc",
    "C_Cpp.default.includePath": [
        "${workspaceFolder}/include",
        "${workspaceFolder}/ndk/include"
    ],
    "C_Cpp.default.defines": [
        "AMIGA",
        "CPU_68000",
        "__AMIGADATE__=20260101"
    ],
    "C_Cpp.default.compilerArgs": ["-noixemul"],
    "C_Cpp.default.cStandard": "c99"
 }
 ```
 ---
 ## CI/CD — GitHub Actions
 ### Complete Pipeline: Build, Test, Package
 ```yaml
 # .github/workflows/amiga-build.yml
 name: Amiga Build
 on:
    push:
        branches: [main]
    pull_request:
        branches: [main]
 jobs:
    build:
        runs-on: ubuntu-latest
        container:
            image: bebbo/amiga-gcc:latest
        steps:
            - uses: actions/checkout@v4
            - name: Configure CMake
              run: |
                cmake -B build \
                  -DCMAKE_TOOLCHAIN_FILE=toolchain/m68k-amigaos.cmake \
                  -DCMAKE_BUILD_TYPE=Release .
            - name: Build
              run: cmake --build build -j$(nproc)
            - name: Strip executable
              run: m68k-amigaos-strip build/MyAmigaApp
            - name: Create ADF disk image
              run: |
                pip install amitools
                xdftool build/disk.adf format "MyApp" ffs
                xdftool build/disk.adf makedir MyApp
                xdftool build/disk.adf write build/MyAmigaApp MyApp/MyAmigaApp
            - name: Create LHA distribution
              run: |
                lha a build/MyApp-$(git rev-parse --short HEAD).lha \
                  build/MyAmigaApp README.md
            - name: Upload artifacts
              uses: actions/upload-artifact@v4
              with:
                name: amiga-build
                path: |
                  build/MyAmigaApp
                  build/disk.adf
                  build/*.lha
 ```
 ### Docker-based Build (Alternative)
 ```yaml
 # For toolchains not available as Docker images
 jobs:
    build-vbcc:
        runs-on: ubuntu-latest
        steps:
            - uses: actions/checkout@v4
            - name: Install VBCC
              run: |
                wget http://www.sun.hasenbraten.de/~frank/vbcc/vbcc_0_9.tar.gz
                tar xzf vbcc_0_9.tar.gz
                cd vbcc
                make TARGET=m68k
                echo "$(pwd)/bin" >> $GITHUB_PATH
            - name: Build with VBCC
              run: |
                vc -m68000 -O2 -c99 -o build/MyApp src/main.c
 ```
 ---
 ## Mixed C + Assembly Projects
 ### CMake Configuration for Mixed Sources
 ```cmake
 # Enable assembly language
 enable_language(ASM_M68K)
 # Tell CMake to use vasm for .s files
 set(CMAKE_ASM_M68K_COMPILER vasmm68k_mot)
 set(CMAKE_ASM_M68K_FLAGS "-m68000 -Fhunk -phxass")
 # Assembly sources get compiled separately
 add_executable(${PROJECT_NAME}
    src/main.c
    src/gfx.c
    asm/copper.s      # vasm Motorola syntax
    asm/blitter.s
 )
 # Custom include for generated headers from assembly
 target_include_directories(${PROJECT_NAME} PRIVATE
    ${CMAKE_BINARY_DIR}/generated
 )
 ```
 ### Calling Convention: C ↔ Assembly
 ```c
 /* ---- C side (main.c) ---- */
 /* Function implemented in assembly */
 extern void SetupCopper(void *copperlist);
 extern void BlitCopy(const void *src, void *dst, WORD width, WORD height);
 /* Function called FROM assembly */
 void OnVBlank(void)
 {
    /* Called from copper interrupt — keep it short */
    UpdateDisplay();
 }
 ```
 ```asm
 ; ---- Assembly side (copper.s) ----
 ; Amiga C calling convention:
 ;   D0/D1 = scratch (return value in D0)
 ;   A0/A1 = scratch
 ;   D2-D7/A2-A6 = saved by callee
 ;   Parameters: D0, D1, A0, A1 (left to right, first 4)
 ;   Stack: remaining parameters
    section text,code
    xdef _SetupCopper
    xdef _BlitCopy
    xref _OnVBlank        ; defined in C
 ; void SetupCopper(void *copperlist);
 ; A0 = copperlist pointer
 _SetupCopper:
    movem.l d2-d7/a2-a6, -(sp)
    move.l  a0, a2             ; save copperlist pointer
    ; ... build copper list ...
    move.l  a2, $dff080        ; COP1LCH
    movem.l (sp)+, d2-d7/a2-a6
    rts
 ; void BlitCopy(const void *src, void *dst, WORD width, WORD height);
 ; A0 = src, A1 = dst, D0 = width, D1 = height
 _BlitCopy:
    movem.l d2-d7/a2-a6, -(sp)
    ; ... blitter setup ...
    movem.l (sp)+, d2-d7/a2-a6
    rts
 ```
 ---
 ## Named Antipatterns
 ### "The Host Header Leak" — Including Host Headers
 ```c
 /* BAD: Including standard POSIX headers that don't exist
   on AmigaOS. This compiles on Linux (host headers found)
   but the resulting binary is broken — Amiga doesn't have
   stdio.h in the same way, or <sys/mman.h> at all. */
 #include <stdio.h>        /* exists in libnix but limited */
 #include <sys/mman.h>     /* does NOT exist on Amiga */
 #include <pthread.h>      /* does NOT exist on Amiga */
 ```
 ```c
 /* CORRECT: Use AmigaOS-native APIs */
 #include <proto/dos.h>    /* Open/Read/Write/Close */
 #include <proto/exec.h>   /* AllocMem/FreeMem */
 #include <exec/memory.h>  /* MEMF_CHIP, MEMF_CLEAR */
 /* Amiga has no mmap — use AllocMem() */
 APTR buf = AllocMem(size, MEMF_CHIP | MEMF_CLEAR);
 FreeMem(buf, size);
 ```
 ### "The ixemul Trap" — Mixing ixemul and noixemul
 ```makefile
 # BAD: Mixing ixemul and noixemul flags.
 # ixemul provides a POSIX layer (fopen, malloc, etc.)
 # noixemul means direct AmigaOS calls.
 # Linking both causes symbol conflicts and mysterious crashes.
 CFLAGS = -noixemul -m68000
 LDFLAGS = -lixemul    # CONFLICT: ixemul AND noixemul!
 ```
 ```makefile
 # CORRECT: Choose one and be consistent
 # Option A: noixemul (most Amiga-native, smaller binary)
 CFLAGS  = -noixemul -m68000
 LDFLAGS = -noixemul -lamiga
 # Option B: ixemul (easier porting from POSIX code)
 CFLAGS  = -m68000
 LDFLAGS = -lixemul
 ```
 ### "The Missing NDK Include Path"
 ```cmake
 # BAD: CMake can't find AmigaOS headers.
 # The NDK include path must be explicitly provided.
 # Without it, #include <exec/types.h> fails.
 target_compile_definitions(myapp PRIVATE AMIGA)
 # No include path! Compiler errors on every Amiga header.
 ```
 ```cmake
 # CORRECT: Always specify NDK include path
 target_include_directories(myapp PRIVATE
    ${CMAKE_SOURCE_DIR}/ndk/include   # NDK 3.9 headers
    ${CMAKE_SOURCE_DIR}/include       # project headers
 )
 ```
 ### "The Unstripped Binary" — Shipping Debug Symbols
 ```bash
 # BAD: Amiga executables with debug symbols waste RAM.
 # On an A500 with 512 KB Chip RAM, every byte counts.
 # Unstripped binaries can be 2-5× larger.
 cmake --build build
 cp build/MyApp dist/MyApp   # 250 KB — includes symbols!
 ```
 ```bash
 # CORRECT: Always strip for release
 cmake --build build
 m68k-amigaos-strip build/MyApp -o dist/MyApp   # 80 KB
 # Or in CMakeLists.txt:
 #   set(CMAKE_C_FLAGS_RELEASE "-Os -DNDEBUG")
 #   add_custom_command(TARGET myapp POST_BUILD
 #       COMMAND m68k-amigaos-strip $<TARGET_FILE:myapp>)
 ```
 ---
 ## Historical Context & Modern Analogies
 ### Cross-Platform Amiga Development Timeline
 ```mermaid
 timeline
    title Amiga Cross-Compilation Evolution
    1985 : Native development only\\nAmiga compiles on Amiga
    1990 : SAS/C native IDE\\nStill Amiga-only
    1995 : GCC 2.95 cross-compiler\\nLinux→Amiga, unreliable
    2005 : Bebbo's GCC fork begins\\nModern cross-compilation
    2010 : vasm/vlink mature\\nReliable Amiga assembly
    2015 : VBCC amiga target\\nAlternative C compiler
    2018 : BlitterStudio Docker\\nOne-command setup
    2020 : GitHub Actions CI\\nAutomated Amiga builds
    2023 : CMake toolchain files\\nModern build system
    2026 : VS Code integration\\nFull IDE experience
 ```
 ### Modern Analogies
 | Amiga Concept | Modern Equivalent | Notes |
 |--------------|-------------------|-------|
 | `m68k-amigaos-gcc` | `aarch64-linux-gnu-gcc` (ARM cross-compiler) | Same GCC cross-compilation model |
 | `-noixemul` flag | `-nostdlib` (embedded ARM) | Don't link standard library |
 | vasm Motorola syntax | `arm-none-eabi-as` | Cross-assembler for target |
 | vlink hunk output | `objcopy -O binary` | Produce target binary format |
 | ADF disk image | `.img` (Raspberry Pi SD card) | Bootable media for target |
 | FS-UAE debugger | QEMU GDB stub | Emulator-based remote debugging |
 | CMake toolchain file | `arm-toolchain.cmake` for embedded | Identical CMake pattern |
 | GitHub Actions + Docker | Same for any cross-compilation target | Containerized builds |
 ---
 ## FAQ
 **Q: Should I use GCC or VBCC for a new project?**
 A: Use GCC if you need C++, modern C features, or extensive library support. Use VBCC if binary size is critical (demos, bootblocks) or you want a simpler, faster compilation. Both produce valid Amiga executables.
 **Q: Can I use C++ for Amiga development?**
 A: Yes — bebbo's GCC supports C++14 with some limitations: no RTTI (run-time type info) by default, exceptions are limited, and the C++ standard library (`libstdc++`) has reduced functionality. Many Amiga projects use C++ for class-based organization but avoid heavy STL usage.
 **Q: How do I test my cross-compiled binary?**
 A: Use an emulator (FS-UAE or WinUAE) or a real Amiga with a Gotek floppy emulator. For rapid iteration, FS-UAE with a hard-file (.hdf) is fastest — just copy the binary into the HDF and reboot.
 **Q: Can I debug cross-compiled code with GDB?**
 A: Limited. `m68k-amigaos-gdb` exists but requires an Amiga-side GDB stub, which is complex to set up. Most developers use WinUAE/FS-UAE's built-in debugger instead, or `printf` debugging via serial port.
 **Q: How do I package my app for distribution?**
 A: The standard Amiga distribution format is **LHA** archives. Include the executable, a `.info` icon file, and optionally an `Install` script. For floppy-disk distribution, create an ADF using `xdftool` (from `amitools` Python package).
 **Q: Can I use CMake with assembly-only projects?**
 A: Yes — CMake supports assembly via `enable_language(ASM_M68K)` and `set(CMAKE_ASM_M68K_COMPILER vasmm68k_mot)`. Alternatively, use a simple Makefile for assembly projects (see [makefiles.md](makefiles.md)).
 ---
 ## References
 ### Related Toolchain Articles
 - [gcc_amiga.md](gcc_amiga.md) — bebbo GCC cross-compiler: installation, flags, CPU targets, Docker
 - [vbcc.md](vbcc.md) — VBCC portable compiler: `__reg()`, cross-compilation, size optimization
 - [vasm_vlink.md](vasm_vlink.md) — vasm assembler & vlink linker: modular architecture, linker scripts
 - [makefiles.md](makefiles.md) — Makefile patterns for cross-compilation, mixed C+asm
 - [ndk.md](ndk.md) — NDK versions and header organization
 - [debugging.md](debugging.md) — debugging tools: Enforcer, SnoopDOS, FS-UAE GDB remote
 ### External Resources
 - **bebbo/amiga-gcc**: https://github.com/bebbo/amiga-gcc
 - **BlitterStudio/amiga-gcc**: https://github.com/BlitterStudio/amiga-gcc (with platform build docs)
 - **amitools**: https://github.com/cnvogelg/amitools (xdftool, vamkftools)
 - **FS-UAE**: https://fs-uae.net
--- a/17_demoscene/3d_rendering.md
+++ b/17_demoscene/3d_rendering.md
@ -0,0 +1,875 @@
 [← Home](../README.md) · [Demoscene Techniques](README.md)
 # 3D Rendering — Fixed-Point Math, Blitter Polygons, Rotozoom, Dot Tunnels, and Voxel Space
 ## Overview
 In 1985, the Amiga had no 3D hardware. No matrix engine, no floating-point unit, no texture mapper — just a 7 MHz integer-only 68000, a Blitter that could copy rectangles, and a Copper that could change display registers. Yet by 1990, demoscene coders were rendering real-time filled 3D objects, and by 1994 they were flying through voxel landscapes at playable framerates. The entire 3D pipeline — projection, clipping, rasterization, fill — was built from scratch in hand-tuned 68000 assembly using fixed-point arithmetic.
 This article covers the demoscene 3D rendering techniques that made it possible: fixed-point math, Blitter-filled polygons, texture-mapped rotozoom, dot tunnels, and voxel space rendering. Each technique maps to a specific hardware capability — and the demoscene's creative abuse of it.
 ```mermaid
 graph TB
    subgraph "Math Foundation"
        FP["Fixed-Point<br/>16.16 arithmetic"]
        MATRIX["Matrix Ops<br/>Rotation × projection"]
        SINCOS["Sine/Cosine<br/>Table lookup"]
    end
    subgraph "Rendering"
        FILL["Filled Polygons<br/>Blitter line + fill"]
        TEXTURE["Rotozoom<br/>Affine texture map"]
        DOT["Dot Tunnel<br/>Z-ordered circles"]
        VOXEL["Voxel Space<br/>Raycast heightmap"]
    end
    subgraph "Optimization"
        C2P["Chunky-to-Planar<br/>Kalms/Blitter C2P"]
        DIV["Fast Division<br/>Reciprocal table"]
        CLIP["Screen-Space Clip<br/>Cohen-Sutherland"]
    end
    FP --> FILL
    FP --> TEXTURE
    MATRIX --> FILL
    SINCOS --> MATRIX
    FILL --> DOT
    TEXTURE --> VOXEL
    C2P --> TEXTURE
    DIV --> VOXEL
 ```
 ---
 ## Foundation: Fixed-Point Arithmetic
 The 68000 has no floating-point unit. All 3D math must use integers. The solution is **fixed-point** — encoding fractional values as integers with an implicit decimal point.
 ### 16.16 Fixed-Point Format
 ```
 ┌──────────────────────────────────┬──────────────────────────────────┐
 │  Upper 16 bits: integer part     │  Lower 16 bits: fractional part  │
 │  (signed, two's complement)      │  (unsigned, represents 0 to ~1)  │
 └──────────────────────────────────┴──────────────────────────────────┘
 Example: 1.5 = $00018000
         0.25 = $00004000
        -1.0  = $FFFF0000
         π    = $0003243F  (3.14159...)
 ```
 ### Fixed-Point Operations
 ```c
 /* fixedpoint.h — 16.16 fixed-point arithmetic for 68000 */
 typedef LONG FIXED;   /* 32-bit signed: 16.16 format */
 #define INT_TO_FIXED(x)   ((FIXED)((x) << 16))
 #define FIXED_TO_INT(x)   ((WORD)((x) >> 16))     /* Truncate */
 #define FIXED_TO_INT_R(x) (WORD)(((x) + 0x8000) >> 16)  /* Round */
 #define FLOAT_TO_FIXED(f) ((FIXED)((f) * 65536.0))
 /* Multiply: result = a × b / 65536
   On 68000, use 32×32→64 MULS.L then shift right 16 */
 static inline FIXED fixed_mul(FIXED a, FIXED b) {
    /* 68000 asm:
       move.l  a, d0
       muls.l  b, d0:d1     ; d0:d1 = 64-bit result
       swap    d0            ; d0 = upper 32 bits (already >> 16)
       ; d0 contains the result
    */
    return (FIXED)(((LONG)a * (LONG)b) >> 16);
 }
 /* Divide: result = a × 65536 / b
   Must multiply first to avoid losing precision */
 static inline FIXED fixed_div(FIXED a, FIXED b) {
    /* 68000: be careful with overflow!
       Use reciprocal table for perspective division */
    return (FIXED)(((LONG)a << 16) / (LONG)b);
 }
 ```
 ### Sine/Cosine Tables
 Pre-calculated lookup tables are essential — computing `sin()` at runtime is too slow:
 ```c
 /* trig_tables.c — Pre-calculated 16.16 sine/cosine tables */
 /* 1024 entries covering 0-2π, index = angle × 1024 / (2π) */
 #define TRIG_TABLE_SIZE 1024
 #define ANGLE_2PI      1024    /* Full circle = 1024 units */
 /* 16.16 fixed-point: sin values range from -1.0 ($FFFF0000) to 1.0 ($00010000) */
 static const FIXED sin_table[TRIG_TABLE_SIZE];  /* Generated at build time */
 /* Fast lookup with wrapping */
 static inline FIXED fast_sin(int angle) {
    return sin_table[angle & (TRIG_TABLE_SIZE - 1)];
 }
 static inline FIXED fast_cos(int angle) {
    return sin_table[(angle + 256) & (TRIG_TABLE_SIZE - 1)];  /* cos = sin(x+π/2) */
 }
 ```
 ---
 ## Foundation: Matrix Operations
 3D rotation uses 3×3 matrices multiplied with vertex coordinates. Each rotation (X, Y, Z axis) is a matrix multiply:
 ### Rotation Matrix Construction
 ```c
 /* matrix3d.c — 3D rotation matrices using fixed-point */
 typedef struct {
    FIXED m[3][3];  /* 3×3 rotation matrix */
 } Matrix3D;
 /* Build rotation matrix from Euler angles */
 void build_rotation_matrix(Matrix3D *mat, int ax, int ay, int az) {
    FIXED sx = fast_sin(ax), cx = fast_cos(ax);
    FIXED sy = fast_sin(ay), cy = fast_cos(ay);
    FIXED sz = fast_sin(az), cz = fast_cos(az);
    /* Combined Z×Y×X rotation (standard demoscene order) */
    mat->m[0][0] = fixed_mul(cy, cz);
    mat->m[0][1] = fixed_mul(fixed_mul(sx, sy), cz) - fixed_mul(cx, sz);
    mat->m[0][2] = fixed_mul(fixed_mul(cx, sy), cz) + fixed_mul(sx, sz);
    mat->m[1][0] = fixed_mul(cy, sz);
    mat->m[1][1] = fixed_mul(fixed_mul(sx, sy), sz) + fixed_mul(cx, cz);
    mat->m[1][2] = fixed_mul(fixed_mul(cx, sy), sz) - fixed_mul(sx, cz);
    mat->m[2][0] = -sy;
    mat->m[2][1] = fixed_mul(sx, cy);
    mat->m[2][2] = fixed_mul(cx, cy);
 }
 /* Transform a vertex: result = matrix × vertex */
 void transform_vertex(const Matrix3D *mat, FIXED vx, FIXED vy, FIXED vz,
                      FIXED *rx, FIXED *ry, FIXED *rz) {
    *rx = fixed_mul(mat->m[0][0], vx) +
          fixed_mul(mat->m[0][1], vy) +
          fixed_mul(mat->m[0][2], vz);
    *ry = fixed_mul(mat->m[1][0], vx) +
          fixed_mul(mat->m[1][1], vy) +
          fixed_mul(mat->m[1][2], vz);
    *rz = fixed_mul(mat->m[2][0], vx) +
          fixed_mul(mat->m[2][1], vy) +
          fixed_mul(mat->m[2][2], vz);
 }
 ```
 ### Perspective Projection
 ```c
 /* project.c — Perspective projection to screen coordinates */
 #define SCREEN_CX  160   /* Center X (320 wide) */
 #define SCREEN_CY  128   /* Center Y (256 tall) */
 #define FOCAL_LEN  256   /* Focal length in fixed-point */
 void project_vertex(FIXED x, FIXED y, FIXED z,
                    WORD *sx, WORD *sy) {
    /* Perspective divide: screen = world × focal / z */
    if (z > INT_TO_FIXED(1)) {  /* Avoid division by zero */
        FIXED scale = fixed_div(INT_TO_FIXED(FOCAL_LEN), z);
        *sx = SCREEN_CX + FIXED_TO_INT(fixed_mul(x, scale));
        *sy = SCREEN_CY - FIXED_TO_INT(fixed_mul(y, scale));  /* Y flipped */
    } else {
        *sx = SCREEN_CX;  /* Behind camera */
        *sy = SCREEN_CY;
    }
 }
 ```
 ---
 ## Technique 1: Blitter-Filled Polygons
 The Blitter's **line-draw + fill mode** combination is the foundation of Amiga 3D rendering. The process:
 1. Draw polygon edges using Blitter line mode (sets pixels at boundaries)
 2. Activate Blitter fill mode (fills between set pixels, even→odd fill rule)
 3. Result: a filled polygon with zero CPU pixel writing
 ```mermaid
 sequenceDiagram
    participant CPU as 68000 CPU
    participant Blitter as Blitter
    participant Bitmap as Bitplane Memory
    Note over CPU: Polygon: 4 vertices
    CPU->>Blitter: Configure line mode (BLTCON0=$0B7A)
    CPU->>Blitter: Draw edge V1→V2
    Blitter->>Bitmap: Write edge pixels
    CPU->>Blitter: Draw edge V2→V3
    Blitter->>Bitmap: Write edge pixels
    CPU->>Blitter: Draw edge V3→V4
    Blitter->>Bitmap: Write edge pixels
    CPU->>Blitter: Draw edge V4→V1
    Blitter->>Bitmap: Write edge pixels
    CPU->>Blitter: Configure fill mode (BLTCON0=$01F2)
    CPU->>Blitter: Fill bitmap (even-odd rule)
    Blitter->>Bitmap: Fill interior pixels
    Note over Bitmap: Filled polygon ready
 ```
 ### Blitter Fill Mode Detail
 The Blitter fill uses the **inclusive-odd** fill rule: scanning left to right, it inverts pixels each time it encounters a set bit. This means it fills between pairs of edge pixels:
 ```asm
 ; blit_fill.asm — Blitter fill for a single bitplane
 ; Assumes edges already drawn in the bitmap
        ; Fill from top to bottom of polygon
        lea     $DFF000,a0             ; Custom registers base
        ; Set up fill-mode blit
        move.w  #$01F2,BLTCON0(a0)     ; Fill mode: A→D, fill enabled
        move.w  #$0000,BLTCON1(a0)     ; No line mode, ascending
        move.w  #$FFFF,BLTAFWM(a0)     ; First word mask = all bits
        move.w  #$FFFF,BLTALWM(a0)     ; Last word mask = all bits
        ; Source A = the bitmap data (edge pixels)
        move.l  bitmap_start,BLTAPTH(a0) ; Source address
        ; Destination = same bitmap (fill in-place)
        move.l  bitmap_start,BLTDPTH(a0) ; Dest = source
        ; Blit size: height × width
        ; height = number of scanlines, width = words per line
        move.w  #(HEIGHT<<6)|(WORDS_PER_LINE),BLTSIZE(a0)
        ; Blitter starts immediately
 ```
 ### Multiple-Face Sorting (Painter's Algorithm)
 For solid 3D objects, faces must be drawn back-to-front (painter's algorithm):
 ```c
 /* face_sort.c — Sort polygon faces by depth for painter's algorithm */
 typedef struct {
    WORD   num_vertices;
    WORD   *vertices;    /* Index into vertex array */
    FIXED  avg_z;        /* Average Z depth (for sorting) */
    UWORD  color;        /* Face color */
 } Face;
 int compare_faces(const void *a, const void *b) {
    FIXED za = ((const Face *)a)->avg_z;
    FIXED zb = ((const Face *)b)->avg_z;
    /* Sort far-to-near (painter's algorithm) */
    if (za > zb) return -1;  /* a is farther, draw first */
    if (za < zb) return  1;
    return 0;
 }
 void render_object(Face *faces, int num_faces,
                   FIXED *transformed_z) {
    int i;
    /* Calculate average Z for each face */
    for (i = 0; i < num_faces; i++) {
        FIXED sum = 0;
        int j;
        for (j = 0; j < faces[i].num_vertices; j++) {
            sum += transformed_z[faces[i].vertices[j]];
        }
        faces[i].avg_z = sum / faces[i].num_vertices;
    }
    /* Sort back-to-front */
    qsort(faces, num_faces, sizeof(Face), compare_faces);
    /* Draw each face */
    for (i = 0; i < num_faces; i++) {
        draw_filled_polygon(&faces[i]);
    }
 }
 ```
 ---
 ## Technique 2: Rotozoom (Affine Texture Mapping)
 Rotozoom renders a texture-mapped rectangle that can rotate and scale in real-time. The name comes from **rotate + zoom**. It works by computing a texture coordinate (U,V) for each screen pixel using an affine transform.
 ### Algorithm
 For each screen pixel (x, y), compute texture coordinates:
 ```
 U = U_start + x × dU_dx + y × dU_dy
 V = V_start + x × dV_dx + y × dV_dy
 ```
 Where `dU_dx`, `dV_dx`, `dU_dy`, `dV_dy` are the rotation+scale matrix coefficients.
 ```c
 /* rotozoom.c — Affine texture mapping (rotzoom) */
 #define SCREEN_W 320
 #define SCREEN_H 256
 #define TEX_SIZE 256  /* Texture is 256×256 */
 extern UBYTE texture[TEX_SIZE][TEX_SIZE];  /* Chunky texture */
 extern UBYTE *chunky_buffer;               /* Output chunky buffer */
 void render_rotozoom(FIXED cx, FIXED cy,    /* Texture center offset */
                     FIXED angle, FIXED zoom) {
    FIXED cos_a = fast_cos(angle);
    FIXED sin_a = fast_sin(angle);
    FIXED inv_zoom = fixed_div(INT_TO_FIXED(1), zoom);
    /* Rotation × inverse zoom matrix coefficients */
    FIXED du_dx = fixed_mul(cos_a, inv_zoom);
    FIXED dv_dx = fixed_mul(sin_a, inv_zoom);
    FIXED du_dy = fixed_mul(-sin_a, inv_zoom);
    FIXED dv_dy = fixed_mul(cos_a, inv_zoom);
    /* Start position: center of screen maps to texture center */
    FIXED u_start = cx - fixed_mul(INT_TO_FIXED(SCREEN_W/2), du_dx)
                          - fixed_mul(INT_TO_FIXED(SCREEN_H/2), du_dy);
    FIXED v_start = cy - fixed_mul(INT_TO_FIXED(SCREEN_W/2), dv_dx)
                          - fixed_mul(INT_TO_FIXED(SCREEN_H/2), dv_dy);
    UBYTE *dst = chunky_buffer;
    int y;
    for (y = 0; y < SCREEN_H; y++) {
        FIXED u = u_start;
        FIXED v = v_start;
        int x;
        for (x = 0; x < SCREEN_W; x++) {
            /* Texture lookup with wrapping */
            *dst++ = texture[FIXED_TO_INT(u) & 0xFF]
                            [FIXED_TO_INT(v) & 0xFF];
            u += du_dx;
            v += dv_dx;
        }
        u_start += du_dy;
        v_start += dv_dy;
    }
    /* Convert chunky buffer to planar bitplanes (C2P) */
    chunky_to_planar(chunky_buffer, bitplane_data, SCREEN_W, SCREEN_H);
 }
 ```
 ### Rotozoom in Assembly (Inner Loop)
 The 68000 assembly inner loop is highly optimized. The key insight: texture coordinates wrap at power-of-2 boundaries, so masking with `$FF` (256-wide texture) is free using byte-level addressing:
 ```asm
 ; rotozoom_inner.asm — Optimized inner loop
 ; a0 = destination (chunky buffer)
 ; a1 = texture base (256×256)
 ; d0 = U (16.16 fixed)
 ; d1 = V (16.16 fixed)
 ; d2 = dU/dx (16.16 fixed)
 ; d3 = dV/dx (16.16 fixed)
 ; d4 = loop counter (SCREEN_W)
 .roto_inner:
        move.l  d0,d5           ; Copy U
        swap    d5              ; d5.w = integer part of U
        move.l  d1,d6           ; Copy V
        swap    d6              ; d6.w = integer part of V
        ; Texture lookup: tex[v & 0xFF][u & 0xFF]
        and.w   #$FF,d5         ; U mask (free wrap)
        and.w   #$FF,d6         ; V mask (free wrap)
        lsl.w   #8,d6           ; V × 256 (row offset)
        move.b  (a1,d5.w),d7    ; Texel = tex[v*256+u]
        move.b  d7,(a0)+        ; Write to chunky buffer
        add.l   d2,d0           ; U += dU/dx
        add.l   d3,d1           ; V += dV/dx
        dbra    d4,.roto_inner
 ```
 ---
 ## Technique 3: Dot Tunnel
 The dot tunnel renders concentric rings that appear to fly toward the viewer, creating the illusion of traveling through a tunnel. Each ring is a circle rendered at a specific Z-depth.
 ### Algorithm
 ```c
 /* dot_tunnel.c — Z-ordered ring tunnel effect */
 #define NUM_RINGS   30
 #define MAX_Z       1024
 #define RING_POINTS 32
 typedef struct {
    FIXED z;         /* Depth (0=near, far=background) */
    FIXED radius;    /* Apparent radius (decreases with z) */
    WORD  cx, cy;    /* Center (can be animated) */
    UWORD color;     /* Ring color */
 } Ring;
 void render_dot_tunnel(Ring *rings, int num_rings, ULONG frame) {
    int i;
    /* Update ring positions (move toward viewer) */
    for (i = 0; i < num_rings; i++) {
        rings[i].z -= INT_TO_FIXED(4);  /* Speed toward viewer */
        /* If ring passed camera, reset to far end */
        if (rings[i].z < INT_TO_FIXED(1)) {
            rings[i].z = INT_TO_FIXED(MAX_Z);
        }
        /* Perspective projection of radius */
        rings[i].radius = fixed_div(
            INT_TO_FIXED(200),     /* Base radius */
            rings[i].z             /* Divide by depth */
        );
    }
    /* Sort rings far-to-near (painter's algorithm) */
    /* ... sort by rings[i].z descending ... */
    /* Draw each ring */
    for (i = 0; i < num_rings; i++) {
        int p;
        int radius = FIXED_TO_INT(rings[i].radius);
        int cx = rings[i].cx + FIXED_TO_INT(
            fixed_mul(fast_sin(frame * 3), INT_TO_FIXED(30)));
        int cy = rings[i].cy + FIXED_TO_INT(
            fixed_mul(fast_cos(frame * 5), INT_TO_FIXED(20)));
        for (p = 0; p < RING_POINTS; p++) {
            int angle = p * 360 / RING_POINTS;
            FIXED sa = fast_sin(angle * TRIG_TABLE_SIZE / 360);
            FIXED ca = fast_cos(angle * TRIG_TABLE_SIZE / 360);
            WORD px = cx + FIXED_TO_INT(fixed_mul(INT_TO_FIXED(radius), sa));
            WORD py = cy + FIXED_TO_INT(fixed_mul(INT_TO_FIXED(radius), ca));
            /* Plot pixel or draw Blitter circle at (px, py) */
            plot_dot(px, py, rings[i].color);
        }
    }
 }
 ```
 ---
 ## Technique 4: Voxel Space
 Voxel space renders a 3D landscape from a 2D heightmap and colormap. The algorithm casts rays from the viewer, one per screen column, and draws vertical strips of pixels. The result is a fly-over landscape effect, as seen in the 1994 demo "Space Rangers" by Rebels.
 ### Algorithm (Column-Based Raycasting)
 ```mermaid
 sequenceDiagram
    participant CPU as 68000 CPU
    participant HM as Heightmap
    participant CM as Colormap
    participant Screen as Screen Buffer
    Note over CPU: For each screen column x (0-319):
    CPU->>CPU: Calculate ray direction for column x
    CPU->>HM: Sample height at (ray_x, ray_z)
    HM-->>CPU: height value h
    CPU->>CPU: Project h to screen Y: y = horizon - h/z
    CPU->>CM: Get color at (ray_x, ray_z)
    CM-->>CPU: color value
    CPU->>Screen: Draw vertical line from y to previous y in color
    Note over CPU: Advance ray (step outward)
    CPU->>HM: Sample next point...
 ```
 ### Voxel Space Implementation
 ```c
 /* voxelspace.c — Column-based voxel landscape rendering */
 #define SCREEN_W    320
 #define SCREEN_H    256
 #define MAP_SIZE    256
 #define HORIZON     100   /* Horizon line Y position */
 #define MAX_DEPTH   200   /* Maximum ray distance */
 extern UBYTE heightmap[MAP_SIZE][MAP_SIZE];
 extern UBYTE colormap[MAP_SIZE][MAP_SIZE];
 extern UBYTE *chunky_buffer;
 void render_voxel_space(FIXED cam_x, FIXED cam_z,
                        FIXED cam_angle, FIXED cam_height) {
    int x;
    for (x = 0; x < SCREEN_W; x++) {
        /* Ray angle: camera angle + column offset */
        FIXED column_offset = INT_TO_FIXED(x - SCREEN_W/2);
        FIXED ray_angle = cam_angle + fixed_div(column_offset,
                                                 INT_TO_FIXED(FOCAL_LEN));
        FIXED ray_dx = fast_cos(ray_angle);  /* Direction X */
        FIXED ray_dz = fast_sin(ray_angle);  /* Direction Z */
        FIXED ray_x = cam_x;
        FIXED ray_z = cam_z;
        WORD prev_draw_y = SCREEN_H;  /* Bottom of column */
        int distance;
        for (distance = 1; distance < MAX_DEPTH; distance++) {
            FIXED dz = fixed_div(INT_TO_FIXED(distance), ray_dz);
            FIXED dx = fixed_div(INT_TO_FIXED(distance), ray_dx);
            /* Current map position */
            int mx = (FIXED_TO_INT(cam_x + dx)) & (MAP_SIZE - 1);
            int mz = (FIXED_TO_INT(cam_z + dz)) & (MAP_SIZE - 1);
            /* Height at this point */
            FIXED terrain_h = INT_TO_FIXED(heightmap[mz][mx]);
            /* Project to screen Y */
            FIXED height_diff = terrain_h - cam_height;
            WORD draw_y = HORIZON -
                FIXED_TO_INT(fixed_div(height_diff,
                    INT_TO_FIXED(distance)));
            /* Only draw if above previously drawn pixel */
            if (draw_y < prev_draw_y) {
                UBYTE color = colormap[mz][mx];
                int y;
                for (y = draw_y; y < prev_draw_y; y++) {
                    chunky_buffer[y * SCREEN_W + x] = color;
                }
                prev_draw_y = draw_y;
            }
            /* Step ray outward */
            ray_x += ray_dx;
            ray_z += ray_dz;
        }
    }
    /* C2P conversion for planar display */
    chunky_to_planar(chunky_buffer, bitplane_data, SCREEN_W, SCREEN_H);
 }
 ```
 ### Voxel Space Optimization
 The naive algorithm is too slow for 50 FPS on a 68000. Key optimizations:
 | Optimization | Speedup | How |
 |-------------|---------|-----|
 | **Reciprocal table** | 2× | Pre-compute 1/z values, avoid division |
 | **Step doubling** | 3-4× | Double step size beyond certain depth (less detail needed) |
 | **Height caching** | 1.5× | Cache last N height lookups |
 | **Reduced resolution** | 2-4× | Render at 160×128 and scale up (acceptable for landscape) |
 | **Fast C2P** | 30× | Use Kalms C2P instead of naive conversion |
 ---
 ## Performance Budget
 ### 3D Rendering Costs on Stock A500 (7 MHz 68000)
 | Operation | Cycles (approx.) | Notes |
 |-----------|------------------|-------|
 | Fixed-point multiply | ~28 | `MULS.W` (16×16→32) |
 | Fixed-point divide | ~140 | `DIVS.W` — very expensive! |
 | Sine table lookup | ~12 | Table indexed by angle |
 | Vertex transform | ~300 | 3 multiplies + 3 adds per axis |
 | Perspective divide | ~160 | 2 divides per vertex |
 | Blitter line draw | ~200/edge | DMA time for edge |
 | Blitter fill | ~2000/polygon | Depends on polygon size |
 | Full C2P (Kalms) | ~35ms | 320×256 × 8bpp |
 | Voxel column | ~500/col | Heightmap lookup + draw |
 ### Frame Budget (PAL: 20ms per frame)
 | Effect | Vertices | Time | FPS |
 |--------|----------|------|-----|
 | Single flat-shaded cube | 8 | ~3ms | 50 |
 | 100-face object | 30+ | ~12ms | 30-50 |
 | Rotozoom 320×256 | 0 (per pixel) | ~40ms (with C2P) | 15-25 |
 | Dot tunnel 30 rings | 960 dots | ~8ms | 50 |
 | Voxel space 320×256 | 64K cols | ~80ms | 6-12 |
 ---
 ## Antipatterns
 ### 1. The Floating-Point Temptation
 Using floating-point math on the 68000. The 68881 FPU is optional — most Amigas don't have one. Software floating-point emulation is **100× slower** than fixed-point.
 **Broken:**
 ```c
 /* Don't do this — requires FPU or slow software emulation */
 float x = sin(angle) * distance;
 float y = cos(angle) * distance;
 ```
 **Fixed:**
 ```c
 /* Use fixed-point with pre-calculated tables */
 FIXED x = fixed_mul(fast_sin(angle), distance);
 FIXED y = fixed_mul(fast_cos(angle), distance);
 ```
 ### 2. The Per-Pixel Divide
 Calling `fixed_div()` for every pixel in a rotozoom or voxel renderer. Division is the most expensive operation on the 68000 (~140 cycles for 16-bit, ~280 for 32-bit).
 **Broken:**
 ```c
 for (x = 0; x < 320; x++) {
    for (y = 0; y < 256; y++) {
        FIXED u = fixed_div(x, z);  /* DIVIDE PER PIXEL! */
        FIXED v = fixed_div(y, z);
    }
 }
 ```
 **Fixed:**
 ```c
 /* Pre-compute step values (multiply instead of divide) */
 FIXED du_dx = fixed_mul(scale, inv_z);  /* One divide per frame */
 FIXED dv_dy = fixed_mul(scale, inv_z);
 for (y = 0; y < 256; y++) {
    FIXED u = u_start;
    for (x = 0; x < 320; x++) {
        u += du_dx;  /* ADD, not multiply/divide */
    }
    u_start += du_dy;
 }
 ```
 ### 3. The Backface Cull Miss
 Skipping backface culling for convex objects. Every polygon drawn behind other polygons is wasted Blitter time. A simple dot-product test rejects ~50% of faces.
 **Broken:**
 ```c
 /* Draw all faces — 50% are facing away! */
 for (i = 0; i < num_faces; i++) {
    draw_filled_polygon(&faces[i]);  /* Wastes time on hidden faces */
 }
 ```
 **Fixed:**
 ```c
 for (i = 0; i < num_faces; i++) {
    /* Backface cull: if face normal points away, skip it */
    FIXED nx = compute_normal_x(&faces[i]);
    FIXED nz = compute_normal_z(&faces[i]);
    if (nz < 0) continue;  /* Facing away from camera */
    draw_filled_polygon(&faces[i]);
 }
 ```
 ### 4. The Unsorted Z-Fight
 Drawing faces in random order without depth sorting. Overlapping polygons flicker as they overwrite each other unpredictably each frame.
 **Broken:**
 ```c
 /* Draw faces in arbitrary order → z-fighting */
 for (i = 0; i < num_faces; i++) {
    draw_filled_polygon(&faces[i]);
 }
 ```
 **Fixed:**
 ```c
 /* Sort by average Z depth (painter's algorithm) */
 qsort(faces, num_faces, sizeof(Face), compare_faces_back_to_front);
 for (i = 0; i < num_faces; i++) {
    draw_filled_polygon(&faces[i]);
 }
 ```
 ### 5. The Naive C2P
 Using a naive chunky-to-planar conversion for rotozoom/voxel output. The naive method processes each pixel individually with bit shifts, taking over 1 second per frame on a stock 68000.
 **Broken:**
 ```c
 /* Naive C2P: ~70,000 pixels/sec — 0.9 FPS for 320×256 */
 for (i = 0; i < 320*256; i++) {
    int pixel = chunky[i];
    for (bit = 0; bit < 8; bit++) {
        planes[bit][i/8] |= ((pixel >> bit) & 1) << (7 - (i & 7));
    }
 }
 ```
 **Fixed:**
 ```c
 /* Use Kalms C2P or Blitter-assisted C2P: ~30× faster */
 kalms_c2p(chunky_buffer, planar_data, 320, 256);
 /* See pixel_conversion.md for full implementation */
 ```
 ---
 ## Decision Guide
 ```mermaid
 flowchart TD
    START[Need 3D rendering] --> Q1{What are you rendering?}
    Q1 -->|Solid objects| Q2{Convex or arbitrary?}
    Q1 -->|Landscapes| VOXEL[Voxel Space]
    Q1 -->|Textures| ROTZ[Rotozoom]
    Q1 -->|Abstract/Tunnel| DOT[Dot Tunnel]
    Q2 -->|Convex| Q3{Number of faces?}
    Q2 -->|Arbitrary| BSP[BSP-tree or Z-buffer approach]
    Q3 -->|<100| SIMPLE[Simple painter's algorithm<br/>+ backface cull]
    Q3 -->|100-500| SORT[Sorted faces + Blitter fill]
    Q3 -->|>500| Q4{Stock A500 or accelerated?}
    Q4 -->|Stock A500| REDUCE[Reduce geometry or<br/>use wireframe only]
    Q4 -->|Accelerated| OPT[Optimized fill + C2P pipeline]
    VOXEL --> Q5{Resolution?}
    Q5 -->|Full 320×256| SLOW[~8 FPS on stock A500]
    Q5 -->|Half 160×128| OK[~15-25 FPS on A500]
    ROTZ --> Q6{Platform?}
    Q6 -->|OCS/ECS| C2P_PATH[C2P required<br/>~15-25 FPS]
    Q6 -->|RTG/AGA| CHUNKY[Direct chunky write<br/>~50 FPS possible]
 ```
 ---
 ## Historical Timeline
 ```mermaid
 timeline
    title 3D Rendering on Amiga
    1987 : First wireframe 3D demos (line drawing only)
    1988 : Hidden line removal algorithms
         : Flat-shaded 3D in demoscene
    1989 : Phenomena — filled polygon 3D objects
         : Fixed-point math becomes standard
    1990 : Complex demo — multiple filled objects
         : First rotozoom effects appear
    1991 : Texture-mapped rotozoom at acceptable framerates
         : 3D starfields with depth sorting
    1992 : Voxel space demos appear (low resolution)
         : Dot tunnels with Blitter-optimized rendering
    1993 : Spaceballs — state-of-the-art dot tunnel
         : Higher-resolution voxel landscapes
    1994 : Rebels — smooth voxel space fly-over
         : 68040-accelerated 3D at full frame rate
    1995 : Polka Brothers — CPU raytracer (proof of concept)
         : 68060-accelerated demos with lighting
    2000+ : Demoscene pushes 3D on stock A500
          : Group-optimized rotozoom achieves new speed records
          : MiSTer preserves exact Blitter timing for fill accuracy
 ```
 ---
 ## Modern Analogies
 | Amiga 3D Concept | Modern Equivalent | Why It Maps |
 |-----------------|-------------------|-------------|
 | Fixed-point 16.16 | Half-precision float (FP16) | Both trade precision for speed |
 | Sine lookup table | GPU SFU (Special Function Unit) | Both use hardware-assisted transcendental |
 | Blitter fill mode | GPU rasterizer | Both fill polygon interiors |
 | Painter's algorithm | Z-buffer / depth test | Both solve polygon visibility |
 | Backface culling | GPU backface culling | Both skip invisible faces |
 | Rotozoom | Affine texture sampling | Both use 2×2 matrix transform per pixel |
 | Voxel space raycasting | Heightfield terrain shader | Both cast rays through a heightmap |
 | C2P conversion | Texture swizzle/deswizzle | Both convert between memory layouts |
 | Reciprocal table | GPU reciprocal approximation | Both avoid expensive division |
 | Chunky buffer | Render-to-texture (FBO) | Both render to off-screen buffer |
 ---
 ## Use Cases
 | Use Case | Technique | Notable Examples |
 |----------|-----------|-----------------|
 | 3D game objects | Filled polygons | Flight simulators, Elite clones |
 | Rotating logo | Rotozoom | Every demo with a bitmap logo |
 | Tunnel fly-through | Dot tunnel | Spaceballs, numerous demos |
 | Landscape fly-over | Voxel space | Rebels, numerous demos |
 | 3D chess/board games | Filled polygons + sorting | Various Amiga games |
 | Virtual reality scenes | Combined techniques | Various demo compos |
 | Star field | Z-ordered point rendering | Standard demo effect |
 | Wavy floor/ceiling | Rotozoom variant | Doom-like perspective tricks |
 ---
 ## FPGA / Emulation Impact
 | Concern | Impact | Notes |
 |---------|--------|-------|
 | **Blitter fill timing** | Fill must use exact inclusive-odd rule | Emulators must match Blitter fill behavior precisely |
 | **Line-draw accuracy** | Blitter Bresenham must match real hardware | Affects polygon edge positions |
 | **C2P pipeline** | Chunky→Planar timing affects frame rate | Must be accounted for in demo timing |
 | **Fixed-point overflow** | 68000 MULS.L/DIVS.L edge cases | 32-bit overflow behavior must match hardware |
 | **Blitter-CPU interleaving** | BLTPRI affects CPU stall duration | Must match real Blitter busy-wait timing |
 ---
 ## FAQ
 **Q: Why not use the FPU for 3D math?**
 A: The 68881/68882 FPU is optional hardware that most Amiga models don't have. Software FPU emulation is 50-100× slower than fixed-point integer math. Only 68030/040/060 accelerated Amigas typically have an FPU, and even then, fixed-point is faster for many operations because the 68000's integer multiply is well-optimized.
 **Q: What is the fastest C2P for rotozoom?**
 A: The Kalms C2P is the standard. For AGA machines with 32-bit Blitter access, a Blitter-assisted C2P can be even faster. For RTG cards, C2P is unnecessary — write directly to chunky VRAM. See [Pixel Conversion](../08_graphics/pixel_conversion.md) for benchmarks.
 **Q: How do I handle polygon clipping?**
 A: For simple 3D objects, screen-space clipping (Cohen-Sutherland or Sutherland-Hodgman) is sufficient. Clip against the four screen edges. For objects that can go behind the camera, you need near-plane clipping in 3D space — this is much more complex and most demos avoid it by keeping objects in front of the camera.
 **Q: Can I do texture-mapped polygons (not just rotozoom)?**
 A: Yes, but affine texture mapping (per-polygon UV) produces visible distortion on large polygons. Correct perspective texture mapping requires per-pixel division, which is too slow on a 68000. Most demos use subdivision (split large polygons into smaller ones) or simply use rotozoom for the entire screen.
 **Q: What is a dot matrix / voxel display?**
 A: A voxel (volume pixel) display renders 3D data as a grid of points. On the Amiga, this typically means rendering heightmap terrain as vertical columns (voxel space) or rendering 3D point clouds. The Blitter's line-draw mode can efficiently render individual dots.
 ---
 ## References
 ### Related Knowledge Base Articles
 - [Pixel Conversion](../08_graphics/pixel_conversion.md) — C2P algorithms (Kalms, Blitter, Akiko)
 - [Blitter Programming](../08_graphics/blitter_programming.md) — Fill mode, line draw, minterms
 - [Blitter](../08_graphics/blitter.md) — Blitter hardware architecture
 - [Bitmap](../08_graphics/bitmap.md) — Bitplane memory layout, interleaving
 - [Copper Effects](copper_effects.md) — Copper-driven display effects
 - [Timing Optimization](timing_optimization.md) — Cycle counting, Blitter-CPU interleaving
 - [FPU Architecture](../15_fpu_mmu_cache/fpu_architecture.md) — 68881/68882 floating-point
 ### External Resources
 - **Amiga Hardware Reference Manual** — Blitter fill mode, line-draw mode
 - **Scoopex Amiga Hardware Programming** (Photon) — [YouTube playlist](https://www.youtube.com/playlist?list=PLc3ltHgmiidpK-s0eP5hTKJnjdTHz0_bW) — Blitter fill mode and line-draw video walkthroughs; companion articles at [coppershade.org](http://coppershade.org/)
 - **Pouet.net** — https://www.pouet.net — 3D demo releases with source code
 - **Demozoo** — https://demozoo.org — Demoscene production encyclopedia
 - **Amiga Graphics Archive** — https://amiga.lychesis.net — Copper-enhanced 3D rendering analysis in commercial games
 - **Kalms C2P** — Standard chunky-to-planar implementation
 - **Comanche Voxel Engine** — Original voxel space algorithm reference (NovaLogic)
--- a/17_demoscene/README.md
+++ b/17_demoscene/README.md
@ -0,0 +1,140 @@
 [← Home](../README.md)
 # Demoscene Techniques — Pushing the Hardware Beyond Its Limits
 ## Overview
 The Amiga demoscene is a subculture of programmers, artists, and musicians who create real-time audiovisual presentations ("demos") that push the hardware far beyond what Commodore's engineers imagined possible. From 1986 to the present day, Amiga demos have showcased techniques that later became standard in game development and graphics programming: copper bars, raster effects, sprite multiplexing, chunky pixels, rotozoom, dot tunnels, and more.
 This section documents the **hardware techniques** that demoscene coders invented or perfected on the Amiga — techniques that are essential knowledge for anyone reverse-engineering games, writing emulators, or building FPGA implementations.
 > **Learning path:** For a guided, video-based introduction to these techniques, see the **[Scoopex Amiga Hardware Programming](https://www.youtube.com/playlist?list=PLc3ltHgmiidpK-s0eP5hTKJnjdTHz0_bW)** series by Photon (ScoopexUs) on YouTube — a comprehensive walkthrough of Copper, Blitter, sprites, and hardware banging in 68k assembly, with companion articles at [coppershade.org](http://coppershade.org/).
 ```mermaid
 graph TB
    subgraph "Display Techniques"
        CB["Copper Bars<br/>Color register cycling"]
        RE["Raster Effects<br/>Mid-frame register changes"]
        PL["Parallax Scrolling<br/>Dual playfield / scroll registers"]
    end
    subgraph "Sprite Techniques"
        SM["Sprite Multiplexing<br/>Reuse sprites across scanlines"]
        SB["Sprite-Built Images<br/>Sprites as pixel data"]
    end
    subgraph "Pixel Techniques"
        CC["Copper Chunky<br/>Copper-driven pseudo-chunky"]
        C2P["Chunky to Planar<br/>Fast C2P conversion"]
        HAM["HAM Art<br/>HAM-6/8 for photorealism"]
    end
    subgraph "3D Techniques"
        FL["Filled Polygons<br/>Blitter fill mode"]
        RT["Rotozoom<br/>Affine texture transform"]
        DT["Dot Tunnel<br/>Z-ordered particle tunnel"]
        VO["Voxel Space<br/>Raycast heightmap"]
    end
    style CB fill:#e8f4fd,stroke:#2196f3,color:#333
    style SM fill:#fff3e0,stroke:#ff9800,color:#333
    style CC fill:#f3e5f5,stroke:#9c27b0,color:#333
    style RT fill:#e8f5e9,stroke:#4caf50,color:#333
 ```
 ## Section Index
 | File | Description |
 |------|-------------|
 | [copper_effects.md](copper_effects.md) | Copper bars, raster splits, mid-frame register changes, gradient shading, sine-based color cycling |
 | [sprite_techniques.md](sprite_techniques.md) | Sprite multiplexing, sprite-built images, 15-color attached sprites, sprite-BLT interaction |
 | [pixel_tricks.md](pixel_tricks.md) | Copper chunky, HAM art, scroll-register tricks, modulo-based wrapping, bobs-vs-sprites |
 | [3d_rendering.md](3d_rendering.md) | Fixed-point 3D math, Blitter-filled polygons, rotozoom, dot tunnels, voxel space, matrix operations |
 | [timing_optimization.md](timing_optimization.md) | Cycle counting, blitter-CPU interleaving, copper-wait placement, memory access patterns, self-modifying code |
 ---
 ## The Demoscene & Hardware — Symbiosis
 ```mermaid
 sequenceDiagram
    participant Copper as Copper Coprocessor
    participant Blitter as Blitter DMA
    participant Sprites as Sprite DMA
    participant Audio as Paula Audio DMA
    participant CPU as 68000 CPU
    Note over Copper,CPU: Single Frame (PAL: 312 scanlines, ~20ms)
    par Scanline 0–50 (Top Border)
        Copper->>Copper: WAIT y=0, set BPLCON0
        CPU->>CPU: Calculate 3D vertices
    and Scanline 50–250 (Display Area)
        Copper->>Copper: Change colors per scanline (copper bars)
        Blitter->>Blitter: Fill polygon (line-draw + fill mode)
        Sprites->>Sprites: Display multiplexed sprite data
        Audio->>Audio: Stream 4-channel MOD samples
    and Scanline 250–312 (Bottom Border + VBlank)
        Copper->>Copper: WAIT for VBlank
        CPU->>CPU: Update scroll registers, prepare next frame
    end
 ```
 The key insight of demoscene coding: **the Copper, Blitter, Sprites, and Audio all run via DMA alongside the CPU**. A single frame has the Copper changing display parameters, the Blitter filling polygons, sprites being displayed, audio streaming — all while the CPU computes the next frame's geometry. This parallelism is what made the Amiga unique and what demoscene coders exploited to the absolute limit.
 ### DMA Budget Per Scanline
 | Resource | DMA Slots per Scanline (LoRes) | Used For |
 |----------|-------------------------------|----------|
 | Bitplanes (4 planes) | 8 of 226 | Display pixel data |
 | Sprites (8 sprites) | 4 of 226 | Sprite data fetch |
 | Copper | ~1–2 of 226 | Copper instruction execution |
 | Blitter | 0–226 of 226 (shared) | Copy/fill/line operations |
 | Audio (4 channels) | 1 of 226 | Sample data fetch |
 | CPU | Remaining slots | Computation |
 > [!NOTE]
 > The Blitter and CPU share the same bus cycles. When the Blitter is running, the CPU gets fewer cycles. The `BLTPRI` bit gives the Blitter priority over the CPU entirely — "Blitter Nasty" mode. Demoscene coders use this to time operations precisely.
 ---
 ## Famous Demo Effects & Hardware Techniques
 | Effect | Hardware Used | First Seen | How It Works |
 |--------|-------------|------------|-------------|
 | **Copper Bars** | Copper (color registers) | 1987 ([Scoopex](https://www.pouet.net/prod.php?which=5832)) | Copper writes `COLORxx` registers at different Y positions, creating horizontal color bands |
 | **Raster Bars** | Copper (BPLCON0) | 1988 ([Red Sector](https://www.pouet.net/prod.php?which=3119)) | Same as copper bars, but also changes bitplane depth/resolution mid-frame for split-screen |
 | **Scrolling Sinus** | Copper (scroll registers) | 1988 ([Red Sector](https://www.pouet.net/prod.php?which=3119)) | Per-scanline `BPLxMOD` changes create a sinusoidal wave distortion |
 | **Copper Master** | Copper (all registers) | 1990 ([Angels](https://www.pouet.net/prod.php?which=3422)) | Ultimate copper showcase: bars, gradients, chunky, sine effects |
 | **Sprite Multiplexing** | Sprites + Copper | 1989 ([Kefrens](https://demozoo.org/groups/658/)) | Copper repositions sprites at different Y positions to display > 8 sprites on screen |
 | **Parallax Scrolling** | Dual playfield + scroll regs | 1989 | Two independent bitplane layers scroll at different rates |
 | **Copper Chunky** | Copper (color registers only) | 1990 ([Sanity Arte](https://www.pouet.net/prod.php?which=1477)) | No bitplanes at all — Copper writes `COLOR01` per pixel position to create a chunky-pixel display |
 | **Filled Vectors** | Blitter (line + fill) | 1991 ([Phenomena Enigma](https://www.pouet.net/prod.php?which=394)) | Blitter draws polygon edges, then fill mode paints the interior |
 | **Rotozoom** | CPU math + Blitter copy | 1991 (Complex) | Affine texture transform rendered line-by-line, Blitter copies to bitplanes |
 | **Dot Tunnel** | CPU + Blitter | 1993 ([Spaceballs](https://www.pouet.net/prod.php?which=56651)) | Z-ordered circles rendered with Blitter circles, creating a fly-through tunnel |
 | **Voxel Space** | CPU + blitter | 1994 (Rebels) | Raycast heightmap with 1-pixel-per-column rendering |
 | **Raytracing** | CPU only | 1995 ([Polka Brothers](https://www.pouet.net/prod.php?which=702)) | Pure CPU raytracer at 0.5 FPS — proof of concept |
 ---
 ## References
 ### Related Knowledge Base Articles
 - [Copper](../08_graphics/copper.md) — Copper coprocessor hardware: instruction format, UCopList
 - [Copper Programming](../08_graphics/copper_programming.md) — Building copper lists, gradients, raster effects
 - [Blitter](../08_graphics/blitter.md) — Blitter DMA engine: channels, minterms
 - [Blitter Programming](../08_graphics/blitter_programming.md) — Cookie-cut masking, line draw, fill mode
 - [Sprites](../08_graphics/sprites.md) — Hardware sprites: DMA, attached sprites, multiplexing
 - [Pixel Conversion](../08_graphics/pixel_conversion.md) — Chunky↔Planar: Kalms, Copper Chunky, Akiko
 - [Video Timing](../01_hardware/common/video_timing.md) — Scanline anatomy, beam counters, per-frame budgets
 - [DMA Architecture](../01_hardware/common/dma_architecture.md) — DMA slot allocation, bus arbitration
 ### External Resources
 - **Pouet.net**: https://www.pouet.net — Demo release database
 - **Demozoo**: https://demozoo.org — Demoscene encyclopedia
 - **Amiga Demoscene Archive**: https://amigademoscene.com
 - **Amiga Graphics Archive**: https://amiga.lychesis.net — Forensic analysis of copper lists, palette tricks, and sprite usage in commercial games
 - **Scoopex Amiga Hardware Programming** (Photon): [YouTube playlist](https://www.youtube.com/playlist?list=PLc3ltHgmiidpK-s0eP5hTKJnjdTHz0_bW) — Comprehensive video tutorial series covering Copper, Blitter, sprites, and hardware banging in 68k assembly. Companion site: [coppershade.org](http://coppershade.org/)
 - **Copper Demon** (technik): Copper bar tutorial with source code
--- a/17_demoscene/copper_effects.md
+++ b/17_demoscene/copper_effects.md
@ -0,0 +1,851 @@
 [← Home](../README.md) · [Demoscene Techniques](README.md)
 # Copper Effects — Bars, Raster Splits, Gradients, and Sine Cycling
 ## Overview
 The Copper is the single most important tool in the demoscene coder's arsenal. With only three instructions — `WAIT`, `MOVE`, and `SKIP` — it can repaint the entire screen 50 times per second, changing color registers, bitplane pointers, scroll offsets, and sprite positions at exact scanline boundaries. Every iconic Amiga demo effect, from the rainbow copper bars in [Red Sector's **Megademo**](https://www.pouet.net/prod.php?which=3119) (1989) to the sinus-scrolling message waves in [**Desert Dream**](https://www.pouet.net/prod.php?which=1483) (1993, [Demozoo](https://demozoo.org/productions/142/)), traces back to someone figuring out how to make the Copper do something Commodore's engineers never intended.
 This article covers the specific techniques demoscene coders developed for the Copper: classic copper bars, raster splits for multi-resolution screens, gradient shading, sine-based color cycling, and advanced tricks like copper-generated chunky pixels and mid-frame copper list swaps. For the Copper's hardware architecture and basic programming model, see [Copper](../08_graphics/copper.md) and [Copper Programming](../08_graphics/copper_programming.md).
 ```mermaid
 graph TB
    subgraph "Basic"
        CB["Copper Bars<br/>Color register writes per scanline"]
        GS["Gradient Shading<br/>Smooth color ramps"]
    end
    subgraph "Intermediate"
        RS["Raster Splits<br/>Mid-frame register changes"]
        SIN["Sine Cycling<br/>Table-driven color waves"]
    end
    subgraph "Advanced"
        CC["Copper Chunky<br/>Color register = pixel"]
        DB["Double-Buffer Lists<br/>Mid-frame copper swap"]
        SM["Self-Modifying Lists<br/>Runtime copper patching"]
    end
    CB --> RS
    GS --> SIN
    RS --> CC
    SIN --> SM
    CC --> DB
 ```
 ## Demo Screenshots
 The following screenshots from [Pouet.net](https://www.pouet.net) show copper effects in landmark demoscene productions. Each captures a single frame of effects that are typically animated at 50 Hz.
 | Screenshot | Demo | Year | Copper Technique |
 |:---:|:---:|:---:|:---|
 | ![Scoopex Megademo — copper bars](images/scoopex_megademo.gif) | [Scoopex Megademo](https://www.pouet.net/prod.php?which=5832) | 1987 | First copper bars |
 | ![Red Sector Megademo — sine scroll and copper bars](images/redsector_megademo.gif) | [Red Sector Megademo](https://www.pouet.net/prod.php?which=3119) | 1989 | Sine-scrolling text, raster splits |
 | ![Budbrain Megademo — copper bars with vectorbobs](images/budbrain_megademo.gif) | [Budbrain Megademo](https://www.pouet.net/prod.php?which=1290) | 1990 | Copper bars + vectorbobs |
 | ![Angels Copper Master — the ultimate copper showcase](images/copper_master.jpg) | [Copper Master](https://www.pouet.net/prod.php?which=3422) | 1990 | Ultimate copper showcase |
 | ![Phenomena Enigma — copper-driven effects with filled vectors](images/phenomena_enigma.png) | [Enigma](https://www.pouet.net/prod.php?which=394) | 1991 | Copper + filled vectors |
 | ![The Silents Xpose — copper bars extravaganza](images/xpose.gif) | [Xpose](https://www.pouet.net/prod.php?which=4031) | 1992 | Copper bars extravaganza |
 | ![Melon Dezign Human Target — smooth copper gradients](images/human_target.jpg) | [Human Target](https://www.pouet.net/prod.php?which=3459) | 1992 | Smooth copper gradients |
 | ![Sanity Arte — copper chunky full-screen](images/sanity_arte.gif) | [Arte](https://www.pouet.net/prod.php?which=1477) | 1993 | Copper chunky (COLOR01 per pixel) |
 | ![Kefrens Desert Dream — copper parallax and sine scroll](images/desert_dream.gif) | [Desert Dream](https://www.pouet.net/prod.php?which=1483) | 1993 | Copper parallax + sine scroll |
 | ![Polka Brothers Friday at Eight — combined copper/Blitter effects](images/friday_at_eight.jpg) | [Friday at Eight](https://www.pouet.net/prod.php?which=702) | 1995 | Combined copper/Blitter |
 ---
 ## Hardware Foundation
 ### What the Copper Can Write To
 The Copper writes to the custom chip register space (`$DFF000–$DFF1FF`). For demo effects, the most important targets are:
 | Register | Address | Effect | Used For |
 |----------|---------|--------|----------|
 | `COLOR00` | `$DFF180` | Background color | Copper bars, gradients |
 | `COLOR01`–`COLOR31` | `$DFF182–$DFF1BE` | Palette colors | Gradient fills, chunky pixels |
 | `BPLCON0` | `$DFF100` | Bitplane depth/resolution | Raster splits, resolution mixing |
 | `BPLCON1` | `$DFF102` | Horizontal scroll offset | Wave distortion, parallax |
 | `BPL1MOD`/`BPL2MOD` | `$DFF108`/`$DFF10A` | Bitplane modulo | Sine-wave distortion |
 | `BPL1PTH`–`BPL6PTH` | `$DFF0E0–$DFF0EC` | Bitplane pointers | Screen splitting, page flipping |
 | `SPRxPTH/L` | `$DFF120–$DFF13E` | Sprite pointers/position | Sprite multiplexing |
 | `DIWSTRT`/`DIWSTOP` | `$DFF08E`/`$DFF090` | Display window | Overscan, split display |
 | `DDFSTRT`/`DDFSTOP` | `$DFF092`/`$DFF094` | Data fetch window | Resolution changes |
 ### Copper Instruction Timing
 | Instruction | Words | DMA Cycles | Notes |
 |-------------|-------|------------|-------|
 | `WAIT` | 2 | 2 | Stalls until beam reaches position |
 | `MOVE` | 2 | 2 | Writes a value to a register |
 | `SKIP` (AGA) | 2 | 2 | Conditional skip of next instruction |
 | `WAIT + MOVE` | 4 | 4 | The basic unit of most effects |
 > [!IMPORTANT]
 > Each `WAIT` + `MOVE` pair costs **4 DMA slots per scanline**. The Copper gets ~226 available slots per LoRes scanline (after bitplane, sprite, and audio DMA). This means roughly **56 color register writes per scanline** maximum — the practical budget for copper effects.
 ---
 ## Technique 1: Copper Bars
 The classic Amiga demo effect. Copper bars are horizontal bands of color created by writing different values to `COLOR00` (or any color register) at each scanline. The result is a series of colored stripes across the screen.
 ### How It Works
 ```mermaid
 sequenceDiagram
    participant Beam as Video Beam
    participant Copper as Copper
    participant CR as COLOR00 $DFF180
    Note over Beam: Scanline 50
    Beam->>Copper: Beam reaches (50,0)
    Copper->>CR: MOVE #$0F00 → dark blue
    Note over Beam: Scanline 51
    Beam->>Copper: Beam reaches (51,0)
    Copper->>CR: MOVE #$0FF0 → cyan
    Note over Beam: Scanline 52
    Beam->>Copper: Beam reaches (52,0)
    Copper->>CR: MOVE #$0FFF → white
    Note over Beam: Scanline 53
    Beam->>Copper: Beam reaches (53,0)
    Copper->>CR: MOVE #$0FF0 → cyan
    Note over Beam: Scanline 54
    Beam->>Copper: Beam reaches (54,0)
    Copper->>CR: MOVE #$0F00 → dark blue
 ```
 ### Complete Copper Bar Example
 > ![Angels Copper Master — the ultimate copper bar showcase](images/copper_master.jpg)
 > 
 > *Angels' Copper Master (1990) — the definitive copper bar demonstration, showing dozens of simultaneous color bands.*
 ```asm
 ; copper_bars.asm — Classic copper bars (OCS/ECS)
 ; Assembles with vasm -m68k -Fbin -o copper.bars copper_bars.asm
 COPPER_START:
        ; ---- Wait for scanline 30 (top of display) ----
        dc.w    $801E,$FFFE           ; WAIT y=30, x=0 (encoded as 1E, mask FFFE)
        dc.w    $801E,$FFFE           ; WAIT again for strict timing
        ; ---- Bar 1: warm colors (lines 50-58) ----
        dc.w    $8032,$FFFE           ; WAIT line 50
        dc.w    $0180,$0200           ; MOVE #$0200 → COLOR00 (dark red)
        dc.w    $8033,$FFFE           ; WAIT line 51
        dc.w    $0180,$0444           ; MOVE #$0444 → COLOR00 (red)
        dc.w    $8034,$FFFE           ; WAIT line 52
        dc.w    $0180,$0F88           ; MOVE #$0F88 → COLOR00 (orange)
        dc.w    $8035,$FFFE           ; WAIT line 53
        dc.w    $0180,$0FFF           ; MOVE #$0FFF → COLOR00 (bright)
        dc.w    $8036,$FFFE           ; WAIT line 54
        dc.w    $0180,$0F88           ; MOVE #$0F88 → COLOR00 (orange)
        dc.w    $8037,$FFFE           ; WAIT line 55
        dc.w    $0180,$0444           ; MOVE #$0444 → COLOR00 (red)
        dc.w    $8038,$FFFE           ; WAIT line 56
        dc.w    $0180,$0200           ; MOVE #$0200 → COLOR00 (dark red)
        ; ---- Bar 2: cool colors (lines 80-88) ----
        dc.w    $8050,$FFFE           ; WAIT line 80
        dc.w    $0180,$0002           ; MOVE #$0002 → COLOR00 (dark blue)
        dc.w    $8051,$FFFE           ; WAIT line 81
        dc.w    $0180,$0446           ; MOVE #$0446 → COLOR00 (blue)
        dc.w    $8052,$FFFE           ; WAIT line 82
        dc.w    $0180,$088F           ; MOVE #$088F → COLOR00 (cyan)
        dc.w    $8053,$FFFE           ; WAIT line 83
        dc.w    $0180,$0FFF           ; MOVE #$0FFF → COLOR00 (white)
        dc.w    $8054,$FFFE           ; WAIT line 84
        dc.w    $0180,$088F           ; MOVE #$088F → COLOR00 (cyan)
        dc.w    $8055,$FFFE           ; WAIT line 85
        dc.w    $0180,$0446           ; MOVE #$0446 → COLOR00 (blue)
        dc.w    $8056,$FFFE           ; WAIT line 86
        dc.w    $0180,$0002           ; MOVE #$0002 → COLOR00 (dark blue)
        ; ---- Clear background ----
        dc.w    $8060,$FFFE           ; WAIT line 96
        dc.w    $0180,$0000           ; MOVE #$0000 → COLOR00 (black)
        ; ---- End of copper list ----
        dc.w    $FFFF,$FFFE           ; WAIT forever (end marker)
 ```
 ### Sine-Animated Copper Bars
 > ![The Silents Xpose — animated copper bars extravaganza](images/xpose.gif)
 > 
 > *The Silents' Xpose (1992) — animated copper bars driven by sine tables with multiple phase offsets.*
 Static bars are boring. The demoscene animates them by updating the copper list's color values each frame from a pre-calculated sine table:
 ```c
 /* sine_copper.c — Animate copper bars with sine-wave colors */
 #include <exec/types.h>
 #include <graphics/gfxbase.h>
 /* OCS color format: 0RGB, 4 bits per component */
 #define RGB(r,g,b)  ((UWORD)(((r)<<8)|((g)<<4)|(b)))
 /* Pre-calculated sine table (256 entries, 0-255 range) */
 extern const UBYTE sine_table[256];
 /* Copper bar definitions — 3 bars, each 9 scanlines */
 #define NUM_BARS     3
 #define BAR_HEIGHT   9
 #define BAR_SPACING  30
 #define FIRST_LINE   50
 /* Color gradients for each bar (symmetric: dark→bright→dark) */
 static const UWORD bar_gradient[BAR_HEIGHT] = {
    RGB(1,0,0), RGB(2,1,0), RGB(4,2,1),
    RGB(8,4,2), RGB(15,8,4),  /* peak */
    RGB(8,4,2), RGB(4,2,1), RGB(2,1,0), RGB(1,0,0)
 };
 /* Base hue offsets for each bar (r,g,b component weights) */
 static const UWORD hue_r[NUM_BARS] = { 15, 0, 0 };
 static const UWORD hue_g[NUM_BARS] = { 4, 12, 4 };
 static const UWORD hue_b[NUM_BARS] = { 0, 0, 15 };
 void update_copper_bars(UWORD *copper_ptr, ULONG frame) {
    int bar, line;
    for (bar = 0; bar < NUM_BARS; bar++) {
        /* Phase offset per bar — creates wave motion */
        int phase = (frame * 3 + bar * 85) & 0xFF;
        int brightness = sine_table[phase]; /* 0-255 */
        for (line = 0; line < BAR_HEIGHT; line++) {
            UWORD *wait_ptr = copper_ptr;      /* WAIT instruction */
            UWORD *move_ptr = copper_ptr + 2;  /* MOVE instruction */
            UWORD grad = bar_gradient[line];
            /* Scale gradient by sine brightness */
            int r = ((grad >> 8) & 0xF) * hue_r[bar] * brightness / (15 * 255);
            int g = ((grad >> 4) & 0xF) * hue_g[bar] * brightness / (15 * 255);
            int b = (grad & 0xF) * hue_b[bar] * brightness / (15 * 255);
            /* Clamp to 0-15 */
            if (r > 15) r = 15;
            if (g > 15) g = 15;
            if (b > 15) b = 15;
            move_ptr[1] = RGB(r, g, b);  /* Patch the MOVE data word */
            copper_ptr += 4;              /* Advance past WAIT+MOVE pair */
        }
    }
 }
 ```
 ---
 ## Technique 2: Raster Splits
 A raster split changes display parameters mid-frame. The most common use is a **status bar** at the top of the screen (fixed resolution) with a scrolling game area below (different bitplane pointers, scroll offset, or even resolution).
 ### Split-Screen Architecture
 > ![Budbrain Megademo — raster split with copper bars](images/budbrain_megademo.gif)
 > 
 > *Budbrain Megademo (1990) — copper bars and raster splits used to create multiple display regions.*
 ```mermaid
 graph TB
    subgraph "Frame Layout"
        TOP["Lines 0-49:<br/>Title Bar (HiRes, 4 bitplanes)"]
        GAME["Lines 50-249:<br/>Game Area (LoRes, 5 bitplanes, scrolling)"]
        BOT["Lines 250-311:<br/>Bottom Border / VBlank"]
    end
    subgraph "Copper Actions"
        C1["Copper at line 0:<br/>Set BPLCON0=$9200 (HiRes 4bpp)"]
        C2["Copper at line 50:<br/>Set BPLCON0=$D200 (LoRes 5bpp)<br/>Set BPLCON1 scroll<br/>Swap bitplane pointers"]
        C3["Copper at line 250:<br/>Kill display (DIWSTOP)"]
    end
    C1 --> TOP
    C2 --> GAME
    C3 --> BOT
 ```
 ### Split-Screen Copper List
 ```asm
 ; raster_split.asm — Status bar + scrolling game area
 ; Top 44 lines: HiRes 4-bitplane title bar
 ; Bottom 200 lines: LoRes 5-bitplane scrolling game
 COPPER_SPLIT:
        ; ---- Top section: HiRes title bar ----
        dc.w    $8001,$FFFE           ; WAIT line 1
        dc.w    $0100,$9200           ; BPLCON0: HiRes, 4 bitplanes, color on
        ; Set bitplane pointers for title bar image
        dc.w    $00E0,title_bpl1      ; BPL1PTH
        dc.w    $00E2,title_bpl1+2    ; BPL1PTL (word-aligned)
        dc.w    $00E4,title_bpl2
        dc.w    $00E6,title_bpl2+2
        dc.w    $00E8,title_bpl3
        dc.w    $00EA,title_bpl3+2
        dc.w    $00EC,title_bpl4
        dc.w    $00EE,title_bpl4+2
        dc.w    $0108,$0000           ; BPL1MOD = 0 (no modulo for HiRes)
        dc.w    $010A,$0000           ; BPL2MOD = 0
        ; ---- Split point: switch to game area ----
        dc.w    $802C,$FFFE           ; WAIT line 44
        dc.w    $0100,$D200           ; BPLCON0: LoRes, 5 bitplanes, color on
        ; Swap bitplane pointers to game bitmap
        dc.w    $00E0,game_bpl1
        dc.w    $00E2,game_bpl1+2
        dc.w    $00E4,game_bpl2
        dc.w    $00E6,game_bpl2+2
        dc.w    $00E8,game_bpl3
        dc.w    $00EA,game_bpl3+2
        dc.w    $00EC,game_bpl4
        dc.w    $00EE,game_bpl4+2
        dc.w    $00F0,game_bpl5
        dc.w    $00F2,game_bpl5+2
        dc.w    $0102,$0000           ; BPLCON1: scroll = 0 (updated per frame)
        dc.w    $0108,$0000           ; BPL1MOD (updated per frame)
        dc.w    $010A,$0000           ; BPL2MOD
        ; ---- End of display ----
        dc.w    $FFFF,$FFFE           ; WAIT forever
 ```
 ---
 ## Technique 3: Gradient Shading
 Gradient shading creates smooth color transitions — the signature "Amiga sky" effect. The technique writes `COLOR01`–`COLORxx` progressively at each scanline, creating a smooth ramp from one color to another.
 ### Linear Color Interpolation
 > ![Kefrens Desert Dream — copper-driven gradient sky](images/desert_dream.gif)
 >
 > *Kefrens' Desert Dream (1993) — smooth copper gradients creating the parallax sky effect.*
 OCS colors are 4-bit per component (0–15). To create a gradient from color A to color B over N scanlines, interpolate each component independently:
 ```c
 /* gradient.c — Compute copper list for a sky gradient */
 #define RGB(r,g,b)  ((UWORD)(((r)<<8)|((g)<<4)|(b)))
 void make_gradient_sky(UWORD *copper, int first_line, int num_lines,
                       UWORD color_top, UWORD color_bot) {
    int y;
    int r1 = (color_top >> 8) & 0xF, r2 = (color_bot >> 8) & 0xF;
    int g1 = (color_top >> 4) & 0xF, g2 = (color_bot >> 4) & 0xF;
    int b1 =  color_top       & 0xF, b2 =  color_bot       & 0xF;
    for (y = 0; y < num_lines; y++) {
        /* Linear interpolation with rounding */
        int r = r1 + ((r2 - r1) * y + (num_lines / 2)) / num_lines;
        int g = g1 + ((g2 - g1) * y + (num_lines / 2)) / num_lines;
        int b = b1 + ((b2 - b1) * y + (num_lines / 2)) / num_lines;
        /* WAIT for target line, then MOVE color */
        *copper++ = 0x8001 | ((first_line + y) << 8);  /* WAIT y */
        *copper++ = 0xFFFE;                              /* x mask */
        *copper++ = 0x0180;                              /* MOVE → COLOR00 */
        *copper++ = RGB(r, g, b);
    }
 }
 ```
 ### Typical Gradient Ramps
 | Effect | Top Color | Bottom Color | Lines | Description |
 |--------|-----------|-------------|-------|-------------|
 | Sunset sky | `$0F44` (orange) | `$0202` (dark blue) | 200 | Warm→cool transition |
 | Deep ocean | `$0448` (teal) | `$0002` (navy) | 150 | Light→dark depth |
 | Metallic bar | `$0888` (gray) | `$0FFF` (white) | 10 | Specular highlight |
 | Dawn | `$0000` (black) | `$0F80` (pink) | 100 | Night→sunrise |
 ---
 ## Technique 4: Sine-Based Color Cycling
 The demoscene rarely uses static gradients. Instead, color values are driven by sine tables with different phase offsets, creating fluid wave effects. The key insight: **use multiple sine waves with different frequencies and phases**, then combine them.
 ### Sine Table Generation
 ```c
 /* sine_gen.c — Generate a 256-entry sine table (0-255 range) */
 /* In practice, this is pre-computed at build time */
 #include <math.h>
 void generate_sine_table(UBYTE *table) {
    int i;
    for (i = 0; i < 256; i++) {
        double s = sin(2.0 * 3.14159265 * i / 256.0);
        table[i] = (UBYTE)((s + 1.0) * 127.5);  /* 0-255 range */
    }
 }
 ```
 ### Multi-Wave Color Cycling
 > ![Melon Dezign Human Target — smooth copper gradients](images/human_target.jpg)
 >
 > *Melon Dezign's Human Target (1992) — silky smooth copper gradients driven by sine waves.*
 The classic demoscene effect cycles three sine waves (one per RGB component) with different frequencies:
 ```c
 /* color_cycle.c — Animate copper bar colors with multi-wave sine */
 void animate_color_cycle(UWORD *copper_colors, int num_entries,
                         ULONG frame) {
    int i;
    for (i = 0; i < num_entries; i++) {
        /* Three sine waves: R at 1×, G at 2×, B at 3× frequency */
        int phase_r = (frame * 2 + i * 3) & 0xFF;
        int phase_g = (frame * 3 + i * 5) & 0xFF;
        int phase_b = (frame * 5 + i * 7) & 0xFF;
        int r = sine_table[phase_r] >> 4;  /* 0-15 */
        int g = sine_table[phase_g] >> 4;
        int b = sine_table[phase_b] >> 4;
        copper_colors[i] = RGB(r, g, b);
    }
 }
 ```
 > ![Phenomena Enigma — copper-driven water/plasma effect](images/phenomena_enigma.png)
 >
 > *Phenomena's Enigma (1991) — multi-wave sine copper cycling creating a water/plasma effect, combined with filled-vector rendering.*
 ### Sine Scrolling (Scrolling Sinus)
 One of the most iconic effects: a text message that scrolls across the screen in a sine wave pattern. This is achieved by changing `BPLCON1` (scroll offset) or `BPLxMOD` (modulo) per scanline:
 ```asm
 ; sine_scroll.asm — Per-scanline BPLCON1 modulation
        ; ---- In VBlank interrupt handler ----
        ; Apply sine-wave scroll offsets to copper list
        ; copper_mod_points[] points to the BPLCON1 MOVE data words
 scroll_sine_text:
        move.l  scroll_phase,d0
        addq.l  #2,d0                  ; Advance phase
        move.l  d0,scroll_phase
        lea     sine_table,a0
        lea     copper_mod_points,a1   ; Array of ptrs to MOVE data words
        move.w  #NUM_SCROLL_LINES-1,d1
 .next_line:
        move.b  (a0,d0.w),d2           ; Get sine value
        lsr.w   #4,d2                  ; Scale to 0-15 scroll range
        move.w  d2,(a1)+               ; Patch copper MOVE data word
        addq.w  #4,d0                  ; Advance phase per line
        dbra    d1,.next_line
        rts
 ```
 ---
 ## Technique 5: Double-Buffered Copper Lists
 Advanced effects swap the active copper list mid-frame. The Copper hardware reads a `COP1LC` register to know where its list starts. By changing `COP1LC` during vertical blank, or even mid-frame, you can chain multiple copper lists together:
 ```mermaid
 sequenceDiagram
    participant VBI as VBlank Interrupt
    participant CPU as 68000 CPU
    participant Copper as Copper
    participant COP1LC as COP1LC Register
    Note over VBI: Frame N starts
    VBI->>CPU: Level 3 interrupt
    CPU->>COP1LC: Set to copper_list_A
    CPU->>CPU: Animate list A colors
    Copper->>Copper: Execute list A (top half of screen)
    Note over Copper: List A ends with<br/>MOVE COP1LC → copper_list_B
    Copper->>COP1LC: Self-jump to list B
    CPU->>CPU: Animate list B colors
    Copper->>Copper: Execute list B (bottom half)
 ```
 ### Copper List Chaining
 ```asm
 ; double_copper.asm — Chain two copper lists in one frame
 LIST_A:
        ; First half of screen: copper bars
        dc.w    $8032,$FFFE           ; WAIT line 50
        dc.w    $0180,$0F00           ; COLOR00 = blue
        ; ... more bars ...
        dc.w    $8080,$FFFE           ; WAIT line 128
        ; Chain to LIST_B: write COP1LCH and COP1LCL
        dc.w    $0080,LIST_B>>16      ; COP1LCH = high word of LIST_B
        dc.w    $0082,LIST_B&$FFFF    ; COP1LCL = low word
        dc.w    $0088,0                ; COPJMP1 — trigger jump (strobe)
        dc.w    $FFFF,$FFFE           ; Safety WAIT
 LIST_B:
        ; Second half: different effects
        dc.w    $8080,$FFFE           ; WAIT line 128
        dc.w    $0180,$000F           ; COLOR00 = red
        ; ... more effects ...
        dc.w    $FFFF,$FFFE           ; End
 ```
 ---
 ## Technique 6: Self-Modifying Copper Lists
 Rather than pre-building the entire copper list, the CPU patches it in real-time during vertical blank. This is how most demoscene effects work — the copper list is a template with placeholder values that get overwritten each frame:
 ```c
 /* smc_copper.c — Self-modifying copper list for animated effects */
 /* Copper list with placeholders (marked 0xDEAD) */
 static UWORD copper_template[] = {
    /* Bar 1: WAIT + MOVE pairs */
    0x8032, 0xFFFE,  0x0180, 0xDEAD,   /* line 50, COLOR00 = ??? */
    0x8033, 0xFFFE,  0x0180, 0xDEAD,   /* line 51 */
    0x8034, 0xFFFE,  0x0180, 0xDEAD,   /* line 52 */
    0x8035, 0xFFFE,  0x0180, 0xDEAD,   /* line 53 */
    0x8036, 0xFFFE,  0x0180, 0xDEAD,   /* line 54 */
    /* Bar 2 */
    0x8050, 0xFFFE,  0x0180, 0xDEAD,   /* line 80 */
    0x8051, 0xFFFE,  0x0180, 0xDEAD,   /* line 81 */
    0x8052, 0xFFFE,  0x0180, 0xDEAD,   /* line 82 */
    0x8053, 0xFFFE,  0x0180, 0xDEAD,   /* line 83 */
    0x8054, 0xFFFE,  0x0180, 0xDEAD,   /* line 84 */
    0xFFFF, 0xFFFE                       /* End */
 };
 /* Indices of color data words (every 4th word starting at offset 3) */
 #define BAR1_START  3   /* Index of first color word for bar 1 */
 #define BAR2_START  13  /* Index of first color word for bar 2 */
 #define BAR_LEN     5   /* Entries per bar */
 void patch_copper_bars(ULONG frame) {
    int i;
    /* Animate bar 1 with sine wave */
    for (i = 0; i < BAR_LEN; i++) {
        int phase = (frame * 4 + i * 20) & 0xFF;
        int bright = sine_table[phase] >> 4;  /* 0-15 */
        copper_template[BAR1_START + i * 4] = RGB(bright, bright/2, 0);
    }
    /* Animate bar 2 with different phase */
    for (i = 0; i < BAR_LEN; i++) {
        int phase = (frame * 6 + i * 25 + 128) & 0xFF;
        int bright = sine_table[phase] >> 4;
        copper_template[BAR2_START + i * 4] = RGB(0, bright/2, bright);
    }
 }
 ```
 ---
 ## Antipatterns
 ### 1. The Copper Overflow
 Writing too many color registers per scanline. The Copper has limited DMA bandwidth — each `WAIT` + `MOVE` pair costs 4 slots. On a scanline with heavy bitplane DMA (6 planes HiRes), there may be fewer than 20 slots available.
 **Broken:**
 ```asm
 ; 32 color register writes on one scanline — WILL FAIL
 ; with 5+ bitplanes active (DMA starvation)
 dc.w    $8032,$FFFE
 dc.w    $0180,$0F00   ; COLOR00
 dc.w    $0182,$0F00   ; COLOR01
 dc.w    $0184,$0F00   ; COLOR02
 ; ... 29 more COLOR writes ...
 ```
 **Fixed:**
 ```asm
 ; Spread color writes across 2-3 scanlines
 dc.w    $8032,$FFFE
 dc.w    $0180,$0F00   ; COLOR00
 dc.w    $0182,$0F00   ; COLOR01
 dc.w    $0184,$0F00   ; COLOR02
 ; ... up to ~10 more is safe with 4 planes LoRes ...
 dc.w    $8034,$FFFE   ; Two lines later
 dc.w    $0186,$0F00   ; COLOR03
 dc.w    $0188,$0F00   ; COLOR04
 ; ... continue on next scanline ...
 ```
 ### 2. The Stale Copper List
 Forgetting to update the copper list pointer (`COP1LC`) after modifying the list in RAM. The Copper may have already fetched and cached the old instructions.
 **Broken:**
 ```c
 /* Modify copper list in RAM but don't tell Copper */
 copper_list[offset] = new_color;
 /* Copper still reads cached/stale data! */
 ```
 **Fixed:**
 ```c
 copper_list[offset] = new_color;
 /* In VBlank: reload copper pointer to flush cache */
 custom.cop1lc = (ULONG)copper_list;
 /* Or use COPJMP1 strobe to force immediate reload */
 ```
 ### 3. The Over-Scanned WAIT
 Setting a WAIT position beyond the visible display area. PAL has 313 scanlines (0–312), NTSC has 263 (0–262). A WAIT for line 313 on PAL wraps incorrectly; on NTSC, anything past line 262 never triggers.
 **Broken:**
 ```asm
 ; Assumes PAL — breaks on NTSC machines
 dc.w    $8138,$FFFE   ; WAIT line 312 (PAL only)
 dc.w    $0180,$0000   ; Clear color
 ```
 **Fixed:**
 ```asm
 ; Use a safe VBlank wait that works on both PAL and NTSC
 dc.w    $FFFF,$FFFE   ; WAIT $FF,$FF — waits forever (end of list)
 ; Reset at VBlank via interrupt handler instead
 ```
 ### 4. The Register Collision
 Writing to a register that the CPU or Blitter is also modifying in the same frame. The Copper runs asynchronously — it can clobber a value the CPU just set.
 **Broken:**
 ```c
 /* CPU sets COLOR01 for game object highlighting */
 custom.color[1] = 0x0FFF;
 /* But the copper list also writes COLOR01 at line 100 */
 /* → Copper overwrites the CPU's value */
 ```
 **Fixed:**
 ```c
 /* Reserve specific color registers for CPU and others for Copper */
 /* CPU uses COLOR00-COLOR03, Copper uses COLOR04-COLOR31 */
 custom.color[4] = copper_animated_color;  /* Copper-safe */
 ```
 ### 5. The AGA Position Bug
 On AGA, the Copper's horizontal position resolution doubles to 8 bits (`$DFF004` BEAMCON0 changes). Using OCS-style horizontal WAIT values produces incorrect timing on AGA hardware. The `BPC` bit in `FMODE` ($DFF1FC) controls whether Copper positions are interpreted as low-res or high-res clock cycles.
 **Broken:**
 ```asm
 ; OCS copper list used directly on AGA — horizontal timing off
 dc.w    $8007,$FFFE   ; WAIT x=7 on OCS, but AGA reads x=3.5
 ```
 **Fixed:**
 ```asm
 ; Set FMODE.BPC=0 for OCS-compatible copper timing
 ; before activating the copper list
 dc.w    $01FC,$0000   ; FMODE = 0 (OCS compatibility)
 ; Or double all horizontal positions for AGA-native mode
 dc.w    $800E,$FFFE   ; WAIT x=14 (AGA: same as x=7 OCS)
 ```
 ---
 ## Decision Guide
 ```mermaid
 flowchart TD
    START[Need visual effect] --> Q1{Color only or<br/>structural change?}
    Q1 -->|Color only| Q2{Smooth gradient<br/>or discrete bars?}
    Q1 -->|Structural| SPLIT[Raster Split:<br/>Change BPLCON0/pointers]
    Q2 -->|Gradient| GRAD[Gradient Shading:<br/>Interpolated COLOR writes]
    Q2 -->|Bars| Q3{Animated or static?}
    Q3 -->|Static| STATIC[Static Copper Bars:<br/>Fixed WAIT+MOVE pairs]
    Q3 -->|Animated| Q4{Wave motion<br/>or pulse?}
    Q4 -->|Wave| SINE[Sine Cycling:<br/>Table-driven per-frame patch]
    Q4 -->|Pulse| PULSE[Brightness modulation:<br/>Single sine per bar]
    SPLIT --> Q5{Same resolution<br/>both halves?}
    Q5 -->|Yes| SIMPLE[Simple split:<br/>Swap BPL pointers only]
    Q5 -->|No| FULL[Full split:<br/>Change BPLCON0 + pointers + modulo]
    STYLE_STATIC --> BUDGET{DMA budget<br/>check}
    GRAD --> BUDGET
    SINE --> BUDGET
    BUDGET -->|Pass| OK[Proceed]
    BUDGET -->|Fail| REDUCE[Reduce color writes<br/>per scanline or<br/>reduce bitplane depth]
 ```
 ---
 ## Performance Characteristics
 ### DMA Budget Per Scanline
 | Bitplane Depth | Bitplane DMA Slots | Available for Copper | Max Color Writes |
 |---------------|--------------------|---------------------|-----------------|
 | 1 plane LoRes | 2 | ~220 | ~55 |
 | 2 planes LoRes | 4 | ~218 | ~54 |
 | 4 planes LoRes | 8 | ~214 | ~53 |
 | 5 planes LoRes | 10 | ~212 | ~53 |
 | 6 planes LoRes | 12 | ~210 | ~52 |
 | 4 planes HiRes | 16 | ~206 | ~51 |
 | 6 planes HiRes | 24 | ~198 | ~49 |
 > [!NOTE]
 > Each color register write requires 1 WAIT + 1 MOVE = 4 slots. The "Max Color Writes" column assumes one write per scanline with a WAIT at the start. Consecutive writes without WAIT only cost 2 slots each (MOVE only), but the Copper must WAIT at least once to synchronize.
 ---
 ## Historical Timeline
 ```mermaid
 timeline
    title Copper Effects Evolution
    1987 : [Scoopex Megademo](https://www.pouet.net/prod.php?which=5832) — first copper bars demo
    1988 : [Red Sector Megademo](https://www.pouet.net/prod.php?which=3119) — sine-scrolling text wave
    1989 : [Budbrain Megademo](https://www.pouet.net/prod.php?which=1290) — copper bars + vectorbobs
    1990 : [Sanity Arte](https://www.pouet.net/prod.php?which=1477) — copper chunky (color register per pixel)
         : [Angels Copper Master](https://www.pouet.net/prod.php?which=3422) — ultimate copper showcase
    1991 : [Phenomena Enigma](https://www.pouet.net/prod.php?which=394) — copper-driven effects + filled vectors
    1992 : [The Silents Xpose](https://www.pouet.net/prod.php?which=4031) — copper bars extravaganza
         : [Melon Dezign Human Target](https://www.pouet.net/prod.php?which=3459) — smooth copper gradients
    1993 : [Kefrens Desert Dream](https://www.pouet.net/prod.php?which=1483) — copper-driven parallax + sine scroll
    1995 : [Polka Brothers Friday at Eight](https://www.pouet.net/prod.php?which=702) — combined copper/Blt effects
    2000+ : Demoscene continues — WHDLoad copper effects
          : MiSTer FPGA preserves exact copper timing
 ```
 ---
 ## Modern Analogies
 | Amiga Copper Concept | Modern Equivalent | Why It Maps |
 |---------------------|-------------------|-------------|
 | Copper bar (per-line color write) | Fragment shader per-scanline uniform | Both change output color based on Y position |
 | Raster split | Render pass boundary | Both change rendering state at a specific point |
 | Gradient shading | Vertex color gradient | Both interpolate colors across the screen |
 | Sine color cycling | Animated uniform / time-based shader | Both drive color from a function of time |
 | Self-modifying copper list | Dynamic command buffer generation | Both generate GPU commands per frame |
 | Copper list chaining | Multiple render passes | Both execute separate command sequences sequentially |
 | WAIT instruction | GPU pipeline barrier / fence | Both synchronize to display timing |
 | DMA slot budget | Memory bandwidth budget | Both limited by available bus cycles per scanline |
 ---
 ## Use Cases
 | Use Case | Technique | Examples |
 |----------|-----------|----------|
 | Demo background | Copper bars, sine cycling | [Copper Master](https://www.pouet.net/prod.php?which=3422), [Xpose](https://www.pouet.net/prod.php?which=4031) |
 | Game status bar | Raster split | Turrican, Lionheart, Risky Woods |
 | Sky/terrain gradient | Gradient shading | Shadow of the Beast, [Agony](https://amiga.lychesis.net/games/Agony.html) |
 | Scrolling sine text | BPLCON1/BPLMOD modulation | [Red Sector Megademo](https://www.pouet.net/prod.php?which=3119), [Desert Dream](https://www.pouet.net/prod.php?which=1483) |
 | Multi-resolution display | Raster split with BPLCON0 change | Many games (title HiRes, game LoRes) |
 | Water/plasma effect | Multi-wave sine cycling | [Enigma](https://www.pouet.net/prod.php?which=394), numerous cracktros |
 | Metallic logo shine | Fast gradient sweep across logo | [Human Target](https://www.pouet.net/prod.php?which=3459), [Arte](https://www.pouet.net/prod.php?which=1477) |
 | Full-screen copper effect | Copper chunky (see [pixel_tricks.md](pixel_tricks.md)) | [Sanity Arte](https://www.pouet.net/prod.php?which=1477) |
 ---
 ## FPGA / Emulation Impact
 Copper effects are among the most timing-sensitive code on the Amiga. Accurate emulation requires:
 | Concern | Impact | FPGA Notes |
 |---------|--------|------------|
 | **Cycle-accurate WAIT** | Copper must stall until exact beam position | Minimig/MiSTer implement beam counter compare at cycle granularity |
 | **DMA slot allocation** | Copper slots must be reserved correctly after bitplane/sprite DMA | Bus arbiter must interleave correctly |
 | **Register write latency** | Copper writes are visible next cycle | Write buffer must not add latency |
 | **COPJMP strobes** | List jumps must take effect at exact position | State machine must handle strobe timing |
 | **AGA 8-bit horizontal** | FMODE.BPC changes position interpretation | Must track FMODE state at copper fetch time |
 | **Self-modifying code** | CPU writes to copper list must be visible to Copper DMA | Requires cache coherency between CPU writes and DMA reads |
 > [!WARNING]
 > Many WHDLoad patches fix games that relied on specific Copper timing on real hardware. Emulators like WinUAE have "copper timing" settings (exact/default/fast) because some demos only work with specific timing models.
 ---
 ## FAQ
 **Q: How many color changes can I do per scanline?**
 A: On a stock A500 with 4 bitplanes LoRes, approximately 53 WAIT+MOVE pairs per scanline. With 6 bitplanes HiRes, it drops to ~49. Each additional consecutive MOVE (without WAIT) adds 2 slots instead of 4.
 **Q: Can the Copper read registers?**
 A: No. The Copper has no read capability. It can only WAIT for a beam position and MOVE (write) a value to a register. This is why self-modifying copper lists are done by the CPU writing to Chip RAM — the Copper itself cannot inspect register values.
 **Q: What is copper chunky and why is it impressive?**
 A: Copper chunky uses the Copper to write `COLOR01` at every pixel position across a scanline, creating a chunky-pixel display without any bitplanes at all. It requires extremely precise timing and works only at low resolution. The technique was most famously used in [Sanity's Arte](https://www.pouet.net/prod.php?which=1477) (1993):
 > ![Sanity Arte — copper chunky full-screen rendering](images/sanity_arte.gif)
 >
 > *Sanity's Arte (1993) — full-screen copper chunky: every pixel is a COLOR01 write, no bitplanes used.*
 See [pixel_tricks.md](pixel_tricks.md) for the full technique.
 **Q: Do copper effects work on AGA?**
 A: Yes, with caveats. AGA adds an 8-bit horizontal position (vs OCS 7-bit), controlled by the `BPC` bit in `FMODE`. AGA also has 256-color registers (`COLOR00`–`COLOR255`) instead of 32, allowing much more complex copper effects. However, the higher bandwidth of AGA bitplane DMA leaves fewer slots for the Copper.
 **Q: Can I use copper effects from AmigaOS?**
 A: Yes, via `UCopList` — the user copper list attached to a `ViewPort`. Intuition merges your copper instructions with its own. See [Copper Programming](../08_graphics/copper_programming.md) for the OS-friendly approach. For full copper control (demos), you take over the hardware directly.
 **Q: What happens if the Copper runs past the end of a scanline before finishing?**
 A: The Copper simply continues executing on the next scanline. There is no error or trap. The WAIT instruction's purpose is to synchronize — if you don't WAIT, the Copper runs as fast as DMA allows. Effects that don't need per-line synchronization can skip WAITs entirely.
 ---
 ## References
 ### Related Knowledge Base Articles
 - [Copper](../08_graphics/copper.md) — Copper coprocessor hardware: instruction format, UCopList
 - [Copper Programming](../08_graphics/copper_programming.md) — Building copper lists, gradients, raster effects
 - [Pixel Conversion](../08_graphics/pixel_conversion.md) — Copper chunky technique, C2P algorithms
 - [Sprites](../08_graphics/sprites.md) — Sprite multiplexing (Copper repositions sprites)
 - [Video Timing](../01_hardware/common/video_timing.md) — Scanline anatomy, beam counters
 - [DMA Architecture](../01_hardware/common/dma_architecture.md) — DMA slot allocation, bus arbitration
 - [Pixel Tricks](pixel_tricks.md) — Copper chunky deep dive
 ### External Resources
 - **Copper Demon** (technik) — Copper bar tutorial with source code
 - **Amiga Hardware Reference Manual** — Chapter 6: Copper coprocessor
 - **Amiga Graphics Archive** — https://amiga.lychesis.net/specials/Copper.html — Forensic analysis of copper usage in commercial games (Agony, Bio Challenge, Starray, Wings of Death)
 - **Scoopex Amiga Hardware Programming** (Photon) — [YouTube playlist](https://www.youtube.com/playlist?list=PLc3ltHgmiidpK-s0eP5hTKJnjdTHz0_bW) — Video walkthroughs of copper bars, raster splits, and sine effects in 68k assembly. Companion articles: [coppershade.org](http://coppershade.org/articles/)
 - **Pouet.net** — https://www.pouet.net — Demo database with source code links
 - **Demozoo** — https://demozoo.org — Demoscene production encyclopedia
 ### Notable Demos
 | Demo | Group | Year | Key Copper Technique | Link |
 |------|-------|------|---------------------|------|
 | Megademo | Scoopex | 1987 | First copper bars | [Pouet](https://www.pouet.net/prod.php?which=5832) |
 | Megademo | Red Sector Inc. | 1989 | Sine scroll, raster splits | [Pouet](https://www.pouet.net/prod.php?which=3119) |
 | Budbrain Megademo | Budbrain | 1990 | Copper bars + vectorbobs | [Pouet](https://www.pouet.net/prod.php?which=1290) |
 | Copper Master | Angels | 1990 | Ultimate copper showcase | [Pouet](https://www.pouet.net/prod.php?which=3422) |
 | Enigma | Phenomena | 1991 | Copper + filled vectors | [Pouet](https://www.pouet.net/prod.php?which=394) |
 | Xpose | The Silents | 1992 | Copper bars extravaganza | [Pouet](https://www.pouet.net/prod.php?which=4031) |
 | Human Target | Melon Dezign | 1992 | Smooth copper gradients | [Pouet](https://www.pouet.net/prod.php?which=3459) |
 | Arte | Sanity | 1993 | Copper chunky full-screen | [Pouet](https://www.pouet.net/prod.php?which=1477) |
 | Desert Dream | Kefrens | 1993 | Copper parallax + sine | [Pouet](https://www.pouet.net/prod.php?which=1483) \| [Demozoo](https://demozoo.org/productions/142/) |
 | Friday at Eight | Polka Brothers | 1995 | Combined copper/Blt | [Pouet](https://www.pouet.net/prod.php?which=702) |
--- a/17_demoscene/images/budbrain_megademo.gif
+++ b/17_demoscene/images/budbrain_megademo.gif
--- a/17_demoscene/images/copper_master.jpg
+++ b/17_demoscene/images/copper_master.jpg
--- a/17_demoscene/images/desert_dream.gif
+++ b/17_demoscene/images/desert_dream.gif
--- a/17_demoscene/images/friday_at_eight.jpg
+++ b/17_demoscene/images/friday_at_eight.jpg
--- a/17_demoscene/images/human_target.jpg
+++ b/17_demoscene/images/human_target.jpg
--- a/17_demoscene/images/phenomena_enigma.png
+++ b/17_demoscene/images/phenomena_enigma.png
--- a/17_demoscene/images/redsector_megademo.gif
+++ b/17_demoscene/images/redsector_megademo.gif
--- a/17_demoscene/images/sanity_arte.gif
+++ b/17_demoscene/images/sanity_arte.gif
--- a/17_demoscene/images/scoopex_megademo.gif
+++ b/17_demoscene/images/scoopex_megademo.gif
--- a/17_demoscene/images/xpose.gif
+++ b/17_demoscene/images/xpose.gif
--- a/17_demoscene/pixel_tricks.md
+++ b/17_demoscene/pixel_tricks.md
@ -0,0 +1,668 @@
 [← Home](../README.md) · [Demoscene Techniques](README.md)
 # Pixel Tricks — Copper Chunky, HAM Art, Scroll Register Abuse, and Modulo Wrapping
 ## Overview
 The Amiga's display hardware was designed for planar bitplane graphics — a memory-efficient scheme that matched the DMA streaming pattern perfectly. But the demoscene never accepted "designed for" as a limitation. From 1989 onward, demoscene coders systematically abused every display register to create effects that Commodore's engineers never imagined: **copper chunky** (no bitplanes at all, just color register writes), **HAM art** (photorealistic images in 4096 colors), **scroll-register distortion** (sine-wave text), and **modulo wrapping** (infinite scrolling surfaces).
 These techniques share a common thread: they treat the display registers themselves as the primary rendering surface, not the bitplane memory. The Copper, scroll registers, and modulo values become the "pixels" — a fundamental inversion of the intended programming model.
 ```mermaid
 graph TB
    subgraph "Bitplane Alternatives"
        CC["Copper Chunky<br/>Color register = pixel"]
        HAM["HAM-6/HAM-8 Art<br/>Delta-encoded photorealism"]
    end
    subgraph "Register Abuse"
        SCROLL["Scroll Distortion<br/>BPLCON1 sine waves"]
        MOD["Modulo Wrapping<br/>BPLMOD virtual surfaces"]
        DDF["DDFSTRT/DDFSTOP<br/>Per-line resolution changes"]
    end
    subgraph "Hybrid"
        PLASMA["Plasma Effects<br/>Copper + precalc palette"]
        SHADER["Copper Shader<br/>Per-line pseudo-rendering"]
    end
    CC --> PLASMA
    HAM --> SHADER
    SCROLL --> MOD
    MOD --> DDF
    PLASMA --> SHADER
 ```
 ---
 ## Technique 1: Copper Chunky
 The most extreme display hack on the Amiga. Copper chunky creates a pixel display **without any bitplanes at all**. The Copper writes to a single color register (`COLOR01`) at every pixel position across each scanline. The result is a "chunky pixel" display where each pixel's color is set directly by the Copper — no bitplane memory, no Blitter, no CPU rendering.
 ### Why It Works
 ```mermaid
 sequenceDiagram
    participant Copper
    participant COLOR01 as COLOR01 $DFF182
    participant Denise as Denise DAC
    Note over Copper: Scanline Y=100, pixel positions:
    Copper->>COLOR01: MOVE #$0F00 (blue) at x=0
    COLOR01->>Denise: Output blue pixel
    Copper->>COLOR01: MOVE #$00F0 (green) at x=2
    COLOR01->>Denise: Output green pixel
    Copper->>COLOR01: MOVE #$0FFF (white) at x=4
    COLOR01->>Denise: Output white pixel
    Copper->>COLOR01: MOVE #$0F00 (blue) at x=6
    COLOR01->>Denise: Output blue pixel
    Note over Copper: ...continues across entire line
 ```
 ### Resolution and Bandwidth
 Copper chunky resolution is limited by how many MOVE instructions fit per scanline:
 | Mode | Pixels/Line | Colors | DMA Budget | Notes |
 |------|------------|--------|------------|-------|
 | LoRes (1×) | ~56 | Any of 4096 | WAIT + MOVE = 4 slots/pixel | Sanity "Arte" style |
 | LoRes (2×) | ~112 | Any of 4096 | MOVE only = 2 slots/pixel | No WAIT, consecutive moves |
 | HiRes (1×) | ~28 | Any of 4096 | More DMA used by bitplanes | Rarely used |
 ### Copper Chunky Template
 ```asm
 ; copper_chunky.asm — Minimal copper chunky (56 pixels wide, 1 line)
 ; No bitplanes enabled — display comes entirely from COLOR01 writes
 COPPER_CHUNKY:
        ; Disable all bitplanes, enable color
        dc.w    $0100,$0200           ; BPLCON0: 0 bitplanes, color on
        ; Wait for display area start
        dc.w    $802C,$FFFE           ; WAIT line 44
        ; Write COLOR01 at each pixel position
        ; Each MOVE takes 2 DMA slots; ~56 pixels per LoRes line
        dc.w    $0182,$0F00           ; COLOR01 = blue (pixel 0)
        dc.w    $0182,$0FF0           ; COLOR01 = cyan (pixel 1)
        dc.w    $0182,$0FFF           ; COLOR01 = white (pixel 2)
        dc.w    $0182,$0F0F           ; COLOR01 = magenta (pixel 3)
        dc.w    $0182,$0FF0           ; COLOR01 = cyan (pixel 4)
        dc.w    $0182,$0F00           ; COLOR01 = blue (pixel 5)
        ; ... repeat for ~56 pixels total ...
        ; End of line — black for rest of frame
        dc.w    $8030,$FFFE           ; WAIT next line
        dc.w    $0182,$0000           ; COLOR01 = black
        dc.w    $FFFF,$FFFE           ; End
 ```
 ### Full Copper Chunky Frame
 For a full-screen effect, the CPU pre-calculates the copper list each frame, writing color values for every visible pixel:
 ```c
 /* copper_chunky.c — Generate full copper chunky frame */
 /* Resolution: ~56 pixels × 200 lines = 11,200 "pixels" */
 #define CHUNKY_WIDTH  56
 #define CHUNKY_HEIGHT 200
 #define FIRST_LINE    44
 /* The copper list is an array of UWORDs */
 /* Each pixel = 1 MOVE instruction = 2 UWORDs (register, value) */
 /* Each line = 1 WAIT (2 UWORDs) + 56 MOVEs (112 UWORDs) = 114 UWORDs */
 #define LINE_WORDS  (2 + CHUNKY_WIDTH * 2)
 #define TOTAL_WORDS (LINE_WORDS * CHUNKY_HEIGHT + 4)  /* +4 for end marker */
 static UWORD copper_list[TOTAL_WORDS];
 void generate_chunky_frame(const UBYTE *pixel_data, ULONG frame) {
    int y, x;
    UWORD *cop = copper_list;
    for (y = 0; y < CHUNKY_HEIGHT; y++) {
        /* WAIT for this scanline */
        *cop++ = 0x8001 | (((FIRST_LINE + y) & 0xFF) << 8);
        *cop++ = 0xFFFE;
        /* Write each pixel's color */
        for (x = 0; x < CHUNKY_WIDTH; x++) {
            *cop++ = 0x0182;  /* Register: COLOR01 */
            *cop++ = rgb_palette[pixel_data[y * CHUNKY_WIDTH + x]];
        }
    }
    /* End marker */
    *cop++ = 0xFFFF;
    *cop++ = 0xFFFE;
 }
 ```
 > [!WARNING]
 > Copper chunky requires **disabling all bitplane DMA** — you cannot display bitplane graphics alongside copper chunky pixels. The technique is mutually exclusive with normal display rendering.
 ---
 ## Technique 2: HAM Art
 Hold-And-Modify (HAM) mode gives the Amiga 4,096 on-screen colors from only 6 bitplanes. Instead of direct color lookup, HAM encodes most pixels as **deltas** — modifications to the previous pixel's color. This makes HAM ideal for photorealistic images but nearly useless for animation.
 ### HAM-6 Encoding (OCS/ECS)
 | Bit 5 | Bit 4 | Bits 3-0 | Meaning |
 |-------|-------|----------|---------|
 | 0 | 0 | color index | Use palette[color index] (64 palette entries) |
 | 0 | 1 | blue delta | Modify blue component of previous pixel |
 | 1 | 0 | red delta | Modify red component of previous pixel |
 | 1 | 1 | green delta | Modify green component of previous pixel |
 ### HAM-8 Encoding (AGA)
 | Bits 7-6 | Bits 5-0 | Meaning |
 |-----------|----------|---------|
 | 00 | palette index | Use palette[index] (64 entries) |
 | 01 | blue value | Set blue to 6-bit value, keep R,G |
 | 10 | red value | Set red to 6-bit value, keep G,B |
 | 11 | green value | Set green to 6-bit value, keep R,B |
 ### HAM Art Technique
 HAM art for demos works by pre-rendering images in HAM mode, then displaying them as static or slowly-animated screens. The trick is managing the delta encoding to minimize color fringing:
 ```c
 /* ham_render.c — Render a pre-calculated image to HAM-6 bitmap */
 /* OCS HAM: 6 bitplanes, 4 bits per component, 64 palette + 3 modify modes */
 /* Total on-screen colors: 4,096 (12-bit RGB space) */
 #define HAM_SET_COLOR   0  /* Use palette entry (bits 3-0 = index 0-15) */
 #define HAM_MOD_BLUE    1  /* Modify blue (bits 3-0 = new blue 0-15) */
 #define HAM_MOD_RED     2  /* Modify red (bits 3-0 = new red 0-15) */
 #define HAM_MOD_GREEN   3  /* Modify green (bits 3-0 = new green 0-15) */
 /* Convert a 12-bit RGB pixel to the best HAM encoding
   given the previous pixel's color */
 UWORD rgb_to_ham(UWORD target_rgb, UWORD prev_rgb) {
    int tr = (target_rgb >> 8) & 0xF;
    int tg = (target_rgb >> 4) & 0xF;
    int tb =  target_rgb       & 0xF;
    int pr = (prev_rgb >> 8) & 0xF;
    int pg = (prev_rgb >> 4) & 0xF;
    int pb =  prev_rgb       & 0xF;
    int err_r = abs(tr - pr);
    int err_g = abs(tg - pg);
    int err_b = abs(tb - pb);
    /* If exact match in palette → use SET mode */
    /* (simplified: check against 16 base colors) */
    if (err_r == 0 && err_g == 0 && err_b == 0) {
        /* Exact match — encode as palette reference */
        return (HAM_SET_COLOR << 4) | 0;  /* Palette entry 0 */
    }
    /* Choose the component with largest error to modify */
    if (err_b >= err_r && err_b >= err_g) {
        return (HAM_MOD_BLUE << 4) | tb;
    } else if (err_r >= err_g) {
        return (HAM_MOD_RED << 4) | tr;
    } else {
        return (HAM_MOD_GREEN << 4) | tg;
    }
 }
 ```
 ### HAM Limitations
 | Limitation | Impact | Workaround |
 |-----------|--------|-----------|
 | **Fringing** | Color deltas only change 1 component per pixel | Pre-calculate optimal delta sequences |
 | **Slow rendering** | Each pixel depends on previous pixel state | Use Blitter for fast HAM blits |
 | **No random access** | Can't set arbitrary pixel without context | Pre-render entire scanlines |
 | **Limited animation** | Moving objects create fringing artifacts | Reserve palette entries for sprites/objects |
 | **Display artifacts** | Vertical color bleeding from delta chains | Reset color at start of each scanline |
 ---
 ## Technique 3: Scroll Register Distortion
 `BPLCON1` ($DFF102) controls horizontal scroll offset for the playfield. By changing it per scanline via the Copper, you create wave distortion effects — the classic "sine scrolling text" that defined the Amiga demo aesthetic.
 ### How Scroll Distortion Works
 ```mermaid
 graph LR
    subgraph "Normal Display"
        N["Line 0: offset=0<br/>Line 1: offset=0<br/>Line 2: offset=0<br/>Line 3: offset=0"]
    end
    subgraph "Sine Distorted"
        S["Line 0: offset=0<br/>Line 1: offset=3<br/>Line 2: offset=7<br/>Line 3: offset=4<br/>Line 4: offset=0"]
    end
    N -->|"Copper writes<br/>BPLCON1 per line"| S
 ```
 ### Sine Scroll Implementation
 ```c
 /* sine_scroll.c — Generate copper list for sine wave scroll */
 #define DISPLAY_LINES 200
 #define FIRST_LINE    44
 #define SCROLL_WIDTH  16  /* Max scroll offset (0-15 for LoRes) */
 void generate_sine_scroll(UWORD *copper, const UBYTE *sine_table,
                          ULONG phase) {
    int y;
    for (y = 0; y < DISPLAY_LINES; y++) {
        int offset;
        /* Get sine value for this line */
        int sine_idx = (phase + y * 4) & 0xFF;
        offset = sine_table[sine_idx] >> 4;  /* 0-15 */
        /* WAIT for this scanline */
        *copper++ = 0x8001 | (((FIRST_LINE + y) & 0xFF) << 8);
        *copper++ = 0xFFFE;
        /* MOVE scroll offset to BPLCON1 */
        *copper++ = 0x0102;                /* BPLCON1 register */
        *copper++ = (offset << 4) | offset; /* Even/odd playfield same */
    }
    /* End marker */
    *copper++ = 0xFFFF;
    *copper++ = 0xFFFE;
 }
 ```
 ---
 ## Technique 4: Modulo Wrapping
 `BPL1MOD` and `BPL2MOD` ($DFF108/$DFF10A) define the byte offset added to each bitplane's data pointer at the end of a scanline. Normally this compensates for interleaving. By setting the modulo to unusual values, you create **wrapping effects** — the bitmap appears to fold, repeat, or scroll infinitely.
 ### How Modulo Wrapping Works
 ```
 Normal display (320px wide, 40 bytes/line):
  Line 0: address 0
  Line 1: address 40    (modulo = 40)
  Line 2: address 80
 With modulo = -40 (wraps back 40 bytes per line):
  Line 0: address 0     → displays data[0..39]
  Line 1: address -40   → wraps to end of bitmap!
  Line 2: address -80   → wraps further back!
 With modulo = 20 (half-width):
  Line 0: address 0     → displays data[0..39]
  Line 1: address 20    → offset by 20 bytes = half-line shift
  Line 2: address 40
 ```
 ### Tunnel Effect via Modulo
 ```c
 /* modulo_tunnel.c — Create a tunnel effect using modulo wrapping */
 void setup_modulo_tunnel(ULONG frame) {
    int y;
    UWORD *cop = copper_list;
    for (y = 0; y < 200; y++) {
        /* Distance from center creates tunnel perspective */
        int dist = 100 - abs(y - 100);  /* 0 at top/bottom, 100 at center */
        int modulo = dist * 2;          /* Tighter wrapping at center */
        /* WAIT for line */
        *cop++ = 0x8001 | (((FIRST_LINE + y) & 0xFF) << 8);
        *cop++ = 0xFFFE;
        /* Set modulo per line */
        *cop++ = 0x0108;                    /* BPL1MOD */
        *cop++ = (UWORD)(short)modulo;      /* Signed modulo value */
        *cop++ = 0x010A;                    /* BPL2MOD */
        *cop++ = (UWORD)(short)modulo;
    }
 }
 ```
 ### Scroll Register + Modulo Combined
 The most impressive effects combine scroll offset and modulo changes per scanline:
 ```asm
 ; combined_distort.asm — Per-line scroll + modulo for wave effect
        ; Line 100: normal
        dc.w    $8064,$FFFE           ; WAIT line 100
        dc.w    $0102,$0000           ; BPLCON1 scroll = 0
        dc.w    $0108,$0028           ; BPL1MOD = 40 (normal)
        ; Line 101: slight wave
        dc.w    $8065,$FFFE           ; WAIT line 101
        dc.w    $0102,$0030           ; BPLCON1 scroll = 3
        dc.w    $0108,$0026           ; BPL1MOD = 38 (slightly tighter)
        ; Line 102: more wave
        dc.w    $8066,$FFFE           ; WAIT line 102
        dc.w    $0102,$0070           ; BPLCON1 scroll = 7
        dc.w    $0108,$0020           ; BPL1MOD = 32 (tighter still)
        ; Line 103: peak wave
        dc.w    $8067,$FFFE           ; WAIT line 103
        dc.w    $0102,$00F0           ; BPLCON1 scroll = 15 (max)
        dc.w    $0108,$0018           ; BPL1MOD = 24
        ; Line 104: coming back
        dc.w    $8068,$FFFE           ; WAIT line 104
        dc.w    $0102,$00A0           ; BPLCON1 scroll = 10
        dc.w    $0108,$0020           ; BPL1MOD = 32
 ```
 ---
 ## Technique 5: Plasma Effects
 Plasma is a classic demoscene effect created by overlapping sine waves mapped to a color palette. On the Amiga, plasma is implemented by pre-calculating a color lookup table and using the Copper to write it per scanline (or per block).
 ### Plasma Generation
 ```c
 /* plasma.c — Generate plasma color values */
 #define PLASMA_WIDTH  40  /* One color per 8 pixels (320/8) */
 #define PLASMA_HEIGHT 25  /* One color per 8 scanlines (200/8) */
 #define NUM_COLORS    64  /* Plasma palette size */
 static UWORD plasma_palette[NUM_COLORS];
 /* Pre-calculate plasma palette (rainbow gradient) */
 void init_plasma_palette(void) {
    int i;
    for (i = 0; i < NUM_COLORS; i++) {
        int phase = i * 360 / NUM_COLORS;
        int r = (int)(8.0 + 7.0 * sin(phase * 3.14159 / 180.0));
        int g = (int)(8.0 + 7.0 * sin((phase + 120) * 3.14159 / 180.0));
        int b = (int)(8.0 + 7.0 * sin((phase + 240) * 3.14159 / 180.0));
        plasma_palette[i] = (r << 8) | (g << 4) | b;
    }
 }
 /* Calculate plasma value at (x,y) with time offset */
 UBYTE plasma_value(int x, int y, ULONG t) {
    int v = 0;
    v += sine_table[(x * 4 + t) & 0xFF];
    v += sine_table[(y * 6 + t * 2) & 0xFF];
    v += sine_table[((x + y) * 3 + t) & 0xFF];
    v += sine_table[((x * 2 - y + t * 3) & 0xFF)];
    return (v >> 2) & 0x3F;  /* 0-63 → palette index */
 }
 ```
 ---
 ## Antipatterns
 ### 1. The Copper Chunky Bandwidth Wall
 Attempting copper chunky at too high a resolution. With 5+ bitplanes enabled, there aren't enough DMA slots for per-pixel color writes. The display will flicker, show garbage, or skip pixels.
 **Broken:**
 ```asm
 ; Trying 112-pixel copper chunky with 4 bitplanes active
 ; → DMA starvation, half the pixels never get written
 dc.w    $0100,$$4200   ; BPLCON0: 4 bitplanes enabled
 ; ... then try 112 MOVE instructions per line → FAIL
 ```
 **Fixed:**
 ```asm
 ; Disable ALL bitplanes for copper chunky
 dc.w    $0100,$0200    ; BPLCON0: 0 bitplanes, color on
 ; Now full DMA bandwidth available for Copper writes
 ```
 ### 2. The HAM Fringe
 Rendering sharp edges in HAM mode without resetting the color state. Adjacent pixels of very different colors create visible "fringing" as the delta encoding struggles to transition.
 **Broken:**
 ```c
 /* Drawing a red square on a blue background in HAM */
 /* Each pixel transition goes: blue→red→blue→red → massive fringing */
 for (y = 0; y < height; y++) {
    for (x = 0; x < width; x++) {
        /* Each transition needs 2-3 pixels to change color */
        set_ham_pixel(x, y, is_square ? RED : BLUE);  /* Fringes! */
    }
 }
 ```
 **Fixed:**
 ```c
 /* Use palette entries for the square colors, and place the
   square on scanline boundaries where HAM state resets */
 for (y = 0; y < height; y++) {
    /* At start of each line, set base color from palette */
    set_ham_pixel(0, y, is_square ? RED_PAL_ENTRY : BLUE_PAL_ENTRY);
    /* Now deltas work from a known state */
    for (x = 1; x < width; x++) {
        /* Only use SET mode for sharp transitions */
        if (is_border_pixel(x, y)) {
            set_ham_pixel(x, y, target_palette_entry);
        }
    }
 }
 ```
 ### 3. The Modulo Overflow
 Setting a modulo value that causes bitplane addresses to wrap into non-bitplane memory or cross allocation boundaries. The result is visible garbage, DMA conflicts, or even system crashes.
 **Broken:**
 ```c
 /* Modulo too large → bitplane addresses go past allocated memory */
 custom.bpl1mod = 0x7FFF;  /* Almost 32KB jump per line! */
 /* → Reads from random memory, shows garbage, may crash */
 ```
 **Fixed:**
 ```c
 /* Calculate modulo based on actual bitmap dimensions */
 int bytes_per_line = (width / 8) * num_bitplanes;  /* Interleaved */
 int modulo = bytes_per_line - (display_bytes_per_line);
 custom.bpl1mod = (UWORD)(short)modulo;  /* Sign-extend correctly */
 ```
 ### 4. The Scroll Reset Gap
 Forgetting that `BPLCON1` scroll applies from the Copper write position to the end of the frame. If you set scroll per-line but forget to reset it, the last value persists for all remaining lines.
 **Broken:**
 ```asm
 ; Sine scroll lines 50-150, but no reset after line 150
 dc.w    $8060,$FFFE           ; WAIT line 96
 dc.w    $0102,$0070           ; Scroll = 7
 ; ... more lines ...
 dc.w    $8096,$FFFE           ; WAIT line 150 (last sine line)
 dc.w    $0102,$0003           ; Scroll = 3
 ; BUG: Lines 151+ still have scroll=3!
 ```
 **Fixed:**
 ```asm
 ; After the sine section, reset scroll to 0
 dc.w    $8097,$FFFE           ; WAIT line 151
 dc.w    $0102,$0000           ; Reset scroll to 0
 ```
 ### 5. The Palette Stride
 Assuming HAM palette entries are contiguous. HAM-6 uses only 16 palette entries (0–15) for SET mode — the upper 48 entries of the 64-entry palette are not directly accessible via HAM SET codes.
 **Broken:**
 ```c
 /* Setting HAM palette entry 32 — NOT accessible via HAM SET! */
 ham_palette[32] = 0x0FFF;
 /* HAM SET mode only uses bits 3-0 → entries 0-15 */
 ```
 **Fixed:**
 ```c
 /* Only entries 0-15 are usable with HAM SET (00xxxx) */
 /* Place your most important anchor colors in entries 0-15 */
 ham_palette[0] = 0x0000;  /* Black (background) */
 ham_palette[1] = 0x0FFF;  /* White (highlights) */
 /* ... entries 2-15 for key scene colors */
 ```
 ---
 ## Decision Guide
 ```mermaid
 flowchart TD
    START[Need pixel-level effect] --> Q1{Bitplanes available?}
    Q1 -->|No bitplanes needed| CC[Copper Chunky<br/>~56px wide, any colors]
    Q1 -->|Yes, need bitplanes| Q2{Photorealistic or generated?}
    Q2 -->|Photorealistic| HAM[HAM-6/HAM-8 Art<br/>4096/262,144 colors]
    Q2 -->|Generated/procedural| Q3{Wave distortion or wrapping?}
    Q3 -->|Wave distortion| Q4{Horizontal or vertical?}
    Q3 -->|Surface wrapping| MOD[Modulo Wrapping]
    Q4 -->|Horizontal| SCROLL[BPLCON1 Sine Scroll]
    Q4 -->|Vertical| Q5{Color cycling or structure?}
    Q5 -->|Color only| PLASMA[Plasma / Color Cycling]
    Q5 -->|Structural| SPLIT[Raster Split (see copper_effects.md)]
    CC --> BUDGET{DMA budget<br/>for copper writes?}
    HAM --> BUDGET2{Accept fringing<br/>at sharp edges?}
    SCROLL --> BUDGET
    MOD --> BUDGET
    BUDGET -->|Sufficient| OK[Proceed]
    BUDGET -->|Insufficient| REDUCE[Reduce width or<br/>disable bitplanes]
    BUDGET2 -->|Yes| OK2[Proceed with HAM]
    BUDGET2 -->|No| ALT[Use 6-bitplane normal mode<br/>or copper chunky instead]
 ```
 ---
 ## Historical Timeline
 ```mermaid
 timeline
    title Pixel Tricks Evolution
    1985 : Amiga launch — HAM-6 mode (4096 colors)
         : Planar display standard
    1987 : First HAM art images appear in graphics competitions
    1988 : Red Sector — sine scroll distortion in demos
         : Scroll register abuse becomes standard demo technique
    1989 : First plasma effects using copper color cycling
    1990 : Sanity "Arte" — copper chunky full-screen effect
         : HAM art becomes competition category at parties
    1991 : Combined scroll + modulo distortion effects
         : Copper chunky reaches 56-pixel practical limit
    1992 : AGA HAM-8 mode — 262,144 colors
         : Photorealistic HAM-8 images in demos
    1993 : Advanced plasma with pre-calculated tables
         : Modulo tunnel effects
    1994 : Full copper chunky animations at acceptable framerates
    2000+ : Pixel tricks preserved in demo archives
          : FPGA ensures exact register timing
 ```
 ---
 ## Modern Analogies
 | Amiga Pixel Trick | Modern Equivalent | Why It Maps |
 |------------------|-------------------|-------------|
 | Copper chunky | Fragment shader per-pixel output | Both generate color per pixel programmatically |
 | HAM delta encoding | DXT/Etc2 texture compression | Both store deltas from neighbors, not absolute values |
 | Scroll register distortion | Vertex displacement shader | Both deform the output geometry |
 | Modulo wrapping | UV coordinate wrapping/repeating | Both tile the source data |
 | Plasma effect | GPU procedural texture | Both generate visuals from math functions |
 | Per-line register changes | Per-scanline render state | Both change parameters at Y boundaries |
 | Self-modifying copper list | Dynamic command buffer | Both generate GPU commands per frame |
 | Color register palette | CLUT (Color Lookup Table) | Both map indices to RGB values |
 ---
 ## Use Cases
 | Use Case | Technique | Notable Examples |
 |----------|-----------|-----------------|
 | Full-screen chunky graphics | Copper chunky | Sanity "Arte", ASD demos |
 | Photorealistic static images | HAM-6/HAM-8 | Amiga art competitions, IPF cover art |
 | Sine-wave text scroll | BPLCON1 distortion | Every Amiga demo with scrolling text |
 | Wave distortion effect | Scroll + modulo combined | Red Sector, Kefrens demos |
 | Background plasma | Copper color cycling | Numerous demos, cracktros |
 | Tunnel perspective | Modulo wrapping | "Tunnel" demo effect |
 | Virtual scrolling surface | Modulo repeat | Racing games, side-scrollers |
 | Color gradient animation | Per-line color writes | Copper bar variants |
 ---
 ## FPGA / Emulation Impact
 | Concern | Impact | Notes |
 |---------|--------|-------|
 | **Copper chunky timing** | Exact per-cycle color register writes | Denise must latch COLORxx at precise pixel positions |
 | **HAM delta decoding** | Per-pixel color state machine | Must track running R/G/B across entire scanline |
 | **Scroll register latency** | BPLCON1 changes take effect next line | Must match hardware pipeline delay |
 | **Modulo arithmetic** | Signed 16-bit addition to address | Address wrapping must handle 24-bit space correctly |
 | **BPLCON0 bitplane disable** | 0-bitplane mode must suppress all BPL DMA | Copper chunky depends on zero bitplane DMA |
 | **HAM-8 AGA extension** | 8-bitplane HAM mode with 6-bit deltas | AGA Lisa chip HAM decoder must be implemented |
 ---
 ## FAQ
 **Q: Can copper chunky animate at full frame rate?**
 A: Rarely. A 56×200 copper chunky frame requires generating ~11,200 copper instructions per frame. On a 7 MHz 68000, that takes most of the frame time. Most copper chunky demos animate at reduced resolution or partial updates.
 **Q: Why not always use HAM for more colors?**
 A: HAM's delta encoding makes it nearly impossible to render sharp-edited graphics or animations. Any pixel transition that requires changing more than one color component creates visible fringing. HAM is excellent for pre-rendered images but terrible for real-time rendering.
 **Q: What's the difference between plasma and copper bars?**
 A: Copper bars are horizontal bands of uniform color (one color per scanline or per group of scanlines). Plasma varies color both horizontally and vertically using sine-based lookup tables, creating a 2D color field. Plasma typically renders to bitplanes (pre-calculated pixel data) while copper bars use register writes only.
 **Q: Can I combine copper chunky with normal bitplane graphics?**
 A: Not simultaneously — copper chunky requires disabling all bitplane DMA. However, you can use copper chunky on some scanlines and bitplane graphics on others, switching `BPLCON0` mid-frame via the Copper. This is rarely done because the copper list size doubles.
 **Q: How does modulo wrapping differ from scroll wrapping?**
 A: Scroll (`BPLCON1`) shifts pixel data horizontally within a scanline by 0–15 pixels. Modulo (`BPL1MOD`) changes the byte offset between consecutive scanlines in memory — it affects which memory addresses are read for each line, not how the bits are shifted. They produce different visual effects: scroll creates smooth horizontal displacement, modulo creates discontinuous "jumps" in the displayed data.
 **Q: What is EHB (Extra Half-Brite) and how does it relate?**
 A: EHB is a 6-bitplane mode where the 32nd palette entry is automatically generated as a half-brightness copy of entries 0–31, giving 64 colors. It's not a "pixel trick" per se — it's a hardware feature of ECS+. It can be combined with copper effects for interesting results but is documented separately in [HAM & EHB Modes](../08_graphics/ham_ehb_modes.md).
 ---
 ## References
 ### Related Knowledge Base Articles
 - [Pixel Conversion](../08_graphics/pixel_conversion.md) — C2P algorithms, copper chunky theory
 - [HAM & EHB Modes](../08_graphics/ham_ehb_modes.md) — HAM-6/HAM-8 encoding, EHB mode
 - [Copper Effects](copper_effects.md) — Copper bars, gradients, raster splits
 - [Display Modes](../08_graphics/display_modes.md) — ModeID, BPLCON0 settings
 - [Video Timing](../01_hardware/common/video_timing.md) — Scanline structure, beam position
 - [Blitter Programming](../08_graphics/blitter_programming.md) — Blitter fill for HAM rendering
 ### External Resources
 - **Amiga Hardware Reference Manual** — Chapter 3: BPLCON0/1, HAM mode, scroll registers
 - **Amiga Graphics Archive** — https://amiga.lychesis.net/specials/Copper.html — Copper chunky and gradient analysis in commercial games
 - **Scoopex Amiga Hardware Programming** (Photon) — [YouTube playlist](https://www.youtube.com/playlist?list=PLc3ltHgmiidpK-s0eP5hTKJnjdTHz0_bW) — Bitplane and copper-based effects walkthroughs; companion site: [coppershade.org](http://coppershade.org/)
 - **Pouet.net** — https://www.pouet.net — Copper chunky and plasma demos
 - **Demozoo** — https://demozoo.org — Demoscene production encyclopedia
 - **HAM Encoder** — Various tools for converting images to HAM format
--- a/17_demoscene/sprite_techniques.md
+++ b/17_demoscene/sprite_techniques.md
@ -0,0 +1,581 @@
 [← Home](../README.md) · [Demoscene Techniques](README.md)
 # Sprite Techniques — Multiplexing, Sprite-Built Images, and Attached Sprites
 ## Overview
 The Amiga has 8 hardware sprite channels, each displaying a 16-pixel-wide, arbitrarily-tall image in 3 colors (plus transparent). That sounds limiting — and it is, for games that need dozens of moving objects. But the demoscene turned this constraint into a creative engine. By repositioning sprites at different scanlines (multiplexing), building full-screen images from sprite data, and combining pairs into 15-color attached sprites, demoscene coders extracted far more from 8 channels than Commodore intended.
 This article covers the specific demoscene sprite techniques that go beyond basic sprite usage. For the hardware architecture, DMA timing, and OS-level sprite API, see [Sprites](../08_graphics/sprites.md).
 ```mermaid
 graph TB
    subgraph "Multiplexing"
        SMX["Vertical Multiplex<br/>Reuse sprites at different Y"]
        HMX["Horizontal Reposition<br/>Move SPRxPOS per line"]
    end
    subgraph "Construction"
        SBI["Sprite-Built Images<br/>Full graphics from sprite data"]
        SBM["Sprite-Built Masks<br/>Sprites as Blitter stencil"]
    end
    subgraph "Enhancement"
        ATT["Attached Sprites<br/>15-color sprite pairs"]
        OVL["Sprite Overlay<br/>Sprites on top of playfield"]
    end
    SMX --> SBI
    HMX --> SBM
    SBI --> ATT
    SBM --> OVL
 ```
 ---
 ## Hardware Recap
 ### Sprite DMA Channels
 | Sprite | DMA Slots/Line | Position Reg | Data Regs | Colors | Notes |
 |--------|---------------|-------------|-----------|--------|-------|
 | SPR0 | 2 | `$DFF140` | `$DFF144`/`6` | COLOR17/COLOR18 + COLOR00 | Usually mouse pointer |
 | SPR1 | 2 | `$DFF148` | `$DFF14C`/`E` | COLOR19/COLOR20 + COLOR00 | |
 | SPR2 | 2 | `$DFF150` | `$DFF154`/`6` | COLOR21/COLOR22 + COLOR00 | |
 | SPR3 | 2 | `$DFF158` | `$DFF15C`/`E` | COLOR23/COLOR24 + COLOR00 | |
 | SPR4 | 2 | `$DFF160` | `$DFF164`/`6` | COLOR25/COLOR26 + COLOR00 | |
 | SPR5 | 2 | `$DFF168` | `$DFF16C`/`E` | COLOR27/COLOR28 + COLOR00 | |
 | SPR6 | 2 | `$DFF170` | `$DFF174`/`6` | COLOR29/COLOR30 + COLOR00 | |
 | SPR7 | 2 | `$DFF178` | `$DFF17C`/`E` | COLOR31/COLOR32 + COLOR00 | |
 ### Sprite Data Format
 Each sprite line is 2 words (32 bits). For a standard 16-pixel-wide, 3-color sprite, each word contains one bitplane of the image:
 ```
 ┌─────────────────────────────────────┐
 │  Word 1 (bitplane 0)  │  Word 2 (bitplane 1)  │
 │  bits: 16 pixels      │  bits: 16 pixels      │
 └─────────────────────────────────────┘
 Color mapping:
  00 = transparent (playfield shows through)
  01 = color register A  (e.g., COLOR17 for SPR0)
  10 = color register B  (e.g., COLOR18 for SPR0)
  11 = both set → COLOR00 (shared with background!)
 ```
 ### Sprite Control Registers
 | Register | Address | Description |
 |----------|---------|-------------|
 | `SPRxPOS` | `$DFF140+8n` | Vertical start (bits 15-8) + horizontal start (bits 7-0) |
 | `SPRxCTL` | `$DFF142+8n` | Vertical stop (bits 15-8) + attach bit (bit 7) + control |
 | `SPRxDATA` | `$DFF144+8n` | Data bitplane 0 (write-only) |
 | `SPRxDATB` | `$DFF146+8n` | Data bitplane 1 (write-only) |
 > [!NOTE]
 > Sprite position encoding: The 8-bit horizontal component uses low-resolution pixel units ($00 = leftmost visible, max ~$DA). The 8-bit vertical component is the scanline number (PAL: 0–311, NTSC: 0–261).
 ---
 ## Technique 1: Sprite Multiplexing
 The most important demoscene sprite technique. The Amiga's 8 sprites each persist from their start scanline to their stop scanline. By repositioning a sprite mid-frame (via the Copper writing new `SPRxPOS`/`SPRxCTL` values), the same hardware channel displays different images at different vertical positions.
 ### How Multiplexing Works
 ```mermaid
 sequenceDiagram
    participant Copper as Copper
    participant SPR0 as Sprite 0 Channel
    participant Screen as Display
    Note over Screen: Scanline 50
    Copper->>SPR0: SPR0POS = $3200 (start line 50, x=0)
    Copper->>SPR0: SPR0CTL = $5C00 (stop line 92, attach=0)
    Copper->>SPR0: SPR0DATA = image_A plane 0
    Copper->>SPR0: SPR0DATB = image_A plane 1
    SPR0->>Screen: Render image_A (lines 50-92)
    Note over Screen: Scanline 93
    Copper->>SPR0: SPR0POS = $5D28 (start line 93, x=40)
    Copper->>SPR0: SPR0CTL = $8728 (stop line 135, attach=0)
    Copper->>SPR0: SPR0DATA = image_B plane 0
    Copper->>SPR0: SPR0DATB = image_B plane 1
    SPR0->>Screen: Render image_B (lines 93-135)
 ```
 ### Multiplexed Sprite Setup
 The Copper writes new position and data values for each sprite at each repositioning point:
 ```asm
 ; sprite_mux.asm — Multiplex sprite 0 at two vertical positions
 SPRITE_MUX_COPPER:
        ; ---- First instance: lines 50-79 (30 lines) ----
        dc.w    $8032,$FFFE           ; WAIT line 50
        dc.w    $0140,$3200           ; SPR0POS: start line 50, x=0
        dc.w    $0142,$4E00           ; SPR0CTL: stop line 78, no attach
        ; Sprite 0 data pointer in Copper (sets SPR0DATA/DATB via DMA)
        dc.w    $0180,$0000           ; COLOR00 = black (sprite uses it)
        ; ---- Reposition at line 80 ----
        dc.w    $8050,$FFFE           ; WAIT line 80
        dc.w    $0140,$5028           ; SPR0POS: start line 80, x=40
        dc.w    $0142,$6A28           ; SPR0CTL: stop line 106, x=40
        ; ---- Reposition again at line 110 ----
        dc.w    $806E,$FFFE           ; WAIT line 110
        dc.w    $0140,$6E50           ; SPR0POS: start line 110, x=80
        dc.w    $0142,$8C50           ; SPR0CTL: stop line 140, x=80
        dc.w    $FFFF,$FFFE           ; End
 ```
 ### Multiplexing Limits
 | Factor | Constraint | Practical Limit |
 |--------|-----------|-----------------|
 | **DMA bandwidth** | 2 slots per sprite per scanline (8 sprites = 16 slots) | All 8 sprites always consume 16 slots — fixed cost |
 | **Copper repositioning** | 2 WAIT+MOVE pairs per reposition (8 words) | ~7 repositions per scanline in LoRes 4-plane |
 | **Vertical spacing** | Must wait for sprite stop before restarting | Minimum ~1 scanline gap between instances |
 | **Data storage** | Each sprite instance needs its own data in Chip RAM | Limited by Chip RAM budget |
 | **Horizontal position** | Single 8-bit value, max ~$DA (218 pixels LoRes) | LoRes only; HiRes sprites need different encoding |
 ### Typical Multiplexing Budgets
 | Configuration | Sprites | Multiplexes/Sprite | Total Objects | Colors |
 |--------------|---------|-------------------|---------------|--------|
 | 8 sprites × 1 mux | 8 | 1 | 8 | 3 each |
 | 8 sprites × 3 mux | 8 | 3 | 24 | 3 each |
 | 4 sprites × 5 mux | 4 | 5 | 20 | 3 each |
 | 8 attached × 2 mux | 4 pairs | 2 | 8 | 15 each |
 ---
 ## Technique 2: Sprite-Built Images
 Instead of using sprites for moving objects, demoscene coders use sprite data to construct static images — logos, borders, or even full-screen graphics. This frees bitplane memory for the main display and lets sprites act as an overlay layer.
 ### How It Works
 1. Arrange 8 sprites vertically (no multiplexing) to cover the full screen height
 2. Each sprite is 16 pixels wide × the full display height (~256 lines)
 3. Total sprite coverage: 8 × 16 = 128 pixels wide, full height
 4. Combine with bitplanes for the remaining horizontal space
 ### Logo Construction from Sprites
 ```mermaid
 graph LR
    subgraph "Screen Composition"
        PF["Bitplane Layer<br/>320×256, 4-bitplane background"]
        S0["SPR0-3<br/>64px × 256 lines<br/>Logo left half"]
        S4["SPR4-7<br/>64px × 256 lines<br/>Logo right half"]
    end
    PF --> MUX["Priority MUX<br/>(BPLCON2)"]
    S0 --> MUX
    S4 --> MUX
    MUX --> OUT["Final Display<br/>Background + sprite overlay"]
 ```
 ```c
 /* sprite_logo.c — Build a 128-pixel-wide logo from 8 sprites */
 #define LOGO_WIDTH  128   /* 8 sprites × 16 pixels */
 #define LOGO_HEIGHT 256
 /* Each sprite line = 2 words = 32 bits = 16 pixels × 2 bitplanes
   For a logo, we need to convert our image data to sprite format */
 struct SpriteData {
    UWORD pos;      /* SPRxPOS value */
    UWORD ctl;      /* SPRxCTL value */
    UWORD data[LOGO_HEIGHT];  /* Bitplane 0 */
    UWORD datb[LOGO_HEIGHT];  /* Bitplane 1 */
 };
 /* Initialize 8 sprites to display a 128-pixel logo */
 void init_sprite_logo(struct SpriteData sprites[8], int start_y) {
    int i;
    for (i = 0; i < 8; i++) {
        int x = i * 16;  /* Each sprite starts 16px after the previous */
        /* Position: high byte = Y, low byte = X (lowres units) */
        sprites[i].pos = (start_y << 8) | (x & 0xFF);
        /* Control: stop Y = start_y + LOGO_HEIGHT - 1 */
        int stop_y = start_y + LOGO_HEIGHT - 1;
        sprites[i].ctl = (stop_y << 8) | (x & 0xFF);
        /* Data is already in sprite format (2 words per line) */
        /* Would be filled from pre-converted image data */
    }
 }
 ```
 ---
 ## Technique 3: Attached Sprites (15-Color)
 Normal sprites have 3 colors. **Attached sprites** combine two sprite channels (even+odd, like SPR0+SPR1) into a single image with 15 colors. The even sprite provides bitplanes 0-1, the odd sprite provides bitplanes 2-3. Together, they form a 4-bitplane (16-color, one transparent) image.
 ### Attachment Encoding
 The `attach` bit in `SPRxCTL` (bit 7) tells Denise to combine the sprite pair:
 ```asm
 ; attached_sprites.asm — 15-color sprite pair (SPR0 + SPR1)
        ; ---- Set up SPR0 (even: bitplanes 0,1) ----
        dc.w    $0140,$3200           ; SPR0POS: line 50, x=0
        dc.w    $0142,$5A80           ; SPR0CTL: stop line 90, ATTACH=1 (bit 7)
        ; ---- Set up SPR1 (odd: bitplanes 2,3) ----
        dc.w    $0148,$3200           ; SPR1POS: SAME position as SPR0
        dc.w    $014A,$5A00           ; SPR1CTL: stop line 90, ATTACH=0 (odd)
        ; ---- Set 15-color palette for SPR0+SPR1 ----
        ; SPR0 colors: COLOR17 (01), COLOR18 (10), COLOR00 (11)
        ; SPR1 colors: COLOR19 (01), COLOR20 (10), COLOR00 (11)
        ; Combined mapping:
        ;   0000 = transparent
        ;   0001 = COLOR17  (SPR0 only, bit 0)
        ;   0010 = COLOR18  (SPR0 only, bit 1)
        ;   0011 = COLOR00  (both)
        ;   0101 = COLOR19  (SPR1 only, bit 2)
        ;   ...
        ;   1111 = COLOR00  (all bits set)
 ```
 ### Attached Sprite Color Table
 | Bits (3210) | Color Register | Notes |
 |-------------|---------------|-------|
 | 0000 | Transparent | Playfield visible |
 | 0001 | COLOR17 | Even sprite, plane 0 |
 | 0010 | COLOR18 | Even sprite, plane 1 |
 | 0011 | COLOR00 | Both planes of even sprite |
 | 0101 | COLOR19 | Odd sprite, plane 0 |
 | 0110 | COLOR20 | Odd sprite, plane 1 |
 | 0111 | COLOR00 | Odd combined |
 | 1001 | COLOR17+19 | Mixed |
 | 1010 | COLOR18+20 | Mixed |
 | ... | Various | See full table below |
 > [!NOTE]
 > The 4-bit combination gives 16 values. Value 0 is transparent, and values where both even and odd planes are all-ones map to `COLOR00`. The remaining 12 values use the sprite's own color registers (COLOR17–COLOR20 for the SPR0+SPR1 pair), giving effectively 15 distinct non-transparent colors.
 ### Attached Pair Assignments
 | Even Sprite | Odd Sprite | Color Registers | Available |
 |-------------|-----------|-----------------|-----------|
 | SPR0 | SPR1 | COLOR17, COLOR18, COLOR19, COLOR20 + COLOR00 | 15 colors |
 | SPR2 | SPR3 | COLOR21, COLOR22, COLOR23, COLOR24 + COLOR00 | 15 colors |
 | SPR4 | SPR5 | COLOR25, COLOR26, COLOR27, COLOR28 + COLOR00 | 15 colors |
 | SPR6 | SPR7 | COLOR29, COLOR30, COLOR31, COLOR32 + COLOR00 | 15 colors |
 ---
 ## Technique 4: Sprite Overlay with Priority Control
 `BPLCON2` (bit 6) controls whether sprites appear above or below playfield bitplanes. Demoscene effects exploit this to create overlay effects — sprites that appear in front of or behind the playfield for visual layering:
 ```asm
 ; Set sprites behind playfield (for background effects)
 dc.w    $0104,$0040           ; BPLCON2: sprites behind playfield
 ; Set sprites in front of playfield (default, for overlays)
 dc.w    $0104,$0000           ; BPLCON2: sprites in front
 ```
 This can be changed mid-frame by the Copper, enabling sprites to appear behind bitplanes in one area and in front in another — a common technique in multi-layer parallax effects.
 ---
 ## Technique 5: Sprite-Based Color Effects
 Each sprite's color registers can be changed by the Copper per scanline. This enables per-line color animation on sprite graphics:
 ```asm
 ; Animate sprite colors per scanline for rainbow effect
 dc.w    $8032,$FFFE           ; WAIT line 50
 dc.w    $0188,$0F00           ; COLOR17 = blue (SPR0 color A)
 dc.w    $8033,$FFFE           ; WAIT line 51
 dc.w    $0188,$0FF0           ; COLOR17 = cyan
 dc.w    $8034,$FFFE           ; WAIT line 52
 dc.w    $0188,$0FFF           ; COLOR17 = white
 ; ... continues for full rainbow
 ```
 This technique is the basis for many demo effects where a sprite "absorbs" the current background color and appears to change color as it moves.
 ---
 ## Antipatterns
 ### 1. The Invisible Sprite
 Forgetting to set sprite color registers. By default, all sprite colors are `$0000` (black), which is also the default background color — making sprites invisible against a black background.
 **Broken:**
 ```c
 /* Sprite is there but invisible — COLOR17/18 match background */
 custom.spr[0].pos = pos_value;
 custom.spr[0].ctl = ctl_value;
 /* Colors never set → invisible on black background */
 ```
 **Fixed:**
 ```c
 custom.spr[0].pos = pos_value;
 custom.spr[0].ctl = ctl_value;
 /* Set sprite colors to visible values */
 custom.color[17] = 0x0FFF;  /* Bright white */
 custom.color[18] = 0x0F00;  /* Blue */
 ```
 ### 2. The Overlapping Multiplex
 Repositioning a sprite before its previous instance has finished displaying. The sprite channel can only hold one position at a time — setting a new start position while the old one is still active causes visual artifacts.
 **Broken:**
 ```asm
 ; Sprite 0 starts at line 50, runs to line 100
 dc.w    $0140,$3200           ; SPR0POS: start=50
 dc.w    $0142,$6400           ; SPR0CTL: stop=100
 ; But reposition at line 70 — before line 100 stop!
 dc.w    $8046,$FFFE           ; WAIT line 70
 dc.w    $0140,$4628           ; SPR0POS: start=70 ← CONFLICT!
 ```
 **Fixed:**
 ```asm
 ; Let first instance finish (stop line 100), then reposition
 dc.w    $8064,$FFFE           ; WAIT line 100 (after stop)
 dc.w    $0140,$6428           ; SPR0POS: start=100, x=40
 dc.w    $0142,$8C28           ; SPR0CTL: stop=140, x=40
 ```
 ### 3. The Misaligned Attached Pair
 Attached sprites require both even and odd sprites to have identical positions. Even a 1-pixel offset breaks the 15-color illusion.
 **Broken:**
 ```asm
 dc.w    $0140,$3200           ; SPR0POS: line 50, x=0
 dc.w    $0148,$3201           ; SPR1POS: line 50, x=1 ← OFFSET!
 ```
 **Fixed:**
 ```asm
 dc.w    $0140,$3200           ; SPR0POS: line 50, x=0
 dc.w    $0148,$3200           ; SPR1POS: line 50, x=0 ← SAME!
 ```
 ### 4. The Sprite DMA Starvation
 Disabling sprite DMA (`DMACON` bit 3) but still writing to sprite position registers. Without DMA, no sprite data is fetched — the registers are set but nothing appears.
 **Broken:**
 ```c
 custom.dmacon = 0x8100;  /* Enable DMA, but forgot sprite bit (bit 3) */
 /* Sprites won't display even though positions are set */
 ```
 **Fixed:**
 ```c
 custom.dmacon = 0x8100 | 0x0008;  /* Enable DMA + sprite DMA (SPR0-DMA) */
 /* Or simply: */
 custom.dmacon = 0x81FF;  /* Enable all DMA channels */
 ```
 ### 5. The AGA Width Trap
 AGA allows 32-pixel and 64-pixel wide sprites via `FMODE` settings. But the sprite data format changes — wider sprites need more words per line. Using OCS-format sprite data with AGA wide-sprite settings produces garbage.
 **Broken:**
 ```c
 /* Set 32-pixel sprite mode but provide 16-pixel data */
 custom.fmode = 0x0030;  /* SPR_FMODE = 32-bit fetch */
 /* Sprite data still only 2 words/line (16px) → garbage */
 ```
 **Fixed:**
 ```c
 if (aga_detected) {
    custom.fmode = 0x0030;  /* 32-bit sprite fetch */
    /* Provide 4 words per line (32px × 2 bitplanes) */
 } else {
    custom.fmode = 0x0000;  /* OCS: 16-bit fetch */
    /* Provide 2 words per line (16px × 2 bitplanes) */
 }
 ```
 ---
 ## Decision Guide
 ```mermaid
 flowchart TD
    START[Need sprite-based effect] --> Q1{Moving objects or<br/>image construction?}
    Q1 -->|Moving objects| Q2{How many simultaneous<br/>on one scanline?}
    Q1 -->|Image construction| SBI[Sprite-Built Image]
    Q2 -->|≤8| Q3{Need 15 colors?}
    Q2 -->|>8| Q4{Can tolerate<br/>vertical gaps?}
    Q3 -->|Yes, ≤4 objects| ATT[Attached Sprites<br/>4 pairs × 15 colors]
    Q3 -->|No, 3 colors fine| BASIC[Basic 8 sprites<br/>3 colors each]
    Q4 -->|Yes| MUX[Sprite Multiplexing<br/>Reuse channels vertically]
    Q4 -->|No| BLIT[Use Blitter BOBs instead<br/>No sprite limit]
    SBI --> Q5{Full screen or overlay?}
    Q5 -->|Overlay| OVL[Sprite overlay on playfield]
    Q5 -->|Full width| Q6{Need 15 colors?}
    Q6 -->|Yes| ASBI[Attached sprite-built image<br/>64px wide × 15 colors]
    Q6 -->|No| BSBI[Basic sprite-built image<br/>128px wide × 3 colors]
 ```
 ---
 ## Performance Analysis
 ### Sprite DMA Cost (Fixed, Always Paid)
 | Configuration | DMA Slots/Scanline | Notes |
 |--------------|--------------------|-------|
 | All 8 sprites enabled | 16 | Fixed cost regardless of use |
 | Sprites disabled | 0 | Can reclaim 16 slots for other DMA |
 | 4 sprites (0-3) | 8 | Common for 2 attached pairs |
 ### Multiplexing Overhead
 | Operation | Copper Words | DMA Slots | Notes |
 |-----------|-------------|-----------|-------|
 | Reposition (POS+CTL) | 4 | 4 | WAIT + MOVE × 2 |
 | With new data pointers | 8 | 8 | POS + CTL + DATA/DATB ptrs |
 | Full reposition + colors | 12 | 12 | Above + 2 color register writes |
 > [!TIP]
 > Each sprite reposition costs 4–12 DMA slots per scanline at the reposition line. Plan multiplexing points carefully — avoid repositioning all 8 sprites on the same scanline.
 ---
 ## Historical Timeline
 ```mermaid
 timeline
    title Sprite Techniques Evolution
    1985 : Amiga launch — 8 sprites, 3 colors each
         : Games use sprites for player, enemies, bullets
    1987 : First sprite multiplexing demos appear
    1988 : Kefrens "MegaDemo" — advanced sprite multiplexing
         : Scoopex uses sprites for copper-bar overlays
    1989 : Sprite-built logos in demos
         : First attached sprite demos (15-color objects)
    1990 : Complex sprite/Blt synchronization
         : Sprite multiplexing in commercial games
    1991 : Full-screen sprite effects in demos
         : Sprite-based "virtual BOBs" for performance
    1992 : AGA sprite enhancements: 32/64px wide
         : Melon Dezign uses wide sprites for design
    1993 : Advanced attached sprite animations
    1994 : Sprite techniques combined with chunky pixels
    2000+ : Demoscene continues optimizing sprite techniques
         : MiSTer ensures cycle-accurate sprite DMA
 ```
 ---
 ## Modern Analogies
 | Amiga Sprite Concept | Modern Equivalent | Why It Maps |
 |---------------------|-------------------|-------------|
 | Sprite multiplexing | Instance rendering / draw-call batching | Same object data reused at different positions |
 | Sprite-built image | GPU sprite atlas / billboard rendering | Multiple small textures composited into scene |
 | Attached sprites | Alpha channel / RGBA4444 textures | More color depth by combining data from two sources |
 | SPRxPOS reposition | Transform matrix update | Changing draw position per instance |
 | Sprite priority (BPLCON2) | Z-sort / depth buffer | Controls which layer appears on top |
 | Sprite DMA | GPU texture fetch | Autonomous hardware fetches pixel data |
 | 16px width limit | Texture dimension constraints | Hardware-imposed maximum per unit |
 | Color register per sprite | Per-object palette / uniform | Color lookup specific to each sprite |
 ---
 ## Use Cases
 | Use Case | Technique | Notable Examples |
 |----------|-----------|-----------------|
 | Game player/enemy objects | Basic sprites + multiplex | Turrican, Shadow of the Beast |
 | Mouse pointer | SPR0 (reserved by OS) | Workbench |
 | Demo logo overlay | Sprite-built image | Melon Dezign, Sanity |
 | Large colorful objects | Attached sprites (15-color) | Kefrens, Phenomena demos |
 | Parallax background layer | Sprite overlay behind playfield | Lionheart, Leander |
 | Status bar icons | Fixed sprites | Many games |
 | Sprite-sprite collision | CLXCON/CLXDAT hardware | Turrican (sprite collision detection) |
 | Color-cycling objects | Per-line sprite color changes | Numerous demos |
 ---
 ## FPGA / Emulation Impact
 | Concern | Impact | Notes |
 |---------|--------|-------|
 | **Sprite DMA timing** | Must fetch exactly 2 words per sprite per scanline | Minimig/MiSTer implement precise DMA slot scheduler |
 | **Attached sprite decoding** | Denise must combine even+odd data correctly | 4-bitplane lookup from 2 sprite channels |
 | **SPRxPOS/CTL latency** | Position changes take effect next line | Must match real hardware delay |
 | **CLXCON collision detection** | Hardware collision between sprites/bitplanes | Required for games like Turrican |
 | **AGA FMODE sprite width** | 32/64px sprites change data format | FMODE.SPR_FMODE must be tracked |
 | **Sprite data staging** | Denise latches data at end of scanline for next line | Pipeline behavior must be emulated |
 ---
 ## FAQ
 **Q: How many sprites can I display on screen at once?**
 A: 8 per scanline (hardware limit). With multiplexing, you can reuse each channel multiple times vertically — a common demo technique shows 30+ "sprites" using 8 channels multiplexed 4 times each.
 **Q: Can sprites be wider than 16 pixels?**
 A: On OCS/ECS, no — 16 pixels is the fixed hardware width. On AGA, `FMODE` bits can set 16, 32, or 64 pixel widths. For OCS, wider objects require Blitter BOBs or multiple sprites side by side.
 **Q: Why does SPR0 sometimes conflict with the mouse pointer?**
 A: Intuition reserves SPR0 and SPR1 for the mouse pointer. If you take over the hardware (demos), you can use all 8 sprites. If running under the OS, use only SPR2-SPR7 or use the `ExtSprite` API (V39+) which cooperates with Intuition.
 **Q: What happens when sprites overlap?**
 A: Lower-numbered sprites have priority — SPR0 appears on top of SPR1, which appears on top of SPR2, etc. The sprite/playfield priority is controlled by `BPLCON2`.
 **Q: Can I use sprites and Blitter BOBs together?**
 A: Yes. Sprites are DMA-driven and cost zero CPU time. BOBs are software-driven (Blitter copies) and cost Blitter DMA time. Many games use sprites for small, frequently-moving objects and BOBs for larger or more colorful ones.
 **Q: What is CLXCON and how does collision detection work?**
 A: `CLXCON` (`$DFF098`) configures which sprite and bitplane bits are included in collision detection. `CLXDAT` (`$DFF00E`) reports detected collisions. The hardware compares sprite and bitplane data in real-time and sets bits when matching pixels overlap. See [Sprites](../08_graphics/sprites.md) for full details.
 ---
 ## References
 ### Related Knowledge Base Articles
 - [Sprites](../08_graphics/sprites.md) — Full hardware sprite architecture, DMA timing, collision detection
 - [AGA Sprites](../01_hardware/aga_a1200_a4000/aga_sprites.md) — AGA sprite enhancements: 32/64px, FMODE, color banks
 - [Copper Effects](copper_effects.md) — Copper-driven sprite repositioning
 - [Pixel Tricks](pixel_tricks.md) — Chunky pixel techniques using sprites
 - [Blitter Programming](../08_graphics/blitter_programming.md) — BOB rendering (sprite alternative)
 - [Animation](../08_graphics/animation.md) — GEL system: VSprites (software sprites)
 - [DMA Architecture](../01_hardware/common/dma_architecture.md) — DMA slot allocation
 ### External Resources
 - **Amiga Hardware Reference Manual** — Chapter 5: Sprites
 - **Amiga Graphics Archive** — https://amiga.lychesis.net — Sprite multiplexing and copper-driven sprite repositioning in commercial games
 - **Scoopex Amiga Hardware Programming** (Photon) — [YouTube playlist](https://www.youtube.com/playlist?list=PLc3ltHgmiidpK-s0eP5hTKJnjdTHz0_bW) — Sprite programming episodes covering hardware setup, multiplexing, and attached sprites
 - **Pouet.net** — https://www.pouet.net — Sprite-based demo releases
 - **Demozoo** — https://demozoo.org — Demoscene encyclopedia
 - **AMIGA Machine Code Tutorial** — Lexington — Sprite programming from scratch
--- a/17_demoscene/timing_optimization.md
+++ b/17_demoscene/timing_optimization.md
@ -0,0 +1,740 @@
 [← Home](../README.md) · [Demoscene Techniques](README.md)
 # Timing Optimization — Cycle Counting, Blitter-CPU Interleaving, and Self-Modifying Code
 ## Overview
 On a stock Amiga 500, the 68000 runs at 7.09 MHz and must share memory bandwidth with the Copper, Blitter, bitplane DMA, sprite DMA, and audio DMA — all running simultaneously. A PAL frame lasts exactly **19,968 CPU cycles** (20ms). After DMA steals its share, the CPU might only get **8,000–12,000 usable cycles per frame**. Every instruction, every memory access, every bus arbitration event is a battle for scarce bandwidth.
 Demoscene coding is the art of extracting maximum performance from this constrained environment. This article covers the timing optimization techniques that demoscene coders developed: cycle-accurate instruction scheduling, Blitter-CPU interleaving to recover stolen cycles, copper-wait placement to minimize stalls, memory access pattern optimization, and self-modifying code for runtime specialization.
 ```mermaid
 graph TB
    subgraph "Analysis"
        CC["Cycle Counting<br/>Know every instruction cost"]
        PROF["Profiling<br/>Measure actual DMA budget"]
    end
    subgraph "Scheduling"
        BCI["Blitter-CPU Interleave<br/>Overlap compute with DMA"]
        CW["Copper Wait Placement<br/>Minimize bus contention"]
    end
    subgraph "Memory"
        SEQ["Sequential Access<br/>Exploit 68000 prefetch"]
        CHIP["Chip vs Fast<br/>Put code in Fast RAM"]
    end
    subgraph "Advanced"
        SMC["Self-Modifying Code<br/>Runtime specialization"]
        FLI["Line-F Trap<br/>Transparent FPU emulation"]
        ROL["Register Allocation<br/>Minimize memory traffic"]
    end
    CC --> BCI
    CC --> SEQ
    PROF --> CW
    BCI --> SMC
    SEQ --> ROL
    CHIP --> BCI
 ```
 ---
 ## Foundation: The DMA Budget
 ### Per-Frame Cycle Budget (PAL)
 | Resource | Cycles per Frame | Percentage | Notes |
 |----------|-----------------|------------|-------|
 | **Total frame** | 19,968 | 100% | 312 scanlines × 226.8 DMA slots/line × ~2 cycles |
 | Bitplane DMA | 2,496–7,488 | 12–37% | Depends on depth/resolution |
 | Sprite DMA | 1,248 | 6% | Fixed: 8 sprites always active |
 | Audio DMA | 312 | 1.5% | Fixed: 4 channels |
 | Copper DMA | 624–1,248 | 3–6% | Depends on copper list length |
 | Refresh DMA | 624 | 3% | DRAM refresh (fixed) |
 | **Available for CPU + Blitter** | ~9,000–15,000 | 45–75% | Shared between CPU and Blitter |
 ### The Key Constraint: Bus Arbitration
 The 68000 and DMA controllers share the same memory bus. When DMA is active, the CPU **stalls** — it cannot fetch instructions or data. The `BLTPRI` bit (Blitter Nasty mode) gives the Blitter total bus priority, starving the CPU almost completely:
 | Mode | Blitter Priority | CPU Gets | Use When |
 |------|-----------------|----------|----------|
 | Normal (`BLTPRI=0`) | Every 4th cycle | ~75% of remaining cycles | Normal operation |
 | Blitter Nasty (`BLTPRI=1`) | All cycles | ~0% (only between Blitter cycles) | Critical Blitter operations |
 ---
 ## Technique 1: Cycle Counting
 Every 68000 instruction has a known cycle cost. Demoscene coders count cycles the way a financial analyst counts dollars — every single one matters.
 ### 68000 Instruction Cycle Costs (Most Common)
 | Instruction | Cycles | Notes |
 |-------------|--------|-------|
 | `MOVE.W Dn,Dn` | 4 | Register-to-register: fastest |
 | `MOVE.W (An),Dn` | 8 | Memory read: 4 + 4 (prefetch penalty) |
 | `MOVE.W Dn,(An)` | 8 | Memory write |
 | `MOVE.L (An)+,(An)+` | 12 | Post-increment: 2 memory accesses |
 | `ADD.W Dn,Dn` | 4 | Register add |
 | `ADD.W #imm,Dn` | 8 | Immediate add (2 words to fetch) |
 | `MULS.W Dn,Dn` | 28 | Signed 16×16→32 multiply |
 | `MULU.W Dn,Dn` | 28 | Unsigned multiply |
 | `DIVS.W Dn,Dn` | 44-140 | Signed divide: 44 best, 140 worst |
 | `MULS.L Dn,Dn` | ~28-44 | 68020+: 32×32→64 multiply |
 | `DBRA Dn,label` | 10 (taken) / 6 (exit) | Loop branch |
 | `BRA label` | 10 | Unconditional branch |
 | `BCC label` | 8 (taken) / 6 (not) | Conditional branch |
 | `JSR (An)` | 12 | Subroutine call |
 | `RTS` | 12 | Return |
 | `LEA (An),An` | 4 | Address computation (no memory access) |
 | `SWAP Dn` | 4 | Swap 16-bit halves |
 | `LSL.W #n,Dn` | 6+n | Shift left: 6 + number of positions |
 | `ROL.W #n,Dn` | 6+n | Rotate left |
 | `MOVE.W (An)+,Dn` | 8 | Read with auto-increment |
 ### DMA Stall Impact
 When DMA is active, instruction cycles increase due to bus contention:
 ```c
 /*
 * Effective cycle cost = base_cycles + dma_stalls
 *
 * dma_stalls depends on:
 *   1. Number of DMA channels active on current scanline
 *   2. Whether the access is to Chip RAM or Fast RAM
 *   3. BLTPRI (Blitter Nasty) mode
 *
 * Rule of thumb on stock A500:
 *   - With 4 bitplanes LoRes, DMA steals ~40% of bus cycles
 *   - With 6 bitplanes HiRes, DMA steals ~60% of bus cycles
 *   - During Blitter operation (normal): CPU gets ~25% of cycles
 *   - During Blitter Nasty: CPU gets ~0-5% of cycles
 */
 ```
 ---
 ## Technique 2: Blitter-CPU Interleaving
 The most important optimization on the Amiga. When the Blitter is running (copying, filling, drawing lines), the CPU normally stalls waiting for bus access. **Interleaving** means finding useful work for the CPU to do during Blitter wait periods — computation that doesn't require memory access (register-only operations).
 ### The Interleaving Principle
 ```mermaid
 gantt
    title Blitter-CPU Interleaving (Single Frame)
    dateFormat X
    axisFormat %s
    section CPU
    Compute 3D vertices    :crit, 0, 4
    (stalled by Blitter)   : 4, 7
    Compute more vertices  :crit, 7, 9
    (stalled by Blitter)   : 9, 12
    Prepare next blit      :crit, 12, 14
    section Blitter
    Fill polygon A         :active, 0, 7
    Fill polygon B         :active, 7, 12
    Fill polygon C         :active, 12, 16
 ```
 ### Implementation Pattern
 ```c
 /*
 * interleave.c — Blitter-CPU interleaving pattern
 *
 * The key: start a Blitter operation, then do CPU computation
 * that only uses registers (no memory access) while Blitter runs.
 */
 void render_frame(void) {
    /* Phase 1: Start Blitter fill for first polygon */
    start_blitter_fill(&polygons[0]);
    /* Phase 2: CPU computes next polygon's vertex positions
       while Blitter fills the first one.
       IMPORTANT: Only register-to-register operations here!
       Any memory access will stall until Blitter finishes. */
    {
        register FIXED rx asm("d0");
        register FIXED ry asm("d1");
        register FIXED rz asm("d2");
        /* Transform vertices for polygon 2 — register-only math */
        rx = fixed_mul(m00, vx1) + fixed_mul(m01, vy1) + fixed_mul(m02, vz1);
        ry = fixed_mul(m10, vx1) + fixed_mul(m11, vy1) + fixed_mul(m12, vz1);
        rz = fixed_mul(m20, vx1) + fixed_mul(m21, vy1) + fixed_mul(m22, vz1);
        /* Store results (will stall if Blitter still running) */
        screen_x1 = project_x(rx, rz);
        screen_y1 = project_y(ry, rz);
    }
    /* Phase 3: Wait for Blitter to finish, then start next blit */
    wait_blitter();
    start_blitter_fill(&polygons[1]);
    /* Phase 4: More CPU computation for polygon 3... */
    /* ... repeat ... */
 }
 ```
 ### Assembly Interleaving
 In 68000 assembly, the interleaving is explicit:
 ```asm
 ; interleave.asm — Start Blitter, do CPU work, wait for Blitter
        ; ---- Start Blitter fill for polygon A ----
        lea     $DFF000,a6
        move.w  #$01F2,BLTCON0(a6)     ; Fill mode
        move.l  poly_a_data,BLTAPTH(a6)
        move.l  poly_a_data,BLTDPTH(a6)
        move.w  #(HEIGHT<<6)|WIDTH_BLT,BLTSIZE(a6)  ; Start!
        ; ---- CPU work: compute polygon B vertices ----
        ; These are register-only operations, no memory access
        ; (the data was pre-loaded into registers)
        move.l  d0,d4           ; 4 cycles
        muls.w  d1,d4           ; 28 cycles
        add.l   d4,d2           ; 4 cycles  (36 total)
        swap    d2              ; 4 cycles  (40 total)
        move.w  d2,d5           ; 4 cycles  (44 total)
        muls.w  d3,d5           ; 28 cycles (72 total)
        ; ... more register math ...
        ; ---- Now we need memory — check if Blitter is done ----
 .blit_wait:
        btst    #6,DMACONR(a6)  ; Read DMA status (1 memory access)
        bne.s   .blit_wait      ; Loop if Blitter busy
        ; ---- Start Blitter fill for polygon B ----
        move.w  #$01F2,BLTCON0(a6)
        move.l  poly_b_data,BLTAPTH(a6)
        move.l  poly_b_data,BLTDPTH(a6)
        move.w  #(HEIGHT2<<6)|WIDTH_BLT,BLTSIZE(a6)  ; Start!
        ; ---- CPU work: compute polygon C vertices ----
        ; ... register-only math again ...
 ```
 ---
 ## Technique 3: Copper-Wait Placement
 The Copper competes with the CPU for bus cycles. Poorly-placed copper lists steal cycles from the CPU during critical computation windows. The optimization: **move copper activity to scanlines where the CPU is idle** (during vertical blank or display border areas).
 ### Optimal Copper-CPU Scheduling
 ```mermaid
 graph LR
    subgraph "Poor Schedule"
        P1["CPU computing during display<br/>Copper also active → contention"]
    end
    subgraph "Good Schedule"
        G1["CPU computes during VBlank<br/>Copper active during display"]
        G2["CPU idle during display<br/>Copper runs free"]
    end
    P1 -->|"Restructure"| G1
 ```
 ### Practical Scheduling
 ```c
 /*
 * schedule.c — Optimal copper-CPU scheduling
 *
 * Principle: Move CPU-heavy computation to VBlank period
 * when the Copper is idle (already executed its list).
 * Let the Copper do its work during the display period
 * when the CPU has less to do.
 */
 void main_loop(void) {
    while (1) {
        /* Wait for VBlank (vertical blanking interval)
           During VBlank: no display DMA, minimal Copper activity */
        WaitTOF();  /* Wait for Top of Frame */
        /* ---- VBlank period: CPU-heavy computation ---- */
        /* This runs during lines 0-19 (top border) and
           lines 250-311 (bottom border + VBlank)
           Minimal DMA contention here! */
        update_3d_vertices();
        update_physics();
        update_audio_buffers();
        build_copper_list();
        /* ---- Display period: let Copper run ---- */
        /* During lines 20-249, the Copper is writing color
           registers and the CPU should do minimal work.
           Only Blitter operations (which have their own DMA)
           or register-only computation should happen here. */
        render_blitter_objects();
    }
 }
 ```
 ---
 ## Technique 4: Memory Access Optimization
 The 68000 has a 2-word instruction prefetch pipeline. Sequential memory accesses are faster because the prefetch buffer is already filled. Random accesses cause pipeline refills and additional wait states.
 ### Memory Access Rules
 | Access Pattern | Effective Speed | Notes |
 |---------------|----------------|-------|
 | Sequential read (post-increment) | Fast | Prefetch buffer hits |
 | Sequential write (post-increment) | Fast | Blitter-style linear access |
 | Random read (indexed) | Slow | Pipeline refill + possible cache miss |
 | Register-only operations | Fastest | No memory access at all |
 | Chip RAM access | Variable | DMA contention adds wait states |
 | Fast RAM access | Consistent | No DMA contention |
 ### Optimization Techniques
 ```asm
 ; memory_opt.asm — Optimize memory access patterns
        ; ---- BAD: Random access pattern ----
        move.w  0(a0,d0.w*2),d1       ; Indexed: pipeline stall
        move.w  2(a0,d1.w*2),d2       ; Indexed: pipeline stall
        move.w  4(a0,d2.w*2),d3       ; Indexed: pipeline stall
        ; ---- GOOD: Sequential access with post-increment ----
        move.w  (a0)+,d1              ; Sequential: fast
        move.w  (a0)+,d2              ; Sequential: fast
        move.w  (a0)+,d3              ; Sequential: fast
        ; ---- GOOD: Process in cache-line-friendly blocks ----
        ; Unroll loops for sequential access
        movem.l (a0)+,d0-d7           ; Burst read: 8 registers
        ; ... process d0-d7 ...
        movem.l d0-d7,(a1)+           ; Burst write: 8 registers
 ```
 ### Chip RAM vs Fast RAM Strategy
 | Data Type | Best Location | Why |
 |-----------|-------------|-----|
 | Copper lists | Chip RAM | Copper DMA can only read Chip RAM |
 | Bitplane data | Chip RAM | Display DMA can only read Chip RAM |
 | Sprite data | Chip RAM | Sprite DMA can only read Chip RAM |
 | Audio sample data | Chip RAM | Audio DMA can only read Chip RAM |
 | **Code (instructions)** | **Fast RAM** | No DMA contention, consistent speed |
 | **Vertex data** | **Fast RAM** | No DMA stalls during computation |
 | **Sine tables** | **Fast RAM** | No DMA stalls during lookup |
 | **Stack** | **Fast RAM** | No DMA stalls during JSR/RTS/PEA |
 | Lookup tables | Fast RAM | No DMA stalls during indexed access |
 > [!TIP]
 > On a stock A500 with only 512 KB Chip RAM, there is no Fast RAM — all code runs in Chip RAM and contends with DMA. The A501 trapdoor expansion adds 512 KB of "Slow RAM" (Trapdoor RAM, a.k.a. "Ranger" memory) which is not true Fast RAM but doesn't conflict with DMA, making it ~30% faster for code execution than Chip RAM.
 ---
 ## Technique 5: Self-Modifying Code
 Self-modifying code (SMC) changes its own instructions at runtime. On the Amiga, this is used for:
 1. **Loop unrolling with constants** — Patch immediate values in unrolled loops
 2. **Branch optimization** — Replace computed branches with direct branches
 3. **Copper list generation** — Write copper instructions directly into the instruction stream
 4. **Function specialization** — Remove condition checks for known states
 ### SMC for Copper List Patching
 The most common demoscene SMC pattern: a copper list is embedded in the code segment, and the CPU patches the color values each frame:
 ```asm
 ; smc_copper.asm — Self-modifying copper list
        ; The copper list lives in the code segment
        ; Color values are patched by the CPU each frame
 copper_list:
        dc.w    $8032,$FFFE
        dc.w    $0180,$DEAD           ; ← CPU patches $DEAD each frame
        dc.w    $8050,$FFFE
        dc.w    $0180,$BEEF           ; ← CPU patches $BEEF each frame
        dc.w    $FFFF,$FFFE
 update_copper:
        ; Calculate new colors for this frame
        move.w  #some_color,d0
        ; Patch copper list directly (self-modifying!)
        move.w  d0,copper_list+3*2    ; Overwrite $DEAD
        move.w  #other_color,d0
        move.w  d0,copper_list+7*2    ; Overwrite $BEEF
        rts
 ```
 ### SMC for Loop Specialization
 ```asm
 ; smc_loop.asm — Self-modifying loop with patched constant
        ; The loop inner constant starts as a placeholder
 inner_count:
        dc.w    320                    ; ← Patched at runtime
 render_line:
        move.w  inner_count(pc),d0     ; Load (possibly patched) count
 .next_pixel:
        ; ... pixel processing ...
        dbra    d0,.next_pixel
        rts
 ; Somewhere during setup:
 specialize_loop:
        ; If we know we only need 160 pixels, patch the count
        move.w  #160,inner_count(pc)
        rts
 ```
 > [!WARNING]
 > Self-modifying code requires the modified instructions to be in **writable memory** (RAM, not ROM). On the 68000, there is no instruction cache, so modified instructions take effect immediately. On the 68020+ with instruction cache, you must flush the cache after modification (`CPUSHA IC` on 68040, `MOVEC CACR,D0; BCLR #8,D0; MOVEC D0,CACR` on 68030).
 ---
 ## Technique 6: Fast Division via Reciprocal Table
 Division is the most expensive 68000 operation (up to 140 cycles). For 3D rendering where division by Z is needed for every vertex, demoscene coders pre-compute reciprocal tables:
 ```c
 /* reciprocal.c — Pre-computed 1/z table for fast division */
 #define RECIP_TABLE_SIZE 1024
 #define RECIP_SHIFT      16     /* 16.16 fixed-point */
 static FIXED recip_table[RECIP_TABLE_SIZE];
 void build_reciprocal_table(void) {
    int i;
    for (i = 1; i < RECIP_TABLE_SIZE; i++) {
        /* 1.0 / i in 16.16 fixed-point */
        recip_table[i] = ((FIXED)1 << RECIP_SHIFT) / i;
    }
    recip_table[0] = 0x7FFFFFFF;  /* "Infinity" */
 }
 /* Fast divide: x / z ≈ x × recip_table[z] */
 static inline FIXED fast_div(FIXED x, int z) {
    if (z <= 0 || z >= RECIP_TABLE_SIZE) return 0;
    return fixed_mul(x, recip_table[z]);
 }
 ```
 ---
 ## Technique 7: Line-F Trap (FPU Transparency)
 On 68040/060 systems, floating-point instructions that the FPU doesn't implement in hardware trigger a **Line-F exception** (trap vector $2C). The OS provides emulation routines, but demoscene coders can install custom traps that are faster than the OS defaults:
 ```asm
 ; linef_trap.asm — Custom Line-F trap handler for 68040/060
        ; Install custom Line-F handler
        move.l  $2C.w,old_linef_handler   ; Save old handler
        lea     my_linef_handler,a0
        move.l  a0,$2C.w                   ; Install new handler
        ; The handler decodes the trapped FPU instruction
        ; and executes an optimized software equivalent
 my_linef_handler:
        move.l  (sp),a0            ; Get faulting PC
        move.w  (a0),d0            ; Read the FPU opcode
        and.w   #$FE00,d0          ; Mask to Line-F family
        cmp.w   #$F200,d0          ; Is it an FPU instruction?
        beq.s   .handle_fpu
        ; ... chain to old handler if not ...
 .handle_fpu:
        ; Decode specific FPU instruction and emulate
        ; (e.g., FSIN → table lookup + interpolation)
        ; ... specific emulation code ...
        addq.l  #2,(sp)            ; Skip past the FPU opcode
        rte                        ; Return from exception
 ```
 ---
 ## Antipatterns
 ### 1. The Blitter Busy Loop
 Polling the Blitter's busy flag in a tight loop while the CPU could be doing useful work.
 **Broken:**
 ```c
 /* CPU does nothing while waiting for Blitter */
 start_blitter_fill(&poly);
 while (blitter_busy()) {
    /* Tight loop — wastes every cycle */
 }
 start_blitter_fill(&next_poly);
 ```
 **Fixed:**
 ```c
 start_blitter_fill(&poly);
 /* Do useful register-only work while Blitter runs */
 compute_next_vertices();  /* Register math only */
 prepare_next_blit_params();
 /* Now check if Blitter is done */
 while (blitter_busy()) {}  /* Minimal wait */
 start_blitter_fill(&next_poly);
 ```
 ### 2. The Chip RAM Code Trap
 Running performance-critical code from Chip RAM on a system with Fast RAM available. Chip RAM access is slowed by DMA contention.
 **Broken:**
 ```c
 /* Code runs in Chip RAM by default */
 void hot_function(void) {
    /* Every instruction fetch contends with DMA */
    for (i = 0; i < 1000; i++) { ... }
 }
 ```
 **Fixed:**
 ```c
 /* Copy hot function to Fast RAM at startup */
 extern UBYTE fast_ram_code[];
 extern const UBYTE hot_function_src[];
 extern const UBYTE hot_function_end[];
 void init(void) {
    ULONG size = hot_function_end - hot_function_src;
    memcpy(fast_ram_code, hot_function_src, size);
    /* Call fast_ram_code() instead of hot_function() */
 }
 ```
 ### 3. The Naive Division
 Using `DIVS.W` or `DIVS.L` in inner loops. Division is 44-140 cycles on 68000 — the single most expensive instruction.
 **Broken:**
 ```asm
 ; Division in inner loop — 140 cycles each!
 .loop:
        divs.w  d1,d0           ; d0 = d0 / d1 (SLOW!)
        ; ...
        dbra    d7,.loop
 ```
 **Fixed:**
 ```asm
 ; Pre-compute reciprocal, use multiply instead
        move.w  recip_table(pc,d1.w*2),d2  ; Load 1/divisor
 .loop:
        muls.w  d2,d0           ; d0 = d0 × (1/divisor) — 28 cycles
        ; ...
        dbra    d7,.loop
 ```
 ### 4. The Indexed Array Trap
 Using register-indirect with index addressing (`d(An,Dn.W)`) in tight loops. The 68000's pipeline stalls on non-sequential accesses.
 **Broken:**
 ```asm
 ; Indexed access — breaks sequential prefetch
        move.w  (a0,d0.w*2),d1  ; Random access: pipeline stall
        move.w  (a0,d2.w*2),d3  ; Random access: pipeline stall
 ```
 **Fixed:**
 ```asm
 ; Restructure data for sequential access
        lea     (a0,d0.w*2),a1  ; Compute base address once
        move.w  (a1)+,d1        ; Sequential: fast
        move.w  (a1)+,d3        ; Sequential: fast
 ```
 ### 5. The Cache-Coherency Miss
 On 68030+ with data cache enabled, modifying code or copper lists without flushing the cache. The CPU reads stale cached data instead of the modified version.
 **Broken:**
 ```c
 /* Modify copper list in RAM, but cache has old values */
 copper_list[offset] = new_color;
 /* On 68030+, the CPU may read the old value from cache! */
 custom.cop1lc = (ULONG)copper_list;
 ```
 **Fixed:**
 ```c
 copper_list[offset] = new_color;
 /* Flush data cache for modified region (68030+) */
 #if defined(__m68030) || defined(__m68040)
    CacheClearU();  /* Or flush specific address range */
 #endif
 custom.cop1lc = (ULONG)copper_list;
 ```
 ---
 ## Decision Guide
 ```mermaid
 flowchart TD
    START[Need to optimize] --> Q1{What is the bottleneck?}
    Q1 -->|CPU too slow| Q2{Memory or computation bound?}
    Q1 -->|Blitter waiting| BCI[Blitter-CPU Interleave]
    Q1 -->|Bus contention| Q3{During display or VBlank?}
    Q2 -->|Memory| CHIP[Move code/data to Fast RAM]
    Q2 -->|Computation| Q4{Division or multiply heavy?}
    Q4 -->|Division| RECIP[Use reciprocal table]
    Q4 -->|Multiply| Q5{Can precompute any values?}
    Q5 -->|Yes| PRECOMP[Pre-calculate tables]
    Q5 -->|No| SMC[Self-modifying code<br/>for specialization]
    Q3 -->|During display| CW[Move CPU work to VBlank]
    Q3 -->|During VBlank| DMA[Reduce DMA activity<br/>or disable unused channels]
    BCI --> Q6{Blitter operations<br/>overlap with CPU math?}
    Q6 -->|Yes| OK[Interleave: start Blit,<br/>then compute in registers]
    Q6 -->|No| SEQ[Reorder: batch Blits,<br/>then batch CPU work]
 ```
 ---
 ## Historical Timeline
 ```mermaid
 timeline
    title Timing Optimization Evolution
    1985 : Amiga launch — 7.09 MHz 68000, shared bus
         : Coders learn bus arbitration overhead
    1987 : First cycle-counted inner loops
         : Copper-CPU scheduling understood
    1988 : Blitter-CPU interleaving becomes standard
         : Self-modifying copper lists in demos
    1989 : Reciprocal tables replace division
         : Fast RAM awareness for A2000/A500+accelerator
    1990 : Cycle-accurate demo effects (Scoopex, Sanity)
         : MOVEM burst optimization for large copies
    1991 : 68020 acceleration — more registers, MULS.L
         : Cache coherency becomes a concern
    1992 : 68030/040 optimization — cache management
         : Line-F trap handlers for FPU emulation
    1993 : Instruction cache awareness in demo code
         : 68060 superscalar scheduling
    1994 : Demo coders master 68040/060 pipelines
         : Cache-line alignment for critical loops
    2000+ : Cycle-accurate emulators enable precise profiling
          : MiSTer FPGA provides real-hardware verification
 ```
 ---
 ## Modern Analogies
 | Amiga Optimization | Modern Equivalent | Why It Maps |
 |-------------------|-------------------|-------------|
 | Cycle counting | GPU occupancy profiling | Both count execution units per cycle |
 | Blitter-CPU interleave | Async compute (GPU) | Both overlap independent operations |
 | Chip vs Fast RAM | VRAM vs system RAM | Both have bandwidth vs capacity tradeoffs |
 | Self-modifying code | JIT compilation | Both generate code at runtime |
 | Reciprocal table | Fast inverse sqrt / RCP | Both approximate division with table/LUT |
 | Copper-wait scheduling | Pipeline barrier placement | Both minimize stalls from synchronization |
 | MOVEM burst read | SIMD load (NEON/SSE) | Both load multiple values in one operation |
 | Cache flush | Cache maintenance instructions | Both ensure data consistency |
 | Bus arbitration | Memory bandwidth allocation | Both divide bandwidth between agents |
 | Register allocation | Register allocation (compiler) | Both minimize memory traffic |
 ---
 ## Use Cases
 | Use Case | Technique | Impact |
 |----------|-----------|--------|
 | 3D vertex transform | Blitter-CPU interleave | ~2× throughput |
 | Inner loop rendering | Cycle counting + fast RAM | ~30% speedup |
 | Division-heavy code | Reciprocal table | ~5× vs DIVS |
 | Dynamic effects | Self-modifying copper list | Eliminates copy overhead |
 | Blitter-heavy frame | Interleave + scheduling | ~50% more CPU time |
 | Fast data copy | MOVEM burst | ~4× vs MOVE.W loop |
 | 68040/060 code | Cache management | Prevents stale data bugs |
 | Interrupt handlers | Register-only computation | Minimal latency |
 | Audio mixing | Fast RAM + sequential access | Consistent 50 FPS |
 ---
 ## FPGA / Emulation Impact
 | Concern | Impact | Notes |
 |---------|--------|-------|
 | **Cycle-accurate timing** | Demos that rely on exact cycle counts break if timing is wrong | WinUAE "cycle-exact" mode required for many demos |
 | **Bus arbitration** | CPU/DMA cycle interleaving must match Agnus scheduler | Minimig implements 4-cycle DMA slots |
 | **68000 prefetch** | Instruction prefetch buffer must be modeled | Affects branch timing and instruction pairing |
 | **Blitter busy detection** | `DMACONR` bit 6 timing must be exact | Some demos poll at precise cycle counts |
 | **Cache behavior** | 68020+ instruction/data cache affects timing | Emulators must model cache size and replacement policy |
 | **Self-modifying code** | Instruction cache flush must work correctly | 68040+ demos depend on `CPUSHA` instruction |
 > [!NOTE]
 > The MiSTer Amiga core (based on Minimig) implements cycle-exact bus arbitration, which is why many timing-sensitive demos work on MiSTer but not on simpler FPGA implementations that approximate timing.
 ---
 ## FAQ
 **Q: How do I measure actual DMA contention on real hardware?**
 A: Use the E-Clock counter (`ReadEClock()`) or the CIA timers to measure execution time of specific code blocks. Compare timing with display DMA enabled vs disabled. The difference reveals the DMA overhead. Alternatively, use WinUAE's built-in profiler in cycle-exact mode.
 **Q: Is self-modifying code still useful on modern processors?**
 A: Not on x86/ARM — their deeply pipelined superscalar architectures with complex branch prediction make SMC counterproductive (cache invalidation stalls). On the 68000, which has no cache and a simple 2-stage prefetch, SMC is nearly free and often beneficial.
 **Q: What is "Blitter Nasty" mode and when should I use it?**
 A: Setting `BLTPRI` (bit 10 in `DMACON`) gives the Blitter total bus priority, leaving almost no cycles for the CPU. Use it only when the Blitter operation is the critical path and you have no useful CPU work to do. In practice, most demos use normal Blitter mode with interleaving instead.
 **Q: How much faster is Fast RAM really?**
 A: On an A1200 with Fast RAM expansion, code runs ~2-3× faster when placed in Fast RAM vs Chip RAM (during display), because there is no DMA contention. During VBlank (no display DMA), the difference is much smaller. The improvement is most dramatic during the display period when bitplane DMA is active.
 **Q: Can I use all these techniques together?**
 A: Yes, and the best demos do. The optimal pattern is: schedule CPU computation during VBlank, interleave register-only computation during Blitter operations, use Fast RAM for code and data tables, and patch copper lists via self-modifying code. The techniques are complementary.
 **Q: What is the single most impactful optimization?**
 A: **Blitter-CPU interleaving**. On a stock A500, the Blitter and CPU share the bus. If you wait for the Blitter to finish before doing any CPU work, you waste 50-75% of available cycles. Starting the Blitter and then doing register-only computation nearly doubles effective throughput.
 ---
 ## References
 ### Related Knowledge Base Articles
 - [DMA Architecture](../01_hardware/common/dma_architecture.md) — DMA slot allocation, bus arbitration
 - [Bus Architecture](../01_hardware/common/bus_architecture.md) — CPU/DMA bus sharing, wait states
 - [Cache Management](../15_fpu_mmu_cache/cache_management.md) — CACR, CacheClearU, coherency
 - [68040/68060 Libraries](../15_fpu_mmu_cache/68040_68060_libraries.md) — Line-F trap handlers, FPU emulation
 - [Blitter Programming](../08_graphics/blitter_programming.md) — Fill mode, line draw, Blitter timing
 - [3D Rendering](3d_rendering.md) — Fixed-point math, C2P costs
 - [Copper Effects](copper_effects.md) — Copper list timing, DMA budgets
 ### External Resources
 - **68000 Instruction Timing** — Motorola M68000 Programmer's Reference Manual, Appendix A
 - **Amiga Hardware Reference Manual** — DMA timing, bus arbitration
 - **WinUAE** — Cycle-exact Amiga emulator with profiler
 - **Pouet.net** — https://www.pouet.net — Demo source code with optimization notes
 - **Demozoo** — https://demozoo.org — Demoscene encyclopedia
 - **Amiga Graphics Archive** — https://amiga.lychesis.net — Per-game copper list and DMA budget analysis
 - **Scoopex Amiga Hardware Programming** (Photon) — [YouTube playlist](https://www.youtube.com/playlist?list=PLc3ltHgmiidpK-s0eP5hTKJnjdTHz0_bW) — Cycle-exact programming techniques and DMA interleaving; companion site: [coppershade.org](http://coppershade.org/)
--- a/README.md
+++ b/README.md
@ -315,3 +315,13 @@ The Amiga's documentation was scattered across out-of-print manuals, Usenet post
 | [sana2_driver.md](16_driver_development/sana2_driver.md) | Writing SANA-II network drivers |
 | [rtg_driver.md](16_driver_development/rtg_driver.md) | Writing Picasso96/RTG display drivers |
 | [ahi_driver.md](16_driver_development/ahi_driver.md) | Writing AHI audio drivers |
 ### 17 — Demoscene Techniques
 | File | Topic |
 |---|---|
 | [README.md](17_demoscene/README.md) | Section overview: technique classification, DMA budget, famous effects table |
 | [copper_effects.md](17_demoscene/copper_effects.md) | Copper bars, raster splits, gradient shading, sine-based color cycling, double-buffered lists |
 | [sprite_techniques.md](17_demoscene/sprite_techniques.md) | Sprite multiplexing, sprite-built images, 15-color attached sprites, priority control |
 | [pixel_tricks.md](17_demoscene/pixel_tricks.md) | Copper chunky, HAM art, scroll-register distortion, modulo wrapping, plasma effects |
 | [3d_rendering.md](17_demoscene/3d_rendering.md) | Fixed-point 3D math, Blitter-filled polygons, rotozoom, dot tunnels, voxel space |
 | [timing_optimization.md](17_demoscene/timing_optimization.md) | Cycle counting, Blitter-CPU interleaving, memory access patterns, self-modifying code |
--- a/TODO.md
+++ b/TODO.md
@ -71,10 +71,10 @@ Articles were scored against [AGENTS.md](../amiga/AGENTS.md) "Deep" criteria:
 | # | New Article | Why Needed | Status |
 |---|---|---|---|
 | 26 | **Custom Trackloaders & DRM** | 80% of classic games bypassed DOS. Reversing them requires understanding raw MFM sync words, bootblocks, and copy protection (e.g. Rob Northen Copylock). | ✅ **DONE** — `05_reversing/custom_loaders_and_drm.md` (500 lines) |
-| 27 | **RTG (Retargetable Graphics)** | Modern Amigas use RTG (Picasso96/CyberGraphX) for 16/24-bit chunky graphics. Application-level rendering is undocumented in our `08_graphics` folder. | ❌ **Pending** |
+| 27 | **RTG (Retargetable Graphics)** | Modern Amigas use RTG (Picasso96/CyberGraphX) for 16/24-bit chunky graphics. Application-level rendering is undocumented in our `08_graphics` folder. | ✅ **DONE** — `08_graphics/rtg_programming.md` (645 lines) |
-| 28 | **AHI Audio Interface** | Hardware-agnostic 16-bit multi-channel audio mixing is standard for modern Amiga apps, decoupling audio from the 8-bit 4-channel Paula chip limits. | ❌ **Pending** |
+| 28 | **AHI Audio Interface** | Hardware-agnostic 16-bit multi-channel audio mixing is standard for modern Amiga apps, decoupling audio from the 8-bit 4-channel Paula chip limits. | ✅ **DONE** — `11_libraries/ahi_programming.md` (737 lines) |
-| 29 | **Demoscene Techniques** | Exploits like Sprite Multiplexing and Copper Chunks defined the platform's capabilities. Crucial for understanding high-performance hardware banging. | ❌ **Pending** |
+| 29 | **Demoscene Techniques** | Exploits like Sprite Multiplexing and Copper Chunks defined the platform's capabilities. Crucial for understanding high-performance hardware banging. | ✅ **DONE** — `17_demoscene/` section (5 sub-articles + README) |
-| 30 | **Modern Cross-Compilation** | Setting up `m68k-amigaos-gcc`, `vbcc`, and `vasm` via CMake on modern macOS/Linux to build native `.hunk` binaries. | ❌ **Pending** |
+| 30 | **Modern Cross-Compilation** | Setting up `m68k-amigaos-gcc`, `vbcc`, and `vasm` via CMake on modern macOS/Linux to build native `.hunk` binaries. | ✅ **DONE** — `13_toolchain/cross_compilation_guide.md` (603 lines) |
 ---
@ -385,14 +385,14 @@ Articles were scored against [AGENTS.md](../amiga/AGENTS.md) "Deep" criteria:
 | Status | Count |
 |---|---|
-| ✅ Deep | 67 |
+| ✅ Deep | 73 |
 | ✅ Adequate | 112 |
 | ⚠️ Thin | 0 |
-| ❌ Pending (Tier 4) | 4 |
+| ❌ Pending | 0 |
-| **Total** | **183** |
+| **Total** | **185** |
 > MUI framework adds 13 additional articles (4 Deep, 9 Adequate) tracked separately above.
-> **Progress**: Tiers 1, 2, 3 fully complete. 4 Tier 4 new articles remain.
+> **Progress**: ALL TIERS COMPLETE ✅ — 30/30 items done. 73 Deep + 112 Adequate = 185 articles.
 ---
@ -435,14 +435,16 @@ Articles were scored against [AGENTS.md](../amiga/AGENTS.md) "Deep" criteria:
 All 11 items (including bonus iffparse.md) upgraded to Deep quality.
-**Tier 4 — 4 pending advanced topics:**
+**Tier 4 — ALL COMPLETE ✅**
 All 5 items created: Custom Loaders & DRM, RTG Programming, AHI Audio, Demoscene Techniques (5 sub-articles), Cross-Compilation Guide.
 | # | File | Current |
 |---|---|---|
 | 26 | `05_reversing/custom_loaders_and_drm.md` | 500 lines — ✅ Deep |
-| 27 | `08_graphics/rtg_programming.md` | 0 lines |
+| 27 | `08_graphics/rtg_programming.md` | 645 lines — ✅ Deep |
-| 28 | `11_libraries/ahi_programming.md` | 0 lines |
+| 28 | `11_libraries/ahi_programming.md` | 737 lines — ✅ Deep |
-| 29 | `17_demoscene/README.md` | 0 lines |
+| 29 | `17_demoscene/` (section) | 6 files — README(137) + copper_effects(792) + sprite_techniques(580) + pixel_tricks(667) + 3d_rendering(874) + timing_optimization(739) | ✅ Deep |
-| 30 | `13_toolchain/cross_compilation_guide.md` | 0 lines |
+| 30 | `13_toolchain/cross_compilation_guide.md` | 603 lines | ✅ Deep |
-> **Progress**: 26 of 30 items complete (87%). Tier 1, 2, 3 fully cleared. 4 Tier 4 new articles remain.
+> **Progress**: 30 of 30 items complete (100%). ALL TIERS COMPLETE ✅