Add memory types article, expand blitter cookbook, update cross-references

New articles: - 01_hardware/common/memory_types.md: comprehensive Chip/Fast/Slow RAM reference with DMA accessibility matrix, per-model configurations (A500-A4000), accelerator memory expansion (classic + modern), adaptive software behavior, pitfalls with impact analysis, FPGA/emulation notes - AGENTS.md: documentation standards and methodology guidelines Blitter programming (08_graphics/blitter_programming.md): - Rewrote minterm truth table with narrative explanation and worked example - Added 7 advanced use cases with assembly/C code: shifted BOB, hardware scroll, area fill polygon, interleaved bitplane BOBs, double-buffered game loop, GUI window drag, tile map renderer - Added Good/Bad Patterns section with 5 named antipatterns - Added Practical Limitations table (10 constraints with workarounds) - Expanded Common Minterms table with Description and Real-World Use Case columns Cross-reference updates: - Root README: added memory types to coverage, quick start, section index - 01_hardware/README: updated common/ folder description - address_space.md: linked to memory_types.md and chip_ram_expansion.md - memory_management.md (exec): linked to hardware memory types reference - bitmap.md, sprites.md, animation.md, audio.md: linked to memory_types.md explaining DMA Chip RAM requirement for each subsystem - chip_ram_expansion.md: linked to comprehensive memory types article
2026-07-28 04:36:28 +00:00 · 2026-04-25 14:12:46 -04:00 · 2026-04-25 14:12:46 -04:00 · c4522ef2ed
commit c4522ef2ed
parent 26dfed8b44
12 changed files with 1609 additions and 87 deletions
--- a/01_hardware/README.md
+++ b/01_hardware/README.md
@ -43,7 +43,7 @@ See [custom_registers_full.md](../references/custom_registers_full.md) for the c

 | Subfolder | Content |
 |---|---|
-| [common/](common/) | M68k CPU, address space layout, CIA chips, Zorro bus |
+| [common/](common/) | M68k CPU, address space layout, **memory types (Chip/Fast/Slow)**, CIA chips, Zorro bus |
 | [ocs_a500/](ocs_a500/) | OCS chipset — A500, A1000, A2000 |
 | [ecs_a600_a3000/](ecs_a600_a3000/) | ECS chipset — A600, A3000, A500+ |
 | [aga_a1200_a4000/](aga_a1200_a4000/) | AGA chipset — A1200, A4000, CD32 |
--- a/01_hardware/common/address_space.md
+++ b/01_hardware/common/address_space.md
@ -107,3 +107,5 @@ block-beta
 - NDK39: `exec/memory.h` — MEMF_ flag definitions
 - ADCD 2.1 Hardware Manual: memory map chapter
 - Commodore A1200/A4000 Technical Reference Manuals (local archive)
+- See also: [memory_types.md](memory_types.md) — Chip RAM vs Fast RAM vs Slow RAM, DMA accessibility, per-model configurations
+- See also: [chip_ram_expansion.md](../ecs_a600_a3000/chip_ram_expansion.md) — 2 MB Chip RAM with Super Agnus
--- a/01_hardware/common/memory_types.md
+++ b/01_hardware/common/memory_types.md
@ -0,0 +1,417 @@
+[← Home](../../README.md) · [Hardware](../README.md)
+
+# Memory Types — Chip RAM, Fast RAM, Slow RAM, and the DMA Bus
+
+## Overview
+
+The Amiga's memory architecture is fundamentally different from any other home computer of its era. Rather than treating all RAM as equal, the system divides memory into **distinct classes** based on which hardware can access it. This division exists because the custom chipset (Agnus/Alice, Denise/Lisa, Paula) has its own DMA engine that operates on a dedicated bus — and that bus only reaches certain RAM.
+
+Understanding this distinction is not optional. It determines where screen buffers live, why games run faster with expansion RAM, why the [Blitter](../../08_graphics/blitter_programming.md) can't touch Fast RAM, and why a $50 accelerator card with 8 MB of Fast RAM can feel like a new machine.
+
+> [!WARNING]
+> The 68000 is **Big-Endian**. All multi-byte values in memory (pointers, word-sized registers, structure fields) are stored most-significant byte first. Modern developers working with Amiga memory dumps or binary formats will misread data if they assume little-endian layout.
+
+---
+
+## Memory Type Classification
+
+### The Three Types
+
+| Type | Full Name | Who Can Access It? | Speed (Stock A500) | DMA Visible? | Address Range |
+|---|---|---|---|---|---|
+| **Chip RAM** | Chipset-accessible RAM | CPU + all custom chips (Blitter, Copper, bitplane DMA, sprite DMA, audio DMA, disk DMA) | ~3.5 MHz effective (contended) | ✓ Yes | `$000000`–`$07FFFF` (512 KB) to `$1FFFFF` (2 MB) |
+| **Slow RAM** | Pseudo-Fast / Ranger / "Trapdoor" RAM | CPU only — but on the Chip RAM bus | ~3.5 MHz effective (still contended) | ✗ No (not used by DMA) | `$C00000`–`$C7FFFF` (512 KB) |
+| **Fast RAM** | True Fast / Zorro / Accelerator RAM | CPU only — on a separate bus | Full CPU speed (7–50+ MHz, no contention) | ✗ No | `$200000`+ (Zorro II) or `$01000000`+ (Zorro III) |
+
+### Why "Slow RAM" Exists
+
+The A500's internal 512 KB expansion (the "trapdoor" board) and the A500 external side-expansion both map to `$C00000`–`$C7FFFF`. This RAM is **on the same bus as Chip RAM**, so it suffers the same DMA contention — the CPU waits whenever the custom chips are using the bus. However, the custom chips **do not** fetch DMA data from this range. It's the worst of both worlds: too slow for CPU work (bus contention), but invisible to DMA (can't be used for screen buffers or audio).
+
+AmigaOS classifies Slow RAM as `MEMF_CHIP` on some models, and some games and demos actually rely on this. The behavior varies by Agnus revision — Fat Agnus (8372) on the A500+ can optionally remap $C00000 into the Chip RAM address space, making it true Chip RAM.
+
+### Why Fast RAM Matters
+
+On a stock A500 with only Chip RAM, the CPU and custom chips compete for every bus cycle. Adding Fast RAM (via Zorro or an accelerator) gives the CPU its own private memory bus:
+
+```
+┌─────────────────────────────┐     ┌─────────────────────────────┐
+│        Chip RAM Bus         │     │       Fast RAM Bus          │
+│       (16-bit, 3.58 MHz)    │     │  (16/32-bit, CPU speed)     │
+├─────────────────────────────┤     ├─────────────────────────────┤
+│ Agnus (DMA master)          │     │ CPU only (no contention)    │
+│ ├── Bitplane DMA            │     │                             │
+│ ├── Sprite DMA              │     │ Code, stack, data structs   │
+│ ├── Copper DMA              │     │ Non-DMA allocations         │
+│ ├── Blitter DMA             │     │                             │
+│ ├── Audio DMA (Paula)       │     └─────────────────────────────┘
+│ ├── Disk DMA                │
+│ └── CPU (leftover slots)    │
+└─────────────────────────────┘
+```
+
+When the CPU executes code from Fast RAM, it runs at full speed while the custom chips simultaneously work Chip RAM. This is why an A500 with a 68020 accelerator + 8 MB Fast RAM feels dramatically faster even though the display hardware hasn't changed.
+
+---
+
+## DMA Accessibility Matrix
+
+This is the critical table. **If a hardware DMA channel needs data, that data must be in Chip RAM.**
+
+| Operation / Consumer | Chip RAM | Slow RAM | Fast RAM | ROM |
+|---|---|---|---|---|
+| **Bitplane display (Agnus)** | ✓ Required | ✗ | ✗ | ✗ |
+| **Sprite data (Agnus)** | ✓ Required | ✗ | ✗ | ✗ |
+| **Copper list (Agnus)** | ✓ Required | ✗ | ✗ | ✗ |
+| **Blitter source/dest (Agnus)** | ✓ Required | ✗ | ✗ | ✗ |
+| **Audio sample data (Paula)** | ✓ Required | ✗ | ✗ | ✗ |
+| **Disk DMA buffer (Agnus)** | ✓ Required | ✗ | ✗ | ✗ |
+| **CPU code execution** | ✓ Slow (contended) | ✓ Slow (contended) | ✓ Fast (uncontended) | ✓ |
+| **CPU data read/write** | ✓ Slow (contended) | ✓ Slow (contended) | ✓ Fast (uncontended) | Read only |
+| **68040/060 cache** | ✓ Cacheable | ✓ Cacheable | ✓ Cacheable | ✓ Cacheable |
+
+> [!IMPORTANT]
+> There is **no hardware error** when the Blitter, Copper, or other DMA engine is pointed at a Fast RAM address. The DMA engine's address lines simply wrap into the Chip RAM address space, silently reading/writing the wrong location. This is the single most common source of "random corruption" bugs on accelerated Amigas.
+
+---
+
+## AllocMem() Flags
+
+AmigaOS classifies memory through the `MEMF_` flags passed to `AllocMem()`. The kernel maintains a linked list of `MemHeader` structures, one per contiguous memory region:
+
+```c
+/* exec/memory.h — NDK39 */
+#define MEMF_ANY        0L        /* No preference */
+#define MEMF_PUBLIC      (1L<<0)  /* Accessible to all tasks and DMA */
+#define MEMF_CHIP        (1L<<1)  /* Custom chip DMA-accessible */
+#define MEMF_FAST        (1L<<2)  /* CPU-only, no DMA, no contention */
+#define MEMF_LOCAL       (1L<<8)  /* Not mappable (always present) */
+#define MEMF_24BITDMA    (1L<<9)  /* Within 24-bit address space */
+#define MEMF_CLEAR       (1L<<16) /* Zero-fill before returning */
+#define MEMF_REVERSE     (1L<<17) /* Allocate from top of pool */
+#define MEMF_LARGEST     (1L<<18) /* Query: return largest free block */
+#define MEMF_TOTAL       (1L<<19) /* Query: return total of this type */
+#define MEMF_NO_EXPUNGE  (1L<<31) /* V39: don't expunge libraries */
+```
+
+### Allocation Priority
+
+When `MEMF_ANY` is used (the default), AmigaOS allocates from the **fastest available** memory first:
+
+1. **Fast RAM** (Zorro III 32-bit > Zorro II 16-bit > accelerator on-board)
+2. **Slow RAM** (if classified as `MEMF_PUBLIC`)
+3. **Chip RAM** (last resort)
+
+This is why adding Fast RAM instantly speeds up the system — Workbench, libraries, and application code automatically move out of Chip RAM, freeing it for display and audio.
+
+> [!NOTE]
+> `MEMF_CHIP` forces allocation from Chip RAM regardless of availability. Use it only when the data must be DMA-visible (screen buffers, audio samples, Copper lists, sprite data). Requesting `MEMF_CHIP` for code or non-DMA data wastes the most constrained resource in the system.
+
+---
+
+## Per-Model Stock Configurations
+
+| Model | Year | CPU | Stock Chip RAM | Max Chip RAM | Stock Fast/Slow RAM | Expansion Slots | Notes |
+|---|---|---|---|---|---|---|---|
+| **A1000** | 1985 | 68000 @ 7.09 MHz | 256 KB | 512 KB | None | Side expansion | WCS (Writable Control Store) for Kickstart |
+| **A500** | 1987 | 68000 @ 7.09 MHz | 512 KB | 1 MB (2 MB with mod) | 512 KB Slow (trapdoor) | Trapdoor + side slot | Most common model |
+| **A500+** | 1991 | 68000 @ 7.09 MHz | 1 MB | 2 MB | None stock | Trapdoor + side slot | ECS chipset, Fat Agnus |
+| **A2000** | 1987 | 68000 @ 7.09 MHz | 512 KB–1 MB | 2 MB (with Super Agnus) | None stock | 5× Zorro II + CPU slot | Big-box, expandable |
+| **A600** | 1992 | 68000 @ 7.09 MHz | 1 MB | 2 MB | None stock | PCMCIA Type II + trapdoor | Smallest desktop |
+| **A3000** | 1990 | 68030 @ 25 MHz | 1–2 MB | 2 MB | 4–16 MB Fast (on-board) | 4× Zorro III + CPU slot | 32-bit bus, first Zorro III |
+| **A1200** | 1992 | 68EC020 @ 14 MHz | 2 MB | 2 MB | None stock | Trapdoor 150-pin + PCMCIA | AGA chipset |
+| **A4000** | 1992 | 68030 @ 25 MHz or 68040 @ 25 MHz | 2 MB | 2 MB | 4–16 MB Fast (on-board) | 5× Zorro III + CPU slot | AGA, big-box |
+
+### Expansion Capabilities
+
+| Model | Chip RAM Expandable? | Fast RAM Options | Maximum Practical Fast RAM |
+|---|---|---|---|
+| **A1000** | To 512 KB (internal) | Side expansion only | ~2 MB (rare 3rd-party) |
+| **A500** | To 1 MB via trapdoor; 2 MB with Agnus swap | Zorro side expansion, accelerator | 8 MB (accelerator) |
+| **A500+** | To 2 MB via trapdoor | Side expansion, accelerator | 8 MB (accelerator) |
+| **A2000** | To 2 MB (Super Agnus + RAM) | Zorro II cards, CPU slot accelerator | 8 MB (Zorro II) + 128 MB (accelerator) |
+| **A600** | To 2 MB via trapdoor | PCMCIA (up to 4 MB), trapdoor accelerator | 64 MB (accelerator) |
+| **A3000** | Fixed 2 MB | On-board (Ramsey), Zorro III cards | 256 MB (Zorro III) |
+| **A1200** | Fixed 2 MB | Trapdoor accelerator, PCMCIA (4 MB) | 256 MB (accelerator) |
+| **A4000** | Fixed 2 MB | On-board (Ramsey), Zorro III, CPU slot | 256 MB+ (accelerator + Zorro III) |
+
+---
+
+## Third-Party Accelerators and Memory Expansion
+
+Accelerator cards are the primary way to add Fast RAM. They plug into the CPU slot (A2000/A3000/A4000) or the trapdoor connector (A500/A600/A1200) and provide a faster CPU plus private memory:
+
+### Classic (1990s) Accelerators
+
+| Card | For Model | CPU | Max Fast RAM | Bus Width | Other Features |
+|---|---|---|---|---|---|
+| **GVP A530** | A500 | 68030 @ 40 MHz | 8 MB | 32-bit | SCSI, IDE |
+| **Blizzard 1230 Mk IV** | A1200 | 68030 @ 50 MHz | 128 MB (SIMM) | 32-bit | SCSI option |
+| **Blizzard 1260** | A1200 | 68060 @ 50 MHz | 128 MB (SIMM) | 32-bit | SCSI option |
+| **Warp Engine** | A3000/A4000 | 68040 @ 40 MHz | 128 MB (4×SIMM) | 32-bit | SCSI-2 DMA |
+| **CyberStorm Mk III** | A3000/A4000 | 68060 @ 50 MHz | 128 MB (SIMM) | 32-bit | SCSI, Ethernet option |
+| **GVP G-Force 040** | A2000 | 68040 @ 33 MHz | 32 MB | 32-bit | SCSI |
+
+### Modern (2020s) Accelerators
+
+| Card | For Model | CPU | Max Fast RAM | Bus Width | Other Features |
+|---|---|---|---|---|---|
+| **TF536** | A500 | 68030 @ 50 MHz | 64 MB | 32-bit | IDE |
+| **TF1260** | A1200 | 68060 @ 50+ MHz | 128 MB | 32-bit | IDE, PCMCIA-friendly |
+| **PiStorm** | A500/A2000 | ARM (emulated 68k) | 256 MB+ | Virtual | RTG, network, SD card |
+| **PiStorm32** | A1200 | ARM (emulated 68k) | 256 MB+ | Virtual | RTG, network |
+| **Vampire V2** | A500/A600 | FPGA (68080) | 128 MB | 32-bit | RTG, HDMI, Ethernet |
+| **Vampire V4 SA** | Standalone | FPGA (68080) | 512 MB | 64-bit | Full system, AGA compatible |
+
+> [!NOTE]
+> PCMCIA memory cards (A600/A1200) provide up to 4 MB of Fast RAM but are limited to 16-bit Zorro II speed. Some accelerators conflict with the PCMCIA port when mapped above 8 MB — check for "PCMCIA-friendly" jumper settings.
+
+---
+
+## When to Use Each Memory Type
+
+| Scenario | Use | Why |
+|---|---|---|
+| Screen bitmaps | `MEMF_CHIP` | Bitplane DMA can only read Chip RAM |
+| Audio sample buffers | `MEMF_CHIP` | Paula audio DMA can only read Chip RAM |
+| Copper lists | `MEMF_CHIP` | Copper DMA can only read Chip RAM |
+| Sprite data | `MEMF_CHIP` | Sprite DMA can only read Chip RAM |
+| Blitter source/destination | `MEMF_CHIP` | Blitter DMA can only access Chip RAM |
+| Application code | `MEMF_ANY` | Let OS pick the fastest available |
+| Data structures (non-DMA) | `MEMF_ANY` | No need for DMA visibility |
+| Stacks | `MEMF_ANY` | CPU-only |
+| Libraries (loaded by LoadSeg) | `MEMF_ANY` | Automatic — exec allocates from fastest |
+| Disk I/O buffers (trackdisk) | `MEMF_CHIP` | Disk DMA requires Chip RAM |
+
+---
+
+## Historical Context — Why This Design?
+
+### The 1985 Perspective
+
+When Jay Miner designed the Amiga custom chipset, RAM was expensive ($50+/MB) and bus bandwidth was precious. The split architecture was a deliberate trade-off:
+
+**Pros:**
+- Custom chips got dedicated, guaranteed DMA bandwidth — no CPU could starve the display
+- The CPU could run from a separate bus (once Fast RAM was added), achieving true parallelism impossible on competitors
+- 512 KB of Chip RAM was enough for full-color animation + stereo audio + multitasking — competitors needed 2–4× more RAM for less capability
+
+**Cons:**
+- Programmers had to understand which memory type to use — a `malloc()` from C wasn't sufficient
+- Chip RAM was the most constrained resource; running out killed the system even with megabytes of Fast RAM available
+- The "Slow RAM" compromise ($C00000) confused everyone
+
+### Competitive Comparison (1985–1992)
+
+| Feature | Amiga | Atari ST | Mac 128K/Plus | IBM PC/AT |
+|---|---|---|---|---|
+| **Memory architecture** | Split: Chip + Fast | Unified | Unified | Unified |
+| **DMA coprocessors** | Agnus, Denise, Paula | None (STE added DMA later) | None | None (VGA had limited buffer DMA) |
+| **CPU/display parallelism** | Yes (with Fast RAM) | No — CPU shares bus | No — CPU shares bus | Partial (VGA has own buffer) |
+| **Memory ceiling (stock)** | 512 KB Chip + expandable | 512 KB–4 MB | 128 KB–4 MB | 256 KB–16 MB |
+| **Programmer burden** | High (must track MEMF_ types) | Low (all RAM is equal) | Low | Low |
+
+---
+
+## Modern Analogies
+
+| Amiga Concept | Modern Equivalent | Notes |
+|---|---|---|
+| Chip RAM | GPU VRAM (Vulkan `VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT`) | Memory visible to the graphics/DMA hardware; the CPU can access it but shares bandwidth |
+| Fast RAM | System RAM (Vulkan `VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT`) | CPU-only memory; fast, uncontended, but the GPU can't directly DMA from it |
+| `MEMF_CHIP` allocation | `VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT` allocation | Explicitly requesting DMA-visible memory |
+| `MEMF_ANY` allocation | `malloc()` / `new` | Let the runtime pick the best available memory |
+| Chip RAM bus contention | PCIe bandwidth sharing | CPU and GPU compete for bus bandwidth; dedicated VRAM avoids this |
+| Slow RAM | Shared memory (integrated GPU) | On the same bus as the GPU, but not used by it — the worst of both worlds |
+| AllocMem priority order | NUMA-aware allocation | Modern OS allocates from the closest/fastest memory node first |
+
+---
+
+## Pitfalls & Common Mistakes
+
+### Pitfall 1: "The Silent Corruption" — DMA to Fast RAM
+
+```c
+/* ✗ BAD: Audio buffer in Fast RAM */
+BYTE *sample = AllocMem(44100, MEMF_ANY);  /* May return Fast RAM! */
+/* Paula DMA reads from Chip RAM bus — gets garbage */
+```
+
+**Why it's bad**: There is no error, no crash, no diagnostic. The DMA controller's address lines don't reach Fast RAM — they wrap into whatever Chip RAM happens to be at the aliased address. You hear random noise instead of your sample. On a stock A500 (Chip RAM only) the same code works perfectly, so the bug only appears on accelerated machines — the exact systems your advanced users have.
+
+```c
+/* ✓ GOOD: Audio buffer in Chip RAM */
+BYTE *sample = AllocMem(44100, MEMF_CHIP | MEMF_CLEAR);
+```
+
+**Why this fixes it**: `MEMF_CHIP` guarantees the allocation comes from DMA-visible memory. The audio hardware can now reach every byte. Using `MEMF_CLEAR` also eliminates audible pops from uninitialized data.
+
+### Pitfall 2: "The Chip RAM Hog" — Wasting Chip RAM on Non-DMA Data
+
+```c
+/* ✗ BAD: Allocating code/data in Chip RAM unnecessarily */
+struct GameState *state = AllocMem(sizeof(*state), MEMF_CHIP);
+```
+
+**Why it's bad**: Chip RAM is the scarcest resource on every Amiga — typically 512 KB to 2 MB. Every byte you waste on non-DMA data is a byte unavailable for screen buffers, audio samples, and Copper lists. On a stock A500 with 512 KB, a game that hogs 100 KB of Chip RAM for its data structures may fail to open a 5-bitplane screen. Even on a 2 MB AGA machine, Chip RAM fills fast: a 320×256×5 double-buffered display already consumes 2×51,200 = 100 KB per bitplane set.
+
+```c
+/* ✓ GOOD: Let OS pick fastest memory */
+struct GameState *state = AllocMem(sizeof(*state), MEMF_ANY | MEMF_CLEAR);
+```
+
+**Why this changes the game**: `MEMF_ANY` lets AmigaOS place the allocation in Fast RAM when available. On an accelerated machine, this frees Chip RAM for display hardware AND runs faster (no DMA contention). On a stock A500 with only Chip RAM, `MEMF_ANY` falls back to Chip RAM automatically — same behavior, zero code changes.
+
+### Pitfall 3: "The Slow RAM Trap" — Assuming $C00000 is Fast
+
+Slow RAM at `$C00000` is on the Chip RAM bus. Code executing from Slow RAM runs at the same contended speed as Chip RAM — it provides **no** performance benefit over Chip RAM for CPU-bound work.
+
+**Why it's bad**: Developers who add a trapdoor expansion to their A500 and see "1 MB" in the early startup menu assume they've doubled their performance. They haven't. The extra 512 KB at `$C00000` shares the same 16-bit bus with the custom chips — every DMA cycle from Agnus steals a cycle from the CPU, exactly as in Chip RAM. The only advantage is that Slow RAM isn't consumed by screen buffers, so it's available for code/data. But it's emphatically not "Fast" — that name is reserved for memory on a separate bus. Moving performance-critical game loops to Slow RAM produces zero speedup vs. Chip RAM.
+
+### Pitfall 4: "The PCMCIA Conflict" — A1200 Memory Above 8 MB
+
+On the A1200, the PCMCIA port maps to `$600000`–`$9FFFFF`. Accelerator cards that place Fast RAM in this range disable the PCMCIA port entirely. Many modern accelerators (TF1260) include a "PCMCIA-friendly" jumper that limits Fast RAM to 4 MB to avoid this.
+
+**Why it matters**: PCMCIA is the A1200's only practical way to add CompactFlash storage or networking without opening the case. Losing it for the sake of 4 extra MB of Fast RAM is a poor trade-off. Always check whether your accelerator has a PCMCIA compatibility mode before assuming full memory is available.
+
+---
+
+## Adaptive Software Behavior
+
+Well-written Amiga software doesn't assume a fixed memory configuration. It detects what's available at runtime and adjusts its behavior:
+
+### Detecting Available Memory
+
+```c
+#include <exec/memory.h>
+
+ULONG chipFree = AvailMem(MEMF_CHIP);           /* Free Chip RAM */
+ULONG fastFree = AvailMem(MEMF_FAST);           /* Free Fast RAM (0 if none) */
+ULONG totalFree = AvailMem(MEMF_ANY);           /* Total free memory */
+BOOL  hasFastRAM = (fastFree > 0);              /* Fast RAM present? */
+ULONG chipTotal = AvailMem(MEMF_CHIP | MEMF_TOTAL);  /* Total Chip RAM installed */
+```
+
+### Strategy: Chip-Only Mode (Stock A500 / A600)
+
+When no Fast RAM is available, everything competes for the same 512 KB–2 MB:
+
+```c
+if (!hasFastRAM) {
+    /* All memory is Chip RAM — conserve aggressively */
+    numBitplanes = 4;               /* Use 16 colours instead of 32 */
+    useDoubleBuffer = FALSE;        /* Single buffer saves 40 KB per plane */
+    maxBOBs = 8;                    /* Fewer sprites = less Blitter work */
+    musicQuality = QUALITY_LOW;     /* 4-bit 11 kHz samples save Chip RAM */
+    preloadLevel = FALSE;           /* Stream level data from disk */
+}
+```
+
+**Why**: On a Chip-only system, the CPU, Blitter, display, and audio all share one bus. Reducing display complexity (fewer bitplanes) frees DMA slots for the Blitter AND frees Chip RAM for audio/game data. This is why many A500 games use 4 bitplanes (16 colours) while the same game on an accelerated A1200 uses 5 or even 8.
+
+### Strategy: Chip + Fast RAM Mode (Accelerated Systems)
+
+When Fast RAM is available, the architecture unlocks true parallelism:
+
+```c
+if (hasFastRAM) {
+    /* CPU runs from Fast RAM at full speed; Chip RAM for DMA only */
+    numBitplanes = 5;                /* 32 colours — display looks better */
+    useDoubleBuffer = TRUE;          /* Flicker-free, worth the Chip RAM */
+    maxBOBs = 24;                    /* More BOBs — CPU can compute while Blitter blits */
+    musicQuality = QUALITY_HIGH;     /* 8-bit 22 kHz — Chip RAM freed by code in Fast RAM */
+    preloadLevel = TRUE;             /* Load entire level into Fast RAM — no disk latency */
+
+    /* Allocate non-DMA data from Fast RAM explicitly: */
+    level = AllocMem(levelSize, MEMF_FAST | MEMF_CLEAR);
+    if (!level)
+        level = AllocMem(levelSize, MEMF_ANY | MEMF_CLEAR); /* Fallback */
+}
+```
+
+**Why this changes everything**: The CPU no longer waits for the Blitter — it computes physics, AI, and input from its own private bus while the Blitter simultaneously processes screen memory. Game loops that took 20 ms on a stock A500 can complete in 8 ms on a 68030 + Fast RAM, not because the CPU is faster (though it often is), but because the CPU and custom chips now run in **parallel** instead of fighting over one bus.
+
+### Graceful Degradation Pattern
+
+The best Amiga software uses a tiered allocation strategy:
+
+```c
+/* Try Fast RAM first, fall back to any available: */
+void *AllocBest(ULONG size, ULONG flags) {
+    void *mem;
+
+    /* For DMA-visible data, always use MEMF_CHIP: */
+    if (flags & MEMF_CHIP)
+        return AllocMem(size, flags);
+
+    /* Try Fast RAM first (fastest): */
+    mem = AllocMem(size, MEMF_FAST | (flags & ~MEMF_FAST));
+    if (mem) return mem;
+
+    /* Fall back to any available memory: */
+    return AllocMem(size, MEMF_ANY | (flags & ~MEMF_FAST));
+}
+```
+
+### Real-World Examples
+
+| Software | Chip-Only Behavior | With Fast RAM |
+|---|---|---|
+| **WHDLoad** | Patches games to run from Chip RAM sandbox | Preloads entire game into Fast RAM, eliminates disk access |
+| **Protracker** | Loads samples into Chip RAM, limits module size | Decompresses/mixes in Fast RAM, copies final audio buffers to Chip RAM |
+| **Workbench 3.1** | Libraries and windows in Chip RAM (slow) | Libraries auto-load into Fast RAM, only screen buffers use Chip RAM |
+| **Doom (Amiga port)** | Won't run — requires Fast RAM for framebuffer conversion | Renders chunky pixels in Fast RAM, c2p converts to Chip RAM bitplanes |
+| **ShapeShifter** | Won't run — Mac emulation needs contiguous Fast RAM | Maps Mac address space into Fast RAM, uses Chip RAM only for display output |
+
+
+---
+
+## Impact on FPGA/Emulation
+
+For MiSTer FPGA core developers, accurate memory type emulation is critical:
+
+- **Address decoding**: Agnus/Alice must correctly decode Chip RAM range (512 KB / 1 MB / 2 MB depending on chip revision) and reject addresses outside it
+- **Bus arbitration**: DMA slots must be allocated with correct priority (display > sprite > audio > disk > Copper > Blitter > CPU)
+- **Slow RAM behavior**: The $C00000 range must share the Chip RAM bus timing, not run at Fast RAM speed
+- **Fat Agnus detection**: Software reads VPOSR (`$DFF004`) bits 14–8 to detect Agnus revision and Chip RAM size — this must return correct values
+- **MEMF_CHIP boundary**: The exec memory list must correctly reflect the Chip RAM size so `AvailMem(MEMF_CHIP)` returns the right value
+- **Fast RAM emulation**: When emulating accelerators, Fast RAM must be on a separate bus with zero-wait-state access to show the correct performance improvement
+
+---
+
+## Best Practices
+
+1. **Always use `MEMF_CHIP` for DMA-visible data** — screen buffers, audio samples, Copper lists, sprite data, disk buffers
+2. **Never use `MEMF_CHIP` for code, stacks, or non-DMA data structures** — it wastes the most constrained memory type
+3. **Use `MEMF_ANY` for everything else** — let AmigaOS allocate from the fastest available pool
+4. **Check `AvailMem(MEMF_CHIP)` before allocating large Chip RAM blocks** — running out of Chip RAM crashes the display
+5. **Test on systems with and without Fast RAM** — many bugs only appear when allocations land in unexpected memory types
+6. **Use `TypeOfMem()` to verify** — if you need to confirm a pointer is in Chip RAM, call `TypeOfMem(ptr)` and check for `MEMF_CHIP`
+
+---
+
+## FAQ
+
+**Q: Can I make Slow RAM into Chip RAM?**
+A: On machines with Fat Agnus (8372) or Super Agnus (8372A), sometimes. The A500+ and A2000 rev 6+ can remap the Slow RAM range into the Chip RAM address space with a jumper setting, but this requires the correct Agnus revision.
+
+**Q: Why does my game crash on an A3000 but work on an A500?**
+A: Likely a `MEMF_ANY` allocation that returns Fast RAM. On the A500 (Chip RAM only), `MEMF_ANY` returns Chip RAM. On the A3000, it returns Fast RAM — and if you pass that pointer to the Blitter or display hardware, you get corruption.
+
+**Q: Does AGA (A1200/A4000) change anything about memory types?**
+A: No. AGA (Alice/Lisa) uses the same Chip RAM bus architecture as OCS/ECS. The maximum Chip RAM is still 2 MB. AGA adds wider DMA fetches (64-bit via FMODE) but doesn't change the Chip/Fast RAM split.
+
+**Q: How does WHDLoad handle memory type issues?**
+A: WHDLoad patches old games that assume all memory is Chip RAM. It redirects allocations, fixes hardcoded addresses, and provides a Chip RAM "sandbox" so games written for a stock A500 can run on accelerated systems.
+
+---
+
+## References
+
+- NDK 3.9: `exec/memory.h` — MEMF_ flag definitions
+- ADCD 2.1 Hardware Manual — memory map chapter
+- *Amiga Hardware Reference Manual* 3rd ed. — Chapter 1 (System Overview)
+- See also: [address_space.md](address_space.md) — full 24-bit/32-bit address map
+- See also: [chip_ram_expansion.md](../ecs_a600_a3000/chip_ram_expansion.md) — 2 MB Chip RAM with Super Agnus
+- See also: [zorro_bus.md](zorro_bus.md) — Zorro II/III expansion bus (Fast RAM cards)
+- See also: [blitter_programming.md](../../08_graphics/blitter_programming.md) — Blitter DMA (Chip RAM only)
+- See also: [exec_memory.md](../../06_exec_os/exec_memory.md) — AmigaOS memory management API
--- a/01_hardware/ecs_a600_a3000/chip_ram_expansion.md
+++ b/01_hardware/ecs_a600_a3000/chip_ram_expansion.md
@ -74,3 +74,4 @@ Software should not assume 2 MB Chip RAM — always use `AvailMem()` to determin
 - AmigaMail Vol. 2 — Chip RAM expansion articles
 - NDK39: `exec/memory.h` — MEMF flags
 - ADCD 2.1 Hardware Manual — memory map section
+- See also: [memory_types.md](../common/memory_types.md) — comprehensive Chip/Fast/Slow RAM comparison, per-model configurations
--- a/06_exec_os/memory_management.md
+++ b/06_exec_os/memory_management.md
@ -371,5 +371,6 @@ void MyTask(void)
 - NDK39: `exec/memory.h`, `exec/execbase.h`
 - ADCD 2.1: `AllocMem`, `FreeMem`, `AllocVec`, `FreeVec`, `CreatePool`, `AllocPooled`, `AvailMem`
 - [address_space.md](../01_hardware/common/address_space.md) — full address map
+- [memory_types.md](../01_hardware/common/memory_types.md) — hardware-level Chip/Fast/Slow RAM comparison, DMA accessibility matrix, per-model configurations
 - See also: [Multitasking](multitasking.md) — memory safety in multi-task environments
 - *Amiga ROM Kernel Reference Manual: Exec* — memory management chapter
--- a/08_graphics/animation.md
+++ b/08_graphics/animation.md
@ -800,4 +800,5 @@ myVS.PlaneOnOff = 0x00;      /* planes 1-3 get 0 (transparent) */
 - See also: [sprites.md](sprites.md) — Hardware sprite DMA, multiplexing, and priority
 - See also: [copper_programming.md](copper_programming.md) — Copper-driven sprite pointer management
 - See also: [rastport.md](rastport.md) — RastPort drawing context used by GELs
+- See also: [memory_types.md](../01_hardware/common/memory_types.md) — BOB/VSprite data must reside in Chip RAM

--- a/08_graphics/bitmap.md
+++ b/08_graphics/bitmap.md
@ -115,3 +115,4 @@ struct BitMap *bm = AllocBitMap(320, 256, 8, BMF_CLEAR | BMF_DISPLAYABLE, NULL);
 - NDK39: `graphics/gfx.h`
 - ADCD 2.1: `AllocBitMap`, `FreeBitMap`, `InitBitMap`
 - HRM: *Amiga Hardware Reference Manual* — bitplane DMA chapter
+- See also: [memory_types.md](../01_hardware/common/memory_types.md) — why bitmaps must be in Chip RAM (DMA accessibility)
--- a/08_graphics/blitter_programming.md
+++ b/08_graphics/blitter_programming.md
@ -4,11 +4,12 @@

 ## Overview

-The **Blitter** (Block Image Transfer) is a DMA engine that performs raster operations on rectangular blocks of memory. It operates on up to **4 channels** (A, B, C → D) using programmable **minterm logic** and can work independently of the CPU. The Blitter is the workhorse for screen clearing, scrolling, cookie-cut sprites, line drawing, and area fill.
+The **[Blitter](../01_hardware/ocs_a500/blitter.md)** (Block Image Transferrer) is a DMA coprocessor inside the Agnus chip that performs raster operations on rectangular memory blocks at bus speed — **without CPU involvement**. While the 68000 executes game logic, physics, or AI, the Blitter simultaneously clears screens, copies bitmap regions, composites masked sprites ("cookie-cut"), draws lines, and fills polygons. This parallelism is fundamental to why the Amiga could deliver arcade-quality 2D graphics on a 7 MHz processor with 512 KB of RAM.

---
+The Blitter operates on up to **4 DMA channels** (A, B, C → D) using a programmable **8-bit minterm** truth table that encodes any Boolean function of three inputs. Combined with per-channel shift, modulo, and first/last word masking, this makes the Blitter a general-purpose 2D rasterization engine — not merely a memory copier.

-## Channel Architecture
+> [!WARNING]
+> The Blitter can **only** access Chip RAM. Pointing any channel register at Fast RAM causes silent data corruption or system crashes. Always allocate blitter-visible memory with `AllocMem(size, MEMF_CHIP)`.

 ```
 Channel A ──→ ┐
@ -21,39 +22,132 @@ C = background / destination read-back
 D = output destination
 ```

-Each channel reads (or writes, for D) from a different memory pointer with independent modulo.
-
 ---

+## Architecture
+
+The Blitter sits inside **Agnus** (OCS/ECS) or **Alice** (AGA), sharing the DMA bus with the Copper, bitplane fetches, sprite DMA, disk, and audio. It accesses memory through 4 independent DMA channels, each with its own pointer and modulo register:
+
+```mermaid
+graph LR
+    subgraph "Agnus / Alice"
+        A["Channel A<br/>(mask/pattern)"] --> ML["Minterm Logic<br/>(8-bit truth table)"]
+        B["Channel B<br/>(source data)"] --> ML
+        C["Channel C<br/>(background read-back)"] --> ML
+        ML --> D["Channel D<br/>(output)"]
+    end
+
+    CRAM["Chip RAM"] --> A
+    CRAM --> B
+    CRAM --> C
+    D --> CRAM
+
+    style ML fill:#fff9c4,stroke:#f9a825
+    style CRAM fill:#e8f4fd,stroke:#2196f3
+```
+
+The **Minterm Logic** block is the Blitter's core innovation. It takes the current bit from channels A, B, and C (three Boolean inputs) and produces one output bit for channel D according to a programmable **8-bit truth table** stored in BLTCON0 bits 7–0. Since 3 inputs have 8 possible combinations (2³), the 8-bit minterm encodes **any** Boolean function of three variables — that's 256 possible logic operations in a single register write. This is what lets one piece of hardware do copies (`D=A`, minterm `$F0`), clears (`D=0`, minterm `$00`), cookie-cut compositing (`D=A·B+¬A·C`, minterm `$CA`), XOR highlighting (`D=A⊕C`, minterm `$5A`), and any other combination — all without changing hardware, just the 8-bit minterm value. See [Minterm Logic](#minterm-logic) below for the full truth table and common values.
+
+Each channel reads (or writes, for D) from a different memory pointer with independent modulo, allowing operations on sub-rectangles within larger bitmaps. **Writing to `BLTSIZE` ($DFF058) starts the blit immediately** — always configure all other registers first.
+
+### Channel Roles
+
+| Channel | DMA Direction | Typical Role | Has Shift? | Has Mask? |
+|---|---|---|---|---|
+| **A** | Read | Mask, cookie shape, font glyph, line texture | Yes (ASH, 0–15 px) | Yes (BLTAFWM/BLTALWM) |
+| **B** | Read | Source image data | Yes (BSH, 0–15 px) | No |
+| **C** | Read | Background / destination read-back | No | No |
+| **D** | Write | Output destination | No | No |
+
+> [!NOTE]
+> Any channel can be disabled per operation via BLTCON0 bits 11–8 (USEA/B/C/D). Disabling unused channels **saves DMA cycles** — a D-only clear (1 channel) runs 4× faster than a full ABCD blit.
+
+### CPU / Blitter Bus Interaction
+
+The Blitter and the 68000 CPU share the **Chip RAM bus** — they cannot access it simultaneously. Agnus arbitrates access on a cycle-by-cycle basis:
+
+```
+┌────────────────────────────────────────────────────────────┐
+│                    Chip RAM Bus (16-bit)                   │
+├──────────┬──────────┬──────────┬──────────┬────────────────┤
+│ Bitplane │  Sprite  │  Copper  │ Blitter  │   CPU (left-   │
+│   DMA    │   DMA    │   DMA    │   DMA    │   over slots)  │
+├──────────┴──────────┴──────────┴──────────┴────────────────│
+│               Fixed priority (high → low)                  │
+└────────────────────────────────────────────────────────────┘
+```
+
+- **Without `BLTPRI`**: The Blitter gets every other free DMA slot. The CPU gets the remaining slots. Both run at roughly half speed on the Chip RAM bus.
+- **With `BLTPRI` (nasty mode)**: The Blitter takes **all** free DMA slots. The CPU is completely frozen on any Chip RAM access until the blit completes. The CPU can still execute from Fast RAM or ROM — but any Chip RAM read/write stalls.
+- **Display DMA always wins**: Bitplane, sprite, and audio DMA have fixed priority above the Blitter. In high-resolution modes, display DMA alone consumes most of the bus, leaving few slots for blitter operations.
+
+### Chip RAM vs. Fast RAM
+
+The Blitter is physically wired to the Chip RAM bus inside Agnus. It has **no connection** to the Fast RAM (Zorro) bus:
+
+| Memory Type | Blitter Access? | CPU Access? | Notes |
+|---|---|---|---|
+| **Chip RAM** (first 512 KB–2 MB) | ✓ Yes | ✓ Yes (contended) | Screen buffers, audio, sprites, all DMA-visible data |
+| **Fast RAM** (Zorro II/III) | ✗ No | ✓ Yes (uncontended) | Code, variables, non-DMA data |
+| **ROM** ($F80000–$FFFFFF) | ✗ No | ✓ Yes | Kickstart, libraries |
+
+This creates the key optimization opportunity on accelerated Amigas (A1200, A3000, A4000): **the CPU can execute code and access Fast RAM at full speed while the Blitter simultaneously works on Chip RAM**. On a stock A500 with only Chip RAM, the CPU and Blitter always contend for the same bus.
+
+> [!IMPORTANT]
+> There is no hardware error when pointing blitter registers at Fast RAM addresses. The Blitter's 22-bit address lines (OCS/ECS) simply wrap into Chip RAM space — producing silent data corruption at an unpredictable Chip RAM location.
+
 ## Minterm Logic

-The minterm is an **8-bit truth table** encoding the logical function of A, B, C:
+The minterm is an **8-bit value** stored in BLTCON0 (bits 7–0) that tells the Blitter what to do with each pixel. Think of it as a tiny program: for every pixel position, the Blitter reads the current bit from channels A, B, and C, looks up the answer in the minterm, and writes that answer to channel D (destination memory).

-```
-Bit 7: ABC = 111  →  bit value
-Bit 6: ABC = 110  →  bit value
-Bit 5: ABC = 101  →  bit value
-Bit 4: ABC = 100  →  bit value
-Bit 3: ABC = 011  →  bit value
-Bit 2: ABC = 010  →  bit value
-Bit 1: ABC = 001  →  bit value
-Bit 0: ABC = 000  →  bit value
-```
+Since there are 3 inputs (A, B, C), each either 0 or 1, there are exactly **8 possible input combinations**. The 8-bit minterm has one bit for each combination — that bit decides whether the output pixel is on (1) or off (0):
+
+| Minterm Bit | Input A (mask) | Input B (source) | Input C (background) | "If these inputs look like this…" |
+|---|---|---|---|---|
+| Bit 7 | 1 | 1 | 1 | …mask on, source on, background on |
+| Bit 6 | 1 | 1 | 0 | …mask on, source on, background off |
+| Bit 5 | 1 | 0 | 1 | …mask on, source off, background on |
+| Bit 4 | 1 | 0 | 0 | …mask on, source off, background off |
+| Bit 3 | 0 | 1 | 1 | …mask off, source on, background on |
+| Bit 2 | 0 | 1 | 0 | …mask off, source on, background off |
+| Bit 1 | 0 | 0 | 1 | …mask off, source off, background on |
+| Bit 0 | 0 | 0 | 0 | …mask off, source off, background off |
+
+Each bit is a simple yes/no: **"should the output pixel be on for this combination?"**
+
+### Worked Example: Cookie-Cut (`$CA`)
+
+The most important minterm is `$CA` — the cookie-cut blit used for sprite compositing. In binary, `$CA` = `11001010`. Let's read each bit:
+
+| Bit | A (mask) | B (source) | C (background) | `$CA` bit value | Output pixel | Why |
+|---|---|---|---|---|---|---|
+| 7 | on | on | on | **1** | **on** | Inside the shape, source pixel is on → show it |
+| 6 | on | on | off | **1** | **on** | Inside the shape, source pixel is on → show it |
+| 5 | on | off | on | **0** | **off** | Inside the shape, source pixel is off → show it (it's dark) |
+| 4 | on | off | off | **0** | **off** | Inside the shape, source pixel is off → show it |
+| 3 | off | on | on | **1** | **on** | Outside the shape → keep background (it's on) |
+| 2 | off | on | off | **0** | **off** | Outside the shape → keep background (it's off) |
+| 1 | off | off | on | **1** | **on** | Outside the shape → keep background (it's on) |
+| 0 | off | off | off | **0** | **off** | Outside the shape → keep background (it's off) |
+
+The pattern: **where the mask (A) is set → take the source pixel (B). Where the mask is clear → keep the background pixel (C).** That's a sprite draw with transparency — exactly what every Amiga game uses.

 ### Common Minterms

-| Minterm | Hex | Operation | Use Case |
-|---|---|---|---|
-| `D = A` | `$F0` | Copy A to D | Simple block copy |
-| `D = B` | `$CC` | Copy B to D | Simple block copy |
-| `D = C` | `$AA` | Copy C to D | Read-back |
-| `D = A·B + (¬A)·C` | `$CA` | Cookie-cut | Masked sprite blit (B through A mask onto C) |
-| `D = 0` | `$00` | Clear | Clear a memory region |
-| `D = $FFFF` | `$FF` | Set all | Fill with 1s |
-| `D = A XOR C` | `$5A` | XOR | Cursor blink, highlight |
-| `D = A OR C` | `$FA` | OR | Overlay |
-| `D = ¬A AND C` | `$0A` | Mask out | Erase through mask |
-| `D = A·B` | `$C0` | AND (A,B) | Masked pattern |
+| Minterm | Hex | Operation | Description | Real-World Use Case |
+|---|---|---|---|---|
+| `D = A` | `$F0` | Copy A | Output is a copy of channel A — every A-set pixel appears in D | **Block copy**: duplicate a screen region, copy a font glyph to the display |
+| `D = B` | `$CC` | Copy B | Output is a copy of channel B regardless of A and C | **Shifted copy**: B has a barrel shift, so this copies with pixel-level repositioning |
+| `D = C` | `$AA` | Copy C | Output is a copy of the destination read-back | **No-op / readback**: useful for fill mode where C→D with fill carry toggling |
+| `D = A·B + ¬A·C` | `$CA` | Cookie-cut | Where mask (A) is 1: show source (B). Where mask is 0: show background (C) | **Sprite compositing**: draw a player character with transparency onto the game world |
+| `D = 0` | `$00` | Clear | Output is always 0 regardless of inputs | **Screen clear**: zero out a bitplane, erase a region |
+| `D = $FFFF` | `$FF` | Set all | Output is always 1 | **Fill with 1s**: set all pixels in a region (useful for masks) |
+| `D = A XOR C` | `$5A` | XOR | Output toggles wherever A has a set bit | **Cursor blink**: XOR the cursor shape to toggle it on/off without saving background |
+| `D = A OR C` | `$FA` | OR | Output is set wherever either A or C has a set bit | **Overlay**: stamp a shape onto the background without erasing existing pixels |
+| `D = ¬A AND C` | `$0A` | Mask out | Output keeps C pixels only where A is clear — erases through the mask | **Erase shape**: cut a hole in the background matching the mask shape (first pass of two-pass sprite draw) |
+| `D = A AND B` | `$C0` | AND | Output is set only where both A and B agree | **Masked pattern**: apply a fill pattern (B) clipped to a shape (A) |
+| `D = A XOR B` | `$3C` | XOR (A,B) | Output toggles between A and B differences | **Difference detection**: find which pixels changed between two frames |
+| `D = NOT A` | `$0F` | Invert | Output is the bitwise complement of A | **Mask inversion**: generate a negative mask from a positive one |

 ### Cookie-Cut Explained

@ -72,41 +166,85 @@ Minterm $CA:

 ## Register Reference

-| Reg | Offset | Description |
-|---|---|---|
-| `BLTCON0` | `$040` | Control: channels enabled (bits 11–8), ASH (bits 15–12), minterm (bits 7–0) |
-| `BLTCON1` | `$042` | Control: BSH (bits 15–12), line mode (bit 0), fill mode (bits 3–2) |
-| `BLTAFWM` | `$044` | First word mask for channel A |
-| `BLTALWM` | `$046` | Last word mask for channel A |
-| `BLTAPT` | `$050` | Channel A pointer (high+low) |
-| `BLTBPT` | `$04C` | Channel B pointer |
-| `BLTCPT` | `$048` | Channel C pointer |
-| `BLTDPT` | `$054` | Channel D pointer |
-| `BLTAMOD` | `$064` | Channel A modulo |
-| `BLTBMOD` | `$062` | Channel B modulo |
-| `BLTCMOD` | `$060` | Channel C modulo |
-| `BLTDMOD` | `$066` | Channel D modulo |
-| `BLTSIZE` | `$058` | Blit size + START (write triggers blit) |
+| Address | Name | R/W | Description |
+|---------|------|-----|-------------|
+| `$DFF040` | BLTCON0 | W | Control: ASH (bits 15–12), channel enables (bits 11–8), minterm (bits 7–0) |
+| `$DFF042` | BLTCON1 | W | Control: BSH (bits 15–12), fill/line mode (bits 4–0) |
+| `$DFF044` | BLTAFWM | W | First word mask for channel A |
+| `$DFF046` | BLTALWM | W | Last word mask for channel A |
+| `$DFF048` | BLTCPTH/L | W | Channel C pointer (32-bit) |
+| `$DFF04C` | BLTBPTH/L | W | Channel B pointer (32-bit) |
+| `$DFF050` | BLTAPTH/L | W | Channel A pointer (32-bit) |
+| `$DFF054` | BLTDPTH/L | W | Channel D pointer (32-bit) |
+| `$DFF058` | BLTSIZE | W | Blit dimensions + **START** (write triggers blit!) |
+| `$DFF05A` | BLTSIZV | W | Blit height — **AGA only** (15-bit, up to 32768 lines) |
+| `$DFF05C` | BLTSIZH | W | Blit width + START — **AGA only** (11-bit, up to 2048 words) |
+| `$DFF060` | BLTCMOD | W | Channel C modulo (bytes to skip per row) |
+| `$DFF062` | BLTBMOD | W | Channel B modulo |
+| `$DFF064` | BLTAMOD | W | Channel A modulo |
+| `$DFF066` | BLTDMOD | W | Channel D modulo |
+| `$DFF070` | BLTCDAT | W | Channel C data register (preload) |
+| `$DFF072` | BLTBDAT | W | Channel B data register (preload) |
+| `$DFF074` | BLTADAT | W | Channel A data register (preload / line texture) |
+| `$DFF002` | DMACONR | R | DMA status — bit 14 (BBUSY) = blitter busy |

 ### BLTCON0 Encoding

 ```
-Bits 15–12: ASH (A shift, 0–15 pixels)
-Bit  11:    USEA (enable channel A)
-Bit  10:    USEB (enable channel B)
-Bit   9:    USEC (enable channel C)
-Bit   8:    USED (enable channel D, almost always 1)
-Bits  7–0:  Minterm
+Bits 15–12: ASH  — A channel barrel shift (0–15 pixels right)
+Bit  11:    USEA — enable channel A DMA
+Bit  10:    USEB — enable channel B DMA
+Bit   9:    USEC — enable channel C DMA
+Bit   8:    USED — enable channel D DMA (almost always 1)
+Bits  7–0:  LF   — minterm (logic function truth table)
+```
+
+### BLTCON1 Encoding
+
+```
+Bits 15–12: BSH  — B channel barrel shift (0–15 pixels right)
+Bit   4:    IFE  — inclusive fill enable
+Bit   3:    EFE  — exclusive fill enable
+Bit   2:    FCI  — fill carry input (initial state)
+Bit   1:    DESC — descending mode (blit bottom-right → top-left)
+Bit   0:    LINE — line draw mode
 ```

 ### BLTSIZE Encoding (OCS/ECS)

 ```
-Bits 15–6: Height in lines (1–1024, 0 means 1024)
-Bits  5–0: Width in words (1–64, 0 means 64)
+Bits 15–6: Height in lines (1–1024, 0 = 1024)
+Bits  5–0: Width in words  (1–64,   0 = 64)
 ```

-**Writing BLTSIZE starts the blit!**
+> [!WARNING]
+> **Writing BLTSIZE starts the blit!** Always configure all other registers (pointers, modulos, control, masks) before writing BLTSIZE. On AGA, write BLTSIZV first, then BLTSIZH (which triggers the blit).
+
+### Ascending vs. Descending Mode
+
+When source and destination overlap in memory, the blit direction determines whether data is corrupted:
+
+```
+Ascending (default, DESC=0):
+  Reads/writes top-left → bottom-right
+  Use when: dest address > source address
+
+Descending (DESC=1):
+  Reads/writes bottom-right → top-left
+  Use when: dest address < source address
+  Pointers must be set to the LAST word of the block
+  Modulos are subtracted instead of added
+```
+
+This is critical for **scrolling** — shifting the screen contents by a few pixels requires an overlapping copy, and using the wrong direction produces garbage.
+
+### Shift and Alignment
+
+The Blitter is a **word-aligned** (16-bit) processor. Moving objects to arbitrary pixel positions requires the barrel shifter:
+
+- **ASH** (channel A shift) and **BSH** (channel B shift) shift data 0–15 pixels to the right
+- A rectangle N pixels wide at a non-aligned X position spans `⌈(N + shift) / 16⌉` words — one more than aligned
+- **BLTAFWM** (first word mask) and **BLTALWM** (last word mask) prevent the shifted data from corrupting pixels outside the target area

 ---

@ -209,49 +347,634 @@ Bits  5–0: Width in words (1–64, 0 means 64)

 ---

-## System-Friendly Blitter (via graphics.library)
+## Advanced Use Cases & Cookbook
+
+### Use Case 1: Shifted BOB (Sprite at Arbitrary X Position)
+
+The most common real-world blitter task: draw a 16×16 sprite at pixel position (x, y) on a 320-pixel-wide screen. Since x may not be word-aligned, the barrel shifter handles sub-word positioning:
+
+```asm
+    ; Draw 16×16 BOB at pixel (x, y) on a 320px wide screen
+    ; Inputs: d0.w = x position, d1.w = y position
+    ;         a0 = mask data, a1 = sprite data, a2 = screen base
+
+    lea     $DFF000,a5
+
+    ; Calculate screen byte offset:
+    move.w  d1,d2
+    mulu    #40,d2              ; y × 40 bytes/row (320 pixels / 8)
+    move.w  d0,d3
+    lsr.w   #3,d3               ; x / 8 = byte offset in row
+    and.w   #$FFFE,d3           ; word-align (drop bit 0)
+    add.w   d3,d2               ; total byte offset into screen
+    lea     (a2,d2.w),a3        ; a3 = screen pointer for this BOB
+
+    ; Calculate shift amount:
+    move.w  d0,d3
+    and.w   #$000F,d3           ; shift = x mod 16 (0–15 pixels)
+    ror.w   #4,d3               ; move to bits 15–12 for BLTCON0
+    or.w    #$0FCA,d3           ; channels A+B+C+D, minterm $CA
+
+.bwait:
+    btst    #14,$002(a5)
+    bne.s   .bwait
+
+    move.w  d3,$040(a5)         ; BLTCON0: shift + channels + minterm
+    clr.w   $042(a5)            ; BLTCON1: ascending, no fill
+    move.w  #$FFFF,$044(a5)     ; BLTAFWM: all bits in first word
+    move.w  #$0000,$046(a5)     ; BLTALWM: mask off last word (shift overflow)
+    move.l  a0,$050(a5)         ; BLTAPT = mask
+    move.l  a1,$04C(a5)         ; BLTBPT = sprite imagery
+    move.l  a3,$048(a5)         ; BLTCPT = background read-back
+    move.l  a3,$054(a5)         ; BLTDPT = write back to same position
+    clr.w   $064(a5)            ; BLTAMOD = 0 (mask is 1 word wide)
+    clr.w   $062(a5)            ; BLTBMOD = 0 (sprite is 1 word wide)
+    move.w  #36,$060(a5)        ; BLTCMOD = 40 - (2 words × 2) = 36 bytes
+    move.w  #36,$066(a5)        ; BLTDMOD = 36
+    move.w  #(16<<6)|2,$058(a5) ; BLTSIZE: 16 lines × 2 words (1 extra for shift) → GO!
+```
+
+**Key insight**: the blit is 2 words wide even though the sprite is only 16 pixels (1 word). The barrel shift pushes bits into the second word, so we need that extra word — and `BLTALWM=$0000` masks it so we don't corrupt adjacent pixels.
+
+### Use Case 2: Hardware Scroll (Left by N Pixels)
+
+Scrolling the screen left means the destination is at a lower address than the source — we must use **descending mode** to avoid overwriting source data:
+
+```asm
+    ; Scroll 320×256 screen left by 16 pixels (1 word = fastest case)
+    ; Source: screen + 2 bytes (one word right)
+    ; Dest:   screen base
+    ; No shift needed for 16-pixel increments
+
+    lea     $DFF000,a5
+
+.bwait:
+    btst    #14,$002(a5)
+    bne.s   .bwait
+
+    move.l  #$09F00000,$040(a5) ; BLTCON0: A+D, minterm $F0 (copy)
+    clr.w   $042(a5)            ; BLTCON1: ascending (dest > source is OK here)
+    move.w  #$FFFF,$044(a5)     ; BLTAFWM
+    move.w  #$FFFF,$046(a5)     ; BLTALWM
+    move.l  #Screen+2,$050(a5)  ; BLTAPT: source is 1 word to the right
+    move.l  #Screen,$054(a5)    ; BLTDPT: destination is screen start
+    clr.w   $064(a5)            ; BLTAMOD = 0 (full-width rows)
+    clr.w   $066(a5)            ; BLTDMOD = 0
+    move.w  #(256<<6)|20,$058(a5) ; BLTSIZE: 256 lines × 20 words → GO!
+    ; After blit: draw new column at right edge (column 19)
+```
+
+For sub-word scrolling (1–15 pixels), combine this with the barrel shifter and draw the new edge column from tile data.
+
+### Use Case 3: Area Fill (Filled Polygon)
+
+The blitter's fill mode is a two-step process: (1) draw the polygon outline with XOR lines, (2) fill the region. This is how games like *Carrier Command* and *Starglider 2* achieved real-time filled 3D:
+
+```asm
+    ; Step 1: Draw polygon edges using blitter line mode (XOR, single-bit)
+    ; (Repeat for each edge of the polygon)
+    ; Use minterm $4A (A XOR C) and BLTCON1 bit 0 = LINE, bit 1 = SING
+
+    ; Step 2: Fill the outlined region
+    ; Fill works RIGHT-TO-LEFT, BOTTOM-TO-TOP — requires descending mode
+    ; Pointers must point to the LAST word of the bitmap region
+
+    lea     $DFF000,a5
+
+.bwait:
+    btst    #14,$002(a5)
+    bne.s   .bwait
+
+    ; Set up inclusive fill (IFE):
+    move.l  #$09F00000,$040(a5)  ; BLTCON0: A+D, minterm $F0 (copy with fill)
+    move.w  #$000A,$042(a5)      ; BLTCON1: DESC=1 (bit 1), IFE=1 (bit 3)
+                                  ; IFE = inclusive fill enable
+    move.w  #$FFFF,$044(a5)      ; BLTAFWM
+    move.w  #$FFFF,$046(a5)      ; BLTALWM
+
+    ; Pointers to LAST word of the fill region (descending!):
+    move.l  #FillBufferEnd,$050(a5) ; BLTAPT: last word of source
+    move.l  #FillBufferEnd,$054(a5) ; BLTDPT: last word of dest (same buffer)
+    clr.w   $064(a5)               ; BLTAMOD = 0
+    clr.w   $066(a5)               ; BLTDMOD = 0
+    move.w  #(Height<<6)|Width,$058(a5)  ; BLTSIZE → GO!
+```
+
+**How it works**: the fill carry bit (`FCI`) toggles on every set pixel. Between two outline pixels on the same scanline, the carry stays on — filling the interior. This is why the outline must use **single-bit mode** (SING=1) — otherwise double-width line pixels break the fill toggle.
+
+### Use Case 4: Interleaved Bitplane BOBs
+
+Standard bitplane layout stores all of plane 0, then all of plane 1, etc. **Interleaved** layout stores one row of plane 0, then one row of plane 1, alternating. This allows a single blit to draw a BOB across all bitplanes at once:
+
+```asm
+    ; Interleaved screen layout:
+    ;   Row 0, Plane 0 (40 bytes)
+    ;   Row 0, Plane 1 (40 bytes)
+    ;   Row 0, Plane 2 (40 bytes)
+    ;   Row 0, Plane 3 (40 bytes)
+    ;   Row 0, Plane 4 (40 bytes)
+    ;   Row 1, Plane 0 (40 bytes)
+    ;   ...
+
+    ; Blit a 16×16 cookie-cut BOB across all 5 bitplanes in ONE operation:
+    ; Height = 16 lines × 5 planes = 80 rows
+    ; Modulo = 40 - 2 = 38 bytes per interleaved row (skip rest of scanline row)
+    ; BOB data is also stored interleaved
+
+    lea     $DFF000,a5
+
+.bwait:
+    btst    #14,$002(a5)
+    bne.s   .bwait
+
+    move.l  #$0FCA0000,$040(a5) ; BLTCON0: A+B+C+D, minterm $CA
+    clr.w   $042(a5)            ; BLTCON1
+    move.w  #$FFFF,$044(a5)     ; BLTAFWM
+    move.w  #$FFFF,$046(a5)     ; BLTALWM
+    move.l  #BOBMask,$050(a5)   ; BLTAPT (interleaved mask: same mask for all planes)
+    move.l  #BOBData,$04C(a5)   ; BLTBPT (interleaved sprite data)
+    move.l  a3,$048(a5)         ; BLTCPT (screen position)
+    move.l  a3,$054(a5)         ; BLTDPT (same)
+    clr.w   $064(a5)            ; BLTAMOD = 0 (mask repeats)
+    clr.w   $062(a5)            ; BLTBMOD = 0
+    move.w  #38,$060(a5)        ; BLTCMOD = 38 (skip to next interleaved row)
+    move.w  #38,$066(a5)        ; BLTDMOD = 38
+    move.w  #(80<<6)|1,$058(a5) ; BLTSIZE: 80 rows (16×5) × 1 word → GO!
+```
+
+**Why this matters**: without interleaving, drawing one BOB on a 5-plane screen requires **5 separate blits** (one per plane), each with its own WaitBlit + register setup overhead. Interleaving does it in **1 blit** — 5× less setup time, critical when drawing 15+ BOBs per frame.
+
+### Use Case 5: Double-Buffered Game Loop
+
+The standard pattern for flicker-free game rendering:
+
+```asm
+MainLoop:
+    ; --- Wait for vertical blank ---
+    bsr     WaitVBL             ; Wait for beam to reach line 0
+
+    ; --- Swap display buffer ---
+    ; Copper list points to the currently visible buffer
+    ; We draw into the hidden back buffer
+    move.l  BackBuffer,a0
+    move.l  FrontBuffer,a1
+    move.l  a0,FrontBuffer      ; Back buffer becomes front (display)
+    move.l  a1,BackBuffer       ; Old front becomes new back (draw target)
+
+    ; Update Copper list bitplane pointers to show new front buffer:
+    bsr     UpdateCopperBPLPTRs
+
+    ; --- Clear back buffer ---
+    bsr     WaitBlit
+    move.l  #$01000000,$040(a5) ; D-only, minterm $00
+    clr.w   $042(a5)
+    move.l  a1,$054(a5)         ; BLTDPT = back buffer
+    clr.w   $066(a5)
+    move.w  #(256<<6)|20,$058(a5) ; Clear 320×256 → GO!
+
+    ; --- Draw all BOBs ---
+    ; CPU can process game logic while the clear blit runs!
+    bsr     UpdateGameLogic     ; Physics, AI, input — runs on CPU
+    bsr     WaitBlit            ; Wait for clear to finish
+    bsr     DrawAllBOBs         ; Chain of cookie-cut blits
+
+    bra     MainLoop
+```
+
+**Key optimization**: `UpdateGameLogic` runs on the CPU *while* the screen clear runs on the Blitter. This is the core of the Amiga's parallelism — ~1.5 ms of free CPU time per frame from a single D-only clear.
+
+### Use Case 6: GUI Window Drag (System-Friendly)
+
+Workbench and applications use `graphics.library` for window dragging, icon rendering, and menu drawing. The OS handles Blitter synchronization:

 ```c
-/* BltBitMap — the safe, OS-friendly way: */
-BltBitMap(srcBitmap, srcX, srcY,
-          dstBitmap, dstX, dstY,
-          width, height,
-          0xC0,     /* minterm: A AND B */
-          0xFF,     /* all planes */
-          NULL);    /* temp buffer */
+#include <graphics/gfx.h>
+#include <graphics/rastport.h>

-/* BltMaskBitMapRastPort — cookie-cut with mask: */
-BltMaskBitMapRastPort(srcBM, srcX, srcY,
-                       rp, dstX, dstY,
-                       width, height,
-                       (ABC | ABNC | ANBC),  /* minterm for cookie */
-                       maskPlane);
+/* Scroll a window's contents up by 8 pixels (text scroll): */
+ScrollRaster(rp,       /* RastPort */
+             0, 8,     /* dx=0, dy=8 (scroll up by 8 pixels) */
+             0, 0,     /* top-left corner of scroll area */
+             319, 199); /* bottom-right */
+/* The OS automatically uses an ascending/descending blit, sets modulos, */
+/* and clears the exposed bottom strip. */

-/* BltClear — fast memory clear: */
-BltClear(memory, byteCount, 0);
+/* Copy a rectangular region between two bitmaps: */
+BltBitMap(srcBM, 0, 0,       /* source bitmap, x, y */
+          dstBM, 100, 50,    /* dest bitmap, x, y */
+          64, 32,            /* width, height */
+          0xC0,              /* minterm: A AND B → masked copy */
+          0xFF,              /* all bitplanes */
+          NULL);             /* no temp buffer needed */

-/* OwnBlitter / DisownBlitter — exclusive access: */
-OwnBlitter();    /* wait for and lock blitter */
-/* ... direct register programming ... */
-DisownBlitter(); /* release */
+/* Draw a filled rectangle (uses the Blitter internally): */
+SetAPen(rp, 3);              /* Set pen colour to index 3 */
+RectFill(rp, 10, 10, 100, 50); /* Filled rectangle */
+```
+
+### Use Case 7: Tile Map Renderer
+
+Games like *The Settlers*, *Cannon Fodder*, and most platformers render backgrounds from tile maps. Each tile is a 16×16 (or 32×32) block blitted to screen coordinates:
+
+```asm
+    ; Render a 20×16 tile map (320×256 screen, 16×16 tiles)
+    ; TileMap: array of 320 bytes (20×16), each byte = tile index
+    ; TileGfx: tile graphics, 16×16 pixels × 5 planes, interleaved
+
+    lea     TileMap,a0
+    lea     Screen,a2
+    moveq   #16-1,d7            ; 16 tile rows
+
+.tilerow:
+    moveq   #20-1,d6            ; 20 tiles per row
+
+.tilecol:
+    moveq   #0,d0
+    move.b  (a0)+,d0            ; Get tile index
+    mulu    #16*5*2,d0          ; Tile data offset (16 rows × 5 planes × 2 bytes)
+    lea     TileGfx,a1
+    add.l   d0,a1               ; a1 = tile graphics pointer
+
+    bsr     WaitBlit
+    move.l  #$09F00000,$040(a5) ; BLTCON0: A+D, minterm $F0 (straight copy)
+    clr.w   $042(a5)            ; BLTCON1
+    move.w  #$FFFF,$044(a5)     ; BLTAFWM
+    move.w  #$FFFF,$046(a5)     ; BLTALWM
+    move.l  a1,$050(a5)         ; BLTAPT = tile data (interleaved)
+    move.l  a2,$054(a5)         ; BLTDPT = screen position
+    clr.w   $064(a5)            ; BLTAMOD = 0 (tile data is contiguous)
+    move.w  #38,$066(a5)        ; BLTDMOD = 40 - 2 = 38 (interleaved screen)
+    move.w  #(80<<6)|1,$058(a5) ; BLTSIZE: 80 rows (16×5) × 1 word → GO!
+
+    addq.l  #2,a2               ; Next tile position (1 word right)
+    dbf     d6,.tilecol
+
+    ; Move to next tile row: advance screen pointer by 16 scanlines × 5 planes × 40 bytes
+    add.l   #16*5*40-40,a2      ; Subtract the 40 bytes already advanced by 20 tiles
+    dbf     d7,.tilerow
 ```

 ---

-## Performance Notes
+## Good and Bad Patterns

-| Operation | Speed |
-|---|---|
-| Word copy | 4 DMA cycles per word (1 µs at 3.58 MHz) |
-| Full 320×256 clear | ~1280 µs (~1.3 ms) |
-| Cookie-cut blit | 4 channels = 4 cycles/word (same as copy) |
-| CPU vs blitter | Blitter wins for moves > ~40 words |
-| Nasty mode | `BLTPRI` in DMACON: blitter gets priority, CPU stalls |
+### ✓ Pattern: "Blit and Compute" — Overlap CPU and Blitter Work
+
+```asm
+    ; Start a blit, then do CPU work while it runs:
+    bsr     SetupAndStartBlit   ; Triggers BLTSIZE write
+    bsr     UpdatePlayerPhysics ; CPU work — runs in parallel!
+    bsr     ProcessInput        ; More CPU work
+    bsr     WaitBlit            ; NOW wait for blit to finish
+    bsr     SetupNextBlit       ; Safe to touch registers
+```
+
+This is the **entire point** of having a Blitter. Any code that busy-waits immediately after starting a blit wastes the Amiga's key advantage.
+
+### ✗ Antipattern: "The Busy-Wait Hog"
+
+```asm
+    ; ✗ BAD: Wait immediately after every blit — wastes CPU cycles
+    bsr     StartBlit
+.wait1: btst #14,$002(a5)
+    bne.s   .wait1              ; CPU does NOTHING while blitter runs
+    bsr     StartNextBlit
+.wait2: btst #14,$002(a5)
+    bne.s   .wait2              ; More wasted time
+```
+
+### ✓ Pattern: "Batch Then Wait" — Chain Setup, Single Sync Point
+
+```asm
+    ; Process all game logic FIRST:
+    bsr     RunAI
+    bsr     RunPhysics
+    bsr     AnimateFrames
+    ; THEN start the rendering blits in sequence:
+    bsr     WaitBlit
+    bsr     BlitBOB1
+    bsr     WaitBlit
+    bsr     BlitBOB2
+    bsr     WaitBlit
+    bsr     BlitBOB3
+    ; The CPU-intensive work happened during the previous frame's display time
+```
+
+### ✗ Antipattern: "The Single-Plane-At-A-Time"
+
+```asm
+    ; ✗ BAD: Blit each bitplane separately (5× setup overhead)
+    lea     Plane0,a0
+    bsr     BlitBOBOnePlane
+    lea     Plane1,a0
+    bsr     BlitBOBOnePlane
+    lea     Plane2,a0
+    bsr     BlitBOBOnePlane
+    lea     Plane3,a0
+    bsr     BlitBOBOnePlane
+    lea     Plane4,a0
+    bsr     BlitBOBOnePlane     ; 5 blits, 5 WaitBlit calls, 5× register setup
+```
+
+```asm
+    ; ✓ GOOD: Use interleaved bitplanes — ONE blit for all planes
+    bsr     BlitBOBInterleaved  ; 1 blit, 1 WaitBlit, 1× register setup
+```
+
+### ✗ Antipattern: "System-Unfriendly Direct Access"
+
+```c
+/* ✗ BAD: Hit blitter registers directly from a Workbench app */
+custom.bltcon0 = 0x09F00000;
+/* The OS may be using the blitter RIGHT NOW for window operations */
+```
+
+```c
+/* ✓ GOOD: Use OwnBlitter/DisownBlitter for exclusive access */
+OwnBlitter();           /* Wait for and lock the blitter */
+WaitBlit();              /* Ensure previous blit is done */
+/* ... safe to program registers directly ... */
+DisownBlitter();         /* Release for OS use */
+```
+
+### ✗ Antipattern: "Hardcoded 320-Pixel Modulo"
+
+```asm
+    ; ✗ BAD: Assumes screen width is always 320 pixels (modulo = 40 - blit_width*2)
+    move.w  #36,$066(a5)        ; BLTDMOD = 36 (hardcoded for 320px)
+```
+
+Many Amiga programs run on PAL overscan (352 or 384 pixels), productivity modes (640+), or RTG screens. Always calculate modulo from the actual screen byte width:
+
+```asm
+    ; ✓ GOOD: Calculate modulo from actual bitmap width
+    move.w  ScreenBytesPerRow,d0
+    sub.w   BlitWidthBytes,d0
+    move.w  d0,$066(a5)         ; BLTDMOD = dynamic
+```
+
+### ✗ Antipattern: "Ignoring the DMA Budget"
+
+The Blitter shares the DMA bus with display, audio, and disk. In high-bandwidth display modes, there are fewer free DMA slots:
+
+| Display Mode | DMA Slots Used by Display | Remaining for Blitter | Effect |
+|---|---|---|---|
+| Lores 320×256 × 5 planes | ~100 per line | ~126 per line | Full blitter speed |
+| Hires 640×256 × 4 planes | ~160 per line | ~66 per line | Blitter runs at ~50% speed |
+| Super Hires 1280 × 4 planes | ~200+ per line | ~26 per line | Blitter barely runs |
+| HAM8 (AGA) | ~200 per line | ~26 per line | Blitter barely runs |
+
+**Rule of thumb**: if your game stutters in hires modes, it's probably DMA contention, not CPU speed.
+
+---
+
+## Practical Limitations
+
+| Limitation | Detail | Workaround |
+|---|---|---|
+| **Max blit size (OCS/ECS)** | 1024 lines × 64 words (1024×1024 pixels) | Split into multiple blits |
+| **Max blit size (AGA)** | 32768 lines × 2048 words (BLTSIZV/BLTSIZH) | Rarely a practical issue |
+| **Word alignment** | Blitter operates on 16-bit word boundaries only | Use barrel shift + masks for sub-word positioning; costs 1 extra word of width |
+| **No scaling** | Cannot scale or rotate — purely rectangular block ops | Use CPU for affine transforms, then blit the result |
+| **No clipping** | Blitter will happily write outside the screen bitmap | Implement clipping in software before setting up the blit |
+| **Single operation at a time** | Only one blit can run at a time — no queue | Pipeline setup: compute next blit's parameters on CPU while current blit runs |
+| **Chip RAM only** | All 4 channels must point to Chip RAM | Use `MEMF_CHIP` for all blitter-visible allocations; see [memory_types.md](../01_hardware/common/memory_types.md) |
+| **Fill carry direction** | Fill mode only works right-to-left (descending) | Always use DESC=1 with fill; set pointers to the end of the data |
+| **No transparency levels** | Boolean operations only — 1-bit masking, no alpha | Dithering or multiple passes for graduated transparency |
+| **Line mode limitations** | Lines drawn with SING=1 for fill prep are single-dot-per-row — visible gaps on steep angles | Use non-SING mode for visible lines, SING only for fill boundaries |
+
+---
+
+## Performance Analysis
+
+### DMA Cycle Costs
+
+The Blitter consumes DMA cycles proportional to the number of **active channels**. Each active channel adds 1 DMA cycle per word per row:
+
+| Channels Active | Cycles/Word | Example Operation | Time for 320×256 (1 plane) |
+|---|---|---|---|
+| D only | 1 cycle | Screen clear | ~0.3 ms |
+| A + D | 2 cycles | Simple copy (A→D) | ~0.6 ms |
+| A + B + D | 3 cycles | Masked copy | ~0.9 ms |
+| A + B + C + D | 4 cycles | Cookie-cut blit | ~1.3 ms |
+
+> At 3.58 MHz DMA clock, 1 cycle ≈ 280 ns. A full 320×256×5-plane screen clear takes ~1.5 ms (D-only × 5 planes).
+
+### CPU vs. Blitter Crossover
+
+The Blitter is not always faster than the 68000:
+
+| Operation Size | Winner | Why |
+|---|---|---|
+| < ~40 words | CPU (68000) | Blitter setup overhead (~20 cycles) exceeds the DMA savings |
+| 40–200 words | Tie | Depends on whether CPU needs the bus |
+| > 200 words | Blitter | DMA runs independently; CPU can compute in parallel |
+| Any size (A1200) | **Measure** | 68020 can access 32-bit Fast RAM while Blitter uses Chip RAM bus — often faster to do both |
+
+### Nasty Mode (`BLTPRI`)
+
+Setting bit 10 of DMACON (`BLTPRI`) gives the Blitter absolute DMA priority — the CPU is **frozen** until the blit completes. This maximizes blitter throughput but:
+- Disables all interrupt servicing during the blit
+- Breaks timing-sensitive code (audio, serial)
+- Most professional software avoids it; demos use it freely
+
+---
+
+## When to Use / When NOT to Use
+
+### When to Use the Blitter
+
+- **Screen clearing** — D-only blit at 1 cycle/word is unbeatable
+- **BOB/sprite compositing** — cookie-cut blit is the standard technique for all Amiga game objects
+- **Scrolling** — overlapping copy with correct ascending/descending mode
+- **Polygon filling** — exclusive/inclusive fill after boundary line drawing
+- **Large memory copies** — any block > ~40 words benefits from DMA parallelism
+- **Line drawing** — hardware Bresenham is faster than any software implementation on 68000
+
+### When NOT to Use
+
+- **Small copies (< 40 words)** — 68000 `MOVEM` or `MOVE.L` loop is faster due to blitter setup overhead
+- **Fast RAM operations** — the Blitter cannot access Fast RAM at all; use CPU
+- **Pixel-level operations** — the Blitter works on word-aligned rectangles; per-pixel logic requires CPU
+- **A1200/A4000 with Fast RAM** — the 68020/030 running from 32-bit Fast RAM can often outperform the Blitter on Chip RAM, especially if you can overlap CPU work with display DMA
+
+### Applicability Ranges
+
+- **BOBs**: Practical limit ~15–20 per frame at 320×256×5 planes before exhausting DMA bandwidth
+- **Fill mode**: Works on single bitplanes only — filling a 5-plane display requires 5 passes
+- **Line mode**: Maximum line length limited by BLTSIZE height field (1024 on OCS/ECS, 32768 on AGA)
+
+---
+
+## Historical Context — The 1985 Competitive Landscape
+
+The Blitter was architecturally unprecedented in 1985. No competing home computer shipped with a comparable 2D rasterization coprocessor:
+
+| Feature | Amiga (1985) | Atari ST (1985) | PC EGA (1984) | Mac 128K (1984) | C64 (1982) |
+|---|---|---|---|---|---|
+| **Hardware blitter** | Yes — 4-channel DMA with minterm logic | No (added in Mega ST/STE, 1987 — 1 source only) | No | No | No |
+| **Channels** | 3 source + 1 dest | 1 source + 1 dest (STE) | — | — | — |
+| **Boolean ops** | 256 minterms (arbitrary 3-input logic) | 16 logic ops (STE) | — | — | — |
+| **Line drawing** | Hardware Bresenham | No | No | No | No |
+| **Area fill** | Hardware inclusive/exclusive fill | No | No | No | No |
+| **Shift/mask** | Per-channel barrel shift + first/last word masks | Shift + endmask (STE) | — | — | — |
+| **CPU relief** | Full DMA — CPU free during blit | Partial — CPU still involved (STE) | CPU does everything | CPU does everything | CPU does everything |
+
+### Pros (in 1985 context)
+
+- **Parallelism**: The 68000 could execute game logic while the Blitter handled all rendering — this was the Amiga's key advantage over every competitor
+- **Generality**: 256 minterm combinations meant any Boolean compositing operation was a single register write, not a software loop
+- **Integration**: Shared DMA bus with Copper and sprites meant the entire display pipeline was hardware-driven
+- **Line + fill in hardware**: Enabled real-time filled polygon rendering (used in games like Carrier Command, Starglider 2) that was impossible on competing platforms
+
+### Cons (in 1985 context)
+
+- **Chip RAM only**: All blitter-visible data had to live in the first 512 KB (later 1–2 MB), competing with screen memory, audio, and disk buffers
+- **Word alignment**: Sub-pixel positioning required shift + extra word width + masking — complex setup for simple operations
+- **No scaling/rotation**: Purely rectangular block operations; affine transforms required CPU
+- **DMA contention**: Heavy blitter use starved the CPU of bus cycles even without nasty mode
+
+---
+
+## Modern Analogies
+
+| Amiga Blitter Concept | Modern Equivalent | Notes |
+|---|---|---|
+| 4-channel minterm blit | GPU blend equations (Vulkan `VkBlendOp`) | The minterm is a fixed-function Boolean blend; modern GPUs use programmable shaders but the concept of combining sources through a logic function is identical |
+| Cookie-cut (A·B + ¬A·C) | Alpha compositing / Porter-Duff `SrcOver` | The Amiga used 1-bit masks; modern systems use 8-bit alpha channels, but the compositing algebra is the same |
+| DMA-driven blit | `vkCmdCopyImage` / `MTLBlitCommandEncoder` | Modern GPUs have dedicated DMA/copy engines that run asynchronously, exactly like the Blitter ran independently of the 68000 |
+| OwnBlitter/DisownBlitter | Vulkan queue submission / Metal command buffer | Exclusive access to a shared hardware resource, then release — the synchronization pattern is identical |
+| BLTPRI (nasty mode) | GPU preemption priority | Giving the transfer engine absolute bus priority at the cost of starving other consumers |
+| Fill mode | GPU rasterizer fill | Hardware polygon fill is now done by the rasterizer stage; the Blitter's XOR-toggle fill was a clever 1985 approximation |
+| BLTSIZE triggers blit | Command buffer submission | Writing the final register starts execution — analogous to `vkQueueSubmit` or `[commandBuffer commit]` |
+| Barrel shift + word masks | Texture sampling with sub-texel offset | Achieving sub-pixel positioning through hardware shift and masking |
+
+---
+
+## Pitfalls & Common Mistakes
+
+### Pitfall 1: "The Silent Corruption" — Fast RAM Pointers
+
+```asm
+    ; ✗ BAD: Buffer allocated in Fast RAM
+    move.l  #FastRAMBuffer,$054(a5)  ; BLTDPT points to Fast RAM
+    move.w  #(256<<6)|20,$058(a5)   ; Blit runs... but writes garbage
+```
+
+The Blitter's DMA engine is wired to the Chip RAM bus only. Fast RAM addresses silently alias to Chip RAM addresses or produce random data. **There is no error signal** — the blit completes "successfully" with corrupt output.
+
+```asm
+    ; ✓ GOOD: Buffer in Chip RAM
+    move.l  #ChipRAMBuffer,$054(a5) ; Allocated with MEMF_CHIP
+```
+
+### Pitfall 2: "The Race Condition" — Missing WaitBlit
+
+```asm
+    ; ✗ BAD: Start a new blit without waiting for previous one
+    move.l  #$09F00000,$040(a5)  ; Overwrite BLTCON0 while previous blit runs!
+    move.l  #NewSource,$050(a5)  ; Corrupt the in-progress blit
+    move.w  #(64<<6)|4,$058(a5)  ; Start another blit — undefined behavior
+```
+
+Modifying blitter registers while a blit is in progress produces unpredictable results — partial data, corrupted pointers, or system crashes.
+
+```asm
+    ; ✓ GOOD: Always wait
+.bwait:
+    btst    #14,$002(a5)         ; Test BBUSY in DMACONR
+    bne.s   .bwait
+    ; Now safe to set up the next blit
+```
+
+### Pitfall 3: "The Wrong Direction" — Overlapping Copy Corruption
+
+```asm
+    ; ✗ BAD: Scrolling left (dest < source) with ascending mode
+    ; Source at offset 2, dest at offset 0 — ascending overwrites source data
+    ; before it's read, producing smeared garbage
+```
+
+```asm
+    ; ✓ GOOD: Use descending mode when dest < source
+    move.w  #$0002,$042(a5)      ; BLTCON1: DESC=1
+    ; Set pointers to LAST word of block, not first
+```
+
+### Pitfall 4: "The Off-By-One Word" — Forgetting Shift Width Expansion
+
+```asm
+    ; ✗ BAD: 32-pixel wide blit at non-aligned X — width still set to 2 words
+    ; Shifted data spills into adjacent word, corrupting neighboring pixels
+    move.w  #(16<<6)|2,$058(a5)  ; Only 2 words wide — but shift needs 3!
+```
+
+```asm
+    ; ✓ GOOD: Add 1 word when shift > 0
+    move.w  #(16<<6)|3,$058(a5)  ; 3 words: 2 for data + 1 for shift overflow
+    move.w  #$FFF0,$046(a5)      ; BLTALWM masks off the rightmost 4 pixels
+```
+
+### Pitfall 5: "The Stale Pointer" — Reusing Registers After a Blit
+
+After a blit completes, all pointer registers have advanced to the **end** of the data. A second blit with the same pointers starts where the first one left off — not at the original position.
+
+```asm
+    ; ✓ GOOD: Always reload all pointers before each blit
+    move.l  #SourceAddr,$050(a5) ; Reload BLTAPT
+    move.l  #DestAddr,$054(a5)   ; Reload BLTDPT
+```
+
+---
+
+## Impact on FPGA/Emulation
+
+The Blitter is one of the most complex subsystems to reproduce accurately in an FPGA core:
+
+- **DMA slot timing**: The Blitter shares DMA slots with bitplane, sprite, Copper, disk, and audio DMA. Incorrect slot allocation produces visible glitches in demos that count cycles
+- **Barrel shifter pipeline**: The A and B channel shifts operate on a word pipeline — off-by-one in the shift register produces 1-pixel horizontal offset errors visible in scrolling
+- **Fill mode carry propagation**: The fill carry bit (`FCI`) must propagate correctly from right to left within each word and across word boundaries; errors produce "zebra stripe" artifacts
+- **Line mode octant handling**: The Bresenham algorithm implementation requires precise handling of 8 octants with correct sign and direction — many emulators get diagonal lines wrong in edge cases
+- **BLTSIZE write-trigger**: The blit must start on the exact cycle that BLTSIZE is written, not one cycle later; demos that chain blits back-to-back depend on this timing
+- **Nasty mode interaction**: `BLTPRI` must correctly freeze the CPU *and* still allow DMA from other sources (Copper, bitplanes) — freezing everything breaks display output
+
+---
+
+## Real-World Software Usage
+
+| Software | Blitter Usage | Notes |
+|---|---|---|
+| **Deluxe Paint** | Brush compositing, flood fill, line tools | Canonical use of BltBitMap + BltMaskBitMapRastPort through the OS |
+| **Shadow of the Beast** | Multi-layer parallax scrolling | Custom blitter routines for layer compositing, bypasses OS |
+| **Carrier Command** | Filled polygon 3D rendering | Blitter line draw + fill mode for real-time vector graphics |
+| **Lemmings** | Terrain destruction, character compositing | Cookie-cut blits for each lemming; XOR blits for terrain modification |
+| **Workbench** | Window dragging, icon rendering, menu drawing | All through graphics.library — system-friendly blitter usage |
+| **Demo scene** | Virtually everything | Chunky-to-planar conversion, texture mapping, copper+blitter co-programming |
+
+---
+
+## Best Practices
+
+1. **Always call `WaitBlit()` or poll BBUSY before touching any blitter register**
+2. **Write BLTSIZE last** — it triggers the blit; all other registers must be configured first
+3. **Use `OwnBlitter()`/`DisownBlitter()`** for system-friendly code — never assume you have exclusive access
+4. **Disable unused channels** — fewer channels = fewer DMA cycles = faster blit
+5. **Set BLTAFWM and BLTALWM to `$FFFF`** for word-aligned blits — forgetting this produces partial-word masking bugs
+6. **Account for shift width expansion** — non-aligned blits are 1 word wider than you expect
+7. **Choose ascending/descending correctly** for overlapping copies — test both scroll directions
+8. **Interleave CPU work with blitter operations** — the whole point of DMA is parallelism; don't busy-wait when you could be computing
+9. **Profile before choosing Blitter vs CPU** — on accelerated Amigas, the 68020+ with Fast RAM often wins

 ---

 ## References

- HRM: *Blitter* chapter — complete register descriptions
- [blitter.md](../01_hardware/ocs_a500/blitter.md) — hardware reference
- [blitter.md](blitter.md) — graphics.library BltBitMap API
+- HRM: *Amiga Hardware Reference Manual* — Blitter chapter (complete register descriptions and timing)
+- NDK 3.9: `hardware/blit.h`, `hardware/custom.h`, `graphics/gfx.h`
+- ADCD 2.1: Hardware Manual — [Blitter chapter](http://amigadev.elowar.com/read/ADCD_2.1/Hardware_Manual_guide/node006D.html)
+- See also: [blitter.md](../01_hardware/ocs_a500/blitter.md) — hardware register reference
+- See also: [animation.md](animation.md) — GEL system (BOBs use the Blitter internally)
+- See also: [copper.md](copper.md) — Copper coprocessor (often co-programmed with the Blitter)
+- See also: [rastport.md](rastport.md) — RastPort drawing context (uses Blitter for all draw operations)
+- See also: [display_modes.md](display_modes.md) — DMA slot budget (Blitter competes for bus bandwidth)
+
--- a/08_graphics/sprites.md
+++ b/08_graphics/sprites.md
@ -303,3 +303,4 @@ ClearPointer(window);
 - ADCD 2.1: `GetSprite`, `MoveSprite`, `ChangeSprite`, `FreeSprite`
 - See also: [copper_programming.md](copper_programming.md) — Copper-driven sprite multiplexing
 - See also: [rastport.md](rastport.md) — BOBs (software sprites via Blitter)
+- See also: [memory_types.md](../01_hardware/common/memory_types.md) — sprite data must reside in Chip RAM (sprite DMA)
--- a/10_devices/audio.md
+++ b/10_devices/audio.md
@ -276,3 +276,4 @@ Games and demos often bypass `audio.device` entirely:
 - HRM: *Amiga Hardware Reference Manual* — Audio DMA chapter
 - ADCD 2.1: `audio.device` autodocs
 - See also: [interrupts.md](../06_exec_os/interrupts.md) — interrupt server chain
+- See also: [memory_types.md](../01_hardware/common/memory_types.md) — audio sample buffers must reside in Chip RAM (Paula DMA)
--- a/AGENTS.md
+++ b/AGENTS.md
@ -0,0 +1,374 @@
+# AGENTS.md — Technical Article Quality Standards
+
+> **Audience**: AI coding assistants and human contributors.
+> Read this before writing or expanding any `.md` file in this repository.
+
+---
+
+## Language & Spelling
+
+- **American English only** — `color`, `behavior`, `initialize`, never `colour`, `behaviour`, `initialise`
+- Use clear, direct technical prose — no filler, no hedging, no marketing language
+- Prefer active voice: "The Blitter copies data" not "Data is copied by the Blitter"
+- **Hexadecimal**: Use Motorola syntax (`$DFF180`) for hardware registers and memory addresses, not C-style (`0xDFF180`), unless explicitly within a C code block.
+
+---
+
+## Pre-Flight: Knowledge Base Scan (MANDATORY)
+
+Before writing or expanding any article, you **must**:
+
+1. **Read the root [README.md](README.md)** — it contains the full Documentation Map with every article in the repository. Understand what already exists to avoid duplicating content and to identify cross-linking opportunities.
+2. **Scan the section's `README.md`** — know what sibling articles exist in the same folder. Your article should complement, not repeat, adjacent content.
+3. **Search for related content** — use grep or file listing to find existing mentions of your topic across the repository. If another article already covers a subtopic in depth, link to it rather than rewriting it.
+4. **Check for established patterns** — look at 2–3 exemplary articles in the repository (see "What Makes an Exemplary Article" below) to match style, depth, and structure.
+
+> [!IMPORTANT]
+> Every article must exist within the knowledge graph. Orphaned articles are unacceptable. Update the root README's Documentation Map when adding new articles.
+
+---
+
+## Article Structure
+
+Every article must follow this skeleton. Omit sections that genuinely don't apply, but the **Overview** and **Navigation** are mandatory.
+
+### 1. Navigation Breadcrumb (line 1)
+
+```markdown
+[← Home](../README.md) · [Section Name](README.md)
+```
+
+### 2. Title
+
+Use a single `#` heading. Include the subject and a subtitle with key subtopics:
+
+```markdown
+# Memory Management — AllocMem, FreeMem, MemHeader
+# Copper Programming — Deep Dive
+# Animation — GEL System: BOBs, VSprites, AnimObs
+```
+
+### 3. Overview
+
+The first section after the title. Must answer in **one dense paragraph**:
+- **What** is this subsystem/API/feature?
+- **Where** does it live in the system? (which library, which chip, which layer)
+- **Why** does it exist? What problem does it solve?
+- **Key constraints** the reader must internalize immediately
+
+Good example:
+> The Blitter (Block Image Transfer) is a DMA engine that performs raster operations on rectangular blocks of memory. It operates on up to 4 channels (A, B, C → D) using programmable minterm logic and can work independently of the CPU.
+
+Bad example:
+> This document describes the Blitter. The Blitter is used for graphics.
+
+### 4. Architecture / How It Works
+
+- Use **Mermaid diagrams** for system relationships, data flows, and state machines
+- Show where the component sits in the overall system (chip, library, OS layer)
+- Explain the **hardware backing** — which DMA engines, which custom chip registers, what bus interactions
+
+### 5. Data Structures & Register Tables
+
+- Show the **actual C struct** from NDK headers with inline comments
+- Follow with a **field description table** for non-obvious fields
+- Include the NDK source path: `/* exec/memory.h — NDK39 */`
+- Annotate critical constraints inline: `/* Chip RAM only! */`
+- **Hardware Registers**: Documentation must use the following table format, including the R/W (Read/Write/Strobe) capability:
+  ```markdown
+  | Address   | Name    | R/W | Description |
+  |-----------|---------|-----|-------------|
+  | `$DFF054` | BLTCON0 | W   | Blitter control register 0 (minterms, channels) |
+  ```
+
+### 6. API Reference
+
+- Show function prototypes with LVO offsets: `/* LVO -198 */`
+- Include practical usage snippets immediately after each prototype
+- Group related functions together
+
+### 7. Decision Guides & Comparison Tables
+
+When multiple approaches exist, provide a **decision matrix**:
+
+```markdown
+| Criterion | Option A | Option B |
+|---|---|---|
+| When to use | ... | ... |
+| Limitation | ... | ... |
+```
+
+### 8. Historical Context & Modern Analogies (MANDATORY for architectural topics)
+
+This is **not optional** for any article covering a fundamental or architectural concept.
+
+**Historical perspective:**
+- Include a **competitive landscape** table comparing to contemporary platforms (Atari ST, C64, NES, Mac, PC, arcade hardware)
+- Explain what made the Amiga's approach innovative (or not) relative to its era
+- Provide **pros/cons analysis** in the context of 1985–1994 hardware constraints
+
+**Modern analogies:**
+- Add a **comparison table** mapping Amiga concepts to modern equivalents (macOS Core Animation, Vulkan/Metal, Unity/Unreal, etc.)
+- Explain **why** the analogy holds and where it breaks down
+- Help modern developers build intuition by connecting unfamiliar retro concepts to things they already know
+
+### 9. Practical Examples
+
+- Every article must include at least one **complete, working code example**
+- Examples must compile — no pseudocode unless explicitly marked
+- Show the full lifecycle: init → use → cleanup
+- Annotate non-obvious lines with inline comments
+
+### 10. When to Use / When NOT to Use
+
+Every API or subsystem article must include explicit guidance on:
+- **When to use** — the ideal scenarios, applicability ranges, sweet spots
+- **When NOT to use** — situations where a different approach is better, with explanation of why
+- **Applicability ranges** — quantify limits (e.g., "works well up to ~20 BOBs; beyond that, custom blitter routines outperform")
+
+### 11. Best Practices & Antipatterns
+
+- **Best practices**: Numbered list of actionable recommendations. Each item should be one line.
+- **Antipatterns**: Common bad habits that compile but produce subtle bugs, poor performance, or system instability. Show the antipattern and the correct alternative side by side.
+
+### 12. Pitfalls & Common Mistakes
+
+- Use a dedicated **Pitfalls** section near the end
+- Each pitfall gets a numbered subsection with:
+  - A **bad code example** showing the bug
+  - An explanation of **why** it fails
+  - The **correct** version
+
+### 13. Use Cases
+
+Provide real-world use cases that demonstrate practical application:
+- What kind of software uses this feature?
+- Which well-known Amiga titles or applications relied on it?
+- What are the common integration patterns?
+
+### 14. FAQ (when topic resonates)
+
+For topics that commonly generate questions (memory management, blitter programming, display modes), include a short FAQ section addressing the most frequent developer questions.
+
+### 15. References
+
+- NDK header paths
+- ADCD 2.1 section references
+- Cross-links to related articles in this repository
+- External links where authoritative (Apple docs, Commodore manuals)
+
+---
+
+## Formatting Standards
+
+### Memory Maps
+
+- When illustrating memory layouts, stack frames, or hunk structures, use **monospace ASCII box-drawing** (`┌─┐`) rather than Mermaid flowcharts. Mermaid is for logic/state; ASCII boxes are for byte-precise memory layouts.
+
+### Reverse Engineering & Patching
+
+When documenting reverse engineering efforts (e.g., bypassing limitations, understanding undocumented behavior):
+- **Disassembly**: Use standard 68k/x86 disassembly blocks with file offsets and original hex bytes included.
+- **Unified Patch Tables**: Use the exact table structure below to show offset, byte delta, instruction, and rationale:
+  ```markdown
+  | File Offset | Original  | Patched   | Assembly         | Rationale |
+  |-------------|-----------|-----------|------------------|-----------|
+  | `$0001A4`   | `66 0A`   | `4E 71`   | `NOP`            | Defeats the check |
+  ```
+- **Call Graphs**: Use Mermaid diagrams for call graphs and tables for obfuscation routines/gate mechanisms.
+
+### Tables
+
+- Use tables for structured comparisons, flag lists, register maps, field descriptions
+- Always include a header row and separator
+- Keep cells concise — one concept per cell
+
+### Code Blocks
+
+- Use fenced code blocks with language tags: ` ```c `, ` ```asm `, ` ```markdown `
+- Include NDK source attribution in struct definitions
+- Use `/* comment */` for inline annotations in C code
+
+### Mermaid Diagrams
+
+- Use for architecture diagrams, data flow, state machines, and system relationships
+- Apply consistent styling: `fill:#e8f4fd,stroke:#2196f3` for DMA/hardware, `fill:#fff9c4,stroke:#f9a825` for coprocessors
+- Keep diagrams readable — no more than ~15 nodes per diagram
+
+### Alerts
+
+Use GitHub-style alerts sparingly for critical information:
+
+```markdown
+> [!NOTE]
+> Background context that aids understanding
+
+> [!WARNING]
+> Common mistake that causes data corruption or system crash
+```
+
+- **The Chip RAM Alert**: Any API, struct, or hardware register that requires DMA-accessible memory must be highlighted with a `> [!WARNING]` block explicitly stating **Requires Chip RAM**.
+
+### Horizontal Rules
+
+Use `---` to separate major sections. Don't use between subsections.
+
+---
+
+## Depth Expectations
+
+### Shallow (unacceptable)
+
+A struct dump with no context, no explanation of relationships, no examples, no pitfalls. This is a header file, not documentation.
+
+### Adequate
+
+Overview + struct descriptions + one example + references. Functional but not a learning resource.
+
+### Deep (target quality)
+
+Everything above, plus:
+- Architectural diagrams showing hardware relationships
+- Historical context and competitive landscape
+- Modern analogies for accessibility
+- Decision guides for choosing between approaches
+- Multiple examples covering common and edge cases
+- Comprehensive pitfalls with bad/good code pairs
+- Performance analysis with quantified costs
+- Cross-references to related articles
+
+**Every article in this repository should target "Deep" quality.**
+
+---
+
+## What Makes an Exemplary Article — Key Differentiators
+
+Analysis of the best articles in this repository ([exe_crunchers.md](03_loader_and_exec_format/exe_crunchers.md), [idcmp.md](09_intuition/idcmp.md), [animation.md](08_graphics/animation.md)) reveals consistent patterns that separate deep technical writing from shallow reference stubs:
+
+### 1. The "Why It Exists" Opening
+
+Every great article opens by answering *why* someone should care — not just what the thing is. Compare:
+
+- ✗ *"IDCMP is a messaging system in Intuition."*
+- ✓ *"Rather than polling for input, an Amiga application **sleeps** — consuming zero CPU — until Intuition sends a message. This is fundamental to why AmigaOS could multitask smoothly on a 7 MHz 68000 with 512 KB of RAM."*
+
+### 2. Multi-Phase Architecture Diagrams
+
+Great articles don't just describe a process — they break it into **numbered phases** with Mermaid diagrams at each stage. Example: `exe_crunchers.md` walks through 7 discrete steps of the decrunch stub, each with its own code block and explanation. This transforms opaque behavior into a debuggable mental model.
+
+### 3. Named Antipatterns with Bad/Good Pairs
+
+The best articles give antipatterns memorable names:
+- "The Kitchen Sink" (requesting all IDCMP flags)
+- "The Phantom Gadget" (dereferencing IAddress after ReplyMsg)
+- "The Signal Swallower" (checking only one signal source)
+
+Each antipattern shows the **broken code**, explains **why it breaks**, and provides the **corrected version**. This pattern is far more effective than generic warnings.
+
+### 4. Decision Flowcharts
+
+When multiple approaches exist, the best articles include a Mermaid decision flowchart that guides the reader to the correct choice. See IDCMP's "Use IDCMP or Exec MsgPort?" flowchart — it encodes the decision logic visually.
+
+### 5. Use-Case Cookbooks
+
+Beyond toy examples, exemplary articles include a **cookbook** of real-world patterns:
+- Double-click detection
+- Rubber-band selection
+- Multi-signal event loops (IDCMP + Timer + ARexx)
+- Menu multi-select chains
+
+These are copy-paste-ready patterns that solve actual developer problems.
+
+### 6. Quantified Performance Tables
+
+Great articles don't just say "this is slow" — they provide concrete numbers:
+- "Mouse movement at fast drag: ~500+ messages/sec — can starve other tasks"
+- "3 blitter ops per BOB per frame — expensive at scale"
+- "Decompression: ~2–5 seconds on a 7 MHz 68000"
+
+### 7. Cross-Platform Comparison Tables
+
+The best articles include a comparison table that maps Amiga concepts to their equivalents on other platforms (both contemporary and modern). This serves two purposes:
+- **Historical context** — shows what was unique about the Amiga
+- **Modern accessibility** — helps developers coming from Windows/macOS/Linux build intuition
+
+See IDCMP's comparison with Win32/X11/Cocoa/Qt and animation's comparison with Atari ST/C64/NES.
+
+### 8. Memory Safety Checklists
+
+For any API that involves allocation, messaging, or shared resources, exemplary articles include a **risk/cause/prevention** table that acts as a pre-flight checklist.
+
+### 9. The "Impact on FPGA/Emulation" Section
+
+Since this repository targets MiSTer FPGA developers, the best articles note implementation concerns for hardware reproduction: timing-sensitive code, self-modifying code, custom chip register access patterns, cache coherency requirements.
+
+### What Mediocre Articles Are Missing
+
+Compare the above with shallow articles that typically lack:
+- No "why" — just "what"
+- No architecture diagrams
+- No decision guides — reader doesn't know when to use the feature
+- No pitfalls — reader will hit every bug the hard way
+- No antipatterns — reader will write bad code that compiles
+- No performance data — reader has no budget intuition
+- No cross-platform context — reader can't connect to existing knowledge
+- No use-case cookbook — reader can describe the API but can't solve problems with it
+
+---
+
+## Research Methodology
+
+Before writing or expanding an article:
+
+1. **Web research** — Search for real-world usage, developer forum discussions, and existing technical analyses. Ground the article in how practitioners actually used the technology, not just what the API reference says.
+2. **Cross-reference NDK headers** — Verify struct layouts, flag values, and LVO offsets against the actual NDK 3.9 headers.
+3. **Study real software** — Reference well-known Amiga titles, demos, and applications that use the feature. Cite specific examples when possible.
+4. **Verify with hardware documentation** — Cross-check against the Amiga Hardware Reference Manual and custom chip datasheets.
+5. **Check modern parallels** — Research whether the concept has modern equivalents; this improves accessibility and reveals design insights.
+6. **Scan this repository first** — Follow the Pre-Flight Knowledge Base Scan procedure above before creating any new content.
+
+---
+
+## Content Principles
+
+1. **Hardware grounding** — Always explain which chip, which DMA channel, which register. The Amiga is a hardware platform; software docs without hardware context are incomplete.
+
+2. **No placeholders** — Every code example must be complete enough to compile. Every struct must show real fields from NDK headers. Every register must show the real address.
+
+3. **API Versioning** — Always specify the minimum OS version required for an API (e.g., 'Requires OS 2.0+'). If a struct changed between versions, document the NDK 3.9 version as the modern baseline, and use the `/* V39 */` inline comments exactly as they appear in the headers.
+
+4. **Big-Endian Warning** — Any article dealing with file formats (HUNK, IFF, ADF) or memory structures must explicitly state that the 68000 is **Big-Endian**. Modern developers will almost always read `0x1234` backwards if not reminded.
+
+5. **Honest trade-offs** — When the OS provides an API that most professional software bypassed, say so. When a feature has scaling problems, quantify them. Don't oversell.
+
+6. **Cross-linking** — Every article should link to at least 2–3 related articles. The documentation is a graph, not a list.
+
+7. **Source attribution** — Cite NDK versions, ADCD sections, and ROM Kernel Reference Manual chapters. This is a technical reference, not folklore.
+
+8. **De-abbreviation** — When introducing abbreviations (GEL, BOB, DMA, OCS), always provide the full name on first use and include a full-name column in summary tables.
+
+9. **Real-world grounding** — Every feature must be contextualized with real use cases, applicability ranges, and honest guidance on when to use alternatives. Avoid documenting APIs in a vacuum.
+
+---
+
+## README Index Maintenance
+
+When creating or significantly expanding an article:
+- Update the section's `README.md` index table
+- Update the root `README.md` Documentation Map if the article is new
+- Index entries should be descriptive, not just the topic name:
+  - ✗ `AnimOb, BOB, VSprite, GEL system`
+  - ✓ `GEL system deep dive: BOBs, VSprites, AnimObs, hardware foundation, collision detection, double buffering, performance tuning`
+
+---
+
+## Commit Messages
+!!!DO NOT make any commits without user asking!!!
+
+Use conventional commit format:
+
+```
+docs(amiga): <concise description of what changed>
+```
+
+!!!DO NOT include co-authored-by trailers!!!
--- a/README.md
+++ b/README.md
@ -16,7 +16,7 @@ The Amiga's documentation was scattered across out-of-print manuals, Usenet post

 | Layer | Coverage |
 |---|---|
-| **⚙️ Hardware** | Custom chip registers (OCS/ECS/AGA), Copper & Blitter programming with worked examples, CIA, Zorro bus, CPU feature matrix |
+| **⚙️ Hardware** | Custom chip registers (OCS/ECS/AGA), Copper & Blitter programming with worked examples, memory architecture (Chip/Fast/Slow RAM), CIA, Zorro bus, CPU feature matrix |
 | **🔌 Boot & Init** | Cold boot sequence, ROM checksum, resident module scan, Kickstart init, startup-sequence |
 | **📦 Binary Format** | HUNK executable format (every record type), relocation, debug info, overlays |
 | **🔗 Linking & ABI** | .fd files, LVO tables, register calling conventions, compiler stubs, SetFunction |
@ -38,7 +38,7 @@ The Amiga's documentation was scattered across out-of-print manuals, Usenet post
 |---|---|
 | **New to Amiga** | [History & chipsets](00_overview/history.md) → [Boot sequence](02_boot_sequence/cold_boot.md) → [Exec kernel](06_exec_os/exec_base.md) |
 | **Writing code** | [Toolchain setup](13_toolchain/gcc_amiga.md) → [Calling conventions](04_linking_and_libraries/register_conventions.md) → [.fd files](04_linking_and_libraries/fd_files.md) |
-| **Doing hardware** | [Address space](01_hardware/common/address_space.md) → [Custom registers](01_hardware/ocs_a500/custom_registers.md) → [Copper programming](08_graphics/copper_programming.md) |
+| **Doing hardware** | [Address space](01_hardware/common/address_space.md) → [Memory types](01_hardware/common/memory_types.md) → [Custom registers](01_hardware/ocs_a500/custom_registers.md) → [Copper programming](08_graphics/copper_programming.md) |
 | **Reverse engineering** | [RE methodology](05_reversing/methodology.md) → [IDA/Ghidra setup](05_reversing/ida_setup.md) → [API call identification](05_reversing/static/api_call_identification.md) |
 | **Building an FPGA core** | [Hardware models](00_overview/hardware_models.md) → [AGA chipset](01_hardware/aga_a1200_a4000/chipset_aga.md) → [68040/060 libs](15_cpu_and_mmu/68040_68060_libraries.md) |

@ -74,7 +74,7 @@ The Amiga's documentation was scattered across out-of-print manuals, Usenet post
 ### 01 — Hardware (by chipset generation)
 | Folder | Coverage |
 |---|---|
-| [common/](01_hardware/common/) | M68k CPU, address space, CIA chips, Zorro bus |
+| [common/](01_hardware/common/) | M68k CPU, address space, **memory types (Chip/Fast/Slow RAM)**, CIA chips, Zorro bus |
 | [ocs_a500/](01_hardware/ocs_a500/) | OCS chipset: custom registers, copper, blitter, sprites, Paula |
 | [ecs_a600_a3000/](01_hardware/ecs_a600_a3000/) | ECS chipset: Super Agnus, productivity modes, Gary, Gayle |
 | [aga_a1200_a4000/](01_hardware/aga_a1200_a4000/) | AGA chipset: Alice, Lisa, copper, blitter (64-bit), palette, Gayle IDE |