amiga-bootcamp/08_graphics/blitter/blitter_programming.md

[← Home](../README.md) · [Graphics](README.md)

# Blitter Programming — Deep Dive

## Overview

The **[Blitter](../../01_hardware/ocs_a500/blitter.md)** (Block Image Transferrer) is a DMA coprocessor inside the Agnus chip that performs raster operations on rectangular memory blocks at bus speed — **without CPU involvement**. While the 68000 executes game logic, physics, or AI, the Blitter simultaneously clears screens, copies bitmap regions, composites masked sprites ("cookie-cut"), draws lines, and fills polygons. This parallelism is fundamental to why the Amiga could deliver arcade-quality 2D graphics on a 7 MHz processor with 512 KB of RAM.

The Blitter operates on up to **4 DMA channels** (A, B, C → D) using a programmable **8-bit minterm** truth table that encodes any Boolean function of three inputs. Combined with per-channel shift, modulo, and first/last word masking, this makes the Blitter a general-purpose 2D rasterization engine — not merely a memory copier.

> [!WARNING]
> The Blitter can **only** access Chip RAM. Pointing any channel register at Fast RAM causes silent data corruption or system crashes. Always allocate blitter-visible memory with `AllocMem(size, MEMF_CHIP)`.

```
Channel A ──→ ┐
Channel B ──→ ├──→ Minterm Logic ──→ Channel D (output)
Channel C ──→ ┘

A = mask/pattern (e.g., cookie shape, font glyph)
B = source image data
C = background / destination read-back
D = output destination
```

---

## Architecture

The Blitter sits inside **Agnus** (OCS/ECS) or **Alice** (AGA), sharing the DMA bus with the Copper, bitplane fetches, sprite DMA, disk, and audio. It accesses memory through 4 independent DMA channels, each with its own pointer and modulo register:

```mermaid
graph LR
    subgraph "Agnus / Alice"
        A["Channel A<br/>(mask/pattern)"] --> ML["Minterm Logic<br/>(8-bit truth table)"]
        B["Channel B<br/>(source data)"] --> ML
        C["Channel C<br/>(background read-back)"] --> ML
        ML --> D["Channel D<br/>(output)"]
    end

    CRAM["Chip RAM"] --> A
    CRAM --> B
    CRAM --> C
    D --> CRAM

    style ML fill:#fff9c4,stroke:#f9a825
    style CRAM fill:#e8f4fd,stroke:#2196f3
```

The **Minterm Logic** block is the Blitter's core innovation. It takes the current bit from channels A, B, and C (three Boolean inputs) and produces one output bit for channel D according to a programmable **8-bit truth table** stored in BLTCON0 bits 7–0. Since 3 inputs have 8 possible combinations (2³), the 8-bit minterm encodes **any** Boolean function of three variables — that's 256 possible logic operations in a single register write. This is what lets one piece of hardware do copies (`D=A`, minterm `$F0`), clears (`D=0`, minterm `$00`), cookie-cut compositing (`D=A·B+¬A·C`, minterm `$CA`), XOR highlighting (`D=A⊕C`, minterm `$5A`), and any other combination — all without changing hardware, just the 8-bit minterm value. See [Minterm Logic](#minterm-logic) below for the full truth table and common values.

Each channel reads (or writes, for D) from a different memory pointer with independent modulo, allowing operations on sub-rectangles within larger bitmaps. **Writing to `BLTSIZE` ($DFF058) starts the blit immediately** — always configure all other registers first.

### Channel Roles

| Channel | DMA Direction | Typical Role | Has Shift? | Has Mask? |
|---|---|---|---|---|
| **A** | Read | Mask, cookie shape, font glyph, line texture | Yes (ASH, 0–15 px) | Yes (BLTAFWM/BLTALWM) |
| **B** | Read | Source image data | Yes (BSH, 0–15 px) | No |
| **C** | Read | Background / destination read-back | No | No |
| **D** | Write | Output destination | No | No |

> [!NOTE]
> Any channel can be disabled per operation via BLTCON0 bits 11–8 (USEA/B/C/D). Disabling unused channels **saves DMA cycles** — a D-only clear (1 channel) runs 4× faster than a full ABCD blit.

### CPU / Blitter Bus Interaction

The Blitter and the 68000 CPU share the **Chip RAM bus** — they cannot access it simultaneously. Agnus arbitrates access on a cycle-by-cycle basis:

```
┌────────────────────────────────────────────────────────────┐
│                    Chip RAM Bus (16-bit)                   │
├──────────┬──────────┬──────────┬──────────┬────────────────┤
│ Bitplane │  Sprite  │  Copper  │ Blitter  │   CPU (left-   │
│   DMA    │   DMA    │   DMA    │   DMA    │   over slots)  │
├──────────┴──────────┴──────────┴──────────┴────────────────│
│               Fixed priority (high → low)                  │
└────────────────────────────────────────────────────────────┘
```

- **Without `BLTPRI`**: The Blitter gets every other free DMA slot. The CPU gets the remaining slots. Both run at roughly half speed on the Chip RAM bus.
- **With `BLTPRI` (nasty mode)**: The Blitter takes **all** free DMA slots. The CPU is completely frozen on any Chip RAM access until the blit completes. The CPU can still execute from Fast RAM or ROM — but any Chip RAM read/write stalls.
- **Display DMA always wins**: Bitplane, sprite, and audio DMA have fixed priority above the Blitter. In high-resolution modes, display DMA alone consumes most of the bus, leaving few slots for blitter operations.

### Chip RAM vs. Fast RAM

The Blitter is physically wired to the Chip RAM bus inside Agnus. It has **no connection** to the Fast RAM (Zorro) bus:

| Memory Type | Blitter Access? | CPU Access? | Notes |
|---|---|---|---|
| **Chip RAM** (first 512 KB–2 MB) | ✓ Yes | ✓ Yes (contended) | Screen buffers, audio, sprites, all DMA-visible data |
| **Fast RAM** (Zorro II/III) | ✗ No | ✓ Yes (uncontended) | Code, variables, non-DMA data |
| **ROM** ($F80000–$FFFFFF) | ✗ No | ✓ Yes | Kickstart, libraries |

This creates the key optimization opportunity on accelerated Amigas (A1200, A3000, A4000): **the CPU can execute code and access Fast RAM at full speed while the Blitter simultaneously works on Chip RAM**. On a stock A500 with only Chip RAM, the CPU and Blitter always contend for the same bus.

> [!IMPORTANT]
> There is no hardware error when pointing blitter registers at Fast RAM addresses. The Blitter's 22-bit address lines (OCS/ECS) simply wrap into Chip RAM space — producing silent data corruption at an unpredictable Chip RAM location.

## Minterm Logic

The minterm is an **8-bit value** stored in BLTCON0 (bits 7–0) that tells the Blitter what to do with each pixel. Think of it as a tiny program: for every pixel position, the Blitter reads the current bit from channels A, B, and C, looks up the answer in the minterm, and writes that answer to channel D (destination memory).

Since there are 3 inputs (A, B, C), each either 0 or 1, there are exactly **8 possible input combinations**. The 8-bit minterm has one bit for each combination — that bit decides whether the output pixel is on (1) or off (0):

| Minterm Bit | Input A (mask) | Input B (source) | Input C (background) | "If these inputs look like this…" |
|---|---|---|---|---|
| Bit 7 | 1 | 1 | 1 | …mask on, source on, background on |
| Bit 6 | 1 | 1 | 0 | …mask on, source on, background off |
| Bit 5 | 1 | 0 | 1 | …mask on, source off, background on |
| Bit 4 | 1 | 0 | 0 | …mask on, source off, background off |
| Bit 3 | 0 | 1 | 1 | …mask off, source on, background on |
| Bit 2 | 0 | 1 | 0 | …mask off, source on, background off |
| Bit 1 | 0 | 0 | 1 | …mask off, source off, background on |
| Bit 0 | 0 | 0 | 0 | …mask off, source off, background off |

Each bit is a simple yes/no: **"should the output pixel be on for this combination?"**

### Worked Example: Cookie-Cut (`$CA`)

The most important minterm is `$CA` — the cookie-cut blit used for sprite compositing. In binary, `$CA` = `11001010`. Let's read each bit:

| Bit | A (mask) | B (source) | C (background) | `$CA` bit value | Output pixel | Why |
|---|---|---|---|---|---|---|
| 7 | on | on | on | **1** | **on** | Inside the shape, source pixel is on → show it |
| 6 | on | on | off | **1** | **on** | Inside the shape, source pixel is on → show it |
| 5 | on | off | on | **0** | **off** | Inside the shape, source pixel is off → show it (it's dark) |
| 4 | on | off | off | **0** | **off** | Inside the shape, source pixel is off → show it |
| 3 | off | on | on | **1** | **on** | Outside the shape → keep background (it's on) |
| 2 | off | on | off | **0** | **off** | Outside the shape → keep background (it's off) |
| 1 | off | off | on | **1** | **on** | Outside the shape → keep background (it's on) |
| 0 | off | off | off | **0** | **off** | Outside the shape → keep background (it's off) |

The pattern: **where the mask (A) is set → take the source pixel (B). Where the mask is clear → keep the background pixel (C).** That's a sprite draw with transparency — exactly what every Amiga game uses.

### Common Minterms

| Minterm | Hex | Operation | Description | Real-World Use Case |
|---|---|---|---|---|
| `D = A` | `$F0` | Copy A | Output is a copy of channel A — every A-set pixel appears in D | **Block copy**: duplicate a screen region, copy a font glyph to the display |
| `D = B` | `$CC` | Copy B | Output is a copy of channel B regardless of A and C | **Shifted copy**: B has a barrel shift, so this copies with pixel-level repositioning |
| `D = C` | `$AA` | Copy C | Output is a copy of the destination read-back | **No-op / readback**: useful for fill mode where C→D with fill carry toggling |
| `D = A·B + ¬A·C` | `$CA` | Cookie-cut | Where mask (A) is 1: show source (B). Where mask is 0: show background (C) | **Sprite compositing**: draw a player character with transparency onto the game world |
| `D = 0` | `$00` | Clear | Output is always 0 regardless of inputs | **Screen clear**: zero out a bitplane, erase a region |
| `D = $FFFF` | `$FF` | Set all | Output is always 1 | **Fill with 1s**: set all pixels in a region (useful for masks) |
| `D = A XOR C` | `$5A` | XOR | Output toggles wherever A has a set bit | **Cursor blink**: XOR the cursor shape to toggle it on/off without saving background |
| `D = A OR C` | `$FA` | OR | Output is set wherever either A or C has a set bit | **Overlay**: stamp a shape onto the background without erasing existing pixels |
| `D = ¬A AND C` | `$0A` | Mask out | Output keeps C pixels only where A is clear — erases through the mask | **Erase shape**: cut a hole in the background matching the mask shape (first pass of two-pass sprite draw) |
| `D = A AND B` | `$C0` | AND | Output is set only where both A and B agree | **Masked pattern**: apply a fill pattern (B) clipped to a shape (A) |
| `D = A XOR B` | `$3C` | XOR (A,B) | Output toggles between A and B differences | **Difference detection**: find which pixels changed between two frames |
| `D = NOT A` | `$0F` | Invert | Output is the bitwise complement of A | **Mask inversion**: generate a negative mask from a positive one |

### Cookie-Cut Explained

```
A = mask (1 = sprite pixel, 0 = transparent)
B = sprite image data
C = background
D = result

Minterm $CA:
  Where A=1: D = B  (show sprite)
  Where A=0: D = C  (show background)
```

---

## Register Reference

| Address | Name | R/W | Description |
|---------|------|-----|-------------|
| `$DFF040` | BLTCON0 | W | Control: ASH (bits 15–12), channel enables (bits 11–8), minterm (bits 7–0) |
| `$DFF042` | BLTCON1 | W | Control: BSH (bits 15–12), fill/line mode (bits 4–0) |
| `$DFF044` | BLTAFWM | W | First word mask for channel A |
| `$DFF046` | BLTALWM | W | Last word mask for channel A |
| `$DFF048` | BLTCPTH/L | W | Channel C pointer (32-bit) |
| `$DFF04C` | BLTBPTH/L | W | Channel B pointer (32-bit) |
| `$DFF050` | BLTAPTH/L | W | Channel A pointer (32-bit) |
| `$DFF054` | BLTDPTH/L | W | Channel D pointer (32-bit) |
| `$DFF058` | BLTSIZE | W | Blit dimensions + **START** (write triggers blit!) |
| `$DFF05A` | BLTSIZV | W | Blit height — **AGA only** (15-bit, up to 32768 lines) |
| `$DFF05C` | BLTSIZH | W | Blit width + START — **AGA only** (11-bit, up to 2048 words) |
| `$DFF060` | BLTCMOD | W | Channel C modulo (bytes to skip per row) |
| `$DFF062` | BLTBMOD | W | Channel B modulo |
| `$DFF064` | BLTAMOD | W | Channel A modulo |
| `$DFF066` | BLTDMOD | W | Channel D modulo |
| `$DFF070` | BLTCDAT | W | Channel C data register (preload) |
| `$DFF072` | BLTBDAT | W | Channel B data register (preload) |
| `$DFF074` | BLTADAT | W | Channel A data register (preload / line texture) |
| `$DFF002` | DMACONR | R | DMA status — bit 14 (BBUSY) = blitter busy |

### BLTCON0 Encoding

```
Bits 15–12: ASH  — A channel barrel shift (0–15 pixels right)
Bit  11:    USEA — enable channel A DMA
Bit  10:    USEB — enable channel B DMA
Bit   9:    USEC — enable channel C DMA
Bit   8:    USED — enable channel D DMA (almost always 1)
Bits  7–0:  LF   — minterm (logic function truth table)
```

### BLTCON1 Encoding

```
Bits 15–12: BSH  — B channel barrel shift (0–15 pixels right)
Bit   4:    IFE  — inclusive fill enable
Bit   3:    EFE  — exclusive fill enable
Bit   2:    FCI  — fill carry input (initial state)
Bit   1:    DESC — descending mode (blit bottom-right → top-left)
Bit   0:    LINE — line draw mode
```

### BLTSIZE Encoding (OCS/ECS)

```
Bits 15–6: Height in lines (1–1024, 0 = 1024)
Bits  5–0: Width in words  (1–64,   0 = 64)
```

> [!WARNING]
> **Writing BLTSIZE starts the blit!** Always configure all other registers (pointers, modulos, control, masks) before writing BLTSIZE. On AGA, write BLTSIZV first, then BLTSIZH (which triggers the blit).

### Ascending vs. Descending Mode

When source and destination overlap in memory, the blit direction determines whether data is corrupted:

```
Ascending (default, DESC=0):
  Reads/writes top-left → bottom-right
  Use when: dest address > source address

Descending (DESC=1):
  Reads/writes bottom-right → top-left
  Use when: dest address < source address
  Pointers must be set to the LAST word of the block
  Modulos are subtracted instead of added
```

This is critical for **scrolling** — shifting the screen contents by a few pixels requires an overlapping copy, and using the wrong direction produces garbage.

### Shift and Alignment

The Blitter is a **word-aligned** (16-bit) processor. Moving objects to arbitrary pixel positions requires the barrel shifter:

- **ASH** (channel A shift) and **BSH** (channel B shift) shift data 0–15 pixels to the right
- A rectangle N pixels wide at a non-aligned X position spans `⌈(N + shift) / 16⌉` words — one more than aligned
- **BLTAFWM** (first word mask) and **BLTALWM** (last word mask) prevent the shifted data from corrupting pixels outside the target area

---

## Complete Examples

### Example 1: Clear Screen (320×256, 1 bitplane)

```asm
    lea     $DFF000,a5

    ; Wait for blitter idle:
.bwait:
    btst    #14,$002(a5)    ; DMACONR bit 14 = BBUSY
    bne.s   .bwait

    ; D channel only, minterm $00 (clear):
    move.l  #$01000000,$040(a5)  ; BLTCON0: USED=1, minterm=$00
    clr.w   $042(a5)             ; BLTCON1: 0
    move.l  #ScreenMem,$054(a5)  ; BLTDPT
    clr.w   $066(a5)             ; BLTDMOD: 0 (contiguous)
    move.w  #(256<<6)|20,$058(a5) ; BLTSIZE: 256 lines × 20 words (320/16)
    ; Blit is now running!
```

### Example 2: Block Copy (No Shift)

```asm
    ; Copy 64×64 pixel block from source to dest (1 bitplane)
    ; Source and dest are in contiguous bitmap, 320 pixels wide

    ; Width = 64 pixels = 4 words
    ; Modulo = (320 - 64) / 16 = 16 words = 32 bytes

    lea     $DFF000,a5

.bwait:
    btst    #14,$002(a5)
    bne.s   .bwait

    move.l  #$09F00000,$040(a5)  ; BLTCON0: USEA+USED, minterm=$F0 (A→D)
    clr.w   $042(a5)             ; BLTCON1
    move.w  #$FFFF,$044(a5)      ; BLTAFWM = all bits
    move.w  #$FFFF,$046(a5)      ; BLTALWM = all bits
    move.l  #SourceAddr,$050(a5) ; BLTAPT
    move.l  #DestAddr,$054(a5)   ; BLTDPT
    move.w  #32,$064(a5)         ; BLTAMOD = 32 bytes
    move.w  #32,$066(a5)         ; BLTDMOD = 32 bytes
    move.w  #(64<<6)|4,$058(a5)  ; BLTSIZE: 64 lines × 4 words → GO!
```

### Example 3: Cookie-Cut Blit (Masked Sprite)

```asm
    ; Blit a 16×16 masked sprite onto background
    ; A = mask, B = sprite data, C = background, D = destination

    lea     $DFF000,a5

.bwait:
    btst    #14,$002(a5)
    bne.s   .bwait

    move.l  #$0FCA0000,$040(a5)  ; BLTCON0: A+B+C+D, minterm=$CA
    clr.w   $042(a5)             ; BLTCON1
    move.w  #$FFFF,$044(a5)      ; BLTAFWM
    move.w  #$FFFF,$046(a5)      ; BLTALWM
    move.l  #MaskData,$050(a5)   ; BLTAPT = mask
    move.l  #SpriteData,$04C(a5) ; BLTBPT = sprite imagery
    move.l  #ScreenPos,$048(a5)  ; BLTCPT = background (read-back)
    move.l  #ScreenPos,$054(a5)  ; BLTDPT = same as C (overwrite)
    clr.w   $064(a5)             ; BLTAMOD = 0 (mask is 16px = 1 word wide)
    clr.w   $062(a5)             ; BLTBMOD = 0
    move.w  #38,$060(a5)         ; BLTCMOD = (320-16)/8 = 38 bytes
    move.w  #38,$066(a5)         ; BLTDMOD = 38
    move.w  #(16<<6)|1,$058(a5)  ; BLTSIZE: 16 lines × 1 word → GO!
```

### Example 4: Line Drawing

```asm
    ; Draw a line from (x1,y1) to (x2,y2) using blitter line mode
    ; This is complex — blitter line mode uses a Bresenham-style algorithm
    ; implemented in hardware

    ; BLTCON1 bit 0 = LINE mode
    ; Channel A = single word (texture pattern)
    ; Channel C/D = destination bitmap

    ; See HRM for the full algorithm; here's the concept:
    move.l  #$0B4A0000,$040(a5)  ; BLTCON0: A+C+D, minterm=$4A (XOR), ASH=dx
    move.w  #$0001,$042(a5)      ; BLTCON1: LINE=1, octant bits set per slope
    move.w  #$8000,$074(a5)      ; BLTADAT: single pixel pattern
    move.w  #$FFFF,$044(a5)      ; BLTAFWM
    move.l  #StartPos,$048(a5)   ; BLTCPT: line start position in bitmap
    move.l  #StartPos,$054(a5)   ; BLTDPT: same
    move.w  #Modulo,$060(a5)     ; BLTCMOD
    move.w  #Modulo,$066(a5)     ; BLTDMOD
    move.w  #(len<<6)|2,$058(a5) ; BLTSIZE: length × 2 → GO!
```

---

## Advanced Use Cases & Cookbook

### Use Case 1: Shifted BOB (Sprite at Arbitrary X Position)

The most common real-world blitter task: draw a 16×16 sprite at pixel position (x, y) on a 320-pixel-wide screen. Since x may not be word-aligned, the barrel shifter handles sub-word positioning:

```asm
    ; Draw 16×16 BOB at pixel (x, y) on a 320px wide screen
    ; Inputs: d0.w = x position, d1.w = y position
    ;         a0 = mask data, a1 = sprite data, a2 = screen base

    lea     $DFF000,a5

    ; Calculate screen byte offset:
    move.w  d1,d2
    mulu    #40,d2              ; y × 40 bytes/row (320 pixels / 8)
    move.w  d0,d3
    lsr.w   #3,d3               ; x / 8 = byte offset in row
    and.w   #$FFFE,d3           ; word-align (drop bit 0)
    add.w   d3,d2               ; total byte offset into screen
    lea     (a2,d2.w),a3        ; a3 = screen pointer for this BOB

    ; Calculate shift amount:
    move.w  d0,d3
    and.w   #$000F,d3           ; shift = x mod 16 (0–15 pixels)
    ror.w   #4,d3               ; move to bits 15–12 for BLTCON0
    or.w    #$0FCA,d3           ; channels A+B+C+D, minterm $CA

.bwait:
    btst    #14,$002(a5)
    bne.s   .bwait

    move.w  d3,$040(a5)         ; BLTCON0: shift + channels + minterm
    clr.w   $042(a5)            ; BLTCON1: ascending, no fill
    move.w  #$FFFF,$044(a5)     ; BLTAFWM: all bits in first word
    move.w  #$0000,$046(a5)     ; BLTALWM: mask off last word (shift overflow)
    move.l  a0,$050(a5)         ; BLTAPT = mask
    move.l  a1,$04C(a5)         ; BLTBPT = sprite imagery
    move.l  a3,$048(a5)         ; BLTCPT = background read-back
    move.l  a3,$054(a5)         ; BLTDPT = write back to same position
    clr.w   $064(a5)            ; BLTAMOD = 0 (mask is 1 word wide)
    clr.w   $062(a5)            ; BLTBMOD = 0 (sprite is 1 word wide)
    move.w  #36,$060(a5)        ; BLTCMOD = 40 - (2 words × 2) = 36 bytes
    move.w  #36,$066(a5)        ; BLTDMOD = 36
    move.w  #(16<<6)|2,$058(a5) ; BLTSIZE: 16 lines × 2 words (1 extra for shift) → GO!
```

**Key insight**: the blit is 2 words wide even though the sprite is only 16 pixels (1 word). The barrel shift pushes bits into the second word, so we need that extra word — and `BLTALWM=$0000` masks it so we don't corrupt adjacent pixels.

### Use Case 2: Hardware Scroll (Left by N Pixels)

Scrolling the screen left means the destination is at a lower address than the source — we must use **descending mode** to avoid overwriting source data:

```asm
    ; Scroll 320×256 screen left by 16 pixels (1 word = fastest case)
    ; Source: screen + 2 bytes (one word right)
    ; Dest:   screen base
    ; No shift needed for 16-pixel increments

    lea     $DFF000,a5

.bwait:
    btst    #14,$002(a5)
    bne.s   .bwait

    move.l  #$09F00000,$040(a5) ; BLTCON0: A+D, minterm $F0 (copy)
    clr.w   $042(a5)            ; BLTCON1: ascending (dest > source is OK here)
    move.w  #$FFFF,$044(a5)     ; BLTAFWM
    move.w  #$FFFF,$046(a5)     ; BLTALWM
    move.l  #Screen+2,$050(a5)  ; BLTAPT: source is 1 word to the right
    move.l  #Screen,$054(a5)    ; BLTDPT: destination is screen start
    clr.w   $064(a5)            ; BLTAMOD = 0 (full-width rows)
    clr.w   $066(a5)            ; BLTDMOD = 0
    move.w  #(256<<6)|20,$058(a5) ; BLTSIZE: 256 lines × 20 words → GO!
    ; After blit: draw new column at right edge (column 19)
```

For sub-word scrolling (1–15 pixels), combine this with the barrel shifter and draw the new edge column from tile data.

### Use Case 3: Area Fill (Filled Polygon)

The blitter's fill mode is a two-step process: (1) draw the polygon outline with XOR lines, (2) fill the region. This is how games like *Carrier Command* and *Starglider 2* achieved real-time filled 3D:

```asm
    ; Step 1: Draw polygon edges using blitter line mode (XOR, single-bit)
    ; (Repeat for each edge of the polygon)
    ; Use minterm $4A (A XOR C) and BLTCON1 bit 0 = LINE, bit 1 = SING

    ; Step 2: Fill the outlined region
    ; Fill works RIGHT-TO-LEFT, BOTTOM-TO-TOP — requires descending mode
    ; Pointers must point to the LAST word of the bitmap region

    lea     $DFF000,a5

.bwait:
    btst    #14,$002(a5)
    bne.s   .bwait

    ; Set up inclusive fill (IFE):
    move.l  #$09F00000,$040(a5)  ; BLTCON0: A+D, minterm $F0 (copy with fill)
    move.w  #$000A,$042(a5)      ; BLTCON1: DESC=1 (bit 1), IFE=1 (bit 3)
                                  ; IFE = inclusive fill enable
    move.w  #$FFFF,$044(a5)      ; BLTAFWM
    move.w  #$FFFF,$046(a5)      ; BLTALWM

    ; Pointers to LAST word of the fill region (descending!):
    move.l  #FillBufferEnd,$050(a5) ; BLTAPT: last word of source
    move.l  #FillBufferEnd,$054(a5) ; BLTDPT: last word of dest (same buffer)
    clr.w   $064(a5)               ; BLTAMOD = 0
    clr.w   $066(a5)               ; BLTDMOD = 0
    move.w  #(Height<<6)|Width,$058(a5)  ; BLTSIZE → GO!
```

**How it works**: the fill carry bit (`FCI`) toggles on every set pixel. Between two outline pixels on the same scanline, the carry stays on — filling the interior. This is why the outline must use **single-bit mode** (SING=1) — otherwise double-width line pixels break the fill toggle.

### Use Case 4: Interleaved Bitplane BOBs

Standard bitplane layout stores all of plane 0, then all of plane 1, etc. **Interleaved** layout stores one row of plane 0, then one row of plane 1, alternating. This allows a single blit to draw a BOB across all bitplanes at once:

```asm
    ; Interleaved screen layout:
    ;   Row 0, Plane 0 (40 bytes)
    ;   Row 0, Plane 1 (40 bytes)
    ;   Row 0, Plane 2 (40 bytes)
    ;   Row 0, Plane 3 (40 bytes)
    ;   Row 0, Plane 4 (40 bytes)
    ;   Row 1, Plane 0 (40 bytes)
    ;   ...

    ; Blit a 16×16 cookie-cut BOB across all 5 bitplanes in ONE operation:
    ; Height = 16 lines × 5 planes = 80 rows
    ; Modulo = 40 - 2 = 38 bytes per interleaved row (skip rest of scanline row)
    ; BOB data is also stored interleaved

    lea     $DFF000,a5

.bwait:
    btst    #14,$002(a5)
    bne.s   .bwait

    move.l  #$0FCA0000,$040(a5) ; BLTCON0: A+B+C+D, minterm $CA
    clr.w   $042(a5)            ; BLTCON1
    move.w  #$FFFF,$044(a5)     ; BLTAFWM
    move.w  #$FFFF,$046(a5)     ; BLTALWM
    move.l  #BOBMask,$050(a5)   ; BLTAPT (interleaved mask: same mask for all planes)
    move.l  #BOBData,$04C(a5)   ; BLTBPT (interleaved sprite data)
    move.l  a3,$048(a5)         ; BLTCPT (screen position)
    move.l  a3,$054(a5)         ; BLTDPT (same)
    clr.w   $064(a5)            ; BLTAMOD = 0 (mask repeats)
    clr.w   $062(a5)            ; BLTBMOD = 0
    move.w  #38,$060(a5)        ; BLTCMOD = 38 (skip to next interleaved row)
    move.w  #38,$066(a5)        ; BLTDMOD = 38
    move.w  #(80<<6)|1,$058(a5) ; BLTSIZE: 80 rows (16×5) × 1 word → GO!
```

**Why this matters**: without interleaving, drawing one BOB on a 5-plane screen requires **5 separate blits** (one per plane), each with its own WaitBlit + register setup overhead. Interleaving does it in **1 blit** — 5× less setup time, critical when drawing 15+ BOBs per frame.

### Use Case 5: Double-Buffered Game Loop

The standard pattern for flicker-free game rendering:

```asm
MainLoop:
    ; --- Wait for vertical blank ---
    bsr     WaitVBL             ; Wait for beam to reach line 0

    ; --- Swap display buffer ---
    ; Copper list points to the currently visible buffer
    ; We draw into the hidden back buffer
    move.l  BackBuffer,a0
    move.l  FrontBuffer,a1
    move.l  a0,FrontBuffer      ; Back buffer becomes front (display)
    move.l  a1,BackBuffer       ; Old front becomes new back (draw target)

    ; Update Copper list bitplane pointers to show new front buffer:
    bsr     UpdateCopperBPLPTRs

    ; --- Clear back buffer ---
    bsr     WaitBlit
    move.l  #$01000000,$040(a5) ; D-only, minterm $00
    clr.w   $042(a5)
    move.l  a1,$054(a5)         ; BLTDPT = back buffer
    clr.w   $066(a5)
    move.w  #(256<<6)|20,$058(a5) ; Clear 320×256 → GO!

    ; --- Draw all BOBs ---
    ; CPU can process game logic while the clear blit runs!
    bsr     UpdateGameLogic     ; Physics, AI, input — runs on CPU
    bsr     WaitBlit            ; Wait for clear to finish
    bsr     DrawAllBOBs         ; Chain of cookie-cut blits

    bra     MainLoop
```

**Key optimization**: `UpdateGameLogic` runs on the CPU *while* the screen clear runs on the Blitter. This is the core of the Amiga's parallelism — ~1.5 ms of free CPU time per frame from a single D-only clear.

### Use Case 6: GUI Window Drag (System-Friendly)

Workbench and applications use `graphics.library` for window dragging, icon rendering, and menu drawing. The OS handles Blitter synchronization:

```c
#include <graphics/gfx.h>
#include <graphics/rastport.h>

/* Scroll a window's contents up by 8 pixels (text scroll): */
ScrollRaster(rp,       /* RastPort */
             0, 8,     /* dx=0, dy=8 (scroll up by 8 pixels) */
             0, 0,     /* top-left corner of scroll area */
             319, 199); /* bottom-right */
/* The OS automatically uses an ascending/descending blit, sets modulos, */
/* and clears the exposed bottom strip. */

/* Copy a rectangular region between two bitmaps: */
BltBitMap(srcBM, 0, 0,       /* source bitmap, x, y */
          dstBM, 100, 50,    /* dest bitmap, x, y */
          64, 32,            /* width, height */
          0xC0,              /* minterm: A AND B → masked copy */
          0xFF,              /* all bitplanes */
          NULL);             /* no temp buffer needed */

/* Draw a filled rectangle (uses the Blitter internally): */
SetAPen(rp, 3);              /* Set pen color to index 3 */
RectFill(rp, 10, 10, 100, 50); /* Filled rectangle */
```

### Use Case 7: Tile Map Renderer

Games like *The Settlers*, *Cannon Fodder*, and most platformers render backgrounds from tile maps. Each tile is a 16×16 (or 32×32) block blitted to screen coordinates:

```asm
    ; Render a 20×16 tile map (320×256 screen, 16×16 tiles)
    ; TileMap: array of 320 bytes (20×16), each byte = tile index
    ; TileGfx: tile graphics, 16×16 pixels × 5 planes, interleaved

    lea     TileMap,a0
    lea     Screen,a2
    moveq   #16-1,d7            ; 16 tile rows

.tilerow:
    moveq   #20-1,d6            ; 20 tiles per row

.tilecol:
    moveq   #0,d0
    move.b  (a0)+,d0            ; Get tile index
    mulu    #16*5*2,d0          ; Tile data offset (16 rows × 5 planes × 2 bytes)
    lea     TileGfx,a1
    add.l   d0,a1               ; a1 = tile graphics pointer

    bsr     WaitBlit
    move.l  #$09F00000,$040(a5) ; BLTCON0: A+D, minterm $F0 (straight copy)
    clr.w   $042(a5)            ; BLTCON1
    move.w  #$FFFF,$044(a5)     ; BLTAFWM
    move.w  #$FFFF,$046(a5)     ; BLTALWM
    move.l  a1,$050(a5)         ; BLTAPT = tile data (interleaved)
    move.l  a2,$054(a5)         ; BLTDPT = screen position
    clr.w   $064(a5)            ; BLTAMOD = 0 (tile data is contiguous)
    move.w  #38,$066(a5)        ; BLTDMOD = 40 - 2 = 38 (interleaved screen)
    move.w  #(80<<6)|1,$058(a5) ; BLTSIZE: 80 rows (16×5) × 1 word → GO!

    addq.l  #2,a2               ; Next tile position (1 word right)
    dbf     d6,.tilecol

    ; Move to next tile row: advance screen pointer by 16 scanlines × 5 planes × 40 bytes
    add.l   #16*5*40-40,a2      ; Subtract the 40 bytes already advanced by 20 tiles
    dbf     d7,.tilerow
```

---

## Good and Bad Patterns

### ✓ Pattern: "Blit and Compute" — Overlap CPU and Blitter Work

```asm
    ; Start a blit, then do CPU work while it runs:
    bsr     SetupAndStartBlit   ; Triggers BLTSIZE write
    bsr     UpdatePlayerPhysics ; CPU work — runs in parallel!
    bsr     ProcessInput        ; More CPU work
    bsr     WaitBlit            ; NOW wait for blit to finish
    bsr     SetupNextBlit       ; Safe to touch registers
```

This is the **entire point** of having a Blitter. Any code that busy-waits immediately after starting a blit wastes the Amiga's key advantage.

### ✗ Antipattern: "The Busy-Wait Hog"

```asm
    ; ✗ BAD: Wait immediately after every blit — wastes CPU cycles
    bsr     StartBlit
.wait1: btst #14,$002(a5)
    bne.s   .wait1              ; CPU does NOTHING while blitter runs
    bsr     StartNextBlit
.wait2: btst #14,$002(a5)
    bne.s   .wait2              ; More wasted time
```

### ✓ Pattern: "Batch Then Wait" — Chain Setup, Single Sync Point

```asm
    ; Process all game logic FIRST:
    bsr     RunAI
    bsr     RunPhysics
    bsr     AnimateFrames
    ; THEN start the rendering blits in sequence:
    bsr     WaitBlit
    bsr     BlitBOB1
    bsr     WaitBlit
    bsr     BlitBOB2
    bsr     WaitBlit
    bsr     BlitBOB3
    ; The CPU-intensive work happened during the previous frame's display time
```

### ✗ Antipattern: "The Single-Plane-At-A-Time"

```asm
    ; ✗ BAD: Blit each bitplane separately (5× setup overhead)
    lea     Plane0,a0
    bsr     BlitBOBOnePlane
    lea     Plane1,a0
    bsr     BlitBOBOnePlane
    lea     Plane2,a0
    bsr     BlitBOBOnePlane
    lea     Plane3,a0
    bsr     BlitBOBOnePlane
    lea     Plane4,a0
    bsr     BlitBOBOnePlane     ; 5 blits, 5 WaitBlit calls, 5× register setup
```

```asm
    ; ✓ GOOD: Use interleaved bitplanes — ONE blit for all planes
    bsr     BlitBOBInterleaved  ; 1 blit, 1 WaitBlit, 1× register setup
```

### ✗ Antipattern: "System-Unfriendly Direct Access"

```c
/* ✗ BAD: Hit blitter registers directly from a Workbench app */
custom.bltcon0 = 0x09F00000;
/* The OS may be using the blitter RIGHT NOW for window operations */
```

```c
/* ✓ GOOD: Use OwnBlitter/DisownBlitter for exclusive access */
OwnBlitter();           /* Wait for and lock the blitter */
WaitBlit();              /* Ensure previous blit is done */
/* ... safe to program registers directly ... */
DisownBlitter();         /* Release for OS use */
```

### ✗ Antipattern: "Hardcoded 320-Pixel Modulo"

```asm
    ; ✗ BAD: Assumes screen width is always 320 pixels (modulo = 40 - blit_width*2)
    move.w  #36,$066(a5)        ; BLTDMOD = 36 (hardcoded for 320px)
```

Many Amiga programs run on PAL overscan (352 or 384 pixels), productivity modes (640+), or RTG screens. Always calculate modulo from the actual screen byte width:

```asm
    ; ✓ GOOD: Calculate modulo from actual bitmap width
    move.w  ScreenBytesPerRow,d0
    sub.w   BlitWidthBytes,d0
    move.w  d0,$066(a5)         ; BLTDMOD = dynamic
```

### ✗ Antipattern: "Ignoring the DMA Budget"

The Blitter shares the DMA bus with display, audio, and disk. In high-bandwidth display modes, there are fewer free DMA slots:

| Display Mode | DMA Slots Used by Display | Remaining for Blitter | Effect |
|---|---|---|---|
| Lores 320×256 × 5 planes | ~100 per line | ~126 per line | Full blitter speed |
| Hires 640×256 × 4 planes | ~160 per line | ~66 per line | Blitter runs at ~50% speed |
| Super Hires 1280 × 4 planes | ~200+ per line | ~26 per line | Blitter barely runs |
| HAM8 (AGA) | ~200 per line | ~26 per line | Blitter barely runs |

**Rule of thumb**: if your game stutters in hires modes, it's probably DMA contention, not CPU speed.

---

## Practical Limitations

| Limitation | Detail | Workaround |
|---|---|---|
| **Max blit size (OCS/ECS)** | 1024 lines × 64 words (1024×1024 pixels) | Split into multiple blits |
| **Max blit size (AGA)** | 32768 lines × 2048 words (BLTSIZV/BLTSIZH) | Rarely a practical issue |
| **Word alignment** | Blitter operates on 16-bit word boundaries only | Use barrel shift + masks for sub-word positioning; costs 1 extra word of width |
| **No scaling** | Cannot scale or rotate — purely rectangular block ops | Use CPU for affine transforms, then blit the result |
| **No clipping** | Blitter will happily write outside the screen bitmap | Implement clipping in software before setting up the blit |
| **Single operation at a time** | Only one blit can run at a time — no queue | Pipeline setup: compute next blit's parameters on CPU while current blit runs |
| **Chip RAM only** | All 4 channels must point to Chip RAM | Use `MEMF_CHIP` for all blitter-visible allocations; see [memory_types.md](../../01_hardware/common/memory_types.md) |
| **Fill carry direction** | Fill mode only works right-to-left (descending) | Always use DESC=1 with fill; set pointers to the end of the data |
| **No transparency levels** | Boolean operations only — 1-bit masking, no alpha | Dithering or multiple passes for graduated transparency |
| **Line mode limitations** | Lines drawn with SING=1 for fill prep are single-dot-per-row — visible gaps on steep angles | Use non-SING mode for visible lines, SING only for fill boundaries |

---

## Performance Analysis

### DMA Cycle Costs

The Blitter consumes DMA cycles proportional to the number of **active channels**. Each active channel adds 1 DMA cycle per word per row:

| Channels Active | Cycles/Word | Example Operation | Time for 320×256 (1 plane) |
|---|---|---|---|
| D only | 1 cycle | Screen clear | ~0.3 ms |
| A + D | 2 cycles | Simple copy (A→D) | ~0.6 ms |
| A + B + D | 3 cycles | Masked copy | ~0.9 ms |
| A + B + C + D | 4 cycles | Cookie-cut blit | ~1.3 ms |

> At 3.58 MHz DMA clock, 1 cycle ≈ 280 ns. A full 320×256×5-plane screen clear takes ~1.5 ms (D-only × 5 planes).

### CPU vs. Blitter Crossover

The Blitter is not always faster than the 68000:

| Operation Size | Winner | Why |
|---|---|---|
| < ~40 words | CPU (68000) | Blitter setup overhead (~20 cycles) exceeds the DMA savings |
| 40–200 words | Tie | Depends on whether CPU needs the bus |
| > 200 words | Blitter | DMA runs independently; CPU can compute in parallel |
| Any size (A1200) | **Measure** | 68020 can access 32-bit Fast RAM while Blitter uses Chip RAM bus — often faster to do both |

### Nasty Mode (`BLTPRI`)

Setting bit 10 of DMACON (`BLTPRI`) gives the Blitter absolute DMA priority — the CPU is **frozen** until the blit completes. This maximizes blitter throughput but:
- Disables all interrupt servicing during the blit
- Breaks timing-sensitive code (audio, serial)
- Most professional software avoids it; demos use it freely

---

## When to Use / When NOT to Use

### When to Use the Blitter

- **Screen clearing** — D-only blit at 1 cycle/word is unbeatable
- **BOB/sprite compositing** — cookie-cut blit is the standard technique for all Amiga game objects
- **Scrolling** — overlapping copy with correct ascending/descending mode
- **Polygon filling** — exclusive/inclusive fill after boundary line drawing
- **Large memory copies** — any block > ~40 words benefits from DMA parallelism
- **Line drawing** — hardware Bresenham is faster than any software implementation on 68000

### When NOT to Use

- **Small copies (< 40 words)** — 68000 `MOVEM` or `MOVE.L` loop is faster due to blitter setup overhead
- **Fast RAM operations** — the Blitter cannot access Fast RAM at all; use CPU
- **Pixel-level operations** — the Blitter works on word-aligned rectangles; per-pixel logic requires CPU
- **A1200/A4000 with Fast RAM** — the 68020/030 running from 32-bit Fast RAM can often outperform the Blitter on Chip RAM, especially if you can overlap CPU work with display DMA

### Applicability Ranges

- **BOBs**: Practical limit ~15–20 per frame at 320×256×5 planes before exhausting DMA bandwidth
- **Fill mode**: Works on single bitplanes only — filling a 5-plane display requires 5 passes
- **Line mode**: Maximum line length limited by BLTSIZE height field (1024 on OCS/ECS, 32768 on AGA)

---

## Historical Context — The 1985 Competitive Landscape

The Blitter was architecturally unprecedented in 1985. No competing home computer shipped with a comparable 2D rasterization coprocessor:

| Feature | Amiga (1985) | Atari ST (1985) | PC EGA (1984) | Mac 128K (1984) | C64 (1982) |
|---|---|---|---|---|---|
| **Hardware blitter** | Yes — 4-channel DMA with minterm logic | No (added in Mega ST/STE, 1987 — 1 source only) | No | No | No |
| **Channels** | 3 source + 1 dest | 1 source + 1 dest (STE) | — | — | — |
| **Boolean ops** | 256 minterms (arbitrary 3-input logic) | 16 logic ops (STE) | — | — | — |
| **Line drawing** | Hardware Bresenham | No | No | No | No |
| **Area fill** | Hardware inclusive/exclusive fill | No | No | No | No |
| **Shift/mask** | Per-channel barrel shift + first/last word masks | Shift + endmask (STE) | — | — | — |
| **CPU relief** | Full DMA — CPU free during blit | Partial — CPU still involved (STE) | CPU does everything | CPU does everything | CPU does everything |

### Pros (in 1985 context)

- **Parallelism**: The 68000 could execute game logic while the Blitter handled all rendering — this was the Amiga's key advantage over every competitor
- **Generality**: 256 minterm combinations meant any Boolean compositing operation was a single register write, not a software loop
- **Integration**: Shared DMA bus with Copper and sprites meant the entire display pipeline was hardware-driven
- **Line + fill in hardware**: Enabled real-time filled polygon rendering (used in games like Carrier Command, Starglider 2) that was impossible on competing platforms

### Cons (in 1985 context)

- **Chip RAM only**: All blitter-visible data had to live in the first 512 KB (later 1–2 MB), competing with screen memory, audio, and disk buffers
- **Word alignment**: Sub-pixel positioning required shift + extra word width + masking — complex setup for simple operations
- **No scaling/rotation**: Purely rectangular block operations; affine transforms required CPU
- **DMA contention**: Heavy blitter use starved the CPU of bus cycles even without nasty mode

---

## Modern Analogies

| Amiga Blitter Concept | Modern Equivalent | Notes |
|---|---|---|
| 4-channel minterm blit | GPU blend equations (Vulkan `VkBlendOp`) | The minterm is a fixed-function Boolean blend; modern GPUs use programmable shaders but the concept of combining sources through a logic function is identical |
| Cookie-cut (A·B + ¬A·C) | Alpha compositing / Porter-Duff `SrcOver` | The Amiga used 1-bit masks; modern systems use 8-bit alpha channels, but the compositing algebra is the same |
| DMA-driven blit | `vkCmdCopyImage` / `MTLBlitCommandEncoder` | Modern GPUs have dedicated DMA/copy engines that run asynchronously, exactly like the Blitter ran independently of the 68000 |
| OwnBlitter/DisownBlitter | Vulkan queue submission / Metal command buffer | Exclusive access to a shared hardware resource, then release — the synchronization pattern is identical |
| BLTPRI (nasty mode) | GPU preemption priority | Giving the transfer engine absolute bus priority at the cost of starving other consumers |
| Fill mode | GPU rasterizer fill | Hardware polygon fill is now done by the rasterizer stage; the Blitter's XOR-toggle fill was a clever 1985 approximation |
| BLTSIZE triggers blit | Command buffer submission | Writing the final register starts execution — analogous to `vkQueueSubmit` or `[commandBuffer commit]` |
| Barrel shift + word masks | Texture sampling with sub-texel offset | Achieving sub-pixel positioning through hardware shift and masking |

---

## Pitfalls & Common Mistakes

### Pitfall 1: "The Silent Corruption" — Fast RAM Pointers

```asm
    ; ✗ BAD: Buffer allocated in Fast RAM
    move.l  #FastRAMBuffer,$054(a5)  ; BLTDPT points to Fast RAM
    move.w  #(256<<6)|20,$058(a5)   ; Blit runs... but writes garbage
```

The Blitter's DMA engine is wired to the Chip RAM bus only. Fast RAM addresses silently alias to Chip RAM addresses or produce random data. **There is no error signal** — the blit completes "successfully" with corrupt output.

```asm
    ; ✓ GOOD: Buffer in Chip RAM
    move.l  #ChipRAMBuffer,$054(a5) ; Allocated with MEMF_CHIP
```

### Pitfall 2: "The Race Condition" — Missing WaitBlit

```asm
    ; ✗ BAD: Start a new blit without waiting for previous one
    move.l  #$09F00000,$040(a5)  ; Overwrite BLTCON0 while previous blit runs!
    move.l  #NewSource,$050(a5)  ; Corrupt the in-progress blit
    move.w  #(64<<6)|4,$058(a5)  ; Start another blit — undefined behavior
```

Modifying blitter registers while a blit is in progress produces unpredictable results — partial data, corrupted pointers, or system crashes.

```asm
    ; ✓ GOOD: Always wait
.bwait:
    btst    #14,$002(a5)         ; Test BBUSY in DMACONR
    bne.s   .bwait
    ; Now safe to set up the next blit
```

### Pitfall 3: "The Wrong Direction" — Overlapping Copy Corruption

```asm
    ; ✗ BAD: Scrolling left (dest < source) with ascending mode
    ; Source at offset 2, dest at offset 0 — ascending overwrites source data
    ; before it's read, producing smeared garbage
```

```asm
    ; ✓ GOOD: Use descending mode when dest < source
    move.w  #$0002,$042(a5)      ; BLTCON1: DESC=1
    ; Set pointers to LAST word of block, not first
```

### Pitfall 4: "The Off-By-One Word" — Forgetting Shift Width Expansion

```asm
    ; ✗ BAD: 32-pixel wide blit at non-aligned X — width still set to 2 words
    ; Shifted data spills into adjacent word, corrupting neighboring pixels
    move.w  #(16<<6)|2,$058(a5)  ; Only 2 words wide — but shift needs 3!
```

```asm
    ; ✓ GOOD: Add 1 word when shift > 0
    move.w  #(16<<6)|3,$058(a5)  ; 3 words: 2 for data + 1 for shift overflow
    move.w  #$FFF0,$046(a5)      ; BLTALWM masks off the rightmost 4 pixels
```

### Pitfall 5: "The Stale Pointer" — Reusing Registers After a Blit

After a blit completes, all pointer registers have advanced to the **end** of the data. A second blit with the same pointers starts where the first one left off — not at the original position.

```asm
    ; ✓ GOOD: Always reload all pointers before each blit
    move.l  #SourceAddr,$050(a5) ; Reload BLTAPT
    move.l  #DestAddr,$054(a5)   ; Reload BLTDPT
```

---

## Impact on FPGA/Emulation

The Blitter is one of the most complex subsystems to reproduce accurately in an FPGA core:

- **DMA slot timing**: The Blitter shares DMA slots with bitplane, sprite, Copper, disk, and audio DMA. Incorrect slot allocation produces visible glitches in demos that count cycles
- **Barrel shifter pipeline**: The A and B channel shifts operate on a word pipeline — off-by-one in the shift register produces 1-pixel horizontal offset errors visible in scrolling
- **Fill mode carry propagation**: The fill carry bit (`FCI`) must propagate correctly from right to left within each word and across word boundaries; errors produce "zebra stripe" artifacts
- **Line mode octant handling**: The Bresenham algorithm implementation requires precise handling of 8 octants with correct sign and direction — many emulators get diagonal lines wrong in edge cases
- **BLTSIZE write-trigger**: The blit must start on the exact cycle that BLTSIZE is written, not one cycle later; demos that chain blits back-to-back depend on this timing
- **Nasty mode interaction**: `BLTPRI` must correctly freeze the CPU *and* still allow DMA from other sources (Copper, bitplanes) — freezing everything breaks display output

---

## Real-World Software Usage

| Software | Blitter Usage | Notes |
|---|---|---|
| **Deluxe Paint** | Brush compositing, flood fill, line tools | Canonical use of BltBitMap + BltMaskBitMapRastPort through the OS |
| **Shadow of the Beast** | Multi-layer parallax scrolling | Custom blitter routines for layer compositing, bypasses OS |
| **Carrier Command** | Filled polygon 3D rendering | Blitter line draw + fill mode for real-time vector graphics |
| **Lemmings** | Terrain destruction, character compositing | Cookie-cut blits for each lemming; XOR blits for terrain modification |
| **Workbench** | Window dragging, icon rendering, menu drawing | All through graphics.library — system-friendly blitter usage |
| **Demo scene** | Virtually everything | Chunky-to-planar conversion, texture mapping, copper+blitter co-programming |

---

## Best Practices

1. **Always call `WaitBlit()` or poll BBUSY before touching any blitter register**
2. **Write BLTSIZE last** — it triggers the blit; all other registers must be configured first
3. **Use `OwnBlitter()`/`DisownBlitter()`** for system-friendly code — never assume you have exclusive access
4. **Disable unused channels** — fewer channels = fewer DMA cycles = faster blit
5. **Set BLTAFWM and BLTALWM to `$FFFF`** for word-aligned blits — forgetting this produces partial-word masking bugs
6. **Account for shift width expansion** — non-aligned blits are 1 word wider than you expect
7. **Choose ascending/descending correctly** for overlapping copies — test both scroll directions
8. **Interleave CPU work with blitter operations** — the whole point of DMA is parallelism; don't busy-wait when you could be computing
9. **Profile before choosing Blitter vs CPU** — on accelerated Amigas, the 68020+ with Fast RAM often wins

---

## References

- HRM: *Amiga Hardware Reference Manual* — Blitter chapter (complete register descriptions and timing)
- NDK 3.9: `hardware/blit.h`, `hardware/custom.h`, `graphics/gfx.h`
- ADCD 2.1: Hardware Manual — [Blitter chapter](http://amigadev.elowar.com/read/ADCD_2.1/Hardware_Manual_guide/node006D.html)
- See also: [blitter.md](../../01_hardware/ocs_a500/blitter.md) — hardware register reference
- See also: [animation.md](../animation.md) — GEL system (BOBs use the Blitter internally)
- See also: [copper.md](../copper/copper.md) — Copper coprocessor (often co-programmed with the Blitter)
- See also: [rastport.md](../rastport.md) — RastPort drawing context (uses Blitter for all draw operations)
- See also: [display_modes.md](../display_modes.md) — DMA slot budget (Blitter competes for bus bandwidth)
- See also: [Akiko — CD32 C2P](../../01_hardware/aga_a1200_a4000/akiko_cd32.md) — hardware Chunky-to-Planar conversion (CD32 alternative to CPU/Blitter C2P)
- **Scoopex Amiga Hardware Programming** (Photon) — [YouTube: Blitter episodes](https://www.youtube.com/playlist?list=PLc3ltHgmiidpK-s0eP5hTKJnjdTHz0_bW) — Video walkthroughs of Blitter setup, cookie-cut masking, line draw, and fill mode. Companion articles: [coppershade.org](http://coppershade.org/articles/)