The **[Blitter](../01_hardware/ocs_a500/blitter.md)** (Block Image Transferrer) is a DMA coprocessor inside the Agnus chip that performs raster operations on rectangular memory blocks at bus speed — **without CPU involvement**. While the 68000 executes game logic, physics, or AI, the Blitter simultaneously clears screens, copies bitmap regions, composites masked sprites ("cookie-cut"), draws lines, and fills polygons. This parallelism is fundamental to why the Amiga could deliver arcade-quality 2D graphics on a 7 MHz processor with 512 KB of RAM.
The Blitter operates on up to **4 DMA channels** (A, B, C → D) using a programmable **8-bit minterm** truth table that encodes any Boolean function of three inputs. Combined with per-channel shift, modulo, and first/last word masking, this makes the Blitter a general-purpose 2D rasterization engine — not merely a memory copier.
> The Blitter can **only** access Chip RAM. Pointing any channel register at Fast RAM causes silent data corruption or system crashes. Always allocate blitter-visible memory with `AllocMem(size, MEMF_CHIP)`.
The Blitter sits inside **Agnus** (OCS/ECS) or **Alice** (AGA), sharing the DMA bus with the Copper, bitplane fetches, sprite DMA, disk, and audio. It accesses memory through 4 independent DMA channels, each with its own pointer and modulo register:
```mermaid
graph LR
subgraph "Agnus / Alice"
A["Channel A<br/>(mask/pattern)"] --> ML["Minterm Logic<br/>(8-bit truth table)"]
The **Minterm Logic** block is the Blitter's core innovation. It takes the current bit from channels A, B, and C (three Boolean inputs) and produces one output bit for channel D according to a programmable **8-bit truth table** stored in BLTCON0 bits 7–0. Since 3 inputs have 8 possible combinations (2³), the 8-bit minterm encodes **any** Boolean function of three variables — that's 256 possible logic operations in a single register write. This is what lets one piece of hardware do copies (`D=A`, minterm `$F0`), clears (`D=0`, minterm `$00`), cookie-cut compositing (`D=A·B+¬A·C`, minterm `$CA`), XOR highlighting (`D=A⊕C`, minterm `$5A`), and any other combination — all without changing hardware, just the 8-bit minterm value. See [Minterm Logic](#minterm-logic) below for the full truth table and common values.
Each channel reads (or writes, for D) from a different memory pointer with independent modulo, allowing operations on sub-rectangles within larger bitmaps. **Writing to `BLTSIZE` ($DFF058) starts the blit immediately** — always configure all other registers first.
### Channel Roles
| Channel | DMA Direction | Typical Role | Has Shift? | Has Mask? |
| **B** | Read | Source image data | Yes (BSH, 0–15 px) | No |
| **C** | Read | Background / destination read-back | No | No |
| **D** | Write | Output destination | No | No |
> [!NOTE]
> Any channel can be disabled per operation via BLTCON0 bits 11–8 (USEA/B/C/D). Disabling unused channels **saves DMA cycles** — a D-only clear (1 channel) runs 4× faster than a full ABCD blit.
### CPU / Blitter Bus Interaction
The Blitter and the 68000 CPU share the **Chip RAM bus** — they cannot access it simultaneously. Agnus arbitrates access on a cycle-by-cycle basis:
- **Without `BLTPRI`**: The Blitter gets every other free DMA slot. The CPU gets the remaining slots. Both run at roughly half speed on the Chip RAM bus.
- **With `BLTPRI` (nasty mode)**: The Blitter takes **all** free DMA slots. The CPU is completely frozen on any Chip RAM access until the blit completes. The CPU can still execute from Fast RAM or ROM — but any Chip RAM read/write stalls.
- **Display DMA always wins**: Bitplane, sprite, and audio DMA have fixed priority above the Blitter. In high-resolution modes, display DMA alone consumes most of the bus, leaving few slots for blitter operations.
This creates the key optimization opportunity on accelerated Amigas (A1200, A3000, A4000): **the CPU can execute code and access Fast RAM at full speed while the Blitter simultaneously works on Chip RAM**. On a stock A500 with only Chip RAM, the CPU and Blitter always contend for the same bus.
> [!IMPORTANT]
> There is no hardware error when pointing blitter registers at Fast RAM addresses. The Blitter's 22-bit address lines (OCS/ECS) simply wrap into Chip RAM space — producing silent data corruption at an unpredictable Chip RAM location.
## Minterm Logic
The minterm is an **8-bit value** stored in BLTCON0 (bits 7–0) that tells the Blitter what to do with each pixel. Think of it as a tiny program: for every pixel position, the Blitter reads the current bit from channels A, B, and C, looks up the answer in the minterm, and writes that answer to channel D (destination memory).
Since there are 3 inputs (A, B, C), each either 0 or 1, there are exactly **8 possible input combinations**. The 8-bit minterm has one bit for each combination — that bit decides whether the output pixel is on (1) or off (0):
| Minterm Bit | Input A (mask) | Input B (source) | Input C (background) | "If these inputs look like this…" |
|---|---|---|---|---|
| Bit 7 | 1 | 1 | 1 | …mask on, source on, background on |
| Bit 6 | 1 | 1 | 0 | …mask on, source on, background off |
| Bit 5 | 1 | 0 | 1 | …mask on, source off, background on |
| Bit 4 | 1 | 0 | 0 | …mask on, source off, background off |
| Bit 3 | 0 | 1 | 1 | …mask off, source on, background on |
| Bit 2 | 0 | 1 | 0 | …mask off, source on, background off |
| Bit 1 | 0 | 0 | 1 | …mask off, source off, background on |
| Bit 0 | 0 | 0 | 0 | …mask off, source off, background off |
Each bit is a simple yes/no: **"should the output pixel be on for this combination?"**
### Worked Example: Cookie-Cut (`$CA`)
The most important minterm is `$CA` — the cookie-cut blit used for sprite compositing. In binary, `$CA` = `11001010`. Let's read each bit:
| Bit | A (mask) | B (source) | C (background) | `$CA` bit value | Output pixel | Why |
|---|---|---|---|---|---|---|
| 7 | on | on | on | **1** | **on** | Inside the shape, source pixel is on → show it |
| 6 | on | on | off | **1** | **on** | Inside the shape, source pixel is on → show it |
| 5 | on | off | on | **0** | **off** | Inside the shape, source pixel is off → show it (it's dark) |
| 4 | on | off | off | **0** | **off** | Inside the shape, source pixel is off → show it |
| 3 | off | on | on | **1** | **on** | Outside the shape → keep background (it's on) |
| 2 | off | on | off | **0** | **off** | Outside the shape → keep background (it's off) |
| 1 | off | off | on | **1** | **on** | Outside the shape → keep background (it's on) |
| 0 | off | off | off | **0** | **off** | Outside the shape → keep background (it's off) |
The pattern: **where the mask (A) is set → take the source pixel (B). Where the mask is clear → keep the background pixel (C).** That's a sprite draw with transparency — exactly what every Amiga game uses.
### Common Minterms
| Minterm | Hex | Operation | Description | Real-World Use Case |
|---|---|---|---|---|
| `D = A` | `$F0` | Copy A | Output is a copy of channel A — every A-set pixel appears in D | **Block copy**: duplicate a screen region, copy a font glyph to the display |
| `D = B` | `$CC` | Copy B | Output is a copy of channel B regardless of A and C | **Shifted copy**: B has a barrel shift, so this copies with pixel-level repositioning |
| `D = C` | `$AA` | Copy C | Output is a copy of the destination read-back | **No-op / readback**: useful for fill mode where C→D with fill carry toggling |
| `D = A·B + ¬A·C` | `$CA` | Cookie-cut | Where mask (A) is 1: show source (B). Where mask is 0: show background (C) | **Sprite compositing**: draw a player character with transparency onto the game world |
| `D = 0` | `$00` | Clear | Output is always 0 regardless of inputs | **Screen clear**: zero out a bitplane, erase a region |
| `D = $FFFF` | `$FF` | Set all | Output is always 1 | **Fill with 1s**: set all pixels in a region (useful for masks) |
| `D = A XOR C` | `$5A` | XOR | Output toggles wherever A has a set bit | **Cursor blink**: XOR the cursor shape to toggle it on/off without saving background |
| `D = A OR C` | `$FA` | OR | Output is set wherever either A or C has a set bit | **Overlay**: stamp a shape onto the background without erasing existing pixels |
| `D = ¬A AND C` | `$0A` | Mask out | Output keeps C pixels only where A is clear — erases through the mask | **Erase shape**: cut a hole in the background matching the mask shape (first pass of two-pass sprite draw) |
| `D = A AND B` | `$C0` | AND | Output is set only where both A and B agree | **Masked pattern**: apply a fill pattern (B) clipped to a shape (A) |
| `D = A XOR B` | `$3C` | XOR (A,B) | Output toggles between A and B differences | **Difference detection**: find which pixels changed between two frames |
| `D = NOT A` | `$0F` | Invert | Output is the bitwise complement of A | **Mask inversion**: generate a negative mask from a positive one |
> **Writing BLTSIZE starts the blit!** Always configure all other registers (pointers, modulos, control, masks) before writing BLTSIZE. On AGA, write BLTSIZV first, then BLTSIZH (which triggers the blit).
### Ascending vs. Descending Mode
When source and destination overlap in memory, the blit direction determines whether data is corrupted:
```
Ascending (default, DESC=0):
Reads/writes top-left → bottom-right
Use when: dest address > source address
Descending (DESC=1):
Reads/writes bottom-right → top-left
Use when: dest address <sourceaddress
Pointers must be set to the LAST word of the block
This is critical for **scrolling** — shifting the screen contents by a few pixels requires an overlapping copy, and using the wrong direction produces garbage.
### Shift and Alignment
The Blitter is a **word-aligned** (16-bit) processor. Moving objects to arbitrary pixel positions requires the barrel shifter:
- **ASH** (channel A shift) and **BSH** (channel B shift) shift data 0–15 pixels to the right
- A rectangle N pixels wide at a non-aligned X position spans `⌈(N + shift) / 16⌉` words — one more than aligned
- **BLTAFWM** (first word mask) and **BLTALWM** (last word mask) prevent the shifted data from corrupting pixels outside the target area
### Use Case 1: Shifted BOB (Sprite at Arbitrary X Position)
The most common real-world blitter task: draw a 16×16 sprite at pixel position (x, y) on a 320-pixel-wide screen. Since x may not be word-aligned, the barrel shifter handles sub-word positioning:
```asm
; Draw 16×16 BOB at pixel (x, y) on a 320px wide screen
**Key insight**: the blit is 2 words wide even though the sprite is only 16 pixels (1 word). The barrel shift pushes bits into the second word, so we need that extra word — and `BLTALWM=$0000` masks it so we don't corrupt adjacent pixels.
### Use Case 2: Hardware Scroll (Left by N Pixels)
Scrolling the screen left means the destination is at a lower address than the source — we must use **descending mode** to avoid overwriting source data:
```asm
; Scroll 320×256 screen left by 16 pixels (1 word = fastest case)
; After blit: draw new column at right edge (column 19)
```
For sub-word scrolling (1–15 pixels), combine this with the barrel shifter and draw the new edge column from tile data.
### Use Case 3: Area Fill (Filled Polygon)
The blitter's fill mode is a two-step process: (1) draw the polygon outline with XOR lines, (2) fill the region. This is how games like *Carrier Command* and *Starglider 2* achieved real-time filled 3D:
```asm
; Step 1: Draw polygon edges using blitter line mode (XOR, single-bit)
; (Repeat for each edge of the polygon)
; Use minterm $4A (A XOR C) and BLTCON1 bit 0 = LINE, bit 1 = SING
; Step 2: Fill the outlined region
; Fill works RIGHT-TO-LEFT, BOTTOM-TO-TOP — requires descending mode
; Pointers must point to the LAST word of the bitmap region
lea $DFF000,a5
.bwait:
btst #14,$002(a5)
bne.s .bwait
; Set up inclusive fill (IFE):
move.l #$09F00000,$040(a5) ; BLTCON0: A+D, minterm $F0 (copy with fill)
; Pointers to LAST word of the fill region (descending!):
move.l #FillBufferEnd,$050(a5) ; BLTAPT: last word of source
move.l #FillBufferEnd,$054(a5) ; BLTDPT: last word of dest (same buffer)
clr.w $064(a5) ; BLTAMOD = 0
clr.w $066(a5) ; BLTDMOD = 0
move.w #(Height<<6)|Width,$058(a5);BLTSIZE→GO!
```
**How it works**: the fill carry bit (`FCI`) toggles on every set pixel. Between two outline pixels on the same scanline, the carry stays on — filling the interior. This is why the outline must use **single-bit mode** (SING=1) — otherwise double-width line pixels break the fill toggle.
### Use Case 4: Interleaved Bitplane BOBs
Standard bitplane layout stores all of plane 0, then all of plane 1, etc. **Interleaved** layout stores one row of plane 0, then one row of plane 1, alternating. This allows a single blit to draw a BOB across all bitplanes at once:
```asm
; Interleaved screen layout:
; Row 0, Plane 0 (40 bytes)
; Row 0, Plane 1 (40 bytes)
; Row 0, Plane 2 (40 bytes)
; Row 0, Plane 3 (40 bytes)
; Row 0, Plane 4 (40 bytes)
; Row 1, Plane 0 (40 bytes)
; ...
; Blit a 16×16 cookie-cut BOB across all 5 bitplanes in ONE operation:
; Height = 16 lines × 5 planes = 80 rows
; Modulo = 40 - 2 = 38 bytes per interleaved row (skip rest of scanline row)
**Why this matters**: without interleaving, drawing one BOB on a 5-plane screen requires **5 separate blits** (one per plane), each with its own WaitBlit + register setup overhead. Interleaving does it in **1 blit** — 5× less setup time, critical when drawing 15+ BOBs per frame.
### Use Case 5: Double-Buffered Game Loop
The standard pattern for flicker-free game rendering:
```asm
MainLoop:
; --- Wait for vertical blank ---
bsr WaitVBL ; Wait for beam to reach line 0
; --- Swap display buffer ---
; Copper list points to the currently visible buffer
; We draw into the hidden back buffer
move.l BackBuffer,a0
move.l FrontBuffer,a1
move.l a0,FrontBuffer ; Back buffer becomes front (display)
move.l a1,BackBuffer ; Old front becomes new back (draw target)
; Update Copper list bitplane pointers to show new front buffer:
bsr UpdateCopperBPLPTRs
; --- Clear back buffer ---
bsr WaitBlit
move.l #$01000000,$040(a5) ; D-only, minterm $00
clr.w $042(a5)
move.l a1,$054(a5) ; BLTDPT = back buffer
clr.w $066(a5)
move.w #(256<<6)|20,$058(a5);Clear320×256→GO!
; --- Draw all BOBs ---
; CPU can process game logic while the clear blit runs!
bsr UpdateGameLogic ; Physics, AI, input — runs on CPU
bsr WaitBlit ; Wait for clear to finish
bsr DrawAllBOBs ; Chain of cookie-cut blits
bra MainLoop
```
**Key optimization**: `UpdateGameLogic` runs on the CPU *while* the screen clear runs on the Blitter. This is the core of the Amiga's parallelism — ~1.5 ms of free CPU time per frame from a single D-only clear.
### Use Case 6: GUI Window Drag (System-Friendly)
Workbench and applications use `graphics.library` for window dragging, icon rendering, and menu drawing. The OS handles Blitter synchronization:
```c
#include <graphics/gfx.h>
#include <graphics/rastport.h>
/* Scroll a window's contents up by 8 pixels (text scroll): */
ScrollRaster(rp, /* RastPort */
0, 8, /* dx=0, dy=8 (scroll up by 8 pixels) */
0, 0, /* top-left corner of scroll area */
319, 199); /* bottom-right */
/* The OS automatically uses an ascending/descending blit, sets modulos, */
/* and clears the exposed bottom strip. */
/* Copy a rectangular region between two bitmaps: */
BltBitMap(srcBM, 0, 0, /* source bitmap, x, y */
dstBM, 100, 50, /* dest bitmap, x, y */
64, 32, /* width, height */
0xC0, /* minterm: A AND B → masked copy */
0xFF, /* all bitplanes */
NULL); /* no temp buffer needed */
/* Draw a filled rectangle (uses the Blitter internally): */
Games like *The Settlers*, *Cannon Fodder*, and most platformers render backgrounds from tile maps. Each tile is a 16×16 (or 32×32) block blitted to screen coordinates:
```asm
; Render a 20×16 tile map (320×256 screen, 16×16 tiles)
; TileMap: array of 320 bytes (20×16), each byte = tile index
move.w #36,$066(a5) ; BLTDMOD = 36 (hardcoded for 320px)
```
Many Amiga programs run on PAL overscan (352 or 384 pixels), productivity modes (640+), or RTG screens. Always calculate modulo from the actual screen byte width:
```asm
; ✓ GOOD: Calculate modulo from actual bitmap width
move.w ScreenBytesPerRow,d0
sub.w BlitWidthBytes,d0
move.w d0,$066(a5) ; BLTDMOD = dynamic
```
### ✗ Antipattern: "Ignoring the DMA Budget"
The Blitter shares the DMA bus with display, audio, and disk. In high-bandwidth display modes, there are fewer free DMA slots:
| Display Mode | DMA Slots Used by Display | Remaining for Blitter | Effect |
|---|---|---|---|
| Lores 320×256 × 5 planes | ~100 per line | ~126 per line | Full blitter speed |
| Hires 640×256 × 4 planes | ~160 per line | ~66 per line | Blitter runs at ~50% speed |
| Super Hires 1280 × 4 planes | ~200+ per line | ~26 per line | Blitter barely runs |
| HAM8 (AGA) | ~200 per line | ~26 per line | Blitter barely runs |
**Rule of thumb**: if your game stutters in hires modes, it's probably DMA contention, not CPU speed.
---
## Practical Limitations
| Limitation | Detail | Workaround |
|---|---|---|
| **Max blit size (OCS/ECS)** | 1024 lines × 64 words (1024×1024 pixels) | Split into multiple blits |
| **Max blit size (AGA)** | 32768 lines × 2048 words (BLTSIZV/BLTSIZH) | Rarely a practical issue |
| **Word alignment** | Blitter operates on 16-bit word boundaries only | Use barrel shift + masks for sub-word positioning; costs 1 extra word of width |
| **No scaling** | Cannot scale or rotate — purely rectangular block ops | Use CPU for affine transforms, then blit the result |
| **No clipping** | Blitter will happily write outside the screen bitmap | Implement clipping in software before setting up the blit |
| **Single operation at a time** | Only one blit can run at a time — no queue | Pipeline setup: compute next blit's parameters on CPU while current blit runs |
| **Chip RAM only** | All 4 channels must point to Chip RAM | Use `MEMF_CHIP` for all blitter-visible allocations; see [memory_types.md](../01_hardware/common/memory_types.md) |
| **Fill carry direction** | Fill mode only works right-to-left (descending) | Always use DESC=1 with fill; set pointers to the end of the data |
| **No transparency levels** | Boolean operations only — 1-bit masking, no alpha | Dithering or multiple passes for graduated transparency |
| **Line mode limitations** | Lines drawn with SING=1 for fill prep are single-dot-per-row — visible gaps on steep angles | Use non-SING mode for visible lines, SING only for fill boundaries |
---
## Performance Analysis
### DMA Cycle Costs
The Blitter consumes DMA cycles proportional to the number of **active channels**. Each active channel adds 1 DMA cycle per word per row:
| Channels Active | Cycles/Word | Example Operation | Time for 320×256 (1 plane) |
|---|---|---|---|
| D only | 1 cycle | Screen clear | ~0.3 ms |
| A + D | 2 cycles | Simple copy (A→D) | ~0.6 ms |
| A + B + D | 3 cycles | Masked copy | ~0.9 ms |
| A + B + C + D | 4 cycles | Cookie-cut blit | ~1.3 ms |
> At 3.58 MHz DMA clock, 1 cycle ≈ 280 ns. A full 320×256×5-plane screen clear takes ~1.5 ms (D-only × 5 planes).
### CPU vs. Blitter Crossover
The Blitter is not always faster than the 68000:
| Operation Size | Winner | Why |
|---|---|---|
| < ~40 words | CPU (68000) | Blitter setup overhead (~20 cycles) exceeds the DMA savings |
| 40–200 words | Tie | Depends on whether CPU needs the bus |
| > 200 words | Blitter | DMA runs independently; CPU can compute in parallel |
| Any size (A1200) | **Measure** | 68020 can access 32-bit Fast RAM while Blitter uses Chip RAM bus — often faster to do both |
### Nasty Mode (`BLTPRI`)
Setting bit 10 of DMACON (`BLTPRI`) gives the Blitter absolute DMA priority — the CPU is **frozen** until the blit completes. This maximizes blitter throughput but:
- Disables all interrupt servicing during the blit
- Breaks timing-sensitive code (audio, serial)
- Most professional software avoids it; demos use it freely
---
## When to Use / When NOT to Use
### When to Use the Blitter
- **Screen clearing** — D-only blit at 1 cycle/word is unbeatable
- **BOB/sprite compositing** — cookie-cut blit is the standard technique for all Amiga game objects
- **Scrolling** — overlapping copy with correct ascending/descending mode
- **Polygon filling** — exclusive/inclusive fill after boundary line drawing
- **Large memory copies** — any block > ~40 words benefits from DMA parallelism
- **Line drawing** — hardware Bresenham is faster than any software implementation on 68000
- **Fast RAM operations** — the Blitter cannot access Fast RAM at all; use CPU
- **Pixel-level operations** — the Blitter works on word-aligned rectangles; per-pixel logic requires CPU
- **A1200/A4000 with Fast RAM** — the 68020/030 running from 32-bit Fast RAM can often outperform the Blitter on Chip RAM, especially if you can overlap CPU work with display DMA
### Applicability Ranges
- **BOBs**: Practical limit ~15–20 per frame at 320×256×5 planes before exhausting DMA bandwidth
- **Fill mode**: Works on single bitplanes only — filling a 5-plane display requires 5 passes
- **Line mode**: Maximum line length limited by BLTSIZE height field (1024 on OCS/ECS, 32768 on AGA)
---
## Historical Context — The 1985 Competitive Landscape
The Blitter was architecturally unprecedented in 1985. No competing home computer shipped with a comparable 2D rasterization coprocessor:
| Feature | Amiga (1985) | Atari ST (1985) | PC EGA (1984) | Mac 128K (1984) | C64 (1982) |
|---|---|---|---|---|---|
| **Hardware blitter** | Yes — 4-channel DMA with minterm logic | No (added in Mega ST/STE, 1987 — 1 source only) | No | No | No |
| **CPU relief** | Full DMA — CPU free during blit | Partial — CPU still involved (STE) | CPU does everything | CPU does everything | CPU does everything |
### Pros (in 1985 context)
- **Parallelism**: The 68000 could execute game logic while the Blitter handled all rendering — this was the Amiga's key advantage over every competitor
- **Generality**: 256 minterm combinations meant any Boolean compositing operation was a single register write, not a software loop
- **Integration**: Shared DMA bus with Copper and sprites meant the entire display pipeline was hardware-driven
- **Line + fill in hardware**: Enabled real-time filled polygon rendering (used in games like Carrier Command, Starglider 2) that was impossible on competing platforms
### Cons (in 1985 context)
- **Chip RAM only**: All blitter-visible data had to live in the first 512 KB (later 1–2 MB), competing with screen memory, audio, and disk buffers
- **Word alignment**: Sub-pixel positioning required shift + extra word width + masking — complex setup for simple operations
- **DMA contention**: Heavy blitter use starved the CPU of bus cycles even without nasty mode
---
## Modern Analogies
| Amiga Blitter Concept | Modern Equivalent | Notes |
|---|---|---|
| 4-channel minterm blit | GPU blend equations (Vulkan `VkBlendOp`) | The minterm is a fixed-function Boolean blend; modern GPUs use programmable shaders but the concept of combining sources through a logic function is identical |
| Cookie-cut (A·B + ¬A·C) | Alpha compositing / Porter-Duff `SrcOver` | The Amiga used 1-bit masks; modern systems use 8-bit alpha channels, but the compositing algebra is the same |
| DMA-driven blit | `vkCmdCopyImage` / `MTLBlitCommandEncoder` | Modern GPUs have dedicated DMA/copy engines that run asynchronously, exactly like the Blitter ran independently of the 68000 |
| OwnBlitter/DisownBlitter | Vulkan queue submission / Metal command buffer | Exclusive access to a shared hardware resource, then release — the synchronization pattern is identical |
| BLTPRI (nasty mode) | GPU preemption priority | Giving the transfer engine absolute bus priority at the cost of starving other consumers |
| Fill mode | GPU rasterizer fill | Hardware polygon fill is now done by the rasterizer stage; the Blitter's XOR-toggle fill was a clever 1985 approximation |
| BLTSIZE triggers blit | Command buffer submission | Writing the final register starts execution — analogous to `vkQueueSubmit` or `[commandBuffer commit]` |
| Barrel shift + word masks | Texture sampling with sub-texel offset | Achieving sub-pixel positioning through hardware shift and masking |
---
## Pitfalls & Common Mistakes
### Pitfall 1: "The Silent Corruption" — Fast RAM Pointers
```asm
; ✗ BAD: Buffer allocated in Fast RAM
move.l #FastRAMBuffer,$054(a5) ; BLTDPT points to Fast RAM
The Blitter's DMA engine is wired to the Chip RAM bus only. Fast RAM addresses silently alias to Chip RAM addresses or produce random data. **There is no error signal** — the blit completes "successfully" with corrupt output.
```asm
; ✓ GOOD: Buffer in Chip RAM
move.l #ChipRAMBuffer,$054(a5) ; Allocated with MEMF_CHIP
```
### Pitfall 2: "The Race Condition" — Missing WaitBlit
```asm
; ✗ BAD: Start a new blit without waiting for previous one
move.l #$09F00000,$040(a5) ; Overwrite BLTCON0 while previous blit runs!
move.l #NewSource,$050(a5) ; Corrupt the in-progress blit
move.w #$FFF0,$046(a5) ; BLTALWM masks off the rightmost 4 pixels
```
### Pitfall 5: "The Stale Pointer" — Reusing Registers After a Blit
After a blit completes, all pointer registers have advanced to the **end** of the data. A second blit with the same pointers starts where the first one left off — not at the original position.
```asm
; ✓ GOOD: Always reload all pointers before each blit
The Blitter is one of the most complex subsystems to reproduce accurately in an FPGA core:
- **DMA slot timing**: The Blitter shares DMA slots with bitplane, sprite, Copper, disk, and audio DMA. Incorrect slot allocation produces visible glitches in demos that count cycles
- **Barrel shifter pipeline**: The A and B channel shifts operate on a word pipeline — off-by-one in the shift register produces 1-pixel horizontal offset errors visible in scrolling
- **Fill mode carry propagation**: The fill carry bit (`FCI`) must propagate correctly from right to left within each word and across word boundaries; errors produce "zebra stripe" artifacts
- **Line mode octant handling**: The Bresenham algorithm implementation requires precise handling of 8 octants with correct sign and direction — many emulators get diagonal lines wrong in edge cases
- **BLTSIZE write-trigger**: The blit must start on the exact cycle that BLTSIZE is written, not one cycle later; demos that chain blits back-to-back depend on this timing
- **Nasty mode interaction**: `BLTPRI` must correctly freeze the CPU *and* still allow DMA from other sources (Copper, bitplanes) — freezing everything breaks display output
---
## Real-World Software Usage
| Software | Blitter Usage | Notes |
|---|---|---|
| **Deluxe Paint** | Brush compositing, flood fill, line tools | Canonical use of BltBitMap + BltMaskBitMapRastPort through the OS |
| **Shadow of the Beast** | Multi-layer parallax scrolling | Custom blitter routines for layer compositing, bypasses OS |
| **Carrier Command** | Filled polygon 3D rendering | Blitter line draw + fill mode for real-time vector graphics |
| **Lemmings** | Terrain destruction, character compositing | Cookie-cut blits for each lemming; XOR blits for terrain modification |
| **Workbench** | Window dragging, icon rendering, menu drawing | All through graphics.library — system-friendly blitter usage |