amiga-bootcamp/17_demoscene/3d_rendering.md
Ilia Sharin f8f8d1c834 docs(amiga): add Tier 4 content — AHI, cross-compilation, RTG, demoscene section
- New: 11_libraries/ahi_programming.md — AHI retargetable audio API
- New: 13_toolchain/cross_compilation_guide.md — cross-compiling for Amiga
- New: 08_graphics/rtg_programming.md — RTG Picasso96/CyberGraphX programming
- New: 17_demoscene/ — full demoscene techniques section:
  - copper_effects.md (6 techniques, 10 Pouet screenshots, antipatterns)
  - sprite_techniques.md (5 techniques, antipatterns)
  - pixel_tricks.md (5 techniques, antipatterns)
  - 3d_rendering.md (fixed-point math, 4 techniques, antipatterns)
  - timing_optimization.md (7 techniques, instruction timing tables)
  - README.md (section index with Mermaid diagrams)
  - images/ (10 authentic Amiga screenshots from Pouet.net)
- New: 05_reversing/games/ (4 copper-analysis screenshots)
- Updated: README index, TODO status (30/30 complete)
- Added external references: Pouet/Demozoo links, Scoopex YouTube
  tutorial series, Amiga Graphics Archive, coppershade.org
2026-05-13 17:49:28 -04:00

31 KiB
Raw Blame History

← Home · Demoscene Techniques

3D Rendering — Fixed-Point Math, Blitter Polygons, Rotozoom, Dot Tunnels, and Voxel Space

Overview

In 1985, the Amiga had no 3D hardware. No matrix engine, no floating-point unit, no texture mapper — just a 7 MHz integer-only 68000, a Blitter that could copy rectangles, and a Copper that could change display registers. Yet by 1990, demoscene coders were rendering real-time filled 3D objects, and by 1994 they were flying through voxel landscapes at playable framerates. The entire 3D pipeline — projection, clipping, rasterization, fill — was built from scratch in hand-tuned 68000 assembly using fixed-point arithmetic.

This article covers the demoscene 3D rendering techniques that made it possible: fixed-point math, Blitter-filled polygons, texture-mapped rotozoom, dot tunnels, and voxel space rendering. Each technique maps to a specific hardware capability — and the demoscene's creative abuse of it.

graph TB
    subgraph "Math Foundation"
        FP["Fixed-Point<br/>16.16 arithmetic"]
        MATRIX["Matrix Ops<br/>Rotation × projection"]
        SINCOS["Sine/Cosine<br/>Table lookup"]
    end
    subgraph "Rendering"
        FILL["Filled Polygons<br/>Blitter line + fill"]
        TEXTURE["Rotozoom<br/>Affine texture map"]
        DOT["Dot Tunnel<br/>Z-ordered circles"]
        VOXEL["Voxel Space<br/>Raycast heightmap"]
    end
    subgraph "Optimization"
        C2P["Chunky-to-Planar<br/>Kalms/Blitter C2P"]
        DIV["Fast Division<br/>Reciprocal table"]
        CLIP["Screen-Space Clip<br/>Cohen-Sutherland"]
    end

    FP --> FILL
    FP --> TEXTURE
    MATRIX --> FILL
    SINCOS --> MATRIX
    FILL --> DOT
    TEXTURE --> VOXEL
    C2P --> TEXTURE
    DIV --> VOXEL

Foundation: Fixed-Point Arithmetic

The 68000 has no floating-point unit. All 3D math must use integers. The solution is fixed-point — encoding fractional values as integers with an implicit decimal point.

16.16 Fixed-Point Format

┌──────────────────────────────────┬──────────────────────────────────┐
│  Upper 16 bits: integer part     │  Lower 16 bits: fractional part  │
│  (signed, two's complement)      │  (unsigned, represents 0 to ~1)  │
└──────────────────────────────────┴──────────────────────────────────┘

Example: 1.5 = $00018000
         0.25 = $00004000
        -1.0  = $FFFF0000
         π    = $0003243F  (3.14159...)

Fixed-Point Operations

/* fixedpoint.h — 16.16 fixed-point arithmetic for 68000 */

typedef LONG FIXED;   /* 32-bit signed: 16.16 format */

#define INT_TO_FIXED(x)   ((FIXED)((x) << 16))
#define FIXED_TO_INT(x)   ((WORD)((x) >> 16))     /* Truncate */
#define FIXED_TO_INT_R(x) (WORD)(((x) + 0x8000) >> 16)  /* Round */
#define FLOAT_TO_FIXED(f) ((FIXED)((f) * 65536.0))

/* Multiply: result = a × b / 65536
   On 68000, use 32×32→64 MULS.L then shift right 16 */
static inline FIXED fixed_mul(FIXED a, FIXED b) {
    /* 68000 asm:
       move.l  a, d0
       muls.l  b, d0:d1     ; d0:d1 = 64-bit result
       swap    d0            ; d0 = upper 32 bits (already >> 16)
       ; d0 contains the result
    */
    return (FIXED)(((LONG)a * (LONG)b) >> 16);
}

/* Divide: result = a × 65536 / b
   Must multiply first to avoid losing precision */
static inline FIXED fixed_div(FIXED a, FIXED b) {
    /* 68000: be careful with overflow!
       Use reciprocal table for perspective division */
    return (FIXED)(((LONG)a << 16) / (LONG)b);
}

Sine/Cosine Tables

Pre-calculated lookup tables are essential — computing sin() at runtime is too slow:

/* trig_tables.c — Pre-calculated 16.16 sine/cosine tables */
/* 1024 entries covering 0-2π, index = angle × 1024 / (2π) */

#define TRIG_TABLE_SIZE 1024
#define ANGLE_2PI      1024    /* Full circle = 1024 units */

/* 16.16 fixed-point: sin values range from -1.0 ($FFFF0000) to 1.0 ($00010000) */
static const FIXED sin_table[TRIG_TABLE_SIZE];  /* Generated at build time */

/* Fast lookup with wrapping */
static inline FIXED fast_sin(int angle) {
    return sin_table[angle & (TRIG_TABLE_SIZE - 1)];
}

static inline FIXED fast_cos(int angle) {
    return sin_table[(angle + 256) & (TRIG_TABLE_SIZE - 1)];  /* cos = sin(x+π/2) */
}

Foundation: Matrix Operations

3D rotation uses 3×3 matrices multiplied with vertex coordinates. Each rotation (X, Y, Z axis) is a matrix multiply:

Rotation Matrix Construction

/* matrix3d.c — 3D rotation matrices using fixed-point */

typedef struct {
    FIXED m[3][3];  /* 3×3 rotation matrix */
} Matrix3D;

/* Build rotation matrix from Euler angles */
void build_rotation_matrix(Matrix3D *mat, int ax, int ay, int az) {
    FIXED sx = fast_sin(ax), cx = fast_cos(ax);
    FIXED sy = fast_sin(ay), cy = fast_cos(ay);
    FIXED sz = fast_sin(az), cz = fast_cos(az);

    /* Combined Z×Y×X rotation (standard demoscene order) */
    mat->m[0][0] = fixed_mul(cy, cz);
    mat->m[0][1] = fixed_mul(fixed_mul(sx, sy), cz) - fixed_mul(cx, sz);
    mat->m[0][2] = fixed_mul(fixed_mul(cx, sy), cz) + fixed_mul(sx, sz);
    mat->m[1][0] = fixed_mul(cy, sz);
    mat->m[1][1] = fixed_mul(fixed_mul(sx, sy), sz) + fixed_mul(cx, cz);
    mat->m[1][2] = fixed_mul(fixed_mul(cx, sy), sz) - fixed_mul(sx, cz);
    mat->m[2][0] = -sy;
    mat->m[2][1] = fixed_mul(sx, cy);
    mat->m[2][2] = fixed_mul(cx, cy);
}

/* Transform a vertex: result = matrix × vertex */
void transform_vertex(const Matrix3D *mat, FIXED vx, FIXED vy, FIXED vz,
                      FIXED *rx, FIXED *ry, FIXED *rz) {
    *rx = fixed_mul(mat->m[0][0], vx) +
          fixed_mul(mat->m[0][1], vy) +
          fixed_mul(mat->m[0][2], vz);
    *ry = fixed_mul(mat->m[1][0], vx) +
          fixed_mul(mat->m[1][1], vy) +
          fixed_mul(mat->m[1][2], vz);
    *rz = fixed_mul(mat->m[2][0], vx) +
          fixed_mul(mat->m[2][1], vy) +
          fixed_mul(mat->m[2][2], vz);
}

Perspective Projection

/* project.c — Perspective projection to screen coordinates */

#define SCREEN_CX  160   /* Center X (320 wide) */
#define SCREEN_CY  128   /* Center Y (256 tall) */
#define FOCAL_LEN  256   /* Focal length in fixed-point */

void project_vertex(FIXED x, FIXED y, FIXED z,
                    WORD *sx, WORD *sy) {
    /* Perspective divide: screen = world × focal / z */
    if (z > INT_TO_FIXED(1)) {  /* Avoid division by zero */
        FIXED scale = fixed_div(INT_TO_FIXED(FOCAL_LEN), z);
        *sx = SCREEN_CX + FIXED_TO_INT(fixed_mul(x, scale));
        *sy = SCREEN_CY - FIXED_TO_INT(fixed_mul(y, scale));  /* Y flipped */
    } else {
        *sx = SCREEN_CX;  /* Behind camera */
        *sy = SCREEN_CY;
    }
}

Technique 1: Blitter-Filled Polygons

The Blitter's line-draw + fill mode combination is the foundation of Amiga 3D rendering. The process:

  1. Draw polygon edges using Blitter line mode (sets pixels at boundaries)
  2. Activate Blitter fill mode (fills between set pixels, even→odd fill rule)
  3. Result: a filled polygon with zero CPU pixel writing
sequenceDiagram
    participant CPU as 68000 CPU
    participant Blitter as Blitter
    participant Bitmap as Bitplane Memory

    Note over CPU: Polygon: 4 vertices

    CPU->>Blitter: Configure line mode (BLTCON0=$0B7A)
    CPU->>Blitter: Draw edge V1→V2
    Blitter->>Bitmap: Write edge pixels
    CPU->>Blitter: Draw edge V2→V3
    Blitter->>Bitmap: Write edge pixels
    CPU->>Blitter: Draw edge V3→V4
    Blitter->>Bitmap: Write edge pixels
    CPU->>Blitter: Draw edge V4→V1
    Blitter->>Bitmap: Write edge pixels

    CPU->>Blitter: Configure fill mode (BLTCON0=$01F2)
    CPU->>Blitter: Fill bitmap (even-odd rule)
    Blitter->>Bitmap: Fill interior pixels

    Note over Bitmap: Filled polygon ready

Blitter Fill Mode Detail

The Blitter fill uses the inclusive-odd fill rule: scanning left to right, it inverts pixels each time it encounters a set bit. This means it fills between pairs of edge pixels:

; blit_fill.asm — Blitter fill for a single bitplane
; Assumes edges already drawn in the bitmap

        ; Fill from top to bottom of polygon
        lea     $DFF000,a0             ; Custom registers base

        ; Set up fill-mode blit
        move.w  #$01F2,BLTCON0(a0)     ; Fill mode: A→D, fill enabled
        move.w  #$0000,BLTCON1(a0)     ; No line mode, ascending
        move.w  #$FFFF,BLTAFWM(a0)     ; First word mask = all bits
        move.w  #$FFFF,BLTALWM(a0)     ; Last word mask = all bits

        ; Source A = the bitmap data (edge pixels)
        move.l  bitmap_start,BLTAPTH(a0) ; Source address

        ; Destination = same bitmap (fill in-place)
        move.l  bitmap_start,BLTDPTH(a0) ; Dest = source

        ; Blit size: height × width
        ; height = number of scanlines, width = words per line
        move.w  #(HEIGHT<<6)|(WORDS_PER_LINE),BLTSIZE(a0)
        ; Blitter starts immediately

Multiple-Face Sorting (Painter's Algorithm)

For solid 3D objects, faces must be drawn back-to-front (painter's algorithm):

/* face_sort.c — Sort polygon faces by depth for painter's algorithm */

typedef struct {
    WORD   num_vertices;
    WORD   *vertices;    /* Index into vertex array */
    FIXED  avg_z;        /* Average Z depth (for sorting) */
    UWORD  color;        /* Face color */
} Face;

int compare_faces(const void *a, const void *b) {
    FIXED za = ((const Face *)a)->avg_z;
    FIXED zb = ((const Face *)b)->avg_z;
    /* Sort far-to-near (painter's algorithm) */
    if (za > zb) return -1;  /* a is farther, draw first */
    if (za < zb) return  1;
    return 0;
}

void render_object(Face *faces, int num_faces,
                   FIXED *transformed_z) {
    int i;

    /* Calculate average Z for each face */
    for (i = 0; i < num_faces; i++) {
        FIXED sum = 0;
        int j;
        for (j = 0; j < faces[i].num_vertices; j++) {
            sum += transformed_z[faces[i].vertices[j]];
        }
        faces[i].avg_z = sum / faces[i].num_vertices;
    }

    /* Sort back-to-front */
    qsort(faces, num_faces, sizeof(Face), compare_faces);

    /* Draw each face */
    for (i = 0; i < num_faces; i++) {
        draw_filled_polygon(&faces[i]);
    }
}

Technique 2: Rotozoom (Affine Texture Mapping)

Rotozoom renders a texture-mapped rectangle that can rotate and scale in real-time. The name comes from rotate + zoom. It works by computing a texture coordinate (U,V) for each screen pixel using an affine transform.

Algorithm

For each screen pixel (x, y), compute texture coordinates:

U = U_start + x × dU_dx + y × dU_dy
V = V_start + x × dV_dx + y × dV_dy

Where dU_dx, dV_dx, dU_dy, dV_dy are the rotation+scale matrix coefficients.

/* rotozoom.c — Affine texture mapping (rotzoom) */

#define SCREEN_W 320
#define SCREEN_H 256
#define TEX_SIZE 256  /* Texture is 256×256 */

extern UBYTE texture[TEX_SIZE][TEX_SIZE];  /* Chunky texture */
extern UBYTE *chunky_buffer;               /* Output chunky buffer */

void render_rotozoom(FIXED cx, FIXED cy,    /* Texture center offset */
                     FIXED angle, FIXED zoom) {
    FIXED cos_a = fast_cos(angle);
    FIXED sin_a = fast_sin(angle);
    FIXED inv_zoom = fixed_div(INT_TO_FIXED(1), zoom);

    /* Rotation × inverse zoom matrix coefficients */
    FIXED du_dx = fixed_mul(cos_a, inv_zoom);
    FIXED dv_dx = fixed_mul(sin_a, inv_zoom);
    FIXED du_dy = fixed_mul(-sin_a, inv_zoom);
    FIXED dv_dy = fixed_mul(cos_a, inv_zoom);

    /* Start position: center of screen maps to texture center */
    FIXED u_start = cx - fixed_mul(INT_TO_FIXED(SCREEN_W/2), du_dx)
                          - fixed_mul(INT_TO_FIXED(SCREEN_H/2), du_dy);
    FIXED v_start = cy - fixed_mul(INT_TO_FIXED(SCREEN_W/2), dv_dx)
                          - fixed_mul(INT_TO_FIXED(SCREEN_H/2), dv_dy);

    UBYTE *dst = chunky_buffer;
    int y;

    for (y = 0; y < SCREEN_H; y++) {
        FIXED u = u_start;
        FIXED v = v_start;
        int x;

        for (x = 0; x < SCREEN_W; x++) {
            /* Texture lookup with wrapping */
            *dst++ = texture[FIXED_TO_INT(u) & 0xFF]
                            [FIXED_TO_INT(v) & 0xFF];

            u += du_dx;
            v += dv_dx;
        }

        u_start += du_dy;
        v_start += dv_dy;
    }

    /* Convert chunky buffer to planar bitplanes (C2P) */
    chunky_to_planar(chunky_buffer, bitplane_data, SCREEN_W, SCREEN_H);
}

Rotozoom in Assembly (Inner Loop)

The 68000 assembly inner loop is highly optimized. The key insight: texture coordinates wrap at power-of-2 boundaries, so masking with $FF (256-wide texture) is free using byte-level addressing:

; rotozoom_inner.asm — Optimized inner loop
; a0 = destination (chunky buffer)
; a1 = texture base (256×256)
; d0 = U (16.16 fixed)
; d1 = V (16.16 fixed)
; d2 = dU/dx (16.16 fixed)
; d3 = dV/dx (16.16 fixed)
; d4 = loop counter (SCREEN_W)

.roto_inner:
        move.l  d0,d5           ; Copy U
        swap    d5              ; d5.w = integer part of U
        move.l  d1,d6           ; Copy V
        swap    d6              ; d6.w = integer part of V

        ; Texture lookup: tex[v & 0xFF][u & 0xFF]
        and.w   #$FF,d5         ; U mask (free wrap)
        and.w   #$FF,d6         ; V mask (free wrap)
        lsl.w   #8,d6           ; V × 256 (row offset)
        move.b  (a1,d5.w),d7    ; Texel = tex[v*256+u]
        move.b  d7,(a0)+        ; Write to chunky buffer

        add.l   d2,d0           ; U += dU/dx
        add.l   d3,d1           ; V += dV/dx
        dbra    d4,.roto_inner

Technique 3: Dot Tunnel

The dot tunnel renders concentric rings that appear to fly toward the viewer, creating the illusion of traveling through a tunnel. Each ring is a circle rendered at a specific Z-depth.

Algorithm

/* dot_tunnel.c — Z-ordered ring tunnel effect */

#define NUM_RINGS   30
#define MAX_Z       1024
#define RING_POINTS 32

typedef struct {
    FIXED z;         /* Depth (0=near, far=background) */
    FIXED radius;    /* Apparent radius (decreases with z) */
    WORD  cx, cy;    /* Center (can be animated) */
    UWORD color;     /* Ring color */
} Ring;

void render_dot_tunnel(Ring *rings, int num_rings, ULONG frame) {
    int i;

    /* Update ring positions (move toward viewer) */
    for (i = 0; i < num_rings; i++) {
        rings[i].z -= INT_TO_FIXED(4);  /* Speed toward viewer */

        /* If ring passed camera, reset to far end */
        if (rings[i].z < INT_TO_FIXED(1)) {
            rings[i].z = INT_TO_FIXED(MAX_Z);
        }

        /* Perspective projection of radius */
        rings[i].radius = fixed_div(
            INT_TO_FIXED(200),     /* Base radius */
            rings[i].z             /* Divide by depth */
        );
    }

    /* Sort rings far-to-near (painter's algorithm) */
    /* ... sort by rings[i].z descending ... */

    /* Draw each ring */
    for (i = 0; i < num_rings; i++) {
        int p;
        int radius = FIXED_TO_INT(rings[i].radius);
        int cx = rings[i].cx + FIXED_TO_INT(
            fixed_mul(fast_sin(frame * 3), INT_TO_FIXED(30)));
        int cy = rings[i].cy + FIXED_TO_INT(
            fixed_mul(fast_cos(frame * 5), INT_TO_FIXED(20)));

        for (p = 0; p < RING_POINTS; p++) {
            int angle = p * 360 / RING_POINTS;
            FIXED sa = fast_sin(angle * TRIG_TABLE_SIZE / 360);
            FIXED ca = fast_cos(angle * TRIG_TABLE_SIZE / 360);

            WORD px = cx + FIXED_TO_INT(fixed_mul(INT_TO_FIXED(radius), sa));
            WORD py = cy + FIXED_TO_INT(fixed_mul(INT_TO_FIXED(radius), ca));

            /* Plot pixel or draw Blitter circle at (px, py) */
            plot_dot(px, py, rings[i].color);
        }
    }
}

Technique 4: Voxel Space

Voxel space renders a 3D landscape from a 2D heightmap and colormap. The algorithm casts rays from the viewer, one per screen column, and draws vertical strips of pixels. The result is a fly-over landscape effect, as seen in the 1994 demo "Space Rangers" by Rebels.

Algorithm (Column-Based Raycasting)

sequenceDiagram
    participant CPU as 68000 CPU
    participant HM as Heightmap
    participant CM as Colormap
    participant Screen as Screen Buffer

    Note over CPU: For each screen column x (0-319):

    CPU->>CPU: Calculate ray direction for column x
    CPU->>HM: Sample height at (ray_x, ray_z)
    HM-->>CPU: height value h
    CPU->>CPU: Project h to screen Y: y = horizon - h/z
    CPU->>CM: Get color at (ray_x, ray_z)
    CM-->>CPU: color value
    CPU->>Screen: Draw vertical line from y to previous y in color

    Note over CPU: Advance ray (step outward)
    CPU->>HM: Sample next point...

Voxel Space Implementation

/* voxelspace.c — Column-based voxel landscape rendering */

#define SCREEN_W    320
#define SCREEN_H    256
#define MAP_SIZE    256
#define HORIZON     100   /* Horizon line Y position */
#define MAX_DEPTH   200   /* Maximum ray distance */

extern UBYTE heightmap[MAP_SIZE][MAP_SIZE];
extern UBYTE colormap[MAP_SIZE][MAP_SIZE];
extern UBYTE *chunky_buffer;

void render_voxel_space(FIXED cam_x, FIXED cam_z,
                        FIXED cam_angle, FIXED cam_height) {
    int x;

    for (x = 0; x < SCREEN_W; x++) {
        /* Ray angle: camera angle + column offset */
        FIXED column_offset = INT_TO_FIXED(x - SCREEN_W/2);
        FIXED ray_angle = cam_angle + fixed_div(column_offset,
                                                 INT_TO_FIXED(FOCAL_LEN));

        FIXED ray_dx = fast_cos(ray_angle);  /* Direction X */
        FIXED ray_dz = fast_sin(ray_angle);  /* Direction Z */

        FIXED ray_x = cam_x;
        FIXED ray_z = cam_z;

        WORD prev_draw_y = SCREEN_H;  /* Bottom of column */
        int distance;

        for (distance = 1; distance < MAX_DEPTH; distance++) {
            FIXED dz = fixed_div(INT_TO_FIXED(distance), ray_dz);
            FIXED dx = fixed_div(INT_TO_FIXED(distance), ray_dx);

            /* Current map position */
            int mx = (FIXED_TO_INT(cam_x + dx)) & (MAP_SIZE - 1);
            int mz = (FIXED_TO_INT(cam_z + dz)) & (MAP_SIZE - 1);

            /* Height at this point */
            FIXED terrain_h = INT_TO_FIXED(heightmap[mz][mx]);

            /* Project to screen Y */
            FIXED height_diff = terrain_h - cam_height;
            WORD draw_y = HORIZON -
                FIXED_TO_INT(fixed_div(height_diff,
                    INT_TO_FIXED(distance)));

            /* Only draw if above previously drawn pixel */
            if (draw_y < prev_draw_y) {
                UBYTE color = colormap[mz][mx];
                int y;

                for (y = draw_y; y < prev_draw_y; y++) {
                    chunky_buffer[y * SCREEN_W + x] = color;
                }
                prev_draw_y = draw_y;
            }

            /* Step ray outward */
            ray_x += ray_dx;
            ray_z += ray_dz;
        }
    }

    /* C2P conversion for planar display */
    chunky_to_planar(chunky_buffer, bitplane_data, SCREEN_W, SCREEN_H);
}

Voxel Space Optimization

The naive algorithm is too slow for 50 FPS on a 68000. Key optimizations:

Optimization Speedup How
Reciprocal table 2× Pre-compute 1/z values, avoid division
Step doubling 3-4× Double step size beyond certain depth (less detail needed)
Height caching 1.5× Cache last N height lookups
Reduced resolution 2-4× Render at 160×128 and scale up (acceptable for landscape)
Fast C2P 30× Use Kalms C2P instead of naive conversion

Performance Budget

3D Rendering Costs on Stock A500 (7 MHz 68000)

Operation Cycles (approx.) Notes
Fixed-point multiply ~28 MULS.W (16×16→32)
Fixed-point divide ~140 DIVS.W — very expensive!
Sine table lookup ~12 Table indexed by angle
Vertex transform ~300 3 multiplies + 3 adds per axis
Perspective divide ~160 2 divides per vertex
Blitter line draw ~200/edge DMA time for edge
Blitter fill ~2000/polygon Depends on polygon size
Full C2P (Kalms) ~35ms 320×256 × 8bpp
Voxel column ~500/col Heightmap lookup + draw

Frame Budget (PAL: 20ms per frame)

Effect Vertices Time FPS
Single flat-shaded cube 8 ~3ms 50
100-face object 30+ ~12ms 30-50
Rotozoom 320×256 0 (per pixel) ~40ms (with C2P) 15-25
Dot tunnel 30 rings 960 dots ~8ms 50
Voxel space 320×256 64K cols ~80ms 6-12

Antipatterns

1. The Floating-Point Temptation

Using floating-point math on the 68000. The 68881 FPU is optional — most Amigas don't have one. Software floating-point emulation is 100× slower than fixed-point.

Broken:

/* Don't do this — requires FPU or slow software emulation */
float x = sin(angle) * distance;
float y = cos(angle) * distance;

Fixed:

/* Use fixed-point with pre-calculated tables */
FIXED x = fixed_mul(fast_sin(angle), distance);
FIXED y = fixed_mul(fast_cos(angle), distance);

2. The Per-Pixel Divide

Calling fixed_div() for every pixel in a rotozoom or voxel renderer. Division is the most expensive operation on the 68000 (~140 cycles for 16-bit, ~280 for 32-bit).

Broken:

for (x = 0; x < 320; x++) {
    for (y = 0; y < 256; y++) {
        FIXED u = fixed_div(x, z);  /* DIVIDE PER PIXEL! */
        FIXED v = fixed_div(y, z);
    }
}

Fixed:

/* Pre-compute step values (multiply instead of divide) */
FIXED du_dx = fixed_mul(scale, inv_z);  /* One divide per frame */
FIXED dv_dy = fixed_mul(scale, inv_z);

for (y = 0; y < 256; y++) {
    FIXED u = u_start;
    for (x = 0; x < 320; x++) {
        u += du_dx;  /* ADD, not multiply/divide */
    }
    u_start += du_dy;
}

3. The Backface Cull Miss

Skipping backface culling for convex objects. Every polygon drawn behind other polygons is wasted Blitter time. A simple dot-product test rejects ~50% of faces.

Broken:

/* Draw all faces — 50% are facing away! */
for (i = 0; i < num_faces; i++) {
    draw_filled_polygon(&faces[i]);  /* Wastes time on hidden faces */
}

Fixed:

for (i = 0; i < num_faces; i++) {
    /* Backface cull: if face normal points away, skip it */
    FIXED nx = compute_normal_x(&faces[i]);
    FIXED nz = compute_normal_z(&faces[i]);
    if (nz < 0) continue;  /* Facing away from camera */

    draw_filled_polygon(&faces[i]);
}

4. The Unsorted Z-Fight

Drawing faces in random order without depth sorting. Overlapping polygons flicker as they overwrite each other unpredictably each frame.

Broken:

/* Draw faces in arbitrary order → z-fighting */
for (i = 0; i < num_faces; i++) {
    draw_filled_polygon(&faces[i]);
}

Fixed:

/* Sort by average Z depth (painter's algorithm) */
qsort(faces, num_faces, sizeof(Face), compare_faces_back_to_front);
for (i = 0; i < num_faces; i++) {
    draw_filled_polygon(&faces[i]);
}

5. The Naive C2P

Using a naive chunky-to-planar conversion for rotozoom/voxel output. The naive method processes each pixel individually with bit shifts, taking over 1 second per frame on a stock 68000.

Broken:

/* Naive C2P: ~70,000 pixels/sec — 0.9 FPS for 320×256 */
for (i = 0; i < 320*256; i++) {
    int pixel = chunky[i];
    for (bit = 0; bit < 8; bit++) {
        planes[bit][i/8] |= ((pixel >> bit) & 1) << (7 - (i & 7));
    }
}

Fixed:

/* Use Kalms C2P or Blitter-assisted C2P: ~30× faster */
kalms_c2p(chunky_buffer, planar_data, 320, 256);
/* See pixel_conversion.md for full implementation */

Decision Guide

flowchart TD
    START[Need 3D rendering] --> Q1{What are you rendering?}
    Q1 -->|Solid objects| Q2{Convex or arbitrary?}
    Q1 -->|Landscapes| VOXEL[Voxel Space]
    Q1 -->|Textures| ROTZ[Rotozoom]
    Q1 -->|Abstract/Tunnel| DOT[Dot Tunnel]

    Q2 -->|Convex| Q3{Number of faces?}
    Q2 -->|Arbitrary| BSP[BSP-tree or Z-buffer approach]

    Q3 -->|<100| SIMPLE[Simple painter's algorithm<br/>+ backface cull]
    Q3 -->|100-500| SORT[Sorted faces + Blitter fill]
    Q3 -->|>500| Q4{Stock A500 or accelerated?}

    Q4 -->|Stock A500| REDUCE[Reduce geometry or<br/>use wireframe only]
    Q4 -->|Accelerated| OPT[Optimized fill + C2P pipeline]

    VOXEL --> Q5{Resolution?}
    Q5 -->|Full 320×256| SLOW[~8 FPS on stock A500]
    Q5 -->|Half 160×128| OK[~15-25 FPS on A500]

    ROTZ --> Q6{Platform?}
    Q6 -->|OCS/ECS| C2P_PATH[C2P required<br/>~15-25 FPS]
    Q6 -->|RTG/AGA| CHUNKY[Direct chunky write<br/>~50 FPS possible]

Historical Timeline

timeline
    title 3D Rendering on Amiga
    1987 : First wireframe 3D demos (line drawing only)
    1988 : Hidden line removal algorithms
         : Flat-shaded 3D in demoscene
    1989 : Phenomena — filled polygon 3D objects
         : Fixed-point math becomes standard
    1990 : Complex demo — multiple filled objects
         : First rotozoom effects appear
    1991 : Texture-mapped rotozoom at acceptable framerates
         : 3D starfields with depth sorting
    1992 : Voxel space demos appear (low resolution)
         : Dot tunnels with Blitter-optimized rendering
    1993 : Spaceballs — state-of-the-art dot tunnel
         : Higher-resolution voxel landscapes
    1994 : Rebels — smooth voxel space fly-over
         : 68040-accelerated 3D at full frame rate
    1995 : Polka Brothers — CPU raytracer (proof of concept)
         : 68060-accelerated demos with lighting
    2000+ : Demoscene pushes 3D on stock A500
          : Group-optimized rotozoom achieves new speed records
          : MiSTer preserves exact Blitter timing for fill accuracy

Modern Analogies

Amiga 3D Concept Modern Equivalent Why It Maps
Fixed-point 16.16 Half-precision float (FP16) Both trade precision for speed
Sine lookup table GPU SFU (Special Function Unit) Both use hardware-assisted transcendental
Blitter fill mode GPU rasterizer Both fill polygon interiors
Painter's algorithm Z-buffer / depth test Both solve polygon visibility
Backface culling GPU backface culling Both skip invisible faces
Rotozoom Affine texture sampling Both use 2×2 matrix transform per pixel
Voxel space raycasting Heightfield terrain shader Both cast rays through a heightmap
C2P conversion Texture swizzle/deswizzle Both convert between memory layouts
Reciprocal table GPU reciprocal approximation Both avoid expensive division
Chunky buffer Render-to-texture (FBO) Both render to off-screen buffer

Use Cases

Use Case Technique Notable Examples
3D game objects Filled polygons Flight simulators, Elite clones
Rotating logo Rotozoom Every demo with a bitmap logo
Tunnel fly-through Dot tunnel Spaceballs, numerous demos
Landscape fly-over Voxel space Rebels, numerous demos
3D chess/board games Filled polygons + sorting Various Amiga games
Virtual reality scenes Combined techniques Various demo compos
Star field Z-ordered point rendering Standard demo effect
Wavy floor/ceiling Rotozoom variant Doom-like perspective tricks

FPGA / Emulation Impact

Concern Impact Notes
Blitter fill timing Fill must use exact inclusive-odd rule Emulators must match Blitter fill behavior precisely
Line-draw accuracy Blitter Bresenham must match real hardware Affects polygon edge positions
C2P pipeline Chunky→Planar timing affects frame rate Must be accounted for in demo timing
Fixed-point overflow 68000 MULS.L/DIVS.L edge cases 32-bit overflow behavior must match hardware
Blitter-CPU interleaving BLTPRI affects CPU stall duration Must match real Blitter busy-wait timing

FAQ

Q: Why not use the FPU for 3D math? A: The 68881/68882 FPU is optional hardware that most Amiga models don't have. Software FPU emulation is 50-100× slower than fixed-point integer math. Only 68030/040/060 accelerated Amigas typically have an FPU, and even then, fixed-point is faster for many operations because the 68000's integer multiply is well-optimized.

Q: What is the fastest C2P for rotozoom? A: The Kalms C2P is the standard. For AGA machines with 32-bit Blitter access, a Blitter-assisted C2P can be even faster. For RTG cards, C2P is unnecessary — write directly to chunky VRAM. See Pixel Conversion for benchmarks.

Q: How do I handle polygon clipping? A: For simple 3D objects, screen-space clipping (Cohen-Sutherland or Sutherland-Hodgman) is sufficient. Clip against the four screen edges. For objects that can go behind the camera, you need near-plane clipping in 3D space — this is much more complex and most demos avoid it by keeping objects in front of the camera.

Q: Can I do texture-mapped polygons (not just rotozoom)? A: Yes, but affine texture mapping (per-polygon UV) produces visible distortion on large polygons. Correct perspective texture mapping requires per-pixel division, which is too slow on a 68000. Most demos use subdivision (split large polygons into smaller ones) or simply use rotozoom for the entire screen.

Q: What is a dot matrix / voxel display? A: A voxel (volume pixel) display renders 3D data as a grid of points. On the Amiga, this typically means rendering heightmap terrain as vertical columns (voxel space) or rendering 3D point clouds. The Blitter's line-draw mode can efficiently render individual dots.


References

External Resources

  • Amiga Hardware Reference Manual — Blitter fill mode, line-draw mode
  • Scoopex Amiga Hardware Programming (Photon) — YouTube playlist — Blitter fill mode and line-draw video walkthroughs; companion articles at coppershade.org
  • Pouet.nethttps://www.pouet.net — 3D demo releases with source code
  • Demozoohttps://demozoo.org — Demoscene production encyclopedia
  • Amiga Graphics Archivehttps://amiga.lychesis.net — Copper-enhanced 3D rendering analysis in commercial games
  • Kalms C2P — Standard chunky-to-planar implementation
  • Comanche Voxel Engine — Original voxel space algorithm reference (NovaLogic)