mirror of
https://github.com/alfishe/amiga-bootcamp.git
synced 2026-06-12 16:16:28 +00:00
875 lines
31 KiB
Markdown
875 lines
31 KiB
Markdown
[← Home](../README.md) · [Demoscene Techniques](README.md)
|
||
|
||
# 3D Rendering — Fixed-Point Math, Blitter Polygons, Rotozoom, Dot Tunnels, and Voxel Space
|
||
|
||
## Overview
|
||
|
||
In 1985, the Amiga had no 3D hardware. No matrix engine, no floating-point unit, no texture mapper — just a 7 MHz integer-only 68000, a Blitter that could copy rectangles, and a Copper that could change display registers. Yet by 1990, demoscene coders were rendering real-time filled 3D objects, and by 1994 they were flying through voxel landscapes at playable framerates. The entire 3D pipeline — projection, clipping, rasterization, fill — was built from scratch in hand-tuned 68000 assembly using fixed-point arithmetic.
|
||
|
||
This article covers the demoscene 3D rendering techniques that made it possible: fixed-point math, Blitter-filled polygons, texture-mapped rotozoom, dot tunnels, and voxel space rendering. Each technique maps to a specific hardware capability — and the demoscene's creative abuse of it.
|
||
|
||
```mermaid
|
||
graph TB
|
||
subgraph "Math Foundation"
|
||
FP["Fixed-Point<br/>16.16 arithmetic"]
|
||
MATRIX["Matrix Ops<br/>Rotation × projection"]
|
||
SINCOS["Sine/Cosine<br/>Table lookup"]
|
||
end
|
||
subgraph "Rendering"
|
||
FILL["Filled Polygons<br/>Blitter line + fill"]
|
||
TEXTURE["Rotozoom<br/>Affine texture map"]
|
||
DOT["Dot Tunnel<br/>Z-ordered circles"]
|
||
VOXEL["Voxel Space<br/>Raycast heightmap"]
|
||
end
|
||
subgraph "Optimization"
|
||
C2P["Chunky-to-Planar<br/>Kalms/Blitter C2P"]
|
||
DIV["Fast Division<br/>Reciprocal table"]
|
||
CLIP["Screen-Space Clip<br/>Cohen-Sutherland"]
|
||
end
|
||
|
||
FP --> FILL
|
||
FP --> TEXTURE
|
||
MATRIX --> FILL
|
||
SINCOS --> MATRIX
|
||
FILL --> DOT
|
||
TEXTURE --> VOXEL
|
||
C2P --> TEXTURE
|
||
DIV --> VOXEL
|
||
```
|
||
|
||
---
|
||
|
||
## Foundation: Fixed-Point Arithmetic
|
||
|
||
The 68000 has no floating-point unit. All 3D math must use integers. The solution is **fixed-point** — encoding fractional values as integers with an implicit decimal point.
|
||
|
||
### 16.16 Fixed-Point Format
|
||
|
||
```
|
||
┌──────────────────────────────────┬──────────────────────────────────┐
|
||
│ Upper 16 bits: integer part │ Lower 16 bits: fractional part │
|
||
│ (signed, two's complement) │ (unsigned, represents 0 to ~1) │
|
||
└──────────────────────────────────┴──────────────────────────────────┘
|
||
|
||
Example: 1.5 = $00018000
|
||
0.25 = $00004000
|
||
-1.0 = $FFFF0000
|
||
π = $0003243F (3.14159...)
|
||
```
|
||
|
||
### Fixed-Point Operations
|
||
|
||
```c
|
||
/* fixedpoint.h — 16.16 fixed-point arithmetic for 68000 */
|
||
|
||
typedef LONG FIXED; /* 32-bit signed: 16.16 format */
|
||
|
||
#define INT_TO_FIXED(x) ((FIXED)((x) << 16))
|
||
#define FIXED_TO_INT(x) ((WORD)((x) >> 16)) /* Truncate */
|
||
#define FIXED_TO_INT_R(x) (WORD)(((x) + 0x8000) >> 16) /* Round */
|
||
#define FLOAT_TO_FIXED(f) ((FIXED)((f) * 65536.0))
|
||
|
||
/* Multiply: result = a × b / 65536
|
||
On 68000, use 32×32→64 MULS.L then shift right 16 */
|
||
static inline FIXED fixed_mul(FIXED a, FIXED b) {
|
||
/* 68000 asm:
|
||
move.l a, d0
|
||
muls.l b, d0:d1 ; d0:d1 = 64-bit result
|
||
swap d0 ; d0 = upper 32 bits (already >> 16)
|
||
; d0 contains the result
|
||
*/
|
||
return (FIXED)(((LONG)a * (LONG)b) >> 16);
|
||
}
|
||
|
||
/* Divide: result = a × 65536 / b
|
||
Must multiply first to avoid losing precision */
|
||
static inline FIXED fixed_div(FIXED a, FIXED b) {
|
||
/* 68000: be careful with overflow!
|
||
Use reciprocal table for perspective division */
|
||
return (FIXED)(((LONG)a << 16) / (LONG)b);
|
||
}
|
||
```
|
||
|
||
### Sine/Cosine Tables
|
||
|
||
Pre-calculated lookup tables are essential — computing `sin()` at runtime is too slow:
|
||
|
||
```c
|
||
/* trig_tables.c — Pre-calculated 16.16 sine/cosine tables */
|
||
/* 1024 entries covering 0-2π, index = angle × 1024 / (2π) */
|
||
|
||
#define TRIG_TABLE_SIZE 1024
|
||
#define ANGLE_2PI 1024 /* Full circle = 1024 units */
|
||
|
||
/* 16.16 fixed-point: sin values range from -1.0 ($FFFF0000) to 1.0 ($00010000) */
|
||
static const FIXED sin_table[TRIG_TABLE_SIZE]; /* Generated at build time */
|
||
|
||
/* Fast lookup with wrapping */
|
||
static inline FIXED fast_sin(int angle) {
|
||
return sin_table[angle & (TRIG_TABLE_SIZE - 1)];
|
||
}
|
||
|
||
static inline FIXED fast_cos(int angle) {
|
||
return sin_table[(angle + 256) & (TRIG_TABLE_SIZE - 1)]; /* cos = sin(x+π/2) */
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Foundation: Matrix Operations
|
||
|
||
3D rotation uses 3×3 matrices multiplied with vertex coordinates. Each rotation (X, Y, Z axis) is a matrix multiply:
|
||
|
||
### Rotation Matrix Construction
|
||
|
||
```c
|
||
/* matrix3d.c — 3D rotation matrices using fixed-point */
|
||
|
||
typedef struct {
|
||
FIXED m[3][3]; /* 3×3 rotation matrix */
|
||
} Matrix3D;
|
||
|
||
/* Build rotation matrix from Euler angles */
|
||
void build_rotation_matrix(Matrix3D *mat, int ax, int ay, int az) {
|
||
FIXED sx = fast_sin(ax), cx = fast_cos(ax);
|
||
FIXED sy = fast_sin(ay), cy = fast_cos(ay);
|
||
FIXED sz = fast_sin(az), cz = fast_cos(az);
|
||
|
||
/* Combined Z×Y×X rotation (standard demoscene order) */
|
||
mat->m[0][0] = fixed_mul(cy, cz);
|
||
mat->m[0][1] = fixed_mul(fixed_mul(sx, sy), cz) - fixed_mul(cx, sz);
|
||
mat->m[0][2] = fixed_mul(fixed_mul(cx, sy), cz) + fixed_mul(sx, sz);
|
||
mat->m[1][0] = fixed_mul(cy, sz);
|
||
mat->m[1][1] = fixed_mul(fixed_mul(sx, sy), sz) + fixed_mul(cx, cz);
|
||
mat->m[1][2] = fixed_mul(fixed_mul(cx, sy), sz) - fixed_mul(sx, cz);
|
||
mat->m[2][0] = -sy;
|
||
mat->m[2][1] = fixed_mul(sx, cy);
|
||
mat->m[2][2] = fixed_mul(cx, cy);
|
||
}
|
||
|
||
/* Transform a vertex: result = matrix × vertex */
|
||
void transform_vertex(const Matrix3D *mat, FIXED vx, FIXED vy, FIXED vz,
|
||
FIXED *rx, FIXED *ry, FIXED *rz) {
|
||
*rx = fixed_mul(mat->m[0][0], vx) +
|
||
fixed_mul(mat->m[0][1], vy) +
|
||
fixed_mul(mat->m[0][2], vz);
|
||
*ry = fixed_mul(mat->m[1][0], vx) +
|
||
fixed_mul(mat->m[1][1], vy) +
|
||
fixed_mul(mat->m[1][2], vz);
|
||
*rz = fixed_mul(mat->m[2][0], vx) +
|
||
fixed_mul(mat->m[2][1], vy) +
|
||
fixed_mul(mat->m[2][2], vz);
|
||
}
|
||
```
|
||
|
||
### Perspective Projection
|
||
|
||
```c
|
||
/* project.c — Perspective projection to screen coordinates */
|
||
|
||
#define SCREEN_CX 160 /* Center X (320 wide) */
|
||
#define SCREEN_CY 128 /* Center Y (256 tall) */
|
||
#define FOCAL_LEN 256 /* Focal length in fixed-point */
|
||
|
||
void project_vertex(FIXED x, FIXED y, FIXED z,
|
||
WORD *sx, WORD *sy) {
|
||
/* Perspective divide: screen = world × focal / z */
|
||
if (z > INT_TO_FIXED(1)) { /* Avoid division by zero */
|
||
FIXED scale = fixed_div(INT_TO_FIXED(FOCAL_LEN), z);
|
||
*sx = SCREEN_CX + FIXED_TO_INT(fixed_mul(x, scale));
|
||
*sy = SCREEN_CY - FIXED_TO_INT(fixed_mul(y, scale)); /* Y flipped */
|
||
} else {
|
||
*sx = SCREEN_CX; /* Behind camera */
|
||
*sy = SCREEN_CY;
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Technique 1: Blitter-Filled Polygons
|
||
|
||
The Blitter's **line-draw + fill mode** combination is the foundation of Amiga 3D rendering. The process:
|
||
|
||
1. Draw polygon edges using Blitter line mode (sets pixels at boundaries)
|
||
2. Activate Blitter fill mode (fills between set pixels, even→odd fill rule)
|
||
3. Result: a filled polygon with zero CPU pixel writing
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant CPU as 68000 CPU
|
||
participant Blitter as Blitter
|
||
participant Bitmap as Bitplane Memory
|
||
|
||
Note over CPU: Polygon: 4 vertices
|
||
|
||
CPU->>Blitter: Configure line mode (BLTCON0=$0B7A)
|
||
CPU->>Blitter: Draw edge V1→V2
|
||
Blitter->>Bitmap: Write edge pixels
|
||
CPU->>Blitter: Draw edge V2→V3
|
||
Blitter->>Bitmap: Write edge pixels
|
||
CPU->>Blitter: Draw edge V3→V4
|
||
Blitter->>Bitmap: Write edge pixels
|
||
CPU->>Blitter: Draw edge V4→V1
|
||
Blitter->>Bitmap: Write edge pixels
|
||
|
||
CPU->>Blitter: Configure fill mode (BLTCON0=$01F2)
|
||
CPU->>Blitter: Fill bitmap (even-odd rule)
|
||
Blitter->>Bitmap: Fill interior pixels
|
||
|
||
Note over Bitmap: Filled polygon ready
|
||
```
|
||
|
||
### Blitter Fill Mode Detail
|
||
|
||
The Blitter fill uses the **inclusive-odd** fill rule: scanning left to right, it inverts pixels each time it encounters a set bit. This means it fills between pairs of edge pixels:
|
||
|
||
```asm
|
||
; blit_fill.asm — Blitter fill for a single bitplane
|
||
; Assumes edges already drawn in the bitmap
|
||
|
||
; Fill from top to bottom of polygon
|
||
lea $DFF000,a0 ; Custom registers base
|
||
|
||
; Set up fill-mode blit
|
||
move.w #$01F2,BLTCON0(a0) ; Fill mode: A→D, fill enabled
|
||
move.w #$0000,BLTCON1(a0) ; No line mode, ascending
|
||
move.w #$FFFF,BLTAFWM(a0) ; First word mask = all bits
|
||
move.w #$FFFF,BLTALWM(a0) ; Last word mask = all bits
|
||
|
||
; Source A = the bitmap data (edge pixels)
|
||
move.l bitmap_start,BLTAPTH(a0) ; Source address
|
||
|
||
; Destination = same bitmap (fill in-place)
|
||
move.l bitmap_start,BLTDPTH(a0) ; Dest = source
|
||
|
||
; Blit size: height × width
|
||
; height = number of scanlines, width = words per line
|
||
move.w #(HEIGHT<<6)|(WORDS_PER_LINE),BLTSIZE(a0)
|
||
; Blitter starts immediately
|
||
```
|
||
|
||
### Multiple-Face Sorting (Painter's Algorithm)
|
||
|
||
For solid 3D objects, faces must be drawn back-to-front (painter's algorithm):
|
||
|
||
```c
|
||
/* face_sort.c — Sort polygon faces by depth for painter's algorithm */
|
||
|
||
typedef struct {
|
||
WORD num_vertices;
|
||
WORD *vertices; /* Index into vertex array */
|
||
FIXED avg_z; /* Average Z depth (for sorting) */
|
||
UWORD color; /* Face color */
|
||
} Face;
|
||
|
||
int compare_faces(const void *a, const void *b) {
|
||
FIXED za = ((const Face *)a)->avg_z;
|
||
FIXED zb = ((const Face *)b)->avg_z;
|
||
/* Sort far-to-near (painter's algorithm) */
|
||
if (za > zb) return -1; /* a is farther, draw first */
|
||
if (za < zb) return 1;
|
||
return 0;
|
||
}
|
||
|
||
void render_object(Face *faces, int num_faces,
|
||
FIXED *transformed_z) {
|
||
int i;
|
||
|
||
/* Calculate average Z for each face */
|
||
for (i = 0; i < num_faces; i++) {
|
||
FIXED sum = 0;
|
||
int j;
|
||
for (j = 0; j < faces[i].num_vertices; j++) {
|
||
sum += transformed_z[faces[i].vertices[j]];
|
||
}
|
||
faces[i].avg_z = sum / faces[i].num_vertices;
|
||
}
|
||
|
||
/* Sort back-to-front */
|
||
qsort(faces, num_faces, sizeof(Face), compare_faces);
|
||
|
||
/* Draw each face */
|
||
for (i = 0; i < num_faces; i++) {
|
||
draw_filled_polygon(&faces[i]);
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Technique 2: Rotozoom (Affine Texture Mapping)
|
||
|
||
Rotozoom renders a texture-mapped rectangle that can rotate and scale in real-time. The name comes from **rotate + zoom**. It works by computing a texture coordinate (U,V) for each screen pixel using an affine transform.
|
||
|
||
### Algorithm
|
||
|
||
For each screen pixel (x, y), compute texture coordinates:
|
||
|
||
```
|
||
U = U_start + x × dU_dx + y × dU_dy
|
||
V = V_start + x × dV_dx + y × dV_dy
|
||
```
|
||
|
||
Where `dU_dx`, `dV_dx`, `dU_dy`, `dV_dy` are the rotation+scale matrix coefficients.
|
||
|
||
```c
|
||
/* rotozoom.c — Affine texture mapping (rotzoom) */
|
||
|
||
#define SCREEN_W 320
|
||
#define SCREEN_H 256
|
||
#define TEX_SIZE 256 /* Texture is 256×256 */
|
||
|
||
extern UBYTE texture[TEX_SIZE][TEX_SIZE]; /* Chunky texture */
|
||
extern UBYTE *chunky_buffer; /* Output chunky buffer */
|
||
|
||
void render_rotozoom(FIXED cx, FIXED cy, /* Texture center offset */
|
||
FIXED angle, FIXED zoom) {
|
||
FIXED cos_a = fast_cos(angle);
|
||
FIXED sin_a = fast_sin(angle);
|
||
FIXED inv_zoom = fixed_div(INT_TO_FIXED(1), zoom);
|
||
|
||
/* Rotation × inverse zoom matrix coefficients */
|
||
FIXED du_dx = fixed_mul(cos_a, inv_zoom);
|
||
FIXED dv_dx = fixed_mul(sin_a, inv_zoom);
|
||
FIXED du_dy = fixed_mul(-sin_a, inv_zoom);
|
||
FIXED dv_dy = fixed_mul(cos_a, inv_zoom);
|
||
|
||
/* Start position: center of screen maps to texture center */
|
||
FIXED u_start = cx - fixed_mul(INT_TO_FIXED(SCREEN_W/2), du_dx)
|
||
- fixed_mul(INT_TO_FIXED(SCREEN_H/2), du_dy);
|
||
FIXED v_start = cy - fixed_mul(INT_TO_FIXED(SCREEN_W/2), dv_dx)
|
||
- fixed_mul(INT_TO_FIXED(SCREEN_H/2), dv_dy);
|
||
|
||
UBYTE *dst = chunky_buffer;
|
||
int y;
|
||
|
||
for (y = 0; y < SCREEN_H; y++) {
|
||
FIXED u = u_start;
|
||
FIXED v = v_start;
|
||
int x;
|
||
|
||
for (x = 0; x < SCREEN_W; x++) {
|
||
/* Texture lookup with wrapping */
|
||
*dst++ = texture[FIXED_TO_INT(u) & 0xFF]
|
||
[FIXED_TO_INT(v) & 0xFF];
|
||
|
||
u += du_dx;
|
||
v += dv_dx;
|
||
}
|
||
|
||
u_start += du_dy;
|
||
v_start += dv_dy;
|
||
}
|
||
|
||
/* Convert chunky buffer to planar bitplanes (C2P) */
|
||
chunky_to_planar(chunky_buffer, bitplane_data, SCREEN_W, SCREEN_H);
|
||
}
|
||
```
|
||
|
||
### Rotozoom in Assembly (Inner Loop)
|
||
|
||
The 68000 assembly inner loop is highly optimized. The key insight: texture coordinates wrap at power-of-2 boundaries, so masking with `$FF` (256-wide texture) is free using byte-level addressing:
|
||
|
||
```asm
|
||
; rotozoom_inner.asm — Optimized inner loop
|
||
; a0 = destination (chunky buffer)
|
||
; a1 = texture base (256×256)
|
||
; d0 = U (16.16 fixed)
|
||
; d1 = V (16.16 fixed)
|
||
; d2 = dU/dx (16.16 fixed)
|
||
; d3 = dV/dx (16.16 fixed)
|
||
; d4 = loop counter (SCREEN_W)
|
||
|
||
.roto_inner:
|
||
move.l d0,d5 ; Copy U
|
||
swap d5 ; d5.w = integer part of U
|
||
move.l d1,d6 ; Copy V
|
||
swap d6 ; d6.w = integer part of V
|
||
|
||
; Texture lookup: tex[v & 0xFF][u & 0xFF]
|
||
and.w #$FF,d5 ; U mask (free wrap)
|
||
and.w #$FF,d6 ; V mask (free wrap)
|
||
lsl.w #8,d6 ; V × 256 (row offset)
|
||
move.b (a1,d5.w),d7 ; Texel = tex[v*256+u]
|
||
move.b d7,(a0)+ ; Write to chunky buffer
|
||
|
||
add.l d2,d0 ; U += dU/dx
|
||
add.l d3,d1 ; V += dV/dx
|
||
dbra d4,.roto_inner
|
||
```
|
||
|
||
---
|
||
|
||
## Technique 3: Dot Tunnel
|
||
|
||
The dot tunnel renders concentric rings that appear to fly toward the viewer, creating the illusion of traveling through a tunnel. Each ring is a circle rendered at a specific Z-depth.
|
||
|
||
### Algorithm
|
||
|
||
```c
|
||
/* dot_tunnel.c — Z-ordered ring tunnel effect */
|
||
|
||
#define NUM_RINGS 30
|
||
#define MAX_Z 1024
|
||
#define RING_POINTS 32
|
||
|
||
typedef struct {
|
||
FIXED z; /* Depth (0=near, far=background) */
|
||
FIXED radius; /* Apparent radius (decreases with z) */
|
||
WORD cx, cy; /* Center (can be animated) */
|
||
UWORD color; /* Ring color */
|
||
} Ring;
|
||
|
||
void render_dot_tunnel(Ring *rings, int num_rings, ULONG frame) {
|
||
int i;
|
||
|
||
/* Update ring positions (move toward viewer) */
|
||
for (i = 0; i < num_rings; i++) {
|
||
rings[i].z -= INT_TO_FIXED(4); /* Speed toward viewer */
|
||
|
||
/* If ring passed camera, reset to far end */
|
||
if (rings[i].z < INT_TO_FIXED(1)) {
|
||
rings[i].z = INT_TO_FIXED(MAX_Z);
|
||
}
|
||
|
||
/* Perspective projection of radius */
|
||
rings[i].radius = fixed_div(
|
||
INT_TO_FIXED(200), /* Base radius */
|
||
rings[i].z /* Divide by depth */
|
||
);
|
||
}
|
||
|
||
/* Sort rings far-to-near (painter's algorithm) */
|
||
/* ... sort by rings[i].z descending ... */
|
||
|
||
/* Draw each ring */
|
||
for (i = 0; i < num_rings; i++) {
|
||
int p;
|
||
int radius = FIXED_TO_INT(rings[i].radius);
|
||
int cx = rings[i].cx + FIXED_TO_INT(
|
||
fixed_mul(fast_sin(frame * 3), INT_TO_FIXED(30)));
|
||
int cy = rings[i].cy + FIXED_TO_INT(
|
||
fixed_mul(fast_cos(frame * 5), INT_TO_FIXED(20)));
|
||
|
||
for (p = 0; p < RING_POINTS; p++) {
|
||
int angle = p * 360 / RING_POINTS;
|
||
FIXED sa = fast_sin(angle * TRIG_TABLE_SIZE / 360);
|
||
FIXED ca = fast_cos(angle * TRIG_TABLE_SIZE / 360);
|
||
|
||
WORD px = cx + FIXED_TO_INT(fixed_mul(INT_TO_FIXED(radius), sa));
|
||
WORD py = cy + FIXED_TO_INT(fixed_mul(INT_TO_FIXED(radius), ca));
|
||
|
||
/* Plot pixel or draw Blitter circle at (px, py) */
|
||
plot_dot(px, py, rings[i].color);
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Technique 4: Voxel Space
|
||
|
||
Voxel space renders a 3D landscape from a 2D heightmap and colormap. The algorithm casts rays from the viewer, one per screen column, and draws vertical strips of pixels. The result is a fly-over landscape effect, as seen in the 1994 demo "Space Rangers" by Rebels.
|
||
|
||
### Algorithm (Column-Based Raycasting)
|
||
|
||
```mermaid
|
||
sequenceDiagram
|
||
participant CPU as 68000 CPU
|
||
participant HM as Heightmap
|
||
participant CM as Colormap
|
||
participant Screen as Screen Buffer
|
||
|
||
Note over CPU: For each screen column x (0-319):
|
||
|
||
CPU->>CPU: Calculate ray direction for column x
|
||
CPU->>HM: Sample height at (ray_x, ray_z)
|
||
HM-->>CPU: height value h
|
||
CPU->>CPU: Project h to screen Y: y = horizon - h/z
|
||
CPU->>CM: Get color at (ray_x, ray_z)
|
||
CM-->>CPU: color value
|
||
CPU->>Screen: Draw vertical line from y to previous y in color
|
||
|
||
Note over CPU: Advance ray (step outward)
|
||
CPU->>HM: Sample next point...
|
||
```
|
||
|
||
### Voxel Space Implementation
|
||
|
||
```c
|
||
/* voxelspace.c — Column-based voxel landscape rendering */
|
||
|
||
#define SCREEN_W 320
|
||
#define SCREEN_H 256
|
||
#define MAP_SIZE 256
|
||
#define HORIZON 100 /* Horizon line Y position */
|
||
#define MAX_DEPTH 200 /* Maximum ray distance */
|
||
|
||
extern UBYTE heightmap[MAP_SIZE][MAP_SIZE];
|
||
extern UBYTE colormap[MAP_SIZE][MAP_SIZE];
|
||
extern UBYTE *chunky_buffer;
|
||
|
||
void render_voxel_space(FIXED cam_x, FIXED cam_z,
|
||
FIXED cam_angle, FIXED cam_height) {
|
||
int x;
|
||
|
||
for (x = 0; x < SCREEN_W; x++) {
|
||
/* Ray angle: camera angle + column offset */
|
||
FIXED column_offset = INT_TO_FIXED(x - SCREEN_W/2);
|
||
FIXED ray_angle = cam_angle + fixed_div(column_offset,
|
||
INT_TO_FIXED(FOCAL_LEN));
|
||
|
||
FIXED ray_dx = fast_cos(ray_angle); /* Direction X */
|
||
FIXED ray_dz = fast_sin(ray_angle); /* Direction Z */
|
||
|
||
FIXED ray_x = cam_x;
|
||
FIXED ray_z = cam_z;
|
||
|
||
WORD prev_draw_y = SCREEN_H; /* Bottom of column */
|
||
int distance;
|
||
|
||
for (distance = 1; distance < MAX_DEPTH; distance++) {
|
||
FIXED dz = fixed_div(INT_TO_FIXED(distance), ray_dz);
|
||
FIXED dx = fixed_div(INT_TO_FIXED(distance), ray_dx);
|
||
|
||
/* Current map position */
|
||
int mx = (FIXED_TO_INT(cam_x + dx)) & (MAP_SIZE - 1);
|
||
int mz = (FIXED_TO_INT(cam_z + dz)) & (MAP_SIZE - 1);
|
||
|
||
/* Height at this point */
|
||
FIXED terrain_h = INT_TO_FIXED(heightmap[mz][mx]);
|
||
|
||
/* Project to screen Y */
|
||
FIXED height_diff = terrain_h - cam_height;
|
||
WORD draw_y = HORIZON -
|
||
FIXED_TO_INT(fixed_div(height_diff,
|
||
INT_TO_FIXED(distance)));
|
||
|
||
/* Only draw if above previously drawn pixel */
|
||
if (draw_y < prev_draw_y) {
|
||
UBYTE color = colormap[mz][mx];
|
||
int y;
|
||
|
||
for (y = draw_y; y < prev_draw_y; y++) {
|
||
chunky_buffer[y * SCREEN_W + x] = color;
|
||
}
|
||
prev_draw_y = draw_y;
|
||
}
|
||
|
||
/* Step ray outward */
|
||
ray_x += ray_dx;
|
||
ray_z += ray_dz;
|
||
}
|
||
}
|
||
|
||
/* C2P conversion for planar display */
|
||
chunky_to_planar(chunky_buffer, bitplane_data, SCREEN_W, SCREEN_H);
|
||
}
|
||
```
|
||
|
||
### Voxel Space Optimization
|
||
|
||
The naive algorithm is too slow for 50 FPS on a 68000. Key optimizations:
|
||
|
||
| Optimization | Speedup | How |
|
||
|-------------|---------|-----|
|
||
| **Reciprocal table** | 2× | Pre-compute 1/z values, avoid division |
|
||
| **Step doubling** | 3-4× | Double step size beyond certain depth (less detail needed) |
|
||
| **Height caching** | 1.5× | Cache last N height lookups |
|
||
| **Reduced resolution** | 2-4× | Render at 160×128 and scale up (acceptable for landscape) |
|
||
| **Fast C2P** | 30× | Use Kalms C2P instead of naive conversion |
|
||
|
||
---
|
||
|
||
## Performance Budget
|
||
|
||
### 3D Rendering Costs on Stock A500 (7 MHz 68000)
|
||
|
||
| Operation | Cycles (approx.) | Notes |
|
||
|-----------|------------------|-------|
|
||
| Fixed-point multiply | ~28 | `MULS.W` (16×16→32) |
|
||
| Fixed-point divide | ~140 | `DIVS.W` — very expensive! |
|
||
| Sine table lookup | ~12 | Table indexed by angle |
|
||
| Vertex transform | ~300 | 3 multiplies + 3 adds per axis |
|
||
| Perspective divide | ~160 | 2 divides per vertex |
|
||
| Blitter line draw | ~200/edge | DMA time for edge |
|
||
| Blitter fill | ~2000/polygon | Depends on polygon size |
|
||
| Full C2P (Kalms) | ~35ms | 320×256 × 8bpp |
|
||
| Voxel column | ~500/col | Heightmap lookup + draw |
|
||
|
||
### Frame Budget (PAL: 20ms per frame)
|
||
|
||
| Effect | Vertices | Time | FPS |
|
||
|--------|----------|------|-----|
|
||
| Single flat-shaded cube | 8 | ~3ms | 50 |
|
||
| 100-face object | 30+ | ~12ms | 30-50 |
|
||
| Rotozoom 320×256 | 0 (per pixel) | ~40ms (with C2P) | 15-25 |
|
||
| Dot tunnel 30 rings | 960 dots | ~8ms | 50 |
|
||
| Voxel space 320×256 | 64K cols | ~80ms | 6-12 |
|
||
|
||
---
|
||
|
||
## Antipatterns
|
||
|
||
### 1. The Floating-Point Temptation
|
||
|
||
Using floating-point math on the 68000. The 68881 FPU is optional — most Amigas don't have one. Software floating-point emulation is **100× slower** than fixed-point.
|
||
|
||
**Broken:**
|
||
```c
|
||
/* Don't do this — requires FPU or slow software emulation */
|
||
float x = sin(angle) * distance;
|
||
float y = cos(angle) * distance;
|
||
```
|
||
|
||
**Fixed:**
|
||
```c
|
||
/* Use fixed-point with pre-calculated tables */
|
||
FIXED x = fixed_mul(fast_sin(angle), distance);
|
||
FIXED y = fixed_mul(fast_cos(angle), distance);
|
||
```
|
||
|
||
### 2. The Per-Pixel Divide
|
||
|
||
Calling `fixed_div()` for every pixel in a rotozoom or voxel renderer. Division is the most expensive operation on the 68000 (~140 cycles for 16-bit, ~280 for 32-bit).
|
||
|
||
**Broken:**
|
||
```c
|
||
for (x = 0; x < 320; x++) {
|
||
for (y = 0; y < 256; y++) {
|
||
FIXED u = fixed_div(x, z); /* DIVIDE PER PIXEL! */
|
||
FIXED v = fixed_div(y, z);
|
||
}
|
||
}
|
||
```
|
||
|
||
**Fixed:**
|
||
```c
|
||
/* Pre-compute step values (multiply instead of divide) */
|
||
FIXED du_dx = fixed_mul(scale, inv_z); /* One divide per frame */
|
||
FIXED dv_dy = fixed_mul(scale, inv_z);
|
||
|
||
for (y = 0; y < 256; y++) {
|
||
FIXED u = u_start;
|
||
for (x = 0; x < 320; x++) {
|
||
u += du_dx; /* ADD, not multiply/divide */
|
||
}
|
||
u_start += du_dy;
|
||
}
|
||
```
|
||
|
||
### 3. The Backface Cull Miss
|
||
|
||
Skipping backface culling for convex objects. Every polygon drawn behind other polygons is wasted Blitter time. A simple dot-product test rejects ~50% of faces.
|
||
|
||
**Broken:**
|
||
```c
|
||
/* Draw all faces — 50% are facing away! */
|
||
for (i = 0; i < num_faces; i++) {
|
||
draw_filled_polygon(&faces[i]); /* Wastes time on hidden faces */
|
||
}
|
||
```
|
||
|
||
**Fixed:**
|
||
```c
|
||
for (i = 0; i < num_faces; i++) {
|
||
/* Backface cull: if face normal points away, skip it */
|
||
FIXED nx = compute_normal_x(&faces[i]);
|
||
FIXED nz = compute_normal_z(&faces[i]);
|
||
if (nz < 0) continue; /* Facing away from camera */
|
||
|
||
draw_filled_polygon(&faces[i]);
|
||
}
|
||
```
|
||
|
||
### 4. The Unsorted Z-Fight
|
||
|
||
Drawing faces in random order without depth sorting. Overlapping polygons flicker as they overwrite each other unpredictably each frame.
|
||
|
||
**Broken:**
|
||
```c
|
||
/* Draw faces in arbitrary order → z-fighting */
|
||
for (i = 0; i < num_faces; i++) {
|
||
draw_filled_polygon(&faces[i]);
|
||
}
|
||
```
|
||
|
||
**Fixed:**
|
||
```c
|
||
/* Sort by average Z depth (painter's algorithm) */
|
||
qsort(faces, num_faces, sizeof(Face), compare_faces_back_to_front);
|
||
for (i = 0; i < num_faces; i++) {
|
||
draw_filled_polygon(&faces[i]);
|
||
}
|
||
```
|
||
|
||
### 5. The Naive C2P
|
||
|
||
Using a naive chunky-to-planar conversion for rotozoom/voxel output. The naive method processes each pixel individually with bit shifts, taking over 1 second per frame on a stock 68000.
|
||
|
||
**Broken:**
|
||
```c
|
||
/* Naive C2P: ~70,000 pixels/sec — 0.9 FPS for 320×256 */
|
||
for (i = 0; i < 320*256; i++) {
|
||
int pixel = chunky[i];
|
||
for (bit = 0; bit < 8; bit++) {
|
||
planes[bit][i/8] |= ((pixel >> bit) & 1) << (7 - (i & 7));
|
||
}
|
||
}
|
||
```
|
||
|
||
**Fixed:**
|
||
```c
|
||
/* Use Kalms C2P or Blitter-assisted C2P: ~30× faster */
|
||
kalms_c2p(chunky_buffer, planar_data, 320, 256);
|
||
/* See pixel_conversion.md for full implementation */
|
||
```
|
||
|
||
---
|
||
|
||
## Decision Guide
|
||
|
||
```mermaid
|
||
flowchart TD
|
||
START[Need 3D rendering] --> Q1{What are you rendering?}
|
||
Q1 -->|Solid objects| Q2{Convex or arbitrary?}
|
||
Q1 -->|Landscapes| VOXEL[Voxel Space]
|
||
Q1 -->|Textures| ROTZ[Rotozoom]
|
||
Q1 -->|Abstract/Tunnel| DOT[Dot Tunnel]
|
||
|
||
Q2 -->|Convex| Q3{Number of faces?}
|
||
Q2 -->|Arbitrary| BSP[BSP-tree or Z-buffer approach]
|
||
|
||
Q3 -->|<100| SIMPLE[Simple painter's algorithm<br/>+ backface cull]
|
||
Q3 -->|100-500| SORT[Sorted faces + Blitter fill]
|
||
Q3 -->|>500| Q4{Stock A500 or accelerated?}
|
||
|
||
Q4 -->|Stock A500| REDUCE[Reduce geometry or<br/>use wireframe only]
|
||
Q4 -->|Accelerated| OPT[Optimized fill + C2P pipeline]
|
||
|
||
VOXEL --> Q5{Resolution?}
|
||
Q5 -->|Full 320×256| SLOW[~8 FPS on stock A500]
|
||
Q5 -->|Half 160×128| OK[~15-25 FPS on A500]
|
||
|
||
ROTZ --> Q6{Platform?}
|
||
Q6 -->|OCS/ECS| C2P_PATH[C2P required<br/>~15-25 FPS]
|
||
Q6 -->|RTG/AGA| CHUNKY[Direct chunky write<br/>~50 FPS possible]
|
||
```
|
||
|
||
---
|
||
|
||
## Historical Timeline
|
||
|
||
```mermaid
|
||
timeline
|
||
title 3D Rendering on Amiga
|
||
1987 : First wireframe 3D demos (line drawing only)
|
||
1988 : Hidden line removal algorithms
|
||
: Flat-shaded 3D in demoscene
|
||
1989 : Phenomena — filled polygon 3D objects
|
||
: Fixed-point math becomes standard
|
||
1990 : Complex demo — multiple filled objects
|
||
: First rotozoom effects appear
|
||
1991 : Texture-mapped rotozoom at acceptable framerates
|
||
: 3D starfields with depth sorting
|
||
1992 : Voxel space demos appear (low resolution)
|
||
: Dot tunnels with Blitter-optimized rendering
|
||
1993 : Spaceballs — state-of-the-art dot tunnel
|
||
: Higher-resolution voxel landscapes
|
||
1994 : Rebels — smooth voxel space fly-over
|
||
: 68040-accelerated 3D at full frame rate
|
||
1995 : Polka Brothers — CPU raytracer (proof of concept)
|
||
: 68060-accelerated demos with lighting
|
||
2000+ : Demoscene pushes 3D on stock A500
|
||
: Group-optimized rotozoom achieves new speed records
|
||
: MiSTer preserves exact Blitter timing for fill accuracy
|
||
```
|
||
|
||
---
|
||
|
||
## Modern Analogies
|
||
|
||
| Amiga 3D Concept | Modern Equivalent | Why It Maps |
|
||
|-----------------|-------------------|-------------|
|
||
| Fixed-point 16.16 | Half-precision float (FP16) | Both trade precision for speed |
|
||
| Sine lookup table | GPU SFU (Special Function Unit) | Both use hardware-assisted transcendental |
|
||
| Blitter fill mode | GPU rasterizer | Both fill polygon interiors |
|
||
| Painter's algorithm | Z-buffer / depth test | Both solve polygon visibility |
|
||
| Backface culling | GPU backface culling | Both skip invisible faces |
|
||
| Rotozoom | Affine texture sampling | Both use 2×2 matrix transform per pixel |
|
||
| Voxel space raycasting | Heightfield terrain shader | Both cast rays through a heightmap |
|
||
| C2P conversion | Texture swizzle/deswizzle | Both convert between memory layouts |
|
||
| Reciprocal table | GPU reciprocal approximation | Both avoid expensive division |
|
||
| Chunky buffer | Render-to-texture (FBO) | Both render to off-screen buffer |
|
||
|
||
---
|
||
|
||
## Use Cases
|
||
|
||
| Use Case | Technique | Notable Examples |
|
||
|----------|-----------|-----------------|
|
||
| 3D game objects | Filled polygons | Flight simulators, Elite clones |
|
||
| Rotating logo | Rotozoom | Every demo with a bitmap logo |
|
||
| Tunnel fly-through | Dot tunnel | Spaceballs, numerous demos |
|
||
| Landscape fly-over | Voxel space | Rebels, numerous demos |
|
||
| 3D chess/board games | Filled polygons + sorting | Various Amiga games |
|
||
| Virtual reality scenes | Combined techniques | Various demo compos |
|
||
| Star field | Z-ordered point rendering | Standard demo effect |
|
||
| Wavy floor/ceiling | Rotozoom variant | Doom-like perspective tricks |
|
||
|
||
---
|
||
|
||
## FPGA / Emulation Impact
|
||
|
||
| Concern | Impact | Notes |
|
||
|---------|--------|-------|
|
||
| **Blitter fill timing** | Fill must use exact inclusive-odd rule | Emulators must match Blitter fill behavior precisely |
|
||
| **Line-draw accuracy** | Blitter Bresenham must match real hardware | Affects polygon edge positions |
|
||
| **C2P pipeline** | Chunky→Planar timing affects frame rate | Must be accounted for in demo timing |
|
||
| **Fixed-point overflow** | 68000 MULS.L/DIVS.L edge cases | 32-bit overflow behavior must match hardware |
|
||
| **Blitter-CPU interleaving** | BLTPRI affects CPU stall duration | Must match real Blitter busy-wait timing |
|
||
|
||
---
|
||
|
||
## FAQ
|
||
|
||
**Q: Why not use the FPU for 3D math?**
|
||
A: The 68881/68882 FPU is optional hardware that most Amiga models don't have. Software FPU emulation is 50-100× slower than fixed-point integer math. Only 68030/040/060 accelerated Amigas typically have an FPU, and even then, fixed-point is faster for many operations because the 68000's integer multiply is well-optimized.
|
||
|
||
**Q: What is the fastest C2P for rotozoom?**
|
||
A: The Kalms C2P is the standard. For AGA machines with 32-bit Blitter access, a Blitter-assisted C2P can be even faster. For RTG cards, C2P is unnecessary — write directly to chunky VRAM. See [Pixel Conversion](../08_graphics/pixel_conversion.md) for benchmarks.
|
||
|
||
**Q: How do I handle polygon clipping?**
|
||
A: For simple 3D objects, screen-space clipping (Cohen-Sutherland or Sutherland-Hodgman) is sufficient. Clip against the four screen edges. For objects that can go behind the camera, you need near-plane clipping in 3D space — this is much more complex and most demos avoid it by keeping objects in front of the camera.
|
||
|
||
**Q: Can I do texture-mapped polygons (not just rotozoom)?**
|
||
A: Yes, but affine texture mapping (per-polygon UV) produces visible distortion on large polygons. Correct perspective texture mapping requires per-pixel division, which is too slow on a 68000. Most demos use subdivision (split large polygons into smaller ones) or simply use rotozoom for the entire screen.
|
||
|
||
**Q: What is a dot matrix / voxel display?**
|
||
A: A voxel (volume pixel) display renders 3D data as a grid of points. On the Amiga, this typically means rendering heightmap terrain as vertical columns (voxel space) or rendering 3D point clouds. The Blitter's line-draw mode can efficiently render individual dots.
|
||
|
||
---
|
||
|
||
## References
|
||
|
||
### Related Knowledge Base Articles
|
||
|
||
- [Pixel Conversion](../08_graphics/pixel_conversion.md) — C2P algorithms (Kalms, Blitter, Akiko)
|
||
- [Blitter Programming](../08_graphics/blitter/blitter_programming.md) — Fill mode, line draw, minterms
|
||
- [Blitter](../08_graphics/blitter/blitter.md) — Blitter hardware architecture
|
||
- [Bitmap](../08_graphics/bitmap.md) — Bitplane memory layout, interleaving
|
||
- [Copper Effects](copper_effects.md) — Copper-driven display effects
|
||
- [Timing Optimization](timing_optimization.md) — Cycle counting, Blitter-CPU interleaving
|
||
- [FPU Architecture](../15_fpu_mmu_cache/fpu_architecture.md) — 68881/68882 floating-point
|
||
|
||
### External Resources
|
||
|
||
- **Amiga Hardware Reference Manual** — Blitter fill mode, line-draw mode
|
||
- **Scoopex Amiga Hardware Programming** (Photon) — [YouTube playlist](https://www.youtube.com/playlist?list=PLc3ltHgmiidpK-s0eP5hTKJnjdTHz0_bW) — Blitter fill mode and line-draw video walkthroughs; companion articles at [coppershade.org](http://coppershade.org/)
|
||
- **Pouet.net** — https://www.pouet.net — 3D demo releases with source code
|
||
- **Demozoo** — https://demozoo.org — Demoscene production encyclopedia
|
||
- **Amiga Graphics Archive** — https://amiga.lychesis.net — Copper-enhanced 3D rendering analysis in commercial games
|
||
- **Kalms C2P** — Standard chunky-to-planar implementation
|
||
- **Comanche Voxel Engine** — Original voxel space algorithm reference (NovaLogic)
|