amiga-bootcamp/01_hardware/aga_a1200_a4000/cpu_030_040.md
2026-04-27 12:42:42 -04:00

20 KiB

← Home · Hardware · AGA

68030/040 on the Amiga (A3000/A4000)

Overview

The A3000 ships with a Motorola 68030 at 16 or 25 MHz. The A4000 ships with either a 68030 or 68040 at 25 MHz. Later accelerator cards bring the 68060. This document covers CPU-specific concerns for AmigaOS 3.1/3.2 on these platforms.

68030 (A3000)

On-Chip Caches

  • Instruction cache: 256 bytes, direct-mapped
  • Data cache: 256 bytes, direct-mapped
  • Both enabled by default on AmigaOS 3.1 (CacheControl() call)

On-Chip PMMU

  • Full 68030 PMMU: ATC (Address Translation Cache), TT registers
  • AmigaOS does not use the MMU by default on A3000
  • Third-party tools (e.g., VMM, mmu.library) use it for virtual memory

Cache Control (exec)

#include <proto/exec.h>

/* Enable instruction and data caches */
CacheControl(CACRF_EnableI | CACRF_EnableD, CACRF_EnableI | CACRF_EnableD);

/* Flush caches (required before self-modifying code) */
CacheClearU();          /* clear all caches */
CacheClearE(addr, len, CACRF_ClearI | CACRF_ClearD);  /* targeted flush */

Cache Coherency with DMA

The 68030 data cache is not snooped by the Amiga chip bus. If the CPU writes to a buffer that the DMA engine (Blitter, audio, custom chips) will read, the cache must be flushed first:

/* Before handing a buffer to DMA: */
CacheClearE(buf, size, CACRF_ClearD);

Similarly, after DMA writes to a buffer that the CPU will read:

CacheClearE(buf, size, CACRF_ClearD);

This is a common source of bugs in audio/video programming on 68030+ Amigas.


68040 (A4000 / Accelerators)

On-Chip FPU — Partial Implementation

The 68040 has an on-chip FPU but omits many instructions present in the 68881/68882:

Missing from 68040 FPU:

  • FSIN, FCOS, FTAN, FASIN, FACOS, FATAN
  • FETOX, FETOXM1, FLOGN, FLOG10
  • FSINH, FCOSH, FTANH
  • FATANH, FASINH, FACOSH
  • FSCALE, FREMX, FREMX, FINTRZ

AmigaOS provides 68040.library which installs Line-F exception handlers to emulate the missing instructions in software.

/* 68040.library is opened automatically by exec at boot */
/* Software should check that it is open before using FP */
struct Library *MathLib = OpenLibrary("68040.library", 0);

Warning

If 68040.library is not installed and software uses a missing 68040 FPU instruction, the system will crash with a Line-F exception. Always ensure 68040.library is present on A4000.

4 KB Instruction + Data Caches

  • 68040 has 4 KB instruction cache and 4 KB data cache, both 4-way set-associative
  • Cache coherency is more complex: DMA writes may not be visible to the CPU without invalidation
  • CacheClearE() / CacheClearU() remain the correct API

68040 Memory Model

CACR (Cache Control Register) accessed via MOVEC:
  bit 15: EDC   — enable data cache
  bit 14: NAD   — no allocate data (streaming mode)
  bit 13: ESB   — enable store buffer
  bit 10: DPI   — disable push-inhibit
  bit  7: EIC   — enable instruction cache
  bit  3: CINV  — cache invalidate

Bus Error Stack Frame (68040)

The 68040 generates a different bus error stack frame from the 68000:

  • 68000: 14-byte frame
  • 68040: 104-byte frame with pipeline state

This matters for exception handlers and debuggers targeting both platforms.


68060 (Accelerator Cards)

Available via Blizzard 060, CyberStorm 060, etc. Key differences:

  • Superscalar: two integer pipelines (in-order, no OOO)
  • Branch prediction: static and dynamic
  • No MOVE16 snooping on some Amiga implementations
  • On-chip FPU: missing same transcendentals as 68040 → needs 68060.library
  • Separate 68060.library for the additional missing instructions vs 68040
struct Library *Lib060 = OpenLibrary("68060.library", 0);

AmigaOS exec CacheControl() API

#include <exec/execbase.h>
#include <proto/exec.h>

/* SysBase->CacheFlags reflect current cache state */
ULONG flags = CacheControl(0, 0);   /* query without changing */

/* Flag bits: */
#define CACRF_EnableD   (1L<<3)   /* data cache enable */
#define CACRF_FreezeD   (1L<<4)   /* data cache freeze */
#define CACRF_ClearD    (1L<<5)   /* clear data cache */
#define CACRF_EnableI   (1L<<8)   /* instruction cache enable */
#define CACRF_FreezeI   (1L<<9)   /* instruction cache freeze */
#define CACRF_ClearI    (1L<<10)  /* clear instruction cache */
#define CACRF_CopyBack  (1L<<31)  /* data cache write-back mode */

Real Accelerator Card Context

The A3000 and A4000 launched with Commodore's own CPU cards, but the accelerator ecosystem exploded with third-party offerings that pushed 68K performance well beyond what Commodore shipped.

Commodore Cards

Card Machine CPU Clock FPU Notes
A3630 A3000 68030 25 MHz 68882 socket Standard A3000-25 CPU card, 2 SIMM slots for Fast RAM
A3640 A4000 68040 25 MHz Integrated (partial) Standard A4000-040 card; soldered 68040, no socket. Fast RAM on card via 4 SIMM slots
A3660 A4000 68060 50 MHz Integrated (partial) Rare — Commodore prototype; production by third parties post-Commodore

Third-Party Accelerators

Card Vendor CPU Clock FPU Memory Key Feature
Blizzard 1230-IV Phase5 68030 50 MHz 68882 @ 50 MHz Up to 256 MB Fast RAM (1 x 72-pin SIMM) Defacto standard A1200 accelerator; MMU for virtual memory, 50 MHz gives ~3.5x A1200 performance
Blizzard 1260 Phase5 68060 50-66 MHz Integrated (partial) Up to 256 MB End-game A1200 accelerator; the engine for high-end demos and Doom/Quake ports
Blizzard 2060 Phase5 68060 50 MHz Integrated Up to 128 MB A2000 version; SCSI-II controller on-card
CyberStorm MK-III Phase5 68060 50-66 MHz Integrated Up to 128 MB A3000/A4000 flagship; Ultra-wide SCSI on-card
CyberStorm PPC Phase5 68060 + PowerPC 604e 50/60 MHz + 200-233 MHz Integrated Up to 128 MB Hybrid 68K+PowerPC — AmigaOS 4.0 target; 68K handles legacy, PPC for new code
WarpEngine 4040 MacroSystem 68040 40 MHz Integrated Up to 64 MB A3000/A4000; 40 MHz 040 comfortably outperforms stock A3640-25
GVP A530 GVP 68030 + 68882 40 MHz Socketed 68882 Up to 8 MB A500 sidecar accelerator — brings 32-bit CPU to the A500 trapdoor slot
Apollo 1240 ACT 68040 25-40 MHz Integrated Up to 64 MB Budget A1200 accelerator; needs 68040.library

Rule for all 040/060 cards: 68040.library or 68060.library must be present in LIBS: or the system will crash on the first missing FPU instruction. Install 68040.library/68060.library at LIBS: root, not in a subdirectory.


Performance Benchmarks

Real-world performance from SysInfo, AIBB, and actual workloads rather than misleading MIPS numbers:

CPU Clock SysSpeed MIPS Dhrystones/sec Memory BW (MB/s) Real-World Factor vs A500
68000 (A500) 7.14 MHz ~1.0 ~1,100 ~1.5 (Chip only) 1.0x (baseline)
68020 (A1200 stock) 14.3 MHz ~3.0 ~4,500 ~3.5 ~2.5x
68030/50 (Blizzard 1230) 50 MHz ~9.5 ~17,000 ~12.0 ~8x
68040/25 (A3640) 25 MHz ~15.0 ~28,000 ~18.0 ~12x
68040/40 (WarpEngine) 40 MHz ~24.0 ~45,000 ~25.0 ~19x
68060/50 (Blizzard 1260) 50 MHz ~53.0 ~95,000 ~38.0 ~38x
68060/66 66 MHz ~70.0 ~125,000 ~40.0 ~50x
68080 (Vampire V4) ~90 MHz equiv ~150+ ~300,000+ ~200+ (Fast + RTG) ~130x

Important Caveats

  • Chip RAM bandwidth is fixed at ~3.5 MB/s regardless of CPU speed. The CPU accelerator only speeds up Fast RAM access.
  • AGA bandwidth (4x fetch) helps but only for Chip RAM reads by the display DMA, not CPU-to-Chip-memory traffic.
  • Bus contention: on A1200 accelerators, the trapdoor connector limits bandwidth between CPU and Amiga chip bus to ~4-5 MB/s maximum.

Named Antipatterns

1. "The Ghost Cache"

What fails — writing to a buffer for DMA but not flushing the data cache:

/* BROKEN — data cache holds dirty data */
PrepareCopperList(copperBuffer);   /* CPU writes to buffer */
/* copperBuffer is in CPU data cache, not yet in chip RAM */
WaitTOF();                          /* switch copper lists */
/* Copper reads stale MAIN MEMORY, not the CPU's dirty cache lines */

Why it fails: On 68030+ CPUs, data writes stay in the data cache. The cache writes back to memory lazily — when the cache line is evicted, not when the write happens. DMA engines (Copper, Blitter, audio, Paula) read from main memory via the chip bus and see stale data because they bypass the CPU cache entirely.

Correct:

PrepareCopperList(copperBuffer);
CacheClearE(copperBuffer, sizeof(copperBuffer), CACRF_ClearD);
WaitTOF();
/* Now Copper reads correct data */

2. "The Missing Library"

What fails — running 68040/68060 software without 68040.library / 68060.library:

/* BROKEN — assumes full FPU */
double x = sin(angle);  /* FSIN is not in 68040/68060 silicon */
/* Line-F exception -> system crash if 68040.library not installed */

Why it fails: The 68040 and 68060 omit transcendental FPU instructions (FSIN, FCOS, FTAN, FLOG10, etc.) that were present in the 68881/68882. When the CPU encounters these opcodes, it raises a Line-F exception. 68040.library and 68060.library install exception handlers that intercept Line-F exceptions and emulate the missing instructions in software — orders of magnitude slower than hardware but at least they complete. Without these libraries loaded, the exception has no handler and the system crashes immediately.

Correct:

/* At startup, verify the appropriate library is open: */
if (SysBase->AttnFlags & AFF_68040)
{
    struct Library *Math040 = OpenLibrary("68040.library", 37);
    if (!Math040) { /* FATAL: install the library */ }
}
else if (SysBase->AttnFlags & AFF_68060)
{
    struct Library *Math060 = OpenLibrary("68060.library", 37);
    if (!Math060) { /* FATAL: install the library */ }
}

/* Then using FPU is safe: */
double x = sin(angle);

3. "The Calibrated Loop"

What fails — delay loops calibrated for 7 MHz 68000:

/* BROKEN — spin-loop delay assumes 68000 timing */
void DelayCycles(ULONG n)
{
    while (n--)   /* assumes ~4 cycles per iteration at 7 MHz */
    {
        asm("nop");  /* pretend delay */
    }
}
/* On 68060 at 50 MHz with superscalar, this is 30x faster and produces no delay at all */

Why it fails: 68060 has dual pipelining, branch prediction, and static + dynamic speculation. A "calibrated delay loop" that produces a 1-second delay on a 68000 may produce a 30 millisecond delay on a 68060. The function derives timing from loop geometry that changes with every CPU generation and clock speed.

Correct:

/* Use ReadEClock() for CPU-independent timing: */
ULONG ReadEClockFreq(struct EClockVal *dest)
{
    /* Stores 64-bit E-clock tick count and returns frequency */
    /* Must open timer.device first */
}

struct EClockVal start, end;
ULONG efreq = ReadEClock(&start);

/* ... do work ... */

ReadEClock(&end);
ULONG elapsed_us = (ULONG)(((UQUAD)(end.ev_lo - start.ev_lo) * 1000000ULL) / efreq);

4. "The Speeding on the Chip Bus"

What fails — 68040/68060 code writing to custom chip registers at full CPU speed:

/* BROKEN — 68060 writes to custom registers at 50 MHz */
custom->bltcon0 = 0x09F0;   /* set up blitter parameters */
custom->bltsize = 0x0040;   /* start the blit — too fast! */
/* The chip bus runs at 3.58 MHz and may miss the second write */

Why it fails: The Amiga custom chip bus runs at 3.58 MHz (color clock speed). The 68060 at 50 MHz can issue writes at a rate that overwhelms the chip bus's buffering. The writes go directly to custom registers whose cycle timing is far slower than the CPU's. Writing without sufficient delay between register accesses causes the second write to arrive before the first has settled, producing random chip states that are not replicable even with identical timing.

Correct:

/* Ensure sufficient delay between custom register writes */
custom->bltcon0 = 0x09F0;
/* Wait for at least one color clock (~280 ns) */
asm volatile ("nop; nop; nop");  /* ~180 ns at 50 MHz */
custom->bltsize = 0x0040;
/* Better approach: use Blitter-done interrupt for synchronization */

Pitfalls

1. Cache Coherency with CIA Chips

The 8520 CIA chips are on the custom chip bus. Reading CIA registers from cached memory returns stale values. Always access CIA registers through non-cached memory regions or use CacheClearU() before reading. This affects keyboard handshaking, serial port timing, and disk change detection.

2. Address Alignment with Hardware Registers

68020+ CPUs support unaligned (odd address) accesses, but Amiga custom chips require WORD-even and LONG-even aligned addresses. An unaligned write to custom chip registers causes a bus error on 68020+. Always align pointers to 16-bit or 32-bit boundaries when pointing to hardware registers.

3. Chip RAM Speed Never Changes

Regardless of CPU speed, Chip RAM always accesses at 7.09 MHz (PAL) / 7.16 MHz (NTSC) color clock. A 68060 at 50 MHz reads Chip RAM at the same bandwidth as the 68000 at 7 MHz. The only way to increase Chip RAM bandwidth is through wider fetch modes on AGA (FMODE), and even that only helps display DMA, not CPU access to Chip RAM.


Best Practices

  1. Always call CacheClearE before handing a CPU-written buffer to DMA — this includes Copper lists, blitter sources, audio samples, disk buffers, and sprite data.
  2. Check SysBase->AttnFlags for AFF_68040 / AFF_68060 — load the corresponding math libraries at startup and abort if missing.
  3. Use ReadEClock() for timing, not instruction-counting loops — this works across all CPU generations.
  4. Do NOT write to custom chip registers in tight loops without NOP delays — give the chip bus hundreds of nanoseconds between successive writes.
  5. Align memory pointers to hardware registers — unaligned accesses cause bus errors on 68020+.
  6. Only touch CacheControl bits you understand — CACRF_CopyBack (bit 31) writes dirty data back, affecting performance dramatically. Do not experiment in production.
  7. Cache-pure code in Fast RAM — 68020+ CPUs fetch instructions from Fast RAM at full CPU speed, but data that lives in Chip RAM is access-penalized at ~3.5 MB/s regardless of CPU speed.

When to Use / When NOT to Use

Scenario Recommended CPU Why
Safe baseline for any Amiga 68000 Every Amiga has at least this
lection A500/A600/A1200 original Their stock CPU 68000/68020 respectively
Compatibility testing 68020+ with cache off Test both cached and uncached
Daily Amiga workstation 68030/50 or 68040/40 8-19x performance, smooth multitasking
Doom / Quake / 3D gaming 68060/50 minimum Triple-digit MIPS needed; FPU for rendering
Vampire / FPGA Apollo 68080 Maximum performance on any Amiga; 130x real-world factor

FPGA & MiSTer Impact

68020+ and 68040+ live differently on FPGA:

Topic Real Hardware FPGA Concern
Data cache on TG68K TG68K is a cycle-accurate 68000 behavior target; no data cache, no MMU Cache advice in this article does not apply to TG68K-based cores
Minimig 68020 Minimig supports 68020 with instruction cache only alan data cache — needed for 68020 timing benchmarks
Apollo 68080 Integrates caches, superscalar execution, and pipelining; correctly decodes all FPU instructions Needs 68080.library or equivalent; behavior matches but is not cycle-accurate to any historical 68K CPU
CacheClearE/CacheClearU Needed on real 68030+ Must NOP on TG68K (no cache) but MUST work on Apollo; test both paths

Historical Context & Modern Analogies

CPU Evolution 1985-1994

The Amiga's CPU story mirrors the broader 68K family evolution:

Year CPU Amiga Significance
1979 68000 A1000/A500/A2000 16/32-bit hybrid; 68,000 transistors
1984 68020 A1200 (1992) Full 32-bit; 256-byte I-cache; 190,000 transistors
1987 68030 A3000, accelerators Integrated MMU; 256-byte I+D caches; 273,000 transistors
1990 68040 A4000, accelerators Integrated FPU (partial); 4 KB+4 KB caches; 1.2M transistors
1994 68060 Accelerators only Superscalar; 8 KB+8 KB caches; 2.5M transistors

Commodore's collapse in 1994 meant the 68060 never shipped in a stock Amiga. Every 68060 Amiga is an aftermarket upgrade.

Modern Analogies

Amiga Concept Modern Equivalent Connection
68030 data cache + DMA coherency problem Non-cache-coherent DMA in embedded ARM SoCs Same pattern: software must explicitly flush before DMA
68040/68060 missing FPU instructions x86 microcode for complex instructions (FSIN etc.) Same approach: hardware traps to software emulation for rare operations
CacheControl(CACRF_CopyBack) x86 MTRR / PAT (Memory Type Range Registers) Same goal: mark memory regions with different caching policies
68060 superscalar dual-pipeline In-order superscalar ARM Cortex-A8 Same architecture: two issue slots, in-order dispatch, no OOO
Trapdoor accelerator slot PCIe add-in card (modern GPU) Same pattern: aftermarket CPU upgrade via a bus connector

FAQ

Q: When do I need 68040.library?

If SysBase->AttnFlags & AFF_68040 is set, the system has a 68040 CPU. 68040.library MUST be installed at LIBS:68040.library or the system crashes on first missing FPU instruction. The library is not optional. Similarly for 68060.library when AFF_68060 is set.

Q: Can I turn off caches for compatibility?

Yes — CacheControl(0, CACRF_EnableI | CACRF_EnableD) disables both caches. Some legacy OCS software written with 68000 timing assumptions behaves correctly only with caches off. Disabling caches reduces 68040 performance to roughly 68000 levels.

Q: Is a 68040/25 faster than a 68030/50?

In raw integer math, the 68040/25 is roughly 50% faster (15 MIPS vs 9.5 MIPS). But the 68030/50 has a socketed 68882 running at 50 MHz, giving it faster FPU performance for math-heavy workloads. For most Amiga software (integer-bound), the 68040/25 wins. For FPU-bound tasks (raytracing, 3D), the 68030/50+68882 combo is faster.

Q: Why does my 68060 system run some demos than a 68030?

The 68060 achieves its speed via caches, pipelining, and branch prediction. Demos that use self-modifying code, precise cycle-counted Copper/CPU sync, or tight write-read sequences to custom registers break these optimizations. The 68060 must frequently flush pipelines and invalidate caches, reducing effective throughput below 68030 levels for certain pathological code patterns.

Q: Can I upgrade an A3640 to 68060?

Not directly. The A3640 is a 68040 card with a soldered CPU. Third-party adapters (CyberStorm, WarpEngine) replace the entire CPU card. Some prototypes (A3660) exist but are extremely rare. Most 68060 upgrades use Phase5 CyberStorm boards.


References

  • NDK39: exec/execbase.h, proto/exec.h — CacheControl, CacheClearE, AttnFlags
  • Motorola 68030 User's Manual (M68030UM/AD)
  • Motorola 68040 User's Manual (M68040UM/AD)
  • Motorola 68060 User's Manual (M68060UM/AD)
  • ADCD 2.1: Libraries_Manual_guide/ — exec CacheControl autodoc
  • Commodore A3000/A4000 Technical Reference Manuals
  • See also: 68040/68060 Libraries — detailed FPU trap handler operation
  • See also: Cache Management — cache control deep-dive
  • See also: MMU Management — MMU setup on 68030/040/060
  • See also: FPU Architecture — 68881/68882 vs 68040/060 FPU comparison
  • See also: AGA Chipset — Alice/Lisa co-processors, FMODE