20 KiB
68030/040 on the Amiga (A3000/A4000)
Overview
The A3000 ships with a Motorola 68030 at 16 or 25 MHz. The A4000 ships with either a 68030 or 68040 at 25 MHz. Later accelerator cards bring the 68060. This document covers CPU-specific concerns for AmigaOS 3.1/3.2 on these platforms.
68030 (A3000)
On-Chip Caches
- Instruction cache: 256 bytes, direct-mapped
- Data cache: 256 bytes, direct-mapped
- Both enabled by default on AmigaOS 3.1 (
CacheControl()call)
On-Chip PMMU
- Full 68030 PMMU: ATC (Address Translation Cache), TT registers
- AmigaOS does not use the MMU by default on A3000
- Third-party tools (e.g., VMM, mmu.library) use it for virtual memory
Cache Control (exec)
#include <proto/exec.h>
/* Enable instruction and data caches */
CacheControl(CACRF_EnableI | CACRF_EnableD, CACRF_EnableI | CACRF_EnableD);
/* Flush caches (required before self-modifying code) */
CacheClearU(); /* clear all caches */
CacheClearE(addr, len, CACRF_ClearI | CACRF_ClearD); /* targeted flush */
Cache Coherency with DMA
The 68030 data cache is not snooped by the Amiga chip bus. If the CPU writes to a buffer that the DMA engine (Blitter, audio, custom chips) will read, the cache must be flushed first:
/* Before handing a buffer to DMA: */
CacheClearE(buf, size, CACRF_ClearD);
Similarly, after DMA writes to a buffer that the CPU will read:
CacheClearE(buf, size, CACRF_ClearD);
This is a common source of bugs in audio/video programming on 68030+ Amigas.
68040 (A4000 / Accelerators)
On-Chip FPU — Partial Implementation
The 68040 has an on-chip FPU but omits many instructions present in the 68881/68882:
Missing from 68040 FPU:
FSIN,FCOS,FTAN,FASIN,FACOS,FATANFETOX,FETOXM1,FLOGN,FLOG10FSINH,FCOSH,FTANHFATANH,FASINH,FACOSHFSCALE,FREMX,FREMX,FINTRZ
AmigaOS provides 68040.library which installs Line-F exception handlers to emulate the missing instructions in software.
/* 68040.library is opened automatically by exec at boot */
/* Software should check that it is open before using FP */
struct Library *MathLib = OpenLibrary("68040.library", 0);
Warning
If
68040.libraryis not installed and software uses a missing 68040 FPU instruction, the system will crash with a Line-F exception. Always ensure68040.libraryis present on A4000.
4 KB Instruction + Data Caches
- 68040 has 4 KB instruction cache and 4 KB data cache, both 4-way set-associative
- Cache coherency is more complex: DMA writes may not be visible to the CPU without invalidation
CacheClearE()/CacheClearU()remain the correct API
68040 Memory Model
CACR (Cache Control Register) accessed via MOVEC:
bit 15: EDC — enable data cache
bit 14: NAD — no allocate data (streaming mode)
bit 13: ESB — enable store buffer
bit 10: DPI — disable push-inhibit
bit 7: EIC — enable instruction cache
bit 3: CINV — cache invalidate
Bus Error Stack Frame (68040)
The 68040 generates a different bus error stack frame from the 68000:
- 68000: 14-byte frame
- 68040: 104-byte frame with pipeline state
This matters for exception handlers and debuggers targeting both platforms.
68060 (Accelerator Cards)
Available via Blizzard 060, CyberStorm 060, etc. Key differences:
- Superscalar: two integer pipelines (in-order, no OOO)
- Branch prediction: static and dynamic
- No MOVE16 snooping on some Amiga implementations
- On-chip FPU: missing same transcendentals as 68040 → needs
68060.library - Separate
68060.libraryfor the additional missing instructions vs 68040
struct Library *Lib060 = OpenLibrary("68060.library", 0);
AmigaOS exec CacheControl() API
#include <exec/execbase.h>
#include <proto/exec.h>
/* SysBase->CacheFlags reflect current cache state */
ULONG flags = CacheControl(0, 0); /* query without changing */
/* Flag bits: */
#define CACRF_EnableD (1L<<3) /* data cache enable */
#define CACRF_FreezeD (1L<<4) /* data cache freeze */
#define CACRF_ClearD (1L<<5) /* clear data cache */
#define CACRF_EnableI (1L<<8) /* instruction cache enable */
#define CACRF_FreezeI (1L<<9) /* instruction cache freeze */
#define CACRF_ClearI (1L<<10) /* clear instruction cache */
#define CACRF_CopyBack (1L<<31) /* data cache write-back mode */
Real Accelerator Card Context
The A3000 and A4000 launched with Commodore's own CPU cards, but the accelerator ecosystem exploded with third-party offerings that pushed 68K performance well beyond what Commodore shipped.
Commodore Cards
| Card | Machine | CPU | Clock | FPU | Notes |
|---|---|---|---|---|---|
| A3630 | A3000 | 68030 | 25 MHz | 68882 socket | Standard A3000-25 CPU card, 2 SIMM slots for Fast RAM |
| A3640 | A4000 | 68040 | 25 MHz | Integrated (partial) | Standard A4000-040 card; soldered 68040, no socket. Fast RAM on card via 4 SIMM slots |
| A3660 | A4000 | 68060 | 50 MHz | Integrated (partial) | Rare — Commodore prototype; production by third parties post-Commodore |
Third-Party Accelerators
| Card | Vendor | CPU | Clock | FPU | Memory | Key Feature |
|---|---|---|---|---|---|---|
| Blizzard 1230-IV | Phase5 | 68030 | 50 MHz | 68882 @ 50 MHz | Up to 256 MB Fast RAM (1 x 72-pin SIMM) | Defacto standard A1200 accelerator; MMU for virtual memory, 50 MHz gives ~3.5x A1200 performance |
| Blizzard 1260 | Phase5 | 68060 | 50-66 MHz | Integrated (partial) | Up to 256 MB | End-game A1200 accelerator; the engine for high-end demos and Doom/Quake ports |
| Blizzard 2060 | Phase5 | 68060 | 50 MHz | Integrated | Up to 128 MB | A2000 version; SCSI-II controller on-card |
| CyberStorm MK-III | Phase5 | 68060 | 50-66 MHz | Integrated | Up to 128 MB | A3000/A4000 flagship; Ultra-wide SCSI on-card |
| CyberStorm PPC | Phase5 | 68060 + PowerPC 604e | 50/60 MHz + 200-233 MHz | Integrated | Up to 128 MB | Hybrid 68K+PowerPC — AmigaOS 4.0 target; 68K handles legacy, PPC for new code |
| WarpEngine 4040 | MacroSystem | 68040 | 40 MHz | Integrated | Up to 64 MB | A3000/A4000; 40 MHz 040 comfortably outperforms stock A3640-25 |
| GVP A530 | GVP | 68030 + 68882 | 40 MHz | Socketed 68882 | Up to 8 MB | A500 sidecar accelerator — brings 32-bit CPU to the A500 trapdoor slot |
| Apollo 1240 | ACT | 68040 | 25-40 MHz | Integrated | Up to 64 MB | Budget A1200 accelerator; needs 68040.library |
Rule for all 040/060 cards:
68040.libraryor68060.librarymust be present inLIBS:or the system will crash on the first missing FPU instruction. Install 68040.library/68060.library atLIBS:root, not in a subdirectory.
Performance Benchmarks
Real-world performance from SysInfo, AIBB, and actual workloads rather than misleading MIPS numbers:
| CPU | Clock | SysSpeed MIPS | Dhrystones/sec | Memory BW (MB/s) | Real-World Factor vs A500 |
|---|---|---|---|---|---|
| 68000 (A500) | 7.14 MHz | ~1.0 | ~1,100 | ~1.5 (Chip only) | 1.0x (baseline) |
| 68020 (A1200 stock) | 14.3 MHz | ~3.0 | ~4,500 | ~3.5 | ~2.5x |
| 68030/50 (Blizzard 1230) | 50 MHz | ~9.5 | ~17,000 | ~12.0 | ~8x |
| 68040/25 (A3640) | 25 MHz | ~15.0 | ~28,000 | ~18.0 | ~12x |
| 68040/40 (WarpEngine) | 40 MHz | ~24.0 | ~45,000 | ~25.0 | ~19x |
| 68060/50 (Blizzard 1260) | 50 MHz | ~53.0 | ~95,000 | ~38.0 | ~38x |
| 68060/66 | 66 MHz | ~70.0 | ~125,000 | ~40.0 | ~50x |
| 68080 (Vampire V4) | ~90 MHz equiv | ~150+ | ~300,000+ | ~200+ (Fast + RTG) | ~130x |
Important Caveats
- Chip RAM bandwidth is fixed at ~3.5 MB/s regardless of CPU speed. The CPU accelerator only speeds up Fast RAM access.
- AGA bandwidth (4x fetch) helps but only for Chip RAM reads by the display DMA, not CPU-to-Chip-memory traffic.
- Bus contention: on A1200 accelerators, the trapdoor connector limits bandwidth between CPU and Amiga chip bus to ~4-5 MB/s maximum.
Named Antipatterns
1. "The Ghost Cache"
What fails — writing to a buffer for DMA but not flushing the data cache:
/* BROKEN — data cache holds dirty data */
PrepareCopperList(copperBuffer); /* CPU writes to buffer */
/* copperBuffer is in CPU data cache, not yet in chip RAM */
WaitTOF(); /* switch copper lists */
/* Copper reads stale MAIN MEMORY, not the CPU's dirty cache lines */
Why it fails: On 68030+ CPUs, data writes stay in the data cache. The cache writes back to memory lazily — when the cache line is evicted, not when the write happens. DMA engines (Copper, Blitter, audio, Paula) read from main memory via the chip bus and see stale data because they bypass the CPU cache entirely.
Correct:
PrepareCopperList(copperBuffer);
CacheClearE(copperBuffer, sizeof(copperBuffer), CACRF_ClearD);
WaitTOF();
/* Now Copper reads correct data */
2. "The Missing Library"
What fails — running 68040/68060 software without 68040.library / 68060.library:
/* BROKEN — assumes full FPU */
double x = sin(angle); /* FSIN is not in 68040/68060 silicon */
/* Line-F exception -> system crash if 68040.library not installed */
Why it fails: The 68040 and 68060 omit transcendental FPU instructions (FSIN, FCOS, FTAN, FLOG10, etc.) that were present in the 68881/68882. When the CPU encounters these opcodes, it raises a Line-F exception. 68040.library and 68060.library install exception handlers that intercept Line-F exceptions and emulate the missing instructions in software — orders of magnitude slower than hardware but at least they complete. Without these libraries loaded, the exception has no handler and the system crashes immediately.
Correct:
/* At startup, verify the appropriate library is open: */
if (SysBase->AttnFlags & AFF_68040)
{
struct Library *Math040 = OpenLibrary("68040.library", 37);
if (!Math040) { /* FATAL: install the library */ }
}
else if (SysBase->AttnFlags & AFF_68060)
{
struct Library *Math060 = OpenLibrary("68060.library", 37);
if (!Math060) { /* FATAL: install the library */ }
}
/* Then using FPU is safe: */
double x = sin(angle);
3. "The Calibrated Loop"
What fails — delay loops calibrated for 7 MHz 68000:
/* BROKEN — spin-loop delay assumes 68000 timing */
void DelayCycles(ULONG n)
{
while (n--) /* assumes ~4 cycles per iteration at 7 MHz */
{
asm("nop"); /* pretend delay */
}
}
/* On 68060 at 50 MHz with superscalar, this is 30x faster and produces no delay at all */
Why it fails: 68060 has dual pipelining, branch prediction, and static + dynamic speculation. A "calibrated delay loop" that produces a 1-second delay on a 68000 may produce a 30 millisecond delay on a 68060. The function derives timing from loop geometry that changes with every CPU generation and clock speed.
Correct:
/* Use ReadEClock() for CPU-independent timing: */
ULONG ReadEClockFreq(struct EClockVal *dest)
{
/* Stores 64-bit E-clock tick count and returns frequency */
/* Must open timer.device first */
}
struct EClockVal start, end;
ULONG efreq = ReadEClock(&start);
/* ... do work ... */
ReadEClock(&end);
ULONG elapsed_us = (ULONG)(((UQUAD)(end.ev_lo - start.ev_lo) * 1000000ULL) / efreq);
4. "The Speeding on the Chip Bus"
What fails — 68040/68060 code writing to custom chip registers at full CPU speed:
/* BROKEN — 68060 writes to custom registers at 50 MHz */
custom->bltcon0 = 0x09F0; /* set up blitter parameters */
custom->bltsize = 0x0040; /* start the blit — too fast! */
/* The chip bus runs at 3.58 MHz and may miss the second write */
Why it fails: The Amiga custom chip bus runs at 3.58 MHz (color clock speed). The 68060 at 50 MHz can issue writes at a rate that overwhelms the chip bus's buffering. The writes go directly to custom registers whose cycle timing is far slower than the CPU's. Writing without sufficient delay between register accesses causes the second write to arrive before the first has settled, producing random chip states that are not replicable even with identical timing.
Correct:
/* Ensure sufficient delay between custom register writes */
custom->bltcon0 = 0x09F0;
/* Wait for at least one color clock (~280 ns) */
asm volatile ("nop; nop; nop"); /* ~180 ns at 50 MHz */
custom->bltsize = 0x0040;
/* Better approach: use Blitter-done interrupt for synchronization */
Pitfalls
1. Cache Coherency with CIA Chips
The 8520 CIA chips are on the custom chip bus. Reading CIA registers from cached memory returns stale values. Always access CIA registers through non-cached memory regions or use CacheClearU() before reading. This affects keyboard handshaking, serial port timing, and disk change detection.
2. Address Alignment with Hardware Registers
68020+ CPUs support unaligned (odd address) accesses, but Amiga custom chips require WORD-even and LONG-even aligned addresses. An unaligned write to custom chip registers causes a bus error on 68020+. Always align pointers to 16-bit or 32-bit boundaries when pointing to hardware registers.
3. Chip RAM Speed Never Changes
Regardless of CPU speed, Chip RAM always accesses at 7.09 MHz (PAL) / 7.16 MHz (NTSC) color clock. A 68060 at 50 MHz reads Chip RAM at the same bandwidth as the 68000 at 7 MHz. The only way to increase Chip RAM bandwidth is through wider fetch modes on AGA (FMODE), and even that only helps display DMA, not CPU access to Chip RAM.
Best Practices
- Always call
CacheClearEbefore handing a CPU-written buffer to DMA — this includes Copper lists, blitter sources, audio samples, disk buffers, and sprite data. - Check
SysBase->AttnFlagsfor AFF_68040 / AFF_68060 — load the corresponding math libraries at startup and abort if missing. - Use
ReadEClock()for timing, not instruction-counting loops — this works across all CPU generations. - Do NOT write to custom chip registers in tight loops without NOP delays — give the chip bus hundreds of nanoseconds between successive writes.
- Align memory pointers to hardware registers — unaligned accesses cause bus errors on 68020+.
- Only touch CacheControl bits you understand — CACRF_CopyBack (bit 31) writes dirty data back, affecting performance dramatically. Do not experiment in production.
- Cache-pure code in Fast RAM — 68020+ CPUs fetch instructions from Fast RAM at full CPU speed, but data that lives in Chip RAM is access-penalized at ~3.5 MB/s regardless of CPU speed.
When to Use / When NOT to Use
| Scenario | Recommended CPU | Why |
|---|---|---|
| Safe baseline for any Amiga | 68000 | Every Amiga has at least this |
| lection A500/A600/A1200 original | Their stock CPU | 68000/68020 respectively |
| Compatibility testing | 68020+ with cache off | Test both cached and uncached |
| Daily Amiga workstation | 68030/50 or 68040/40 | 8-19x performance, smooth multitasking |
| Doom / Quake / 3D gaming | 68060/50 minimum | Triple-digit MIPS needed; FPU for rendering |
| Vampire / FPGA | Apollo 68080 | Maximum performance on any Amiga; 130x real-world factor |
FPGA & MiSTer Impact
68020+ and 68040+ live differently on FPGA:
| Topic | Real Hardware | FPGA Concern |
|---|---|---|
| Data cache on TG68K | TG68K is a cycle-accurate 68000 behavior target; no data cache, no MMU | Cache advice in this article does not apply to TG68K-based cores |
| Minimig 68020 | Minimig supports 68020 with instruction cache only | alan data cache — needed for 68020 timing benchmarks |
| Apollo 68080 | Integrates caches, superscalar execution, and pipelining; correctly decodes all FPU instructions | Needs 68080.library or equivalent; behavior matches but is not cycle-accurate to any historical 68K CPU |
| CacheClearE/CacheClearU | Needed on real 68030+ | Must NOP on TG68K (no cache) but MUST work on Apollo; test both paths |
Historical Context & Modern Analogies
CPU Evolution 1985-1994
The Amiga's CPU story mirrors the broader 68K family evolution:
| Year | CPU | Amiga | Significance |
|---|---|---|---|
| 1979 | 68000 | A1000/A500/A2000 | 16/32-bit hybrid; 68,000 transistors |
| 1984 | 68020 | A1200 (1992) | Full 32-bit; 256-byte I-cache; 190,000 transistors |
| 1987 | 68030 | A3000, accelerators | Integrated MMU; 256-byte I+D caches; 273,000 transistors |
| 1990 | 68040 | A4000, accelerators | Integrated FPU (partial); 4 KB+4 KB caches; 1.2M transistors |
| 1994 | 68060 | Accelerators only | Superscalar; 8 KB+8 KB caches; 2.5M transistors |
Commodore's collapse in 1994 meant the 68060 never shipped in a stock Amiga. Every 68060 Amiga is an aftermarket upgrade.
Modern Analogies
| Amiga Concept | Modern Equivalent | Connection |
|---|---|---|
| 68030 data cache + DMA coherency problem | Non-cache-coherent DMA in embedded ARM SoCs | Same pattern: software must explicitly flush before DMA |
| 68040/68060 missing FPU instructions | x86 microcode for complex instructions (FSIN etc.) | Same approach: hardware traps to software emulation for rare operations |
| CacheControl(CACRF_CopyBack) | x86 MTRR / PAT (Memory Type Range Registers) | Same goal: mark memory regions with different caching policies |
| 68060 superscalar dual-pipeline | In-order superscalar ARM Cortex-A8 | Same architecture: two issue slots, in-order dispatch, no OOO |
| Trapdoor accelerator slot | PCIe add-in card (modern GPU) | Same pattern: aftermarket CPU upgrade via a bus connector |
FAQ
Q: When do I need 68040.library?
If SysBase->AttnFlags & AFF_68040 is set, the system has a 68040 CPU. 68040.library MUST be installed at LIBS:68040.library or the system crashes on first missing FPU instruction. The library is not optional. Similarly for 68060.library when AFF_68060 is set.
Q: Can I turn off caches for compatibility?
Yes — CacheControl(0, CACRF_EnableI | CACRF_EnableD) disables both caches. Some legacy OCS software written with 68000 timing assumptions behaves correctly only with caches off. Disabling caches reduces 68040 performance to roughly 68000 levels.
Q: Is a 68040/25 faster than a 68030/50?
In raw integer math, the 68040/25 is roughly 50% faster (15 MIPS vs 9.5 MIPS). But the 68030/50 has a socketed 68882 running at 50 MHz, giving it faster FPU performance for math-heavy workloads. For most Amiga software (integer-bound), the 68040/25 wins. For FPU-bound tasks (raytracing, 3D), the 68030/50+68882 combo is faster.
Q: Why does my 68060 system run some demos than a 68030?
The 68060 achieves its speed via caches, pipelining, and branch prediction. Demos that use self-modifying code, precise cycle-counted Copper/CPU sync, or tight write-read sequences to custom registers break these optimizations. The 68060 must frequently flush pipelines and invalidate caches, reducing effective throughput below 68030 levels for certain pathological code patterns.
Q: Can I upgrade an A3640 to 68060?
Not directly. The A3640 is a 68040 card with a soldered CPU. Third-party adapters (CyberStorm, WarpEngine) replace the entire CPU card. Some prototypes (A3660) exist but are extremely rare. Most 68060 upgrades use Phase5 CyberStorm boards.
References
- NDK39:
exec/execbase.h,proto/exec.h— CacheControl, CacheClearE, AttnFlags - Motorola 68030 User's Manual (M68030UM/AD)
- Motorola 68040 User's Manual (M68040UM/AD)
- Motorola 68060 User's Manual (M68060UM/AD)
- ADCD 2.1:
Libraries_Manual_guide/— exec CacheControl autodoc - Commodore A3000/A4000 Technical Reference Manuals
- See also: 68040/68060 Libraries — detailed FPU trap handler operation
- See also: Cache Management — cache control deep-dive
- See also: MMU Management — MMU setup on 68030/040/060
- See also: FPU Architecture — 68881/68882 vs 68040/060 FPU comparison
- See also: AGA Chipset — Alice/Lisa co-processors, FMODE