The A3000 ships with a **Motorola 68030** at 16 or 25 MHz. The A4000 ships with either a 68030 or **68040** at 25 MHz. Later accelerator cards bring the 68060. This document covers CPU-specific concerns for AmigaOS 3.1/3.2 on these platforms.
## 68030 (A3000)
### On-Chip Caches
- **Instruction cache**: 256 bytes, direct-mapped
- **Data cache**: 256 bytes, direct-mapped
- Both enabled by default on AmigaOS 3.1 (`CacheControl()` call)
### On-Chip PMMU
- Full 68030 PMMU: ATC (Address Translation Cache), TT registers
- AmigaOS does **not** use the MMU by default on A3000
- Third-party tools (e.g., VMM, mmu.library) use it for virtual memory
The 68030 data cache is **not snooped** by the Amiga chip bus. If the CPU writes to a buffer that the DMA engine (Blitter, audio, custom chips) will read, the cache must be flushed first:
```c
/* Before handing a buffer to DMA: */
CacheClearE(buf, size, CACRF_ClearD);
```
Similarly, after DMA writes to a buffer that the CPU will read:
```c
CacheClearE(buf, size, CACRF_ClearD);
```
This is a common source of bugs in audio/video programming on 68030+ Amigas.
---
## 68040 (A4000 / Accelerators)
### On-Chip FPU — Partial Implementation
The 68040 has an on-chip FPU but omits many instructions present in the 68881/68882:
> If `68040.library` is not installed and software uses a missing 68040 FPU instruction, the system will crash with a Line-F exception. Always ensure `68040.library` is present on A4000.
### 4 KB Instruction + Data Caches
- 68040 has **4 KB instruction cache** and **4 KB data cache**, both 4-way set-associative
- Cache coherency is more complex: DMA writes may not be visible to the CPU without invalidation
-`CacheClearE()` / `CacheClearU()` remain the correct API
### 68040 Memory Model
```
CACR (Cache Control Register) accessed via MOVEC:
bit 15: EDC — enable data cache
bit 14: NAD — no allocate data (streaming mode)
bit 13: ESB — enable store buffer
bit 10: DPI — disable push-inhibit
bit 7: EIC — enable instruction cache
bit 3: CINV — cache invalidate
```
### Bus Error Stack Frame (68040)
The 68040 generates a **different bus error stack frame** from the 68000:
- 68000: 14-byte frame
- 68040: 104-byte frame with pipeline state
This matters for exception handlers and debuggers targeting both platforms.
---
## 68060 (Accelerator Cards)
Available via Blizzard 060, CyberStorm 060, etc. Key differences:
- **Superscalar**: two integer pipelines (in-order, no OOO)
- **Branch prediction**: static and dynamic
- **No MOVE16 snooping** on some Amiga implementations
- On-chip FPU: missing same transcendentals as 68040 → needs `68060.library`
- Separate `68060.library` for the additional missing instructions vs 68040
The A3000 and A4000 launched with Commodore's own CPU cards, but the accelerator ecosystem exploded with third-party offerings that pushed 68K performance well beyond what Commodore shipped.
### Commodore Cards
| Card | Machine | CPU | Clock | FPU | Notes |
|---|---|---|---|---|---|
| **A3630** | A3000 | 68030 | 25 MHz | 68882 socket | Standard A3000-25 CPU card, 2 SIMM slots for Fast RAM |
| **A3640** | A4000 | 68040 | 25 MHz | Integrated (partial) | Standard A4000-040 card; soldered 68040, no socket. Fast RAM on card via 4 SIMM slots |
| **A3660** | A4000 | 68060 | 50 MHz | Integrated (partial) | Rare — Commodore prototype; production by third parties post-Commodore |
> **Rule for all 040/060 cards**: `68040.library` or `68060.library` must be present in `LIBS:` or the system will crash on the first missing FPU instruction. Install 68040.library/68060.library at `LIBS:` root, not in a subdirectory.
---
## Performance Benchmarks
Real-world performance from SysInfo, AIBB, and actual workloads rather than misleading MIPS numbers:
| CPU | Clock | SysSpeed MIPS | Dhrystones/sec | Memory BW (MB/s) | Real-World Factor vs A500 |
- **Chip RAM bandwidth is fixed at ~3.5 MB/s** regardless of CPU speed. The CPU accelerator only speeds up Fast RAM access.
- **AGA bandwidth** (4x fetch) helps but only for Chip RAM reads by the display DMA, not CPU-to-Chip-memory traffic.
- **Bus contention**: on A1200 accelerators, the trapdoor connector limits bandwidth between CPU and Amiga chip bus to ~4-5 MB/s maximum.
---
## Named Antipatterns
### 1. "The Ghost Cache"
**What fails** — writing to a buffer for DMA but not flushing the data cache:
```c
/* BROKEN — data cache holds dirty data */
PrepareCopperList(copperBuffer); /* CPU writes to buffer */
/* copperBuffer is in CPU data cache, not yet in chip RAM */
WaitTOF(); /* switch copper lists */
/* Copper reads stale MAIN MEMORY, not the CPU's dirty cache lines */
```
**Why it fails:** On 68030+ CPUs, data writes stay in the data cache. The cache writes back to memory lazily — when the cache line is evicted, not when the write happens. DMA engines (Copper, Blitter, audio, Paula) read from main memory via the chip bus and see stale data because they bypass the CPU cache entirely.
**What fails** — running 68040/68060 software without `68040.library` / `68060.library`:
```c
/* BROKEN — assumes full FPU */
double x = sin(angle); /* FSIN is not in 68040/68060 silicon */
/* Line-F exception -> system crash if 68040.library not installed */
```
**Why it fails:** The 68040 and 68060 omit transcendental FPU instructions (`FSIN`, `FCOS`, `FTAN`, `FLOG10`, etc.) that were present in the 68881/68882. When the CPU encounters these opcodes, it raises a Line-F exception. 68040.library and 68060.library install exception handlers that intercept Line-F exceptions and emulate the missing instructions in software — orders of magnitude slower than hardware but at least they complete. Without these libraries loaded, the exception has no handler and the system crashes immediately.
**Correct:**
```c
/* At startup, verify the appropriate library is open: */
while (n--) /* assumes ~4 cycles per iteration at 7 MHz */
{
asm("nop"); /* pretend delay */
}
}
/* On 68060 at 50 MHz with superscalar, this is 30x faster and produces no delay at all */
```
**Why it fails:** 68060 has dual pipelining, branch prediction, and static + dynamic speculation. A "calibrated delay loop" that produces a 1-second delay on a 68000 may produce a 30 millisecond delay on a 68060. The function derives timing from loop geometry that changes with every CPU generation and clock speed.
**Correct:**
```c
/* Use ReadEClock() for CPU-independent timing: */
ULONG ReadEClockFreq(struct EClockVal *dest)
{
/* Stores 64-bit E-clock tick count and returns frequency */
**What fails** — 68040/68060 code writing to custom chip registers at full CPU speed:
```c
/* BROKEN — 68060 writes to custom registers at 50 MHz */
custom->bltcon0 = 0x09F0; /* set up blitter parameters */
custom->bltsize = 0x0040; /* start the blit — too fast! */
/* The chip bus runs at 3.58 MHz and may miss the second write */
```
**Why it fails:** The Amiga custom chip bus runs at 3.58 MHz (color clock speed). The 68060 at 50 MHz can issue writes at a rate that overwhelms the chip bus's buffering. The writes go directly to custom registers whose cycle timing is far slower than the CPU's. Writing without sufficient delay between register accesses causes the second write to arrive before the first has settled, producing random chip states that are not replicable even with identical timing.
**Correct:**
```c
/* Ensure sufficient delay between custom register writes */
/* Better approach: use Blitter-done interrupt for synchronization */
```
---
## Pitfalls
### 1. Cache Coherency with CIA Chips
The 8520 CIA chips are on the custom chip bus. Reading CIA registers from cached memory returns stale values. Always access CIA registers through non-cached memory regions or use `CacheClearU()` before reading. This affects keyboard handshaking, serial port timing, and disk change detection.
### 2. Address Alignment with Hardware Registers
68020+ CPUs support unaligned (odd address) accesses, but Amiga custom chips require WORD-even and LONG-even aligned addresses. An unaligned write to custom chip registers causes a bus error on 68020+. Always align pointers to 16-bit or 32-bit boundaries when pointing to hardware registers.
### 3. Chip RAM Speed Never Changes
Regardless of CPU speed, **Chip RAM always accesses at 7.09 MHz (PAL) / 7.16 MHz (NTSC)** color clock. A 68060 at 50 MHz reads Chip RAM at the same bandwidth as the 68000 at 7 MHz. The only way to increase Chip RAM bandwidth is through wider fetch modes on AGA (FMODE), and even that only helps display DMA, not CPU access to Chip RAM.
---
## Best Practices
1.**Always call `CacheClearE` before handing a CPU-written buffer to DMA** — this includes Copper lists, blitter sources, audio samples, disk buffers, and sprite data.
2.**Check `SysBase->AttnFlags` for AFF_68040 / AFF_68060** — load the corresponding math libraries at startup and abort if missing.
3.**Use `ReadEClock()` for timing, not instruction-counting loops** — this works across all CPU generations.
4.**Do NOT write to custom chip registers in tight loops without NOP delays** — give the chip bus hundreds of nanoseconds between successive writes.
5.**Align memory pointers to hardware registers** — unaligned accesses cause bus errors on 68020+.
6.**Only touch CacheControl bits you understand** — CACRF_CopyBack (bit 31) writes dirty data back, affecting performance dramatically. Do not experiment in production.
7.**Cache-pure code in Fast RAM** — 68020+ CPUs fetch instructions from Fast RAM at full CPU speed, but data that lives in Chip RAM is access-penalized at ~3.5 MB/s regardless of CPU speed.
---
## When to Use / When NOT to Use
| Scenario | Recommended CPU | Why |
|---|---|---|
| Safe baseline for any Amiga | 68000 | Every Amiga has at least this |
|lection A500/A600/A1200 original | Their stock CPU | 68000/68020 respectively |
| Compatibility testing | 68020+ with cache off | Test both cached and uncached |
| Doom / Quake / 3D gaming | 68060/50 minimum | Triple-digit MIPS needed; FPU for rendering |
| Vampire / FPGA | Apollo 68080 | Maximum performance on any Amiga; 130x real-world factor |
---
## FPGA & MiSTer Impact
68020+ and 68040+ live differently on FPGA:
| Topic | Real Hardware | FPGA Concern |
|---|---|---|
| **Data cache on TG68K** | TG68K is a cycle-accurate 68000 behavior target; no data cache, no MMU | Cache advice in this article does not apply to TG68K-based cores |
| **Minimig 68020** | Minimig supports 68020 with instruction cache only |alan data cache — needed for 68020 timing benchmarks |
| **Apollo 68080** | Integrates caches, superscalar execution, and pipelining; correctly decodes all FPU instructions | Needs 68080.library or equivalent; behavior matches but is not cycle-accurate to any historical 68K CPU |
| **CacheClearE/CacheClearU** | Needed on real 68030+ | Must NOP on TG68K (no cache) but MUST work on Apollo; test both paths |
---
## Historical Context & Modern Analogies
### CPU Evolution 1985-1994
The Amiga's CPU story mirrors the broader 68K family evolution:
Commodore's collapse in 1994 meant the 68060 never shipped in a stock Amiga. Every 68060 Amiga is an aftermarket upgrade.
### Modern Analogies
| Amiga Concept | Modern Equivalent | Connection |
|---|---|---|
| 68030 data cache + DMA coherency problem | Non-cache-coherent DMA in embedded ARM SoCs | Same pattern: software must explicitly flush before DMA |
| 68040/68060 missing FPU instructions | x86 microcode for complex instructions (FSIN etc.) | Same approach: hardware traps to software emulation for rare operations |
| CacheControl(CACRF_CopyBack) | x86 MTRR / PAT (Memory Type Range Registers) | Same goal: mark memory regions with different caching policies |
| 68060 superscalar dual-pipeline | In-order superscalar ARM Cortex-A8 | Same architecture: two issue slots, in-order dispatch, no OOO |
| Trapdoor accelerator slot | PCIe add-in card (modern GPU) | Same pattern: aftermarket CPU upgrade via a bus connector |
---
## FAQ
### Q: When do I need 68040.library?
If `SysBase->AttnFlags & AFF_68040` is set, the system has a 68040 CPU. 68040.library MUST be installed at `LIBS:68040.library` or the system crashes on first missing FPU instruction. The library is not optional. Similarly for 68060.library when AFF_68060 is set.
### Q: Can I turn off caches for compatibility?
Yes — `CacheControl(0, CACRF_EnableI | CACRF_EnableD)` disables both caches. Some legacy OCS software written with 68000 timing assumptions behaves correctly only with caches off. Disabling caches reduces 68040 performance to roughly 68000 levels.
### Q: Is a 68040/25 faster than a 68030/50?
In raw integer math, the 68040/25 is roughly 50% faster (15 MIPS vs 9.5 MIPS). But the 68030/50 has a socketed 68882 running at 50 MHz, giving it faster FPU performance for math-heavy workloads. For most Amiga software (integer-bound), the 68040/25 wins. For FPU-bound tasks (raytracing, 3D), the 68030/50+68882 combo is faster.
### Q: Why does my 68060 system run some demos than a 68030?
The 68060 achieves its speed via caches, pipelining, and branch prediction. Demos that use self-modifying code, precise cycle-counted Copper/CPU sync, or tight write-read sequences to custom registers break these optimizations. The 68060 must frequently flush pipelines and invalidate caches, reducing effective throughput below 68030 levels for certain pathological code patterns.
### Q: Can I upgrade an A3640 to 68060?
Not directly. The A3640 is a 68040 card with a soldered CPU. Third-party adapters (CyberStorm, WarpEngine) replace the entire CPU card. Some prototypes (A3660) exist but are extremely rare. Most 68060 upgrades use Phase5 CyberStorm boards.