03: new article — exe_crunchers.md (executable packers deep dive)

New comprehensive article on Amiga executable crunchers covering:
- Architecture: how crunched files remain valid HUNK executables
- Major crunchers: PowerPacker, Imploder, Shrinkler, ByteKiller,
  Titanics, CrunchMania, PackFire, XPK framework
- PP20 format: efficiency table, decrunch info, decrunch colours
- Shrinkler internals: 1536-context adaptive probability model,
  range coder, parity context flag, stack-based context table,
  actual 68000 decompressor source from GitHub
- LZ77/LZSS vs context-modelling+range-coding algorithms
- Relocation handling: 3 strategies (compressed relocs, delta
  table, merged single-hunk)
- Memory layout diagrams: before/during/after decompression
- Detection: magic signatures table, fake header warning,
  Python scanner script
- Tools: xfdmaster modular architecture, Ancient C++ library,
  debugger-based extraction (last resort)
- FPGA/emulation impact: timing, self-modifying code, cache

Based on web research: verified PP20 format spec, Shrinkler
source (askeksa/Shrinkler), Ancient library (temisu/ancient),
xfdmaster slave module architecture.

Updated indexes: 03/README.md, root README.md
This commit is contained in:
Ilia Sharin 2026-04-23 18:41:07 -04:00
parent 7df1f11f15
commit 4593ff135a
3 changed files with 491 additions and 0 deletions

View file

@ -22,6 +22,7 @@ This section covers the complete lifecycle of an AmigaOS executable:
| [exe_load_pipeline.md](exe_load_pipeline.md) | LoadSeg → AllocMem → relocation → segment chain → CreateProc → entry point |
| [object_file_format.md](object_file_format.md) | Compiler object files (HUNK_UNIT), multi-section layout, HUNK_LIB archives, linker operation |
| [overlay_system.md](overlay_system.md) | HUNK_OVERLAY: tree architecture, runtime overlay manager, worked binary example, modern alternatives |
| [**exe_crunchers.md**](exe_crunchers.md) | **Executable packers: PowerPacker/Imploder/Shrinkler, decrunch stubs, compression algorithms, detection** |
## Why HUNK?

View file

@ -0,0 +1,489 @@
[← Home](../README.md) · [Loader & HUNK Format](README.md)
# Executable Crunchers — Compression, Decrunch Stubs, and Internals
## Overview
**Executable crunchers** (packers) compress AmigaOS executables while keeping them directly runnable. The crunched file is a valid HUNK executable — when launched, a tiny **decrunch stub** runs first, decompresses the original program in memory, then jumps to its real entry point. The user sees a brief colour-cycling delay (the "decrunch colours"), then the program runs normally.
This was essential in the floppy era: a 200 KB program crunched to 120 KB loads significantly faster from a slow 880 KB floppy *and* frees disk space on capacity-constrained media.
---
## Architecture
```mermaid
graph LR
subgraph "Original Executable"
OH["HUNK_HEADER"] --> OC["HUNK_CODE<br/>Original code"]
OC --> OD["HUNK_DATA<br/>Original data"]
OD --> OB["HUNK_BSS"]
end
subgraph "Crunched Executable"
CH["HUNK_HEADER"] --> CS["HUNK_CODE<br/>Decrunch Stub<br/>(~200800 bytes)"]
CS --> CD["HUNK_DATA<br/>Compressed payload<br/>(original hunks)"]
CD --> CB["HUNK_BSS<br/>Decompression workspace"]
end
OH -.->|"Cruncher tool"| CH
style CS fill:#fff9c4,stroke:#f9a825,color:#333
style CD fill:#e8f4fd,stroke:#2196f3,color:#333
```
### Key Insight
A crunched executable is **itself a valid HUNK file**. The OS loader handles it normally — `LoadSeg()` allocates memory, loads hunks, applies relocations. The "magic" is that hunk 0 contains a decrunch stub instead of the original code, and the data hunk contains the compressed original program.
---
## Major Amiga Crunchers
| Cruncher | Era | Algorithm | Stub Size | Typical Ratio | Notes |
|---|---|---|---|---|---|
| **PowerPacker** (PP20) | 19891994 | LZ77 + configurable efficiency | ~280 bytes | 5060% | Most popular; `powerpacker.library` for data files |
| **Imploder** (IMP!) | 19901993 | LZSS variant | ~400 bytes | 4555% | Multiple modes: Normal, Pure, Library, Overlayed |
| **Turbo Imploder** | 19911993 | Enhanced LZSS | ~420 bytes | 4252% | Faster crunch, same decrunch |
| **ByteKiller** | 19881991 | LZ77 (simple) | ~160 bytes | 5565% | Early; position-independent stub; used for raw data too |
| **Titanics Cruncher** (ATN!) | 19911993 | LZ77 | ~350 bytes | 5565% | Fast decrunch |
| **CrunchMania** (CrM!) | 19921995 | LZ + range coding | ~500 bytes | 4050% | Many registered/customised versions — format variants |
| **Shrinkler** | 2014+ | Context-model + range coder | ~250 bytes | 3040% | Modern; best ratio; used in 4K/64K demo intros |
| **PackFire** | 2016+ | Shrinkler derivative | ~200 bytes | 3040% | Optimised for size-limited compos |
| **XPK** | 1992+ | Framework (multiple sub-packers) | varies | varies | Library-based; supports NUKE, SMPL, SQSH, etc. |
---
## Binary Structure of a Crunched Executable
### What the Cruncher Produces
The cruncher tool reads the original executable, compresses its contents, and wraps them in a new HUNK executable:
```
HUNK_HEADER ($3F3)
num_hunks = 2 or 3
hunk_sizes:
[0] = stub code size + compressed data (or split across hunks)
[1] = workspace BSS (decompression buffer)
HUNK_CODE ($3E9)
<decrunch_stub> ; 200800 bytes of 68000 code
<compressed_payload> ; the original executable, compressed
<metadata> ; original hunk count, sizes, memory types
HUNK_RELOC32 ($3EC) ; relocations for the stub itself (minimal)
HUNK_BSS ($3EB) ; workspace for decompression
<size> ; typically = original uncompressed size
HUNK_END ($3F2)
```
### Alternate Layout (Multi-Hunk)
Some crunchers split the stub and payload into separate hunks:
```
Hunk 0: HUNK_CODE — decrunch stub only (~300 bytes)
Hunk 1: HUNK_DATA — compressed payload + metadata
Hunk 2: HUNK_BSS — decompression workspace
```
---
## PowerPacker PP20 — Format Deep Dive
PowerPacker (by Nico François) is the most widely used Amiga cruncher. It exists in two forms: a **data file format** (for `powerpacker.library`) and an **executable wrapper** (for crunched .exe files).
### PP20 Data Format
```
Offset Size Field
────── ──── ─────────────────────────────
$00 4 Signature: "PP20" ($50503230)
$04 4 Efficiency table: 4 bytes controlling LZ bit-depth
e.g. $09090909 = "Fast", $0A0B0C0D = "Best"
$08 N Compressed bitstream data
$08+N 4 Decrunch info: 24-bit original size (big-endian) + checksum byte
Byte layout: [size_hi] [size_mid] [size_lo] [checksum]
```
### Efficiency Table
The 4-byte efficiency table controls how many bits are used for offset/length encoding in different compression modes:
| Mode | Efficiency Bytes | Description |
|---|---|---|
| Fast | `$09 09 09 09` | Smaller window, faster crunch |
| Mediocre | `$09 0A 0A 0A` | Balance |
| Good | `$09 0A 0B 0B` | Better ratio |
| Very Good | `$09 0A 0B 0C` | Near-best |
| Best | `$09 0A 0C 0D` | Maximum compression, slowest |
The decompressor reads these 4 bytes to initialize its internal offset/length bit-allocation tables before starting the main decompression loop.
### Decrunch Colours
The PowerPacker decrunch stub famously modifies custom chip colour registers during decompression to provide visual feedback — the background colour cycles through shades of grey or colour gradients, signalling that decrunching is in progress. This is the characteristic "decrunch effect" visible on real hardware:
```asm
; Visual feedback during decrunch:
MOVE.W D0, $DFF180 ; COLOR00 — background colour
; D0 increments with each decompressed block
```
---
## Shrinkler — Modern State-of-the-Art
Shrinkler (by Blueberry/Loonies) is the current gold standard for Amiga executable compression, achieving 3040% ratios. It's open-source and widely used in the demo scene.
### Algorithm: Context-Modelling + Range Coder
Unlike older LZ77-based crunchers, Shrinkler uses:
1. **Adaptive context model** — maintains 1536 probability contexts (`NUM_CONTEXTS = 1536`). Each context tracks the probability of the next bit being 0 or 1, updated after every decoded bit
2. **Range coder** — an arithmetic coding variant that encodes bits using interval subdivision based on the context probabilities
3. **LZ matching** — literal bytes and back-references are intermixed, with the context model predicting which type comes next
### Shrinkler Data Header
```
Offset Size Field
────── ──── ─────────────────────────────
$00 4 Signature: "Shri" ($53687269)
$04 1 Major version
$05 1 Minor version
$06 2 Header size (remaining bytes)
$08 4 Compressed data size
$0C 4 Uncompressed data size
$10 4 Safety margin (for in-place decompression)
$14 4 Flags: bit 0 = FLAG_PARITY_CONTEXT
```
The **parity context** flag (`FLAG_PARITY_CONTEXT`) enables a special mode that maintains separate probability models based on the byte position parity, exploiting statistical properties of 68000 machine code (even/odd byte patterns in opcode words).
### 68000 Decompressor Core (from Shrinkler source)
The actual decompressor fits in approximately 100 instructions:
```asm
; Register usage:
; D2 = Range value
; D3 = Interval size
; D4 = Input bit buffer (reads bytes from compressed stream)
; D6 = Context index
; D7 = Parity context flag (0 or 1)
; A4 = Compressed data source pointer
; A5 = Decompressed data destination pointer
INIT_ONE_PROB = $8000 ; Initial probability: 50/50
ADJUST_SHIFT = 4 ; Probability adaptation rate
NUM_CONTEXTS = 1536 ; Context table size
ShrinklerDecompress:
movem.l d2-d7/a4-a6,-(a7)
; Init range decoder state
moveq.l #0,d2 ; Range value = 0
moveq.l #1,d3 ; Interval size = 1
moveq.l #-$80,d4 ; Input buffer (triggers first byte read)
; Init all 1536 probabilities to 50% ($8000)
move.l #NUM_CONTEXTS,d6
.init:
move.w #INIT_ONE_PROB,-(a7) ; Push WORD onto stack
subq.w #1,d6
bne.b .init
; Context table is now on the stack (3072 bytes)
; Main decompression loop
.lit:
; Decode literal byte bit-by-bit using context model
addq.b #1,d6
.getlit:
bsr.b GetBit ; Get one bit from range coder
addx.b d6,d6 ; Shift bit into D6
bcc.b .getlit ; Loop until byte complete
move.b d6,(a5)+ ; Write decompressed byte
.switch:
bsr.b GetKind ; Is next item literal or reference?
bcc.b .lit ; Literal → decode another byte
; Reference: decode offset and length
; ... (LZ match copy loop)
```
### Stack-Based Context Table
A distinctive Shrinkler technique: the 1536-entry probability table (3072 bytes) is allocated **on the stack** — each entry is a WORD pushed during initialization. This avoids needing a separate AllocMem call and keeps the decompressor self-contained.
---
## Compression Algorithms
### LZ77 / LZSS (PowerPacker, Titanics, ByteKiller, Imploder)
The dominant algorithm family. The compressed stream is a sequence of control bits followed by either literal bytes or back-references:
```
[flag bit]
0 → literal byte follows (copy 1 byte verbatim)
1 → match reference: (offset, length)
offset = how far back in already-decompressed data to copy from
length = how many bytes to copy from that position
Decompression pseudo-code:
while (output_pos < original_size):
bit = read_bit()
if bit == 0:
output[output_pos++] = read_byte() # literal
else:
offset = read_bits(offset_bits) # back-reference
length = read_bits(length_bits) + min_len
copy(output, output_pos - offset, length) # copy from history
output_pos += length
```
The **efficiency setting** (PowerPacker) or **mode** (Imploder) controls how many bits are allocated to offset and length fields — more offset bits = larger search window = better compression but slower.
### Context Modelling + Range Coding (Shrinkler, PackFire)
Modern crunchers replace fixed-bit-width encoding with probability-based arithmetic coding:
1. For each bit position, the **context model** estimates: "probability that this bit is 1"
2. The **range coder** encodes the bit using that probability — high-probability bits use fewer output bits
3. After encoding/decoding, the context probability is **updated** based on the actual bit value
This achieves near-optimal compression but decompression is slower (~25 seconds on a 7 MHz 68000 for a typical executable).
---
## Relocation Handling
The original executable had HUNK_RELOC32 entries that patch absolute addresses. After decompression, these must be reapplied. Crunchers use three strategies:
### Method 1: Compress Everything Including Relocs
The entire original file (all hunks + relocation tables) is compressed as a blob. The decrunch stub acts as a mini-`LoadSeg`:
1. Decompress to a temp buffer
2. Parse the HUNK stream
3. Allocate individual hunks with correct memory types
4. Copy data and apply relocations
5. Free the temp buffer
### Method 2: Pre-Relocated + Delta Table
1. Cruncher pre-applies relocations assuming base address 0
2. Stores a compact **delta table** — sorted list of byte-offset deltas between relocation sites
3. After decompression, the stub walks the delta table and adds actual base addresses
```c
/* Delta table: each entry is the offset-delta to the next reloc site */
UWORD reloc_deltas[] = {
0x0006, /* first reloc at offset 6 */
0x0014, /* +0x14 → next at offset 0x1A */
0x0008, /* +0x08 → next at offset 0x22 */
0x0000 /* terminator */
};
/* More compact than storing absolute offsets */
```
### Method 3: Merge and Self-Relocate
All hunks merged into a single code hunk. Inter-hunk references resolved at crunch time. The result needs minimal or no relocation.
**Drawback**: Loses CHIP/FAST memory separation — all data ends up in the same memory type. Problematic for programs that need Chip RAM for bitmaps or audio.
---
## Memory Layout During Decompression
```
BEFORE (crunched exe loaded by OS):
┌──────────────────────┐ Hunk 0 (CODE)
│ Decrunch stub (300B) │
│ Compressed data (80K)│
│ Metadata │
└──────────────────────┘
┌──────────────────────┐ Hunk 1 (BSS)
│ Workspace (200K) │ ← decompression buffer
└──────────────────────┘
DURING (stub is executing):
┌──────────────────────┐ Hunk 0 — still alive
│ Stub + compressed ───│──→ reading from here
└──────────────────────┘
┌──────────────────────┐ AllocMem'd by stub
│ Original Hunk 0 CODE │──→ writing decompressed data
└──────────────────────┘
┌──────────────────────┐ AllocMem'd by stub
│ Original Hunk 1 DATA │
└──────────────────────┘
AFTER (stub jumps to original entry):
┌──────────────────────┐ (freed or abandoned)
│ [freed stub memory] │
└──────────────────────┘
┌──────────────────────┐ Original program running
│ Original Hunk 0 CODE │ ← PC here
└──────────────────────┘
┌──────────────────────┐
│ Original Hunk 1 DATA │
└──────────────────────┘
```
> **In-place decompression**: Some crunchers (including Shrinkler) support decompressing over the compressed data — the `safety_margin` field in the Shrinkler header reserves extra space so the decompressor's write pointer never overtakes the read pointer. Data is decompressed from end to start.
---
## Detection and Identification
### Magic Signatures
| Cruncher | Signature | Hex | Location |
|---|---|---|---|
| PowerPacker | `PP20` | $50503230 | Start of compressed data |
| Imploder | `IMP!` | $494D5021 | Start of compressed data |
| Turbo Imploder | `IMP!` | $494D5021 | Same — version in stub differs |
| Titanics | `ATN!` | $41544E21 | Start of compressed data |
| CrunchMania | `CrM!` / `CrM2` | $43724D21 / $43724D32 | Start of compressed data |
| Shrinkler | `Shri` | $53687269 | Data file header (exe uses stub pattern) |
| ByteKiller | (no magic) | — | Detected by stub pattern only |
| XPK Framework | `XPKF` | $58504B46 | File header |
> [!WARNING]
> **Fake headers** are extremely common in the Amiga cracking scene. A file claiming to be `IMP!` may have a spoofed header to frustrate analysis. If standard tools reject it, the header is likely fake — use a debugger to capture the decrunched memory image instead.
### Detecting Crunched Executables in RE
1. **Tiny code hunk + large data hunk** — unusual ratio signals packing
2. **AllocMem + decompression loop** at entry point — not the normal `c.o` startup pattern
3. **No `MOVE.L 4.W,A6` / `OpenLibrary` sequence** — stub goes straight to decompression
4. **Custom chip register writes** (`$DFF180` colour changes) — decrunch colour feedback
5. **Magic bytes** in the data hunk — scan for known signatures
6. **Self-modifying code** — stub may overwrite its own memory during in-place decompression
```python
# Quick detection script:
import struct
MAGICS = {
b'PP20': 'PowerPacker',
b'IMP!': 'Imploder',
b'ATN!': 'Titanics Cruncher',
b'CrM!': 'CrunchMania',
b'CrM2': 'CrunchMania 2',
b'Shri': 'Shrinkler (data)',
b'XPKF': 'XPK Framework',
}
def detect_cruncher(filename):
with open(filename, 'rb') as f:
data = f.read()
for magic, name in MAGICS.items():
if magic in data:
off = data.index(magic)
print(f" {name} detected at offset ${off:04X}")
return name
# Check for valid HUNK with suspicious layout
if data[:4] == b'\x00\x00\x03\xf3': # HUNK_HEADER
print(" Valid HUNK — check for stub pattern at entry point")
return None
```
---
## Decrunching Tools
### AmigaOS Native
| Tool | Description |
|---|---|
| `xfdmaster.library` | Universal decruncher — modular architecture with "slave" plugins in `LIBS:xfd/` |
| `xfdDecrunch` | CLI front-end: `xfdDecrunch packed.exe unpacked.exe` |
| `xfdScan` / `xfdList` | Identify cruncher type; list installed slave modules |
| `powerpacker.library` | PP20 data file decompression: `ppLoadData()` |
### Cross-Platform
| Tool | Description |
|---|---|
| **Ancient** (C++) | Modern portable library — supports ByteKiller, Imploder, CrunchMania, PP20, and many more. GitHub: `temisu/ancient` |
| `ppunpack` | PP20 only: `ppunpack packed.exe unpacked.exe` |
| Shrinkler `-d` | Shrinkler data files: `shrinkler -d packed unpacked` |
### xfdmaster — Modular Architecture
xfdmaster does not have a hardcoded format list. It loads **slave modules** from `LIBS:xfd/` at runtime, each handling one or more cruncher formats:
```
LIBS:xfd/
PowerPacker ; handles PP20
Imploder ; handles IMP!
CrunchMania ; handles CrM!, CrM2
ByteKiller ; stub-pattern detection
Titanics ; handles ATN!
... ; 100+ supported formats
```
```c
/* Using xfdmaster.library to decrunch any format: */
struct xfdBufferInfo *xbi = xfdAllocObject(XFDOBJ_BUFFERINFO);
xbi->xfdbi_SourceBufLen = filesize;
xbi->xfdbi_SourceBuffer = filebuf;
if (xfdRecogBuffer(xbi))
{
printf("Detected: %s\n", xbi->xfdbi_PackerName);
if (xfdDecrunchBuffer(xbi))
{
/* xbi->xfdbi_TargetBuffer = decrunched data */
/* xbi->xfdbi_TargetBufSaveLen = decrunched size */
}
}
xfdFreeObject(xbi);
```
### Debugger-Based Extraction (Last Resort)
For unknown or custom crunchers, the most reliable method is to load the executable in a hardware-level debugger (HRTMon, ASM-One, or an emulator's monitor), set a breakpoint at the end of the decrunch stub (typically the final `JMP` instruction), and capture the memory image once decompression is complete:
```
; In HRTMon:
> d $entry_point ; disassemble entry
; Find the final JMP at the end of the stub
> bpx $stub_end_jmp ; set breakpoint
> g ; run
; When breakpoint hits, the decrunched program is in memory
> sm $dest $dest+size "decrunched.bin" ; save memory
```
---
## Impact on FPGA / Emulation
| Concern | Detail |
|---|---|
| **Timing-sensitive stubs** | Imploder has tight loops that may fail on accelerated CPUs; some stubs poll `$DFF006` (VHPOSR) for timing |
| **Memory allocation** | Stub requires working `exec.library AllocMem` — must have a functional memory list |
| **Chip RAM specificity** | If original hunks need CHIP RAM, stub must request `MEMF_CHIP` — DMA-accessible memory required for graphics/audio |
| **Self-modifying code** | In-place decompression writes over instruction bytes — 68020+ instruction cache must be invalidated (`CacheClearU`) |
| **Custom chip access** | Decrunch colour writes to `$DFF180` require a working Denise/colour register |
| **Boot-block crunchers** | Trackloaders (game boot blocks) use custom crunchers without HUNK format — completely different mechanism, no OS involvement |
---
## References
- PowerPacker documentation (Nico François, 1989)
- Shrinkler source: https://github.com/askeksa/Shrinkler — `decrunchers/ShrinklerDecompress.S`
- Ancient decompression library: https://github.com/temisu/ancient — portable C++ decompressors
- xfdmaster.library — Aminet `util/pack/xfdmaster.lha` (Dirk Stöcker)
- See also: [HUNK Format](hunk_format.md) — the container format crunchers wrap
- See also: [Exe Load Pipeline](exe_load_pipeline.md) — how LoadSeg handles the crunched HUNK
- See also: [Overlay System](overlay_system.md) — another approach to large-program memory management

View file

@ -98,6 +98,7 @@ The Amiga's documentation was scattered across out-of-print manuals, Usenet post
| [exe_load_pipeline.md](03_loader_and_exec_format/exe_load_pipeline.md) | LoadSeg → AllocMem → relocation → segment chain → CreateProc → entry point |
| [object_file_format.md](03_loader_and_exec_format/object_file_format.md) | HUNK_UNIT object files, multi-section layout, HUNK_LIB archives, linker operation |
| [overlay_system.md](03_loader_and_exec_format/overlay_system.md) | HUNK_OVERLAY: tree architecture, runtime overlay manager, modern alternatives |
| [**exe_crunchers.md**](03_loader_and_exec_format/exe_crunchers.md) | **Executable packers: PP20/Imploder/Shrinkler, decrunch stubs, algorithms, detection** |
### 04 — Linking & Library Integration
| File | Topic |