amiga-bootcamp/03_loader_and_exec_format/exe_crunchers.md
Ilia Sharin 4593ff135a 03: new article — exe_crunchers.md (executable packers deep dive)
New comprehensive article on Amiga executable crunchers covering:
- Architecture: how crunched files remain valid HUNK executables
- Major crunchers: PowerPacker, Imploder, Shrinkler, ByteKiller,
  Titanics, CrunchMania, PackFire, XPK framework
- PP20 format: efficiency table, decrunch info, decrunch colours
- Shrinkler internals: 1536-context adaptive probability model,
  range coder, parity context flag, stack-based context table,
  actual 68000 decompressor source from GitHub
- LZ77/LZSS vs context-modelling+range-coding algorithms
- Relocation handling: 3 strategies (compressed relocs, delta
  table, merged single-hunk)
- Memory layout diagrams: before/during/after decompression
- Detection: magic signatures table, fake header warning,
  Python scanner script
- Tools: xfdmaster modular architecture, Ancient C++ library,
  debugger-based extraction (last resort)
- FPGA/emulation impact: timing, self-modifying code, cache

Based on web research: verified PP20 format spec, Shrinkler
source (askeksa/Shrinkler), Ancient library (temisu/ancient),
xfdmaster slave module architecture.

Updated indexes: 03/README.md, root README.md
2026-04-23 18:41:07 -04:00

20 KiB
Raw Blame History

← Home · Loader & HUNK Format

Executable Crunchers — Compression, Decrunch Stubs, and Internals

Overview

Executable crunchers (packers) compress AmigaOS executables while keeping them directly runnable. The crunched file is a valid HUNK executable — when launched, a tiny decrunch stub runs first, decompresses the original program in memory, then jumps to its real entry point. The user sees a brief colour-cycling delay (the "decrunch colours"), then the program runs normally.

This was essential in the floppy era: a 200 KB program crunched to 120 KB loads significantly faster from a slow 880 KB floppy and frees disk space on capacity-constrained media.


Architecture

graph LR
    subgraph "Original Executable"
        OH["HUNK_HEADER"] --> OC["HUNK_CODE<br/>Original code"]
        OC --> OD["HUNK_DATA<br/>Original data"]
        OD --> OB["HUNK_BSS"]
    end

    subgraph "Crunched Executable"
        CH["HUNK_HEADER"] --> CS["HUNK_CODE<br/>Decrunch Stub<br/>(~200800 bytes)"]
        CS --> CD["HUNK_DATA<br/>Compressed payload<br/>(original hunks)"]
        CD --> CB["HUNK_BSS<br/>Decompression workspace"]
    end

    OH -.->|"Cruncher tool"| CH

    style CS fill:#fff9c4,stroke:#f9a825,color:#333
    style CD fill:#e8f4fd,stroke:#2196f3,color:#333

Key Insight

A crunched executable is itself a valid HUNK file. The OS loader handles it normally — LoadSeg() allocates memory, loads hunks, applies relocations. The "magic" is that hunk 0 contains a decrunch stub instead of the original code, and the data hunk contains the compressed original program.


Major Amiga Crunchers

Cruncher Era Algorithm Stub Size Typical Ratio Notes
PowerPacker (PP20) 19891994 LZ77 + configurable efficiency ~280 bytes 5060% Most popular; powerpacker.library for data files
Imploder (IMP!) 19901993 LZSS variant ~400 bytes 4555% Multiple modes: Normal, Pure, Library, Overlayed
Turbo Imploder 19911993 Enhanced LZSS ~420 bytes 4252% Faster crunch, same decrunch
ByteKiller 19881991 LZ77 (simple) ~160 bytes 5565% Early; position-independent stub; used for raw data too
Titanics Cruncher (ATN!) 19911993 LZ77 ~350 bytes 5565% Fast decrunch
CrunchMania (CrM!) 19921995 LZ + range coding ~500 bytes 4050% Many registered/customised versions — format variants
Shrinkler 2014+ Context-model + range coder ~250 bytes 3040% Modern; best ratio; used in 4K/64K demo intros
PackFire 2016+ Shrinkler derivative ~200 bytes 3040% Optimised for size-limited compos
XPK 1992+ Framework (multiple sub-packers) varies varies Library-based; supports NUKE, SMPL, SQSH, etc.

Binary Structure of a Crunched Executable

What the Cruncher Produces

The cruncher tool reads the original executable, compresses its contents, and wraps them in a new HUNK executable:

HUNK_HEADER ($3F3)
  num_hunks = 2 or 3
  hunk_sizes:
    [0] = stub code size + compressed data (or split across hunks)
    [1] = workspace BSS (decompression buffer)

HUNK_CODE ($3E9)
  <decrunch_stub>           ; 200800 bytes of 68000 code
  <compressed_payload>      ; the original executable, compressed
  <metadata>                ; original hunk count, sizes, memory types

HUNK_RELOC32 ($3EC)         ; relocations for the stub itself (minimal)

HUNK_BSS ($3EB)             ; workspace for decompression
  <size>                    ; typically = original uncompressed size

HUNK_END ($3F2)

Alternate Layout (Multi-Hunk)

Some crunchers split the stub and payload into separate hunks:

Hunk 0: HUNK_CODE — decrunch stub only (~300 bytes)
Hunk 1: HUNK_DATA — compressed payload + metadata
Hunk 2: HUNK_BSS  — decompression workspace

PowerPacker PP20 — Format Deep Dive

PowerPacker (by Nico François) is the most widely used Amiga cruncher. It exists in two forms: a data file format (for powerpacker.library) and an executable wrapper (for crunched .exe files).

PP20 Data Format

Offset  Size  Field
──────  ────  ─────────────────────────────
$00     4     Signature: "PP20" ($50503230)
$04     4     Efficiency table: 4 bytes controlling LZ bit-depth
              e.g. $09090909 = "Fast", $0A0B0C0D = "Best"
$08     N     Compressed bitstream data
$08+N   4     Decrunch info: 24-bit original size (big-endian) + checksum byte
              Byte layout: [size_hi] [size_mid] [size_lo] [checksum]

Efficiency Table

The 4-byte efficiency table controls how many bits are used for offset/length encoding in different compression modes:

Mode Efficiency Bytes Description
Fast $09 09 09 09 Smaller window, faster crunch
Mediocre $09 0A 0A 0A Balance
Good $09 0A 0B 0B Better ratio
Very Good $09 0A 0B 0C Near-best
Best $09 0A 0C 0D Maximum compression, slowest

The decompressor reads these 4 bytes to initialize its internal offset/length bit-allocation tables before starting the main decompression loop.

Decrunch Colours

The PowerPacker decrunch stub famously modifies custom chip colour registers during decompression to provide visual feedback — the background colour cycles through shades of grey or colour gradients, signalling that decrunching is in progress. This is the characteristic "decrunch effect" visible on real hardware:

; Visual feedback during decrunch:
    MOVE.W  D0, $DFF180        ; COLOR00 — background colour
    ; D0 increments with each decompressed block

Shrinkler — Modern State-of-the-Art

Shrinkler (by Blueberry/Loonies) is the current gold standard for Amiga executable compression, achieving 3040% ratios. It's open-source and widely used in the demo scene.

Algorithm: Context-Modelling + Range Coder

Unlike older LZ77-based crunchers, Shrinkler uses:

  1. Adaptive context model — maintains 1536 probability contexts (NUM_CONTEXTS = 1536). Each context tracks the probability of the next bit being 0 or 1, updated after every decoded bit
  2. Range coder — an arithmetic coding variant that encodes bits using interval subdivision based on the context probabilities
  3. LZ matching — literal bytes and back-references are intermixed, with the context model predicting which type comes next

Shrinkler Data Header

Offset  Size  Field
──────  ────  ─────────────────────────────
$00     4     Signature: "Shri" ($53687269)
$04     1     Major version
$05     1     Minor version
$06     2     Header size (remaining bytes)
$08     4     Compressed data size
$0C     4     Uncompressed data size
$10     4     Safety margin (for in-place decompression)
$14     4     Flags: bit 0 = FLAG_PARITY_CONTEXT

The parity context flag (FLAG_PARITY_CONTEXT) enables a special mode that maintains separate probability models based on the byte position parity, exploiting statistical properties of 68000 machine code (even/odd byte patterns in opcode words).

68000 Decompressor Core (from Shrinkler source)

The actual decompressor fits in approximately 100 instructions:

; Register usage:
;   D2 = Range value
;   D3 = Interval size
;   D4 = Input bit buffer (reads bytes from compressed stream)
;   D6 = Context index
;   D7 = Parity context flag (0 or 1)
;   A4 = Compressed data source pointer
;   A5 = Decompressed data destination pointer

INIT_ONE_PROB = $8000          ; Initial probability: 50/50
ADJUST_SHIFT  = 4              ; Probability adaptation rate
NUM_CONTEXTS  = 1536           ; Context table size

ShrinklerDecompress:
    movem.l d2-d7/a4-a6,-(a7)
    ; Init range decoder state
    moveq.l #0,d2              ; Range value = 0
    moveq.l #1,d3              ; Interval size = 1
    moveq.l #-$80,d4           ; Input buffer (triggers first byte read)

    ; Init all 1536 probabilities to 50% ($8000)
    move.l  #NUM_CONTEXTS,d6
.init:
    move.w  #INIT_ONE_PROB,-(a7)  ; Push WORD onto stack
    subq.w  #1,d6
    bne.b   .init
    ; Context table is now on the stack (3072 bytes)

    ; Main decompression loop
.lit:
    ; Decode literal byte bit-by-bit using context model
    addq.b  #1,d6
.getlit:
    bsr.b   GetBit             ; Get one bit from range coder
    addx.b  d6,d6              ; Shift bit into D6
    bcc.b   .getlit            ; Loop until byte complete
    move.b  d6,(a5)+           ; Write decompressed byte

.switch:
    bsr.b   GetKind            ; Is next item literal or reference?
    bcc.b   .lit               ; Literal → decode another byte

    ; Reference: decode offset and length
    ; ... (LZ match copy loop)

Stack-Based Context Table

A distinctive Shrinkler technique: the 1536-entry probability table (3072 bytes) is allocated on the stack — each entry is a WORD pushed during initialization. This avoids needing a separate AllocMem call and keeps the decompressor self-contained.


Compression Algorithms

LZ77 / LZSS (PowerPacker, Titanics, ByteKiller, Imploder)

The dominant algorithm family. The compressed stream is a sequence of control bits followed by either literal bytes or back-references:

[flag bit]
  0 → literal byte follows (copy 1 byte verbatim)
  1 → match reference: (offset, length)
       offset = how far back in already-decompressed data to copy from
       length = how many bytes to copy from that position

Decompression pseudo-code:
  while (output_pos < original_size):
      bit = read_bit()
      if bit == 0:
          output[output_pos++] = read_byte()       # literal
      else:
          offset = read_bits(offset_bits)           # back-reference
          length = read_bits(length_bits) + min_len
          copy(output, output_pos - offset, length) # copy from history
          output_pos += length

The efficiency setting (PowerPacker) or mode (Imploder) controls how many bits are allocated to offset and length fields — more offset bits = larger search window = better compression but slower.

Context Modelling + Range Coding (Shrinkler, PackFire)

Modern crunchers replace fixed-bit-width encoding with probability-based arithmetic coding:

  1. For each bit position, the context model estimates: "probability that this bit is 1"
  2. The range coder encodes the bit using that probability — high-probability bits use fewer output bits
  3. After encoding/decoding, the context probability is updated based on the actual bit value

This achieves near-optimal compression but decompression is slower (~25 seconds on a 7 MHz 68000 for a typical executable).


Relocation Handling

The original executable had HUNK_RELOC32 entries that patch absolute addresses. After decompression, these must be reapplied. Crunchers use three strategies:

Method 1: Compress Everything Including Relocs

The entire original file (all hunks + relocation tables) is compressed as a blob. The decrunch stub acts as a mini-LoadSeg:

  1. Decompress to a temp buffer
  2. Parse the HUNK stream
  3. Allocate individual hunks with correct memory types
  4. Copy data and apply relocations
  5. Free the temp buffer

Method 2: Pre-Relocated + Delta Table

  1. Cruncher pre-applies relocations assuming base address 0
  2. Stores a compact delta table — sorted list of byte-offset deltas between relocation sites
  3. After decompression, the stub walks the delta table and adds actual base addresses
/* Delta table: each entry is the offset-delta to the next reloc site */
UWORD reloc_deltas[] = {
    0x0006,  /* first reloc at offset 6 */
    0x0014,  /* +0x14 → next at offset 0x1A */
    0x0008,  /* +0x08 → next at offset 0x22 */
    0x0000   /* terminator */
};
/* More compact than storing absolute offsets */

Method 3: Merge and Self-Relocate

All hunks merged into a single code hunk. Inter-hunk references resolved at crunch time. The result needs minimal or no relocation.

Drawback: Loses CHIP/FAST memory separation — all data ends up in the same memory type. Problematic for programs that need Chip RAM for bitmaps or audio.


Memory Layout During Decompression

BEFORE (crunched exe loaded by OS):

  ┌──────────────────────┐  Hunk 0 (CODE)
  │ Decrunch stub (300B) │
  │ Compressed data (80K)│
  │ Metadata             │
  └──────────────────────┘
  ┌──────────────────────┐  Hunk 1 (BSS)
  │ Workspace (200K)     │  ← decompression buffer
  └──────────────────────┘

DURING (stub is executing):

  ┌──────────────────────┐  Hunk 0 — still alive
  │ Stub + compressed ───│──→ reading from here
  └──────────────────────┘
  ┌──────────────────────┐  AllocMem'd by stub
  │ Original Hunk 0 CODE │──→ writing decompressed data
  └──────────────────────┘
  ┌──────────────────────┐  AllocMem'd by stub
  │ Original Hunk 1 DATA │
  └──────────────────────┘

AFTER (stub jumps to original entry):

  ┌──────────────────────┐  (freed or abandoned)
  │ [freed stub memory]  │
  └──────────────────────┘
  ┌──────────────────────┐  Original program running
  │ Original Hunk 0 CODE │  ← PC here
  └──────────────────────┘
  ┌──────────────────────┐
  │ Original Hunk 1 DATA │
  └──────────────────────┘

In-place decompression: Some crunchers (including Shrinkler) support decompressing over the compressed data — the safety_margin field in the Shrinkler header reserves extra space so the decompressor's write pointer never overtakes the read pointer. Data is decompressed from end to start.


Detection and Identification

Magic Signatures

Cruncher Signature Hex Location
PowerPacker PP20 $50503230 Start of compressed data
Imploder IMP! $494D5021 Start of compressed data
Turbo Imploder IMP! $494D5021 Same — version in stub differs
Titanics ATN! $41544E21 Start of compressed data
CrunchMania CrM! / CrM2 $43724D21 / $43724D32 Start of compressed data
Shrinkler Shri $53687269 Data file header (exe uses stub pattern)
ByteKiller (no magic) Detected by stub pattern only
XPK Framework XPKF $58504B46 File header

Warning

Fake headers are extremely common in the Amiga cracking scene. A file claiming to be IMP! may have a spoofed header to frustrate analysis. If standard tools reject it, the header is likely fake — use a debugger to capture the decrunched memory image instead.

Detecting Crunched Executables in RE

  1. Tiny code hunk + large data hunk — unusual ratio signals packing
  2. AllocMem + decompression loop at entry point — not the normal c.o startup pattern
  3. No MOVE.L 4.W,A6 / OpenLibrary sequence — stub goes straight to decompression
  4. Custom chip register writes ($DFF180 colour changes) — decrunch colour feedback
  5. Magic bytes in the data hunk — scan for known signatures
  6. Self-modifying code — stub may overwrite its own memory during in-place decompression
# Quick detection script:
import struct

MAGICS = {
    b'PP20': 'PowerPacker',
    b'IMP!': 'Imploder',
    b'ATN!': 'Titanics Cruncher',
    b'CrM!': 'CrunchMania',
    b'CrM2': 'CrunchMania 2',
    b'Shri': 'Shrinkler (data)',
    b'XPKF': 'XPK Framework',
}

def detect_cruncher(filename):
    with open(filename, 'rb') as f:
        data = f.read()
    for magic, name in MAGICS.items():
        if magic in data:
            off = data.index(magic)
            print(f"  {name} detected at offset ${off:04X}")
            return name
    # Check for valid HUNK with suspicious layout
    if data[:4] == b'\x00\x00\x03\xf3':  # HUNK_HEADER
        print("  Valid HUNK — check for stub pattern at entry point")
    return None

Decrunching Tools

AmigaOS Native

Tool Description
xfdmaster.library Universal decruncher — modular architecture with "slave" plugins in LIBS:xfd/
xfdDecrunch CLI front-end: xfdDecrunch packed.exe unpacked.exe
xfdScan / xfdList Identify cruncher type; list installed slave modules
powerpacker.library PP20 data file decompression: ppLoadData()

Cross-Platform

Tool Description
Ancient (C++) Modern portable library — supports ByteKiller, Imploder, CrunchMania, PP20, and many more. GitHub: temisu/ancient
ppunpack PP20 only: ppunpack packed.exe unpacked.exe
Shrinkler -d Shrinkler data files: shrinkler -d packed unpacked

xfdmaster — Modular Architecture

xfdmaster does not have a hardcoded format list. It loads slave modules from LIBS:xfd/ at runtime, each handling one or more cruncher formats:

LIBS:xfd/
  PowerPacker         ; handles PP20
  Imploder            ; handles IMP!
  CrunchMania         ; handles CrM!, CrM2
  ByteKiller          ; stub-pattern detection
  Titanics            ; handles ATN!
  ...                 ; 100+ supported formats
/* Using xfdmaster.library to decrunch any format: */
struct xfdBufferInfo *xbi = xfdAllocObject(XFDOBJ_BUFFERINFO);
xbi->xfdbi_SourceBufLen = filesize;
xbi->xfdbi_SourceBuffer = filebuf;

if (xfdRecogBuffer(xbi))
{
    printf("Detected: %s\n", xbi->xfdbi_PackerName);
    if (xfdDecrunchBuffer(xbi))
    {
        /* xbi->xfdbi_TargetBuffer = decrunched data */
        /* xbi->xfdbi_TargetBufSaveLen = decrunched size */
    }
}
xfdFreeObject(xbi);

Debugger-Based Extraction (Last Resort)

For unknown or custom crunchers, the most reliable method is to load the executable in a hardware-level debugger (HRTMon, ASM-One, or an emulator's monitor), set a breakpoint at the end of the decrunch stub (typically the final JMP instruction), and capture the memory image once decompression is complete:

; In HRTMon:
> d $entry_point          ; disassemble entry
; Find the final JMP at the end of the stub
> bpx $stub_end_jmp       ; set breakpoint
> g                        ; run
; When breakpoint hits, the decrunched program is in memory
> sm $dest $dest+size "decrunched.bin"  ; save memory

Impact on FPGA / Emulation

Concern Detail
Timing-sensitive stubs Imploder has tight loops that may fail on accelerated CPUs; some stubs poll $DFF006 (VHPOSR) for timing
Memory allocation Stub requires working exec.library AllocMem — must have a functional memory list
Chip RAM specificity If original hunks need CHIP RAM, stub must request MEMF_CHIP — DMA-accessible memory required for graphics/audio
Self-modifying code In-place decompression writes over instruction bytes — 68020+ instruction cache must be invalidated (CacheClearU)
Custom chip access Decrunch colour writes to $DFF180 require a working Denise/colour register
Boot-block crunchers Trackloaders (game boot blocks) use custom crunchers without HUNK format — completely different mechanism, no OS involvement

References