diff --git a/03_loader_and_exec_format/exe_crunchers.md b/03_loader_and_exec_format/exe_crunchers.md
index d7d7376..3e463e6 100644
--- a/03_loader_and_exec_format/exe_crunchers.md
+++ b/03_loader_and_exec_format/exe_crunchers.md
@@ -12,6 +12,13 @@ This was essential in the floppy era: a 200 KB program crunched to 120 KB loads
## Architecture
+### Two-Phase Execution Model
+
+A crunched executable goes through **two loading phases**:
+
+1. **Phase 1 — OS loads the wrapper** (`LoadSeg` handles the crunched HUNK file normally)
+2. **Phase 2 — Stub rebuilds the original program** (acting as a mini-`LoadSeg` inside the running process)
+
```mermaid
graph LR
subgraph "Original Executable"
@@ -32,9 +39,145 @@ graph LR
style CD fill:#e8f4fd,stroke:#2196f3,color:#333
```
-### Key Insight
+### Phase 1: What the OS Loader Sees
-A crunched executable is **itself a valid HUNK file**. The OS loader handles it normally — `LoadSeg()` allocates memory, loads hunks, applies relocations. The "magic" is that hunk 0 contains a decrunch stub instead of the original code, and the data hunk contains the compressed original program.
+The crunched file is a perfectly valid HUNK executable. `LoadSeg()` processes it like any other program:
+- Reads `HUNK_HEADER`, allocates 2–3 segments (stub code, compressed data, BSS workspace)
+- Applies the wrapper's own `HUNK_RELOC32` entries (minimal — just the stub's internal references)
+- Links the segments into a BPTR chain and returns the segment list
+- `CreateProc()` sets up a task and jumps to hunk 0 offset 0 — the decrunch stub
+
+At this point the OS is done. It thinks it loaded a normal program. The original executable's structure, memory types, relocations — all of that is **inside the compressed payload** and invisible to the OS.
+
+### Phase 2: What the Stub Must Reconstruct
+
+The decrunch stub must rebuild everything `LoadSeg` would have done for the original executable:
+
+```mermaid
+flowchart TD
+ A["Entry: stub begins executing"] --> B["1. Read metadata block"]
+ B --> C["2. AllocMem for each original hunk
(CHIP/FAST as specified)"]
+ C --> D["3. Decompress payload into
allocated hunk buffers"]
+ D --> E["4. Apply relocations:
patch absolute addresses"]
+ E --> F["5. Build BPTR segment chain"]
+ F --> G["6. Free stub + compressed data"]
+ G --> H["7. JMP to original entry point"]
+
+ style B fill:#e8f4fd,stroke:#2196f3,color:#333
+ style C fill:#fff3e0,stroke:#ff9800,color:#333
+ style E fill:#fce4ec,stroke:#e91e63,color:#333
+ style F fill:#e8f5e9,stroke:#4caf50,color:#333
+```
+
+#### Step 1: Metadata — Preserving the Original Structure
+
+The compressed payload includes a **metadata block** that captures the original executable's structure. This is stored either at a fixed offset in the compressed data or appended after it:
+
+```c
+/* What the cruncher preserves in the metadata: */
+struct CrunchMetadata {
+ ULONG num_hunks; /* original hunk count */
+ ULONG hunk_sizes[]; /* size of each original hunk (bytes) */
+ ULONG hunk_memflags[]; /* MEMF_CHIP, MEMF_FAST, MEMF_ANY per hunk */
+ ULONG hunk_types[]; /* HUNK_CODE, HUNK_DATA, HUNK_BSS */
+ /* Relocation data follows (format varies by cruncher) */
+};
+```
+
+Without this metadata, the stub cannot allocate memory correctly — a bitmap hunk that needs Chip RAM would end up in Fast RAM and be invisible to the custom chip DMA.
+
+#### Step 2: Memory Allocation — Chip vs Fast Separation
+
+This is the critical step most people miss. The original executable might have had:
+
+```
+Hunk 0: HUNK_CODE → MEMF_FAST (68000 code — any memory)
+Hunk 1: HUNK_DATA → MEMF_CHIP (bitmaps, audio samples — MUST be DMA-reachable)
+Hunk 2: HUNK_BSS → MEMF_ANY (zero-filled workspace)
+```
+
+The stub must call `AllocMem()` **individually** for each original hunk with the correct memory type flags:
+
+```asm
+; Stub allocates each original hunk separately:
+ MOVEA.L 4.W, A6 ; SysBase
+ ; Hunk 0: code — any memory is fine
+ MOVE.L code_size, D0
+ MOVE.L #MEMF_PUBLIC, D1
+ JSR -198(A6) ; AllocMem
+ MOVE.L D0, hunk_bases+0 ; save base address
+
+ ; Hunk 1: data — MUST be Chip RAM for DMA
+ MOVE.L data_size, D0
+ MOVE.L #MEMF_CHIP|MEMF_PUBLIC, D1
+ JSR -198(A6) ; AllocMem
+ MOVE.L D0, hunk_bases+4 ; save base address
+
+ ; Hunk 2: BSS — just clear memory
+ MOVE.L bss_size, D0
+ MOVE.L #MEMF_PUBLIC|MEMF_CLEAR, D1
+ JSR -198(A6) ; AllocMem
+ MOVE.L D0, hunk_bases+8
+```
+
+> [!IMPORTANT]
+> If a cruncher loses the CHIP/FAST distinction (merging everything into one hunk), programs with bitmap/audio data in data hunks will **silently fail** — the DMA hardware can only access Chip RAM. Symptoms: garbled graphics, no audio, or Guru Meditation on access.
+
+#### Step 3: Decompress — Fill the Allocated Hunks
+
+The decompressor reads from the compressed payload (in the wrapper's data hunk) and writes to the freshly allocated original hunks. For programs with multiple hunks, the decompressor either:
+- Decompresses into a flat temp buffer, then copies to individual hunks (Method 1)
+- Decompresses directly to each hunk in sequence, using stored boundaries (Method 2)
+
+#### Step 4: Apply Relocations
+
+The original HUNK_RELOC32 tables are **embedded in the compressed data** — they were part of the original file. After decompression, the stub must patch all absolute addresses to reflect the actual allocation addresses. See the [Relocation Handling](#relocation-handling) section below for the three strategies.
+
+#### Step 5: Build the Segment Chain
+
+AmigaDOS tracks loaded programs as a BPTR-linked segment list. The stub must construct this chain so `UnLoadSeg()` can free the memory later:
+
+```c
+/* Each segment has a 4-byte BPTR link at offset -4: */
+/* [alloc_size][-4] [BPTR→next][0] [hunk data...][4+] */
+
+/* Stub builds the chain: */
+for (int i = 0; i < num_hunks - 1; i++)
+{
+ BPTR *link = (BPTR *)(hunk_bases[i]); /* offset 0 = BPTR to next */
+ *link = MKBADDR(hunk_bases[i + 1]); /* point to next segment */
+}
+/* Last segment's link = 0 (NULL) — end of chain */
+*(BPTR *)(hunk_bases[num_hunks - 1]) = 0;
+```
+
+> **Why this matters**: If the stub doesn't build a valid segment chain, `UnLoadSeg()` (called when the program exits) will crash or leak memory. Some simple crunchers skip this step entirely — the program runs fine but its memory is never freed.
+
+#### Step 6: Free the Wrapper, Jump to Original
+
+The stub frees the wrapper's own memory (compressed data, BSS workspace) and `JMP`s to the original entry point at hunk 0, offset 0 (after the BPTR link word):
+
+```asm
+ ; Free wrapper hunks
+ ; (some stubs skip this and accept the memory leak)
+
+ ; Restore registers to match what the OS originally passed
+ MOVEM.L (SP)+, D0-D7/A0-A6
+
+ ; Jump to original program entry
+ MOVEA.L hunk_bases+0, A0
+ ADDQ.L #4, A0 ; skip BPTR link at offset 0
+ JMP (A0)
+```
+
+### The Segment List Handoff Problem
+
+There's a subtle issue: the OS gave the process a segment list pointing to the **wrapper** hunks. But the actual program now lives in **newly allocated** hunks. When the program exits, `UnLoadSeg()` will try to free the wrapper's segment list, which the stub may have already freed — double-free crash.
+
+Sophisticated crunchers solve this by:
+1. **Patching the process's `pr_SegList`** to point to the new segment chain
+2. **Or** keeping the wrapper allocated and linking it into the chain
+3. **Or** replacing the wrapper's content in-place (overwriting stub+compressed with decompressed code)
---