Replaced the brief 'key insight' paragraph with a comprehensive
two-phase execution model covering:
- Phase 1 (OS): what LoadSeg sees, how wrapper relocations are applied
- Phase 2 (stub): all 6 reconstruction steps with code examples:
1. Metadata block preserving original hunk count/sizes/memflags
2. Chip vs Fast RAM allocation (with IMPORTANT warning about
DMA-invisible Fast RAM for bitmap/audio data)
3. Decompression into separately allocated hunks
4. Relocation regeneration from embedded tables
5. BPTR segment chain construction (with C code)
6. Wrapper cleanup and JMP to original entry
- The segment list handoff problem (double-free hazard) and the
three solutions: pr_SegList patching, chain linking, in-place