amiga-bootcamp/05_reversing/static/hunk_reconstruction.md
2026-04-27 18:34:07 -04:00

8.7 KiB
Raw Permalink Blame History

← Home · Reverse Engineering

Static Analysis — HUNK Reconstruction

Overview

A hex dump. Four bytes: 00 00 03 F3. That's HUNK_HEADER — the start of an Amiga executable. Everything that follows — code segments, data, BSS, relocations, symbols — is a stream of tagged longword records. Reading this stream by hand is the first skill every Amiga reverse engineer should develop: it reveals the skeleton of the binary before any tool touches it, and it catches corrupted files, packing wrappers, and obfuscated headers that automated loaders may silently misinterpret.

This article walks through manual HUNK parsing from a raw hex dump, covering the header, code/data/BSS segments, HUNK_RELOC32 patching, HUNK_SYMBOL extraction, and HUNK_EXT import/export resolution — all with copy-paste Python scripts.

graph LR
    subgraph "HUNK Stream"
        HDR["HUNK_HEADER<br/>000003F3"]
        CODE["HUNK_CODE<br/>000003E9"]
        DATA["HUNK_DATA<br/>000003EA"]
        BSS["HUNK_BSS<br/>000003EB"]
        RELOC["HUNK_RELOC32<br/>000003EC"]
        SYM["HUNK_SYMBOL<br/>000003F0"]
        END["HUNK_END<br/>000003F2"]
    end
    HDR -->|"num hunks, sizes"| CODE
    CODE -->|"code bytes"| DATA
    DATA -->|"data bytes"| BSS
    BSS -->|"size only"| RELOC
    RELOC -->|"patch offsets"| SYM
    SYM -->|"debug names"| END


Step 1 — Identify Magic and Header

xxd mybinary | head -8
00000000: 0000 03f3  ← HUNK_HEADER magic
00000004: 0000 0000  ← resident library list (always 0)
00000008: 0000 0003  ← num_hunks = 3
0000000c: 0000 0000  ← first_hunk = 0
00000010: 0000 0002  ← last_hunk = 2
00000014: 0000 0200  ← hunk 0: 0x200 longs = 0x800 bytes (code)
00000018: 0000 0020  ← hunk 1: 0x20 longs = 0x80 bytes (data)
0000001c: 0000 0010  ← hunk 2: 0x10 longs = 0x40 bytes (BSS)

Each size longword: bits 3130 = memory type flag, bits 290 = size in longs.


Step 2 — Walk the Hunk Stream

After the header, scan longword-by-longword:

$000003E9  → HUNK_CODE: read next longword = size, then size*4 bytes
$000003EA  → HUNK_DATA: same
$000003EB  → HUNK_BSS: read size longword only (no data)
$000003EC  → HUNK_RELOC32: read pairs until terminator 0
$000003F0  → HUNK_SYMBOL: read (name_len, name, value) until name_len=0
$000003F1  → HUNK_DEBUG: read size longword, skip size*4 bytes
$000003F2  → HUNK_END: advance to next hunk

Grep for hunk boundaries

xxd mybinary | grep -E "0003 (e9|ea|eb|ec|f0|f1|f2|f3)"

Step 3 — Extract HUNK_SYMBOL Table

# find HUNK_SYMBOL ($3F0)
python3 - <<'EOF'
import struct, sys

data = open("mybinary", "rb").read()
i = 0
while i < len(data) - 4:
    tag = struct.unpack_from(">I", data, i)[0]
    if tag == 0x3F0:  # HUNK_SYMBOL
        print(f"HUNK_SYMBOL at offset {i:#x}")
        i += 4
        while True:
            nlen = struct.unpack_from(">I", data, i)[0]
            if nlen == 0: break
            name = data[i+4 : i+4+nlen*4].rstrip(b"\x00").decode("ascii","replace")
            val  = struct.unpack_from(">I", data, i+4+nlen*4)[0]
            print(f"  {name} = {val:#x}")
            i += 4 + nlen*4 + 4
    else:
        i += 4
EOF

Step 4 — Resolve HUNK_EXT Imports/Exports

In object files (HUNK_UNIT), HUNK_EXT carries import/export tables:

# Simplified HUNK_EXT parser
elif tag == 0x3EF:  # HUNK_EXT
    i += 4
    while True:
        word = struct.unpack_from(">I", data, i)[0]
        if word == 0: break
        ext_type = (word >> 24) & 0xFF
        nlen     = word & 0x00FFFFFF
        name = data[i+4 : i+4+nlen*4].rstrip(b"\x00").decode("ascii","replace")
        i += 4 + nlen * 4
        if ext_type in (1, 2):         # EXT_DEF / EXT_ABS
            val = struct.unpack_from(">I", data, i)[0]; i += 4
            print(f"  EXPORT {name} = {val:#x}")
        elif ext_type == 0x81:         # EXT_REF32
            nrefs = struct.unpack_from(">I", data, i)[0]; i += 4
            refs  = struct.unpack_from(f">{nrefs}I", data, i); i += nrefs*4
            print(f"  IMPORT {name} @ {[hex(r) for r in refs]}")

Step 5 — Annotating Reloc Patches in IDA

After loading the HUNK file in IDA:

  1. View → Open Subviews → Fixups — lists all HUNK_RELOC32 patch sites
  2. Press F5 on a relocated longword to see the computed address
  3. Use Edit → Operand type → Offset (data segment) to annotate as a pointer

IDA's Amiga loader applies relocations automatically, so all cross-hunk pointers show their final resolved addresses.


Decision Guide — HUNK Analysis Scenarios

Scenario Tool Why
Quick symbol dump hunkinfo or hex grep for $3F0 Instant, no scripting needed
Unknown / corrupted file Manual hex walk (Step 12) Identifies problems automated tools hide
Full symbol + reloc extraction Python script (Steps 34) Exports everything for external analysis
Standard RE in IDA IDA Amiga HUNK loader Automatic — no manual steps needed
Obfuscated / packed binary Manual hex walk first Detect non-standard headers before IDA silently fails

Named Antipatterns

1. "The Missing Relocation"

What it looks like — seeing MOVE.L #$0000000, An in a HUNK_CODE section and assuming the value is zero:

MOVE.L  #$00000000, D1    ; looks like D1 = 0
; But HUNK_RELOC32 at this offset changes it at load time!

Why it fails: HUNK_RELOC32 replaces placeholder longwords in CODE/DATA with actual addresses at load time. A $00000000 may become $00123456 after relocation. Without checking the relocation table, you're reading pre-patch values — completely wrong.

Correct: Always cross-reference every longword in CODE/DATA against the HUNK_RELOC32 table before interpreting it as a value.

2. "The End-of-Hunk Confusion"

What it looks like — finding 000003F2 (HUNK_END) and assuming that's the end of the file:

000003F2  ← HUNK_END of hunk 0
000003E9  ← HUNK_CODE of hunk 1 — file continues!

Why it fails: HUNK_END marks the end of a single hunk (code segment), not the end of the file. Multi-hunk executables have multiple HUNK_END markers — one per segment. Stopping at the first one loses all remaining hunks.

Correct: Continue parsing after HUNK_END until you reach either EOF or the end of the header-declared hunk count.


Use-Case Cookbook

Detect a Packed Binary (Cruncher Wrapper)

Packed executables often have unusual hunk structures:

xxd mybinary | head -4
# Normal: 0000 03F3 ... (HUNK_HEADER, num_hunks = N)
# Packed: 0000 03F3 0000 0001 0000 0000 ... (single hunk, huge size)
#   → single hunks with massive CODE segments = likely decruncher stub

Extract Strings from a HUNK Binary Without Loading

import struct, sys
data = open(sys.argv[1], 'rb').read()
for i in range(0, len(data), 4):
    tag = struct.unpack_from('>I', data, i)[0]
    if tag in (0x3E9, 0x3EA):  # CODE or DATA
        size = struct.unpack_from('>I', data, i+4)[0] & 0x3FFFFFFF
        segment = data[i+8 : i+8+size*4]
        # Extract printable ASCII runs
        import re
        for m in re.finditer(rb'[\x20-\x7E]{4,}', segment):
            print(f'{i+8+m.start():08X}: {m.group().decode("ascii")}')

Cross-Platform Comparison

Amiga Concept Win32 PE Equivalent Linux ELF Equivalent Notes
HUNK_HEADER PE MZ + PE\0\0 signature ELF \x7FELF magic Same: file type identifier at offset 0
Hunk sizes in longs PE section SizeOfRawData ELF p_filesz Amiga uses 32-bit longword units; PE/ELF use bytes
HUNK_RELOC32 PE .reloc section ELF .rela.dyn Same purpose: load-time address patching
HUNK_SYMBOL PDB debug symbols (external) ELF .symtab (embedded) Amiga debug symbols in-line; PE keeps them separate
HUNK_EXT import/export PE Import/Export Directory ELF .dynsym Same concept: cross-module symbol resolution

FAQ

Why are hunk sizes in longs, not bytes?

The Amiga's 68000 CPU is a 16/32-bit architecture where memory is naturally addressed in 16-bit words and 32-bit longwords. Using longword units for hunk sizes keeps the headers word-aligned and simplifies the loader. Multiply by 4 to get byte sizes.

What's the difference between HUNK_UNIT and HUNK_HEADER?

HUNK_UNIT ($3E7) marks an object file (.o), intended for linking. HUNK_HEADER ($3F3) marks a linked executable. Object files contain HUNK_EXT symbols for unresolved references; executables have all references resolved.


References


References

  • NDK39: dos/doshunks.h
  • hunk_format.md — hunk type code reference
  • hunk_relocation.md — HUNK_RELOC32 mechanics
  • vlink documentation (HUNK appendix): http://sun.hasenbraten.de/vlink/