amiga-bootcamp/05_reversing/static/string_xref_analysis.md

[← Home](../../README.md) · [Reverse Engineering](../README.md)

# String Cross-Reference Analysis

## Overview

A binary is a sea of bytes. Most of it is unintelligible machine code. But floating in that sea are islands of ASCII: library names, error messages, format strings, screen titles. Each string is a **label on a code path** — the first thing a reverse engineer should find, because it's the only human-readable content in the entire binary.

String cross-reference analysis is the fastest entry point into an unknown Amiga binary. Find the `.library` strings → find `OpenLibrary` calls → identify every OS API the program uses. Find error messages → find the error-handling code paths. Find format strings → find printf/logging sites → understand program flow. This article covers the complete string-driven RE methodology.

```mermaid
graph TB
    subgraph "String Types"
        LIB[".library strings<br/>→ OpenLibrary calls"]
        ERR["Error messages<br/>→ failure code paths"]
        FMT["Format strings<br/>→ printf/logging"]
        TITLE["Screen/window titles<br/>→ product identity"]
        PATH["File path strings<br/>→ file I/O targets"]
    end
    subgraph "What They Reveal"
        API["API usage map"]
        FLOW["Program flow"]
        ID["Product name/version"]
        FILES["File access patterns"]
    end
    LIB --> API
    ERR --> FLOW
    FMT --> FLOW
    TITLE --> ID
    PATH --> FILES
```

---

---

## Finding Library Name Strings

Every `OpenLibrary` call is preceded by a string reference. Search for `".library"`:

```bash
# Host: grep for library name strings in binary
strings mybinary | grep -i library
# → "dos.library", "graphics.library", "intuition.library", ...
```

In IDA:
1. `View → Open Subviews → Strings` (Shift+F12)
2. Search for `.library`
3. Press `X` on any result to see all cross-references
4. Each xref leads to a `LEA str(PC), A1` or `MOVE.L #str, A1` before a `JSR -552(A6)` (OpenLibrary)

---

## Tracing OpenLibrary Calls to Their Targets

```asm
; Pattern to find:
LEA     (_str_dos).L, A1      ; "dos.library"
MOVEQ   #36, D0               ; min version
MOVEA.L 4.W, A6               ; exec.library
JSR     (-552,A6)             ; OpenLibrary → D0 = DOSBase
MOVE.L  D0, (_DOSBase).L      ; store for later use
```

Xref `_str_dos` → find this block → identify the stored library base variable → label it `_DOSBase`.

---

## Using HUNK_SYMBOL Names as Seed Labels

If `HUNK_SYMBOL` is present (debug build), IDA auto-applies names. These seed labels help bootstrap analysis:

1. `View → Open Subviews → Names` → look for any `_` prefixed symbols
2. Named functions often call unnamed helpers nearby — work outward
3. String xrefs from named functions propagate names further

---

## Error Message Strings

Error/diagnostic strings reveal program flow:

```asm
; Common pattern:
LEA     _err_nolib(PC), A0     ; "Can't open dos.library"
MOVEA.L _DOSBase, A6
JSR     (-60,A6)               ; Output() → D0 = stdout
MOVE.L  D0, D1
LEA     _err_nolib(PC), A2
MOVE.L  A2, D2
MOVEQ   #_err_nolib_end - _err_nolib, D3
JSR     (-48,A6)               ; Write(stdout, msg, len)
```

The error string tells you exactly what this code path handles.

---

## Format String Xref Analysis (printf)

SAS/C `printf` style calls via `dos.library VPrintf`:

```asm
MOVEA.L  _DOSBase, A6
LEA      _fmt_str(PC), A0      ; "Error: %ld\n"
MOVE.L   A0, D1
MOVE.L   A1, D2                ; varargs array
JSR      (-954,A6)             ; VPrintf()
```

Format strings like `"Error: %ld\n"` or `"Processing: %s"` reveal parameter types and function purpose.

---

## Workbench Title Strings

```asm
; Typical NewScreen/OpenScreen call sequence:
LEA     _screen_title(PC), A0  ; "MyApp v1.0"
MOVE.L  A0, (NewScreen+ns_Title)
```

Screen/window title strings appear in `intuition.library` `OpenScreen` / `OpenWindow` calls and give the product name.

---

## Automated String Map

Build a complete string inventory:

```python
# IDA script: map all string xrefs
for s in idautils.Strings():
    text = str(idc.get_strlit_contents(s.ea, s.length, s.strtype))
    refs = list(idautils.XrefsTo(s.ea))
    if refs:
        for ref in refs:
            func = idc.get_func_name(ref.frm)
            print(f"{s.ea:#x} [{text!r:40s}] ← {func or 'unknown'} @ {ref.frm:#x}")

---

## Decision Guide — String-Driven Entry Points

| String Type | What to Do First | What It Tells You |
|---|---|---|
| `".library"` | Xref → find OpenLibrary | Every OS API the program uses |
| `"Error:"` / `"Can't"` / `"Failed"` | Xref → error handler | Failure code paths, rare branches |
| `"%d"` / `"%s"` / `"%ld"` | Xref → VPrintf/printf | Logging sites, parameter types |
| File paths (`"SYS:"`, `"LIBS:"`, `"PROGDIR:"`) | Xref → Open/Lock/LoadSeg | File I/O targets |
| Screen/window titles | Xref → OpenScreen/OpenWindow | Application identity, version |

---

## Named Antipatterns

### 1. "The Dead String"

**What it looks like** — finding an error string with no cross-references and assuming the code path is unreachable:

```asm
LEA     _err_fatal(PC), A0     ; "FATAL: disk error"
; No xref to this string — but it's used via computed address!
```

**Why it fails:** Some programs build string addresses dynamically (e.g., through a string table indexed at runtime). IDA won't detect these as xrefs. The string IS used — just not through a static reference.

**Correct:** For strings without xrefs, check if they're part of a larger string table (consecutive string data). If so, a function loading a base address + computed offset may reference them dynamically.

### 2. "The Null Bait"

**What it looks like** — IDA showing a 100-character "string" because it didn't stop at an embedded null:

```asm
; SAS/C strings are Pascal-style: length-prefixed, NOT null-terminated!
DC.B    $0E, "Hello, World!", 0   ; length byte = 14, then data, then null
; IDA sees only "Hello, World!" — misses the length byte
```

**Why it fails:** SAS/C uses Pascal-style strings (length byte prefix) for some internal data. IDA's C-style null-terminated string detection stops at the first null and may misinterpret string boundaries.

**Correct:** Check the byte before the string. If it equals the string length, it's a Pascal string — the string starts at that byte, not after it.

---

## Use-Case Cookbook

### Map Every OS API Call from Strings Alone

```python
# IDA Python: from .library strings → OpenLibrary → all calls
import idautils, idc

LIBRARIES = {}
for s in idautils.Strings():
    text = str(s)
    if text.endswith('.library'):
        for xref in idautils.XrefsTo(s.ea):
            # Walk forward from xref to find JSR (-552,A6)
            ea = xref.frm
            for _ in range(20):
                if idc.print_insn_mnem(ea) == 'JSR':
                    op = idc.print_operand(ea, 0)
                    if '-552' in op:
                        # Find where D0 (result) is stored
                        next_ea = idc.next_head(ea)
                        if idc.print_insn_mnem(next_ea) == 'MOVE.L':
                            dest = idc.print_operand(next_ea, 0)
                            LIBRARIES[text] = dest
                            print(f"{text} → stored at {dest}")
                ea = idc.next_head(ea)
```

### Find All Version Strings

Version strings often follow the pattern `"$VER: name version (date)"`:

```bash
strings mybinary | grep -i '\$VER:'
# Output: $VER: MyApp 1.23 (12.04.1993)
```

---

## Cross-Platform Comparison

| Amiga Concept | Win32 Equivalent | Linux Equivalent | Notes |
|---|---|---|---|
| `.library` strings → OpenLibrary | `.dll` strings → LoadLibrary | `.so` strings → dlopen | Same pattern: string identifies dynamically loaded module |
| String xref analysis | `strings.exe` + IDA cross-reference | `strings` + radare2/Ghidra xref | Universal RE technique: strings are the first foothold |
| SAS/C Pascal strings | Delphi/BCB short strings | N/A (C-dominated ecosystem) | Pascal-style strings are rare outside Amiga SAS/C |
| `$VER:` version string convention | `VS_VERSION_INFO` resource | `.comment` ELF section | Amiga's convention is informal but widely followed |

---

## FAQ

### Why do some strings have no xrefs in IDA?

Possible causes: (1) the string is referenced via a computed address (base+index), (2) the string is in a data table accessed by offset, (3) the string is dead code from a library compiled in but never called, (4) IDA's string detection split a long string incorrectly.

### How do I handle non-ASCII strings (German umlauts, etc.)?

Amiga uses ISO 8859-1 (Latin-1) encoding. Characters above `$7F` are valid Latin-1 but may display incorrectly in IDA's default ASCII view. Set IDA's string encoding to Latin-1 or use `idc.get_strlit_contents(ea, -1, STRTYPE_C_16)` for wide strings.

---

## References
```

---

## References

- IDA Pro: Strings subview (Shift+F12), Xrefs (X key)
- `static/api_call_identification.md` — resolving library base from string xrefs
- NDK39: `dos/dos.h` — `VPrintf`, `FPrintf`, error code strings
docs(amiga): complete AmigaOS 3.1/3.2 developer reference — 172 files across 17 sections Comprehensive technical documentation covering: - Hardware: OCS/ECS/AGA custom chip registers, Copper & Blitter deep dives - Boot sequence: cold boot through startup-sequence - Binary format: HUNK executable spec, relocation, debug info - Linking & ABI: .fd files, LVO tables, register calling conventions - Exec kernel: tasks, interrupts, memory, signals, semaphores - AmigaDOS: file I/O, FFS/OFS layout, CLI/Shell scripting - Graphics: planar bitmaps, Copper programming, HAM/EHB modes - Intuition: screens, windows, IDCMP, BOOPSI - Devices: trackdisk, SCSI, serial, timer, audio, keyboard - Libraries: utility, expansion, IFFParse, locale, ARexx - Networking: bsdsocket API, SANA-II, TCP/IP stack comparison - Toolchain: GCC, vasm/vlink, SAS/C, NDK, debugging - Reverse engineering: IDA/Ghidra setup, compiler fingerprints, case studies - CPU & MMU: 68040/060 emulation libs, PMMU, cache management - Driver development: SANA-II, Picasso96/RTG, AHI audio All files include breadcrumb navigation. No local paths or proprietary content. 2026-04-23 12:16:52 -04:00			`[← Home](../../README.md) · [Reverse Engineering](../README.md)`

			`# String Cross-Reference Analysis`

			`## Overview`

More information added 2026-04-27 18:34:07 -04:00			`A binary is a sea of bytes. Most of it is unintelligible machine code. But floating in that sea are islands of ASCII: library names, error messages, format strings, screen titles. Each string is a label on a code path — the first thing a reverse engineer should find, because it's the only human-readable content in the entire binary.`

			String cross-reference analysis is the fastest entry point into an unknown Amiga binary. Find the `.library` strings → find `OpenLibrary` calls → identify every OS API the program uses. Find error messages → find the error-handling code paths. Find format strings → find printf/logging sites → understand program flow. This article covers the complete string-driven RE methodology.

			```mermaid
			`graph TB`
			`subgraph "String Types"`
			`LIB[".library strings<br/>→ OpenLibrary calls"]`
			`ERR["Error messages<br/>→ failure code paths"]`
			`FMT["Format strings<br/>→ printf/logging"]`
			`TITLE["Screen/window titles<br/>→ product identity"]`
			`PATH["File path strings<br/>→ file I/O targets"]`
			`end`
			`subgraph "What They Reveal"`
			`API["API usage map"]`
			`FLOW["Program flow"]`
			`ID["Product name/version"]`
			`FILES["File access patterns"]`
			`end`
			`LIB --> API`
			`ERR --> FLOW`
			`FMT --> FLOW`
			`TITLE --> ID`
			`PATH --> FILES`
			```

			`---`
docs(amiga): complete AmigaOS 3.1/3.2 developer reference — 172 files across 17 sections Comprehensive technical documentation covering: - Hardware: OCS/ECS/AGA custom chip registers, Copper & Blitter deep dives - Boot sequence: cold boot through startup-sequence - Binary format: HUNK executable spec, relocation, debug info - Linking & ABI: .fd files, LVO tables, register calling conventions - Exec kernel: tasks, interrupts, memory, signals, semaphores - AmigaDOS: file I/O, FFS/OFS layout, CLI/Shell scripting - Graphics: planar bitmaps, Copper programming, HAM/EHB modes - Intuition: screens, windows, IDCMP, BOOPSI - Devices: trackdisk, SCSI, serial, timer, audio, keyboard - Libraries: utility, expansion, IFFParse, locale, ARexx - Networking: bsdsocket API, SANA-II, TCP/IP stack comparison - Toolchain: GCC, vasm/vlink, SAS/C, NDK, debugging - Reverse engineering: IDA/Ghidra setup, compiler fingerprints, case studies - CPU & MMU: 68040/060 emulation libs, PMMU, cache management - Driver development: SANA-II, Picasso96/RTG, AHI audio All files include breadcrumb navigation. No local paths or proprietary content. 2026-04-23 12:16:52 -04:00
			`---`

			`## Finding Library Name Strings`

			Every `OpenLibrary` call is preceded by a string reference. Search for `".library"`:

			```bash
			`# Host: grep for library name strings in binary`
			`strings mybinary \| grep -i library`
			`# → "dos.library", "graphics.library", "intuition.library", ...`
			```

			`In IDA:`
			1. `View → Open Subviews → Strings` (Shift+F12)
			2. Search for `.library`
			3. Press `X` on any result to see all cross-references
			4. Each xref leads to a `LEA str(PC), A1` or `MOVE.L #str, A1` before a `JSR -552(A6)` (OpenLibrary)

			`---`

			`## Tracing OpenLibrary Calls to Their Targets`

			```asm
			`; Pattern to find:`
			`LEA (_str_dos).L, A1 ; "dos.library"`
			`MOVEQ #36, D0 ; min version`
			`MOVEA.L 4.W, A6 ; exec.library`
			`JSR (-552,A6) ; OpenLibrary → D0 = DOSBase`
			`MOVE.L D0, (_DOSBase).L ; store for later use`
			```

			Xref `_str_dos` → find this block → identify the stored library base variable → label it `_DOSBase`.

			`---`

			`## Using HUNK_SYMBOL Names as Seed Labels`

			If `HUNK_SYMBOL` is present (debug build), IDA auto-applies names. These seed labels help bootstrap analysis:

			1. `View → Open Subviews → Names` → look for any `_` prefixed symbols
			`2. Named functions often call unnamed helpers nearby — work outward`
			`3. String xrefs from named functions propagate names further`

			`---`

			`## Error Message Strings`

			`Error/diagnostic strings reveal program flow:`

			```asm
			`; Common pattern:`
			`LEA _err_nolib(PC), A0 ; "Can't open dos.library"`
			`MOVEA.L _DOSBase, A6`
			`JSR (-60,A6) ; Output() → D0 = stdout`
			`MOVE.L D0, D1`
			`LEA _err_nolib(PC), A2`
			`MOVE.L A2, D2`
			`MOVEQ #_err_nolib_end - _err_nolib, D3`
			`JSR (-48,A6) ; Write(stdout, msg, len)`
			```

			`The error string tells you exactly what this code path handles.`

			`---`

			`## Format String Xref Analysis (printf)`

			SAS/C `printf` style calls via `dos.library VPrintf`:

			```asm
			`MOVEA.L _DOSBase, A6`
			`LEA _fmt_str(PC), A0 ; "Error: %ld\n"`
			`MOVE.L A0, D1`
			`MOVE.L A1, D2 ; varargs array`
			`JSR (-954,A6) ; VPrintf()`
			```

			Format strings like `"Error: %ld\n"` or `"Processing: %s"` reveal parameter types and function purpose.

			`---`

			`## Workbench Title Strings`

			```asm
			`; Typical NewScreen/OpenScreen call sequence:`
			`LEA _screen_title(PC), A0 ; "MyApp v1.0"`
			`MOVE.L A0, (NewScreen+ns_Title)`
			```

			Screen/window title strings appear in `intuition.library` `OpenScreen` / `OpenWindow` calls and give the product name.

			`---`

			`## Automated String Map`

			`Build a complete string inventory:`

			```python
			`# IDA script: map all string xrefs`
			`for s in idautils.Strings():`
			`text = str(idc.get_strlit_contents(s.ea, s.length, s.strtype))`
			`refs = list(idautils.XrefsTo(s.ea))`
			`if refs:`
			`for ref in refs:`
			`func = idc.get_func_name(ref.frm)`
			`print(f"{s.ea:#x} [{text!r:40s}] ← {func or 'unknown'} @ {ref.frm:#x}")`
More information added 2026-04-27 18:34:07 -04:00
			`---`

			`## Decision Guide — String-Driven Entry Points`

			`\| String Type \| What to Do First \| What It Tells You \|`
			`\|---\|---\|---\|`
			\| `".library"` \| Xref → find OpenLibrary \| Every OS API the program uses \|
			\| `"Error:"` / `"Can't"` / `"Failed"` \| Xref → error handler \| Failure code paths, rare branches \|
			\| `"%d"` / `"%s"` / `"%ld"` \| Xref → VPrintf/printf \| Logging sites, parameter types \|
			\| File paths (`"SYS:"`, `"LIBS:"`, `"PROGDIR:"`) \| Xref → Open/Lock/LoadSeg \| File I/O targets \|
			`\| Screen/window titles \| Xref → OpenScreen/OpenWindow \| Application identity, version \|`

			`---`

			`## Named Antipatterns`

			`### 1. "The Dead String"`

			`What it looks like — finding an error string with no cross-references and assuming the code path is unreachable:`

			```asm
			`LEA _err_fatal(PC), A0 ; "FATAL: disk error"`
			`; No xref to this string — but it's used via computed address!`
			```

			`Why it fails: Some programs build string addresses dynamically (e.g., through a string table indexed at runtime). IDA won't detect these as xrefs. The string IS used — just not through a static reference.`

			`Correct: For strings without xrefs, check if they're part of a larger string table (consecutive string data). If so, a function loading a base address + computed offset may reference them dynamically.`

			`### 2. "The Null Bait"`

			`What it looks like — IDA showing a 100-character "string" because it didn't stop at an embedded null:`

			```asm
			`; SAS/C strings are Pascal-style: length-prefixed, NOT null-terminated!`
			`DC.B $0E, "Hello, World!", 0 ; length byte = 14, then data, then null`
			`; IDA sees only "Hello, World!" — misses the length byte`
			```

			`Why it fails: SAS/C uses Pascal-style strings (length byte prefix) for some internal data. IDA's C-style null-terminated string detection stops at the first null and may misinterpret string boundaries.`

			`Correct: Check the byte before the string. If it equals the string length, it's a Pascal string — the string starts at that byte, not after it.`

			`---`

			`## Use-Case Cookbook`

			`### Map Every OS API Call from Strings Alone`

			```python
			`# IDA Python: from .library strings → OpenLibrary → all calls`
			`import idautils, idc`

			`LIBRARIES = {}`
			`for s in idautils.Strings():`
			`text = str(s)`
			`if text.endswith('.library'):`
			`for xref in idautils.XrefsTo(s.ea):`
			`# Walk forward from xref to find JSR (-552,A6)`
			`ea = xref.frm`
			`for _ in range(20):`
			`if idc.print_insn_mnem(ea) == 'JSR':`
			`op = idc.print_operand(ea, 0)`
			`if '-552' in op:`
			`# Find where D0 (result) is stored`
			`next_ea = idc.next_head(ea)`
			`if idc.print_insn_mnem(next_ea) == 'MOVE.L':`
			`dest = idc.print_operand(next_ea, 0)`
			`LIBRARIES[text] = dest`
			`print(f"{text} → stored at {dest}")`
			`ea = idc.next_head(ea)`
			```

			`### Find All Version Strings`

			Version strings often follow the pattern `"$VER: name version (date)"`:

			```bash
			`strings mybinary \| grep -i '\$VER:'`
			`# Output: $VER: MyApp 1.23 (12.04.1993)`
			```

			`---`

			`## Cross-Platform Comparison`

			`\| Amiga Concept \| Win32 Equivalent \| Linux Equivalent \| Notes \|`
			`\|---\|---\|---\|---\|`
			\| `.library` strings → OpenLibrary \| `.dll` strings → LoadLibrary \| `.so` strings → dlopen \| Same pattern: string identifies dynamically loaded module \|
			\| String xref analysis \| `strings.exe` + IDA cross-reference \| `strings` + radare2/Ghidra xref \| Universal RE technique: strings are the first foothold \|
			`\| SAS/C Pascal strings \| Delphi/BCB short strings \| N/A (C-dominated ecosystem) \| Pascal-style strings are rare outside Amiga SAS/C \|`
			\| `$VER:` version string convention \| `VS_VERSION_INFO` resource \| `.comment` ELF section \| Amiga's convention is informal but widely followed \|

			`---`

			`## FAQ`

			`### Why do some strings have no xrefs in IDA?`

			`Possible causes: (1) the string is referenced via a computed address (base+index), (2) the string is in a data table accessed by offset, (3) the string is dead code from a library compiled in but never called, (4) IDA's string detection split a long string incorrectly.`

			`### How do I handle non-ASCII strings (German umlauts, etc.)?`

			Amiga uses ISO 8859-1 (Latin-1) encoding. Characters above `$7F` are valid Latin-1 but may display incorrectly in IDA's default ASCII view. Set IDA's string encoding to Latin-1 or use `idc.get_strlit_contents(ea, -1, STRTYPE_C_16)` for wide strings.

			`---`

			`## References`
docs(amiga): complete AmigaOS 3.1/3.2 developer reference — 172 files across 17 sections Comprehensive technical documentation covering: - Hardware: OCS/ECS/AGA custom chip registers, Copper & Blitter deep dives - Boot sequence: cold boot through startup-sequence - Binary format: HUNK executable spec, relocation, debug info - Linking & ABI: .fd files, LVO tables, register calling conventions - Exec kernel: tasks, interrupts, memory, signals, semaphores - AmigaDOS: file I/O, FFS/OFS layout, CLI/Shell scripting - Graphics: planar bitmaps, Copper programming, HAM/EHB modes - Intuition: screens, windows, IDCMP, BOOPSI - Devices: trackdisk, SCSI, serial, timer, audio, keyboard - Libraries: utility, expansion, IFFParse, locale, ARexx - Networking: bsdsocket API, SANA-II, TCP/IP stack comparison - Toolchain: GCC, vasm/vlink, SAS/C, NDK, debugging - Reverse engineering: IDA/Ghidra setup, compiler fingerprints, case studies - CPU & MMU: 68040/060 emulation libs, PMMU, cache management - Driver development: SANA-II, Picasso96/RTG, AHI audio All files include breadcrumb navigation. No local paths or proprietary content. 2026-04-23 12:16:52 -04:00			```

			`---`

			`## References`

			`- IDA Pro: Strings subview (Shift+F12), Xrefs (X key)`
			- `static/api_call_identification.md` — resolving library base from string xrefs
			- NDK39: `dos/dos.h` — `VPrintf`, `FPrintf`, error code strings