amiga-bootcamp/05_reversing/static/compilers/vbcc.md

12 KiB

← Home · Reverse Engineering · Static Analysis · Compilers

VBCC — Reverse Engineering Field Manual

Overview

VBCC (Volker Barthelmann's C Compiler) is a portable, retargetable ISO C89 compiler that produces the smallest binaries among Amiga compilers. Its key RE characteristics are: no frame pointer (SP-relative access only), per-function register saves (only what's actually used), PC-relative string addressing, and a distinctive __reg() calling convention for AmigaOS library calls. VBCC generates clean, tight code that can look deceptively like hand-optimized assembly.

Key constraints:

  • No LINK instruction — VBCC never uses LINK A5 or LINK A6. Locals are accessed via $offset(SP). Function boundaries are defined by MOVEM.L ... -(SP) at entry and RTS at exit.
  • Minimal register saves — Unlike SAS/C (9 registers always) or GCC (per-function but often substantial), VBCC saves only the exact registers used. A leaf function with no locals has no prologue at all.
  • Tail-call optimization — VBCC uses BRA.S to common epilogue blocks and BRA to tail-call other functions more aggressively than any other Amiga compiler.
  • __MERGED hunks — VBCC sometimes merges CODE and DATA into a single hunk when the small data model is active.
  • Hunk names: CODE, DATA, BSS (+ optional __MERGED for small-data)
graph TB
    subgraph "Source (.c)"
        SRC["C source (C89)"]
    end
    subgraph "VBCC Compiler Pipeline"
        VC["vc (driver)"]
        VBCCM68K["vbccm68k (code generator)"]
        VASM["vasm (assembler)"]
        VLINK["vlink (linker)"]
    end
    subgraph "Binary Output"
        HUNK["Amiga HUNK executable"]
        CODE["CODE hunk"]
        DATA["DATA hunk"]
        MERGED["__MERGED (optional, small-data)"]
    end

    SRC --> VC
    VC --> VBCCM68K --> VASM --> VLINK
    VLINK --> HUNK
    HUNK --> CODE & DATA
    HUNK --> MERGED

Binary Identification — The VBCC Signature

Function Prologue — Nothing or Minimal

; VBCC leaf function (no locals, no calls):
_simple_func:
    ; NO prologue at all
    ; ... function body ...
    RTS

; VBCC function with locals:
_moderate_func:
    MOVEM.L D2-D3/A2, -(SP)       ; saves ONLY the 3 registers used
    ; ... function body ...
    MOVEM.L (SP)+, D2-D3/A2
    RTS

; VBCC large function:
_large_func:
    MOVEM.L D2-D5/A2-A3, -(SP)    ; per-function exact save
    LEA     -$80(SP), SP           ; allocate stack frame
    ; ... function body ...
    LEA     $80(SP), SP
    MOVEM.L (SP)+, D2-D5/A2-A3
    RTS

Key differentiator from GCC: Both VBCC and GCC use per-function register saves, but VBCC's code is consistently tighter. VBCC uses BRA.S label to share common epilogue/cleanup code, where GCC duplicates it. VBCC uses MOVEQ and ADDQ aggressively for small constants.

String Addressing

Like GCC, VBCC uses PC-relative string addressing:

    LEA     .str_hello(PC), A0
    JSR     _Printf

.str_hello: DC.B "Hello", $0A, 00

The __reg() Calling Convention — Unique VBCC Fingerprint

VBCC's __reg() keyword places C variables in named CPU registers without inline assembly:

/* VBCC source: */
BPTR __reg("d0") MyOpen(__reg("d1") CONST_STRPTR name,
                        __reg("d2") LONG accessMode);
; Generated code for Open("foo", MODE_OLDFILE):
    MOVEA.L _DOSBase, A6
    LEA     .str_foo(PC), A0
    MOVE.L  A0, D1                 ; name → D1
    MOVEQ   #1002, D2              ; MODE_OLDFILE → D2
    JSR     -$1E(A6)               ; Open() LVO

No other Amiga compiler generates this exact register-to-argument mapping without inline assembly stubs. The __reg() assignments are visible only through the register usage pattern — functions that take args in specific registers (D1, D2, D3, etc.) without stack access.


Library Call Patterns

VBCC library calls are compact and direct:

; VBCC library call — minimal code:
    MOVEA.L (_DOSBase).L, A6       ; load library base (absolute with relocation)
    MOVE.L  fh(SP), D1             ; arg from stack
    MOVE.L  buf(SP), D2
    MOVE.L  len(SP), D3
    JSR     -$2A(A6)               ; Read()

; Return value check:
    TST.L   D0
    BMI.S   .error

VBCC differs from SAS/C here: SAS/C would load args through A5-relative offsets ($08(A5)). VBCC uses SP-relative offsets. Since SP may change within the function (pushing args), VBCC carefully maintains SP offsets.

#pragma amicall — VBCC Library Call Pragmas

#pragma amicall(DOSBase, 0x1E, Open(d1, d2))
// VBCC pragma format is simpler than SAS/C:
// - Library base name (identifier, not a string)
// - LVO in hex
// - Function name with argument register list

In the binary, these pragmas produce the same JSR -$XXX(A6) patterns as any other compiler — the pragma just controls argument register assignment.


Optimization Patterns

VBCC prioritizes code density over raw speed. Its signatures:

Pattern VBCC Style SAS/C Equivalent
Shared epilogue BRA.S .epilogue from multiple exit points Duplicated epilogue at each return
Tail calls BRA _other_func (discard own frame first) JSR _other_func / RTS
Small constant loading MOVEQ #N, Dn whenever possible MOVE.L #N, Dn for some small values
Stack frame LEA -$N(SP), SP (when frame > 32K or variable) LINK A5, #-N
Loop termination DBRA Dn, loop (when counter fits in 16 bits) SUBQ.L #1, Dn / BNE loop

Cross-Module Optimization

VBCC supports cross-module optimization — when linking, vlink can reorder and merge functions across .o files. In the binary, this means function layout may NOT match source file order, and small static functions may be inlined at link time.


Same C Function — VBCC Output

; CountWords() — VBCC, -O -speed:
; C prototype: ULONG CountWords(CONST_STRPTR str)

_CountWords:
    MOVEM.L D2-D3, -(SP)          ; only D2, D3 needed
    
    MOVEQ   #0, D2                 ; D2 = count
    MOVEQ   #0, D3                 ; D3 = in_word
    
    MOVEA.L $0C(SP), A0            ; A0 = str (arg at SP + 12)
    
    BRA.S   .loop_test

.loop_body:
    CMPI.B  #' ', (A0)             ; *str == ' '?
    BEQ.S   .not_word
    CMPI.B  #'\t', (A0)
    BEQ.S   .not_word
    CMPI.B  #'\n', (A0)
    BEQ.S   .not_word
    
    TST.B   D3
    BNE.S   .next_char
    
    ADDQ.L  #1, D2                 ; count++
    MOVEQ   #1, D3                 ; in_word = TRUE
    BRA.S   .next_char

.not_word:
    MOVEQ   #0, D3                 ; in_word = FALSE

.next_char:
    ADDQ.L  #1, A0                 ; str++

.loop_test:
    TST.B   (A0)
    BNE.S   .loop_body

    MOVE.L  D2, D0                 ; return count
    MOVEM.L (SP)+, D2-D3
    RTS

VBCC-specific observations:

  1. MOVEM.L D2-D3, -(SP) — only 2 registers saved. Minimal.
  2. BRA.S .loop_test — unconditional branch to loop condition at top.
  3. BRA.S .next_char — shared increment code reached from two paths.
  4. Identical to GCC in this function because the function is simple enough that optimization differences don't show. For more complex functions (with multiple return paths, struct access, switch statements), VBCC's shared-epilogue and tail-call patterns emerge.
Cross-Compiler Comparison (CountWords, bytes of code):
  SAS/C -O2:  ~52 bytes (LINK A5 + 9-reg save + epilogue overhead)
  GCC -O2:    ~48 bytes (no LINK, minimal save, CMPI.B)
  VBCC -speed:~46 bytes (no LINK, minimal save, aggressive BRA sharing)
  DICE C:     ~48 bytes (similar to VBCC)

Named Antipatterns

; VBCC function boundaries are RTS-delimited, not LINK-delimited.
; If your IDA script searches for LINK to find functions, you'll miss ALL VBCC functions.

; VBCC function entry could be any of:
;   1. MOVEM.L ..., -(SP)  (most common)
;   2. LEA -$XX(SP), SP     (large frame)
;   3. First instruction after previous RTS (leaf functions)
;   4. TST.L D0 / BEQ ...   (function that doesn't save any regs)

"The Register Ghost" — __reg() Without Symbols

Without source-level __reg() declarations, VBCC function arguments appear to use arbitrary register assignments. This can look like a custom ABI. The pattern is actually the VBCC __reg() convention encoded via <proto/*.h> headers during compilation.


Pitfalls & Common Mistakes

1. Confusing VBCC and GCC Output

Both omit frame pointers and use per-function saves. Disambiguate by:

  • Hunk names: VBCC uses CODE/DATA; GCC uses .text/.data (usually)
  • __MERGED hunk: VBCC-specific — no other compiler produces this
  • Function naming: VBCC emits names like _funcname; GCC emits .Lxxx internal labels
  • BRA density: VBCC has more BRA.S instructions (shared epilogues); GCC tends to duplicate code

2. Misreading SP-Relative Offsets

; At function entry (after MOVEM.L D2-D3, -(SP)):
; SP points 8 bytes below entry SP (D2 and D3 pushed)
; Arg1 is at $0C(SP)  (8 bytes regs + 4 bytes return addr)
; But after LEA -$10(SP), SP:
; Arg1 is now at $1C(SP)  (8 regs + 4 ret + 16 locals)
; The offset CHANGES when SP is modified — unlike A5-relative offsets

Track every LEA +/-$N(SP), SP instruction — each one shifts ALL subsequent SP-relative offsets.


Use Cases

Software Known to Be VBCC-Compiled

Application Notes
ScummVM (some ports) Large C codebase; VBCC's strict C89 catches portability issues
Modern Amiga utilities Many 2000s+ CLI tools use VBCC for small binary size
AROS system components VBCC is a supported AROS build compiler
MUI 5 custom classes Tight BOOPSI dispatch benefits from VBCC's register allocation
AmigaOS 4 system libraries Hyperion's SDK supports VBCC for OS4 development

Historical Context

VBCC was created by Volker Barthelmann in the mid-1990s as a lightweight alternative to GCC's growing complexity. While GCC was the "heavy" compiler with C++ support, VBCC targeted developers who wanted a fast, standards-compliant C89 compiler that produced small binaries.

Unlike SAS/C (commercial, dead since 1996) and GCC (open source but complex), VBCC occupies a unique niche: actively maintained, free for personal use, with a clean codebase. Its vlink linker and vasm assembler companion tools form a complete toolchain that has become the de facto standard for modern Amiga development alongside GCC bebbo.


Modern Analogies

VBCC Concept Modern Equivalent
__reg() register ... asm("d0") in GCC/Clang (GNU C extension)
Per-function register save Clang's -O2 with aggressive register allocation
Cross-module optimization LTO (Link-Time Optimization) in modern compilers
vlink with vasm LLVM's integrated lld linker with clang
Config-driven target system LLVM's TargetRegistry and target description files

FPGA / Emulation Impact

  • No LINK/UNLK: VBCC binaries don't use these instructions, reducing test coverage needs for frame pointer ops on FPGA cores.
  • Aggressive LEA for stack frames: LEA -$N(SP), SP must correctly update SP in a single instruction — verify your FPGA core handles LEA with SP destination correctly.
  • Cross-module optimization: No runtime impact; all inlining and merging happens at link time.

FAQ

Q: How do I distinguish VBCC from GCC output? A: Check hunk names — VBCC uses CODE/DATA, GCC typically uses .text/.data. Check for __MERGED hunk (VBCC-only). Check internal labels: VBCC uses _name format; GCC uses .Lxxx. Check BRA density — VBCC shares epilogues more aggressively.

Q: Does VBCC support C++? A: No. If you find C++ constructs (vtables, new/delete, name mangling), it's NOT VBCC.

Q: Can VBCC and GCC object files be mixed? A: No. They use different calling conventions for internal runtime functions. Link the entire project with one compiler. Assembly (vasm) can be mixed with VBCC C code using vlink.


References