amiga-bootcamp/11_libraries/translator.md

537 lines
21 KiB
Markdown
Raw Permalink Normal View History

2026-04-26 14:46:18 -04:00
[← Home](../README.md) · [Libraries](README.md)
# translator.library — English-to-Phonetic Translation for Speech Synthesis
## Overview
`translator.library` is the front half of the Amiga's built-in text-to-speech pipeline: a single-function library that converts unrestricted English text into **phonetic strings** — the expanded ARPABET phoneme codes used by `narrator.device` to generate human-like speech through the Amiga's audio hardware. Introduced with AmigaOS 1.2 and distributed as a disk-based library in `LIBS:`, it encapsulates over 450 context-sensitive pronunciation rules, an exception dictionary for irregular words (through, though, cough), abbreviation expansion (Dr., Prof., lb.), and automatic content-word accentuation — all in a single call: `Translate()`. The output is a string of space-delimited phoneme codes with stress markers that can be passed directly to `narrator.device` via `CMD_WRITE`, stored for later playback, or analyzed for phonetic research. While hand-coded phonetics always produce higher-quality speech, `Translate()` is the only practical option when the input is arbitrary user text at runtime.
---
## Architecture
### The Amiga Speech Pipeline
```mermaid
flowchart LR
subgraph INPUT["Input Layer"]
ENG["English Text<br/>(ASCII)"]
PHON["Hand-Coded<br/>Phonetic String"]
end
subgraph TRANSLATOR["translator.library"]
TR["Translate()<br/>English → Phonetic"]
RULES["450+ Context Rules<br/>Exception Dictionary<br/>Abbreviation Expansion"]
end
subgraph NARRATOR["narrator.device"]
SYNTH["Speech Synthesizer<br/>Formant Model"]
MOUTH["Mouth Shape<br/>Generator"]
end
subgraph OUTPUT["Output Layer"]
AUDIO["audio.device<br/>DMA Audio Channels"]
MOUTHDATA["mouth_rb<br/>Width/Height"]
end
ENG --> TR
TR --> RULES
RULES --> TR
TR -->|"Phonetic String"| SYNTH
PHON --> SYNTH
SYNTH --> AUDIO
SYNTH --> MOUTH
MOUTH --> MOUTHDATA
style TR fill:#e8f4fd,stroke:#2196f3,color:#333
style SYNTH fill:#fff3e0,stroke:#ff9800,color:#333
```
### Library Base
| Name | Type | Description |
|---|---|---|
| `TranslatorBase` | `struct Library *` | Library base pointer returned by `OpenLibrary()` |
| `ITranslator` | Interface pointer (OS 4.x+) | Interface-based access for AmigaOS 4+ |
`translator.library` is a **disk-based** library — it lives in `LIBS:translator.library`, not in ROM. This means `OpenLibrary()` can fail if the file is missing, and the library can be expunged from memory under low-memory conditions.
### Key Design Decisions
| Decision | Rationale |
|---|---|
| **Single-function API** | Translation is inherently stateless — input text, output phonetics. No session, no configuration |
| **Disk-based, not ROM** | Phonetic dictionary is large (~20+ KB of rules); keeping it out of ROM saves Kickstart space |
| **Negative return codes for overflow** | Allows progressive translation of long texts without pre-allocating a huge buffer |
| **Rule-based, not neural** | 1985 technology couldn't run a neural TTS; the 450 context-sensitive rules were state-of-the-art for the era |
---
## API Reference
### Opening and Closing
```c
/* Classic AmigaOS (1.x3.x) — LVO -30 */
struct Library *TranslatorBase;
TranslatorBase = OpenLibrary("translator.library", 0);
if (!TranslatorBase) { /* LIBS:translator.library not found */ }
/* ... use Translate() ... */
CloseLibrary(TranslatorBase);
```
```c
/* AmigaOS 4.x — Interface-based */
struct Library *TranslatorBase;
struct TranslatorIFace *ITranslator;
TranslatorBase = IExec->OpenLibrary("translator.library", 0);
if (TranslatorBase)
{
ITranslator = (struct TranslatorIFace *)
IExec->GetInterface(TranslatorBase, "main", 1, NULL);
if (ITranslator)
{
/* ... use ITranslator->Translate() ... */
}
IExec->DropInterface((struct Interface *)ITranslator);
}
IExec->CloseLibrary(TranslatorBase);
```
### Translate()
```c
/* LVO -36 — Converts English text to phonetic string */
LONG Translate(STRPTR input, /* a0: English input string */
LONG inputLen, /* d0: length of input */
STRPTR output, /* a1: output buffer for phonetics */
LONG outputSize /* d0: size of output buffer */);
```
| Parameter | Description |
|---|---|
| `input` | Null-terminated or length-delimited English ASCII string. Case-insensitive; punctuation is preserved where it affects pronunciation |
| `inputLen` | Number of characters to translate from `input`. Use `strlen(input)` for the full string |
| `output` | Pre-allocated buffer to receive the phonetic string. **Must be large enough** — phonetics are typically 24× the input length |
| `outputSize` | Size of the output buffer in bytes |
**Return value:**
| Return | Meaning |
|---|---|
| `0` | Full translation succeeded; output buffer was large enough |
| **Negative** value | Buffer overflow — translation stopped at a word boundary. `-(rtnCode)` is the character offset in the input string where translation ended. Resume by calling `Translate(input + offset, inputLen - offset, output, outputSize)` |
| Other non-zero | Translation error (unlikely — the library tries to translate literally if rules fail) |
> [!NOTE]
> The negative return value always stops at a **word boundary** (space or punctuation), not mid-word. This prevents split phonemes and makes resumption seamless.
### Output Format
The output is a space-delimited string of **ARPABET phoneme codes** with **stress markers** appended to vowels:
```
Input: "This is Amiga speaking."
Output: "DH IH1 Z IH1 Z AE1 M IH0 G AH0 S P IY1 K IH0 NG ."
└─ "This" ─┘ └"is"─┘ └─── "Amiga" ───┘ └─── "speaking" ───┘
```
| Marker | Meaning | Example |
|---|---|---|
| `0` | No stress (unstressed vowel) | `IH0` = unstressed "i" (as in "rabbit") |
| `1` | Primary stress | `IY1` = stressed "ee" (as in "speak") |
| `2` | Secondary stress | `OW2` = secondary "oh" (as in "overflow") |
| `3` | Emphatic stress (rare) | Used for contrastive emphasis |
---
## Phonetic Output Examples
| English Input | Phonetic Output (approx.) |
|---|---|
| `Hello world.` | `HH EH0 L OW1 W ER1 L D .` |
| `The quick brown fox.` | `DH AH0 K W IH1 K B R AW1 N F AA1 K S .` |
| `Amiga` | `AE1 M IH0 G AH0` or `AH0 M IY1 G AH0` (both valid) |
| `Commodore` | `K AA1 M AH0 D AO1 R` |
| `Guru Meditation` | `G UH1 R UW0 M EH2 D IH0 T EY1 SH AH0 N` |
> [!WARNING]
> The translator library was designed for **American English** pronunciation. British spellings (colour, centre) and non-English words will be translated using American phonetic rules and may sound odd.
---
## Integration with narrator.device
The standard workflow:
```c
#include <devices/narrator.h>
#include <clib/translator_protos.h>
/* 1. Open translator */
struct Library *TranslatorBase = OpenLibrary("translator.library", 0);
/* 2. Open narrator device */
struct MsgPort *mp = CreatePort(NULL, 0);
struct narrator_rb *voiceIO = (struct narrator_rb *)
CreateExtIO(mp, sizeof(struct narrator_rb));
OpenDevice("narrator.device", 0, (struct IORequest *)voiceIO, 0);
/* 3. Translate English → phonetic */
#define PHONBUF_SIZE 2048
STRPTR english = "Welcome to the Amiga speech system.";
UBYTE phonBuffer[PHONBUF_SIZE];
LONG result = Translate(english, strlen(english),
(STRPTR)phonBuffer, PHONBUF_SIZE);
if (result == 0)
{
/* 4. Configure voice parameters */
voiceIO->rate = 150; /* words per minute */
voiceIO->pitch = 110; /* Hz baseline */
voiceIO->sex = 0; /* 0=male, 1=female */
voiceIO->volume = 64; /* 064 */
voiceIO->sampfreq = 22200; /* Hz (Amiga native rate) */
/* 5. Send to narrator */
voiceIO->message.io_Command = CMD_WRITE;
voiceIO->message.io_Data = phonBuffer;
voiceIO->message.io_Length = strlen((STRPTR)phonBuffer);
DoIO((struct IORequest *)voiceIO);
}
/* 6. Cleanup */
CloseDevice((struct IORequest *)voiceIO);
DeleteExtIO((struct IORequest *)voiceIO);
DeletePort(mp);
CloseLibrary(TranslatorBase);
```
---
## When to Use / When NOT to Use
| Scenario | Use `Translate()`? | Rationale |
|---|---|---|
| **Unrestricted user input** (text editor, terminal, chat) | ✅ Yes | Only practical option — you can't pre-code phonetics for arbitrary text |
| **Fixed application strings** (game dialog, error messages) | ❌ No | Hand-code phonetics once; ship the phonetic strings. Much better quality |
| **Accessibility screen reader** | ✅ Yes | Essential — must speak whatever is on screen |
| **Demo/game with iconic lines** | ❌ No | Hand-tune phonetics, stress, and timing for maximum impact |
| **Multi-language support** | ❌ No | translator.library is English-only; use a third-party TTS or pre-recorded samples |
| **Phonetic research/analysis** | ⚠️ Maybe | Output is useful for analysis but not linguistically rigorous — use as a starting point |
| **Speaking numbers/dates** | ⚠️ Maybe | Library handles some abbreviations but not all; pre-process complex formats into spelled-out words |
---
## Pitfalls & Common Mistakes
### 1. Underestimating Phonetic Buffer Size
The phonetic representation is **always longer** than the input English. A 100-character sentence typically produces 300500 bytes of phonetics:
```c
/* BAD: Same-sized buffer — will overflow on first long word */
UBYTE phonBuf[256];
STRPTR english = "The extraordinarily complicated implementation...";
LONG result = Translate(english, strlen(english), (STRPTR)phonBuf, 256);
/* result will be negative — phonetic for "extraordinarily" alone is ~40 chars */
/* CORRECT: Allocate 4× input length, minimum 512 bytes */
#define PHONBUF_SIZE(maxInput) (((maxInput) * 4) + 512)
UBYTE *phonBuf = AllocMem(PHONBUF_SIZE(strlen(english)), MEMF_ANY);
```
### 2. Ignoring Negative Return Code
A negative return from `Translate()` is a **resumption offset**, not a fatal error:
```c
/* BAD: Treats partial translation as failure */
LONG rtn = Translate(text, len, buf, size);
if (rtn != 0) { /* panic — but text was partially translated! */ }
/* CORRECT: Resume from offset on negative return */
LONG offset = 0;
while (offset < len)
{
LONG rtn = Translate(text + offset, len - offset, buf, BUF_SIZE);
if (rtn == 0) break; /* done */
if (rtn < 0) offset += (-rtn); /* resume from word boundary */
else { /* unexpected error */ break; }
}
```
### 3. Passing Non-Null-Terminated Input with Wrong Length
If `inputLen` doesn't match the actual string, `Translate()` reads garbage or stops early:
```c
/* BAD: strlen() on a buffer that may not be null-terminated */
UBYTE buf[256];
Read(fh, buf, 256); /* may fill entire buffer — no terminator */
Translate((STRPTR)buf, strlen((STRPTR)buf), out, 1024);
/* strlen() may read past the buffer! */
/* CORRECT: Use the explicit read count */
LONG actual = Read(fh, buf, 256);
Translate((STRPTR)buf, actual, out, 1024);
```
### 4. Not Checking for Missing Disk-Based Library
Unlike ROM libraries, `translator.library` may not be present:
```c
/* BAD: Assumes library is always available */
struct Library *TranslatorBase = OpenLibrary("translator.library", 0);
Translate("Hello", 5, buf, 512); /* crash if TranslatorBase == NULL! */
/* CORRECT: Always check the return */
struct Library *TranslatorBase = OpenLibrary("translator.library", 0);
if (TranslatorBase)
{
Translate("Hello", 5, buf, 512);
CloseLibrary(TranslatorBase);
}
else
{
Printf("Speech not available — translator.library missing\n");
}
```
---
## Named Antipatterns
### "The Mumbler" — Unrealistic Rate/Pitch
Setting `rate` extremely high makes speech unintelligible, but the translator itself has nothing to do with it — the problem is feeding valid phonetics to a misconfigured narrator:
```c
/* BAD: Chipmunk speech */
voiceIO->rate = 400; /* 400 words/min — unintelligible */
voiceIO->pitch = 255; /* extremely high pitch */
/* Sensible defaults: */
voiceIO->rate = 150; /* natural conversational speed */
voiceIO->pitch = 110; /* male baseline (85110 for male, 160220 for female) */
voiceIO->sex = 0; /* 0=male, 1=female */
```
### "The Silent Speaker" — Mismatched Audio Allocation
The narrator device must allocate audio channels. If another application holds all four channels, `OpenDevice("narrator.device", ...)` succeeds but speech may not be audible:
```c
/* BAD: No check on audio channel availability */
OpenDevice("narrator.device", 0, (struct IORequest *)voiceIO, 0);
/* Speech may be silent if audio channels are all in use */
/* CORRECT: Set channel mask to request specific channels */
UBYTE chanMasks[] = { 0x03, 0x0C, 0x30, 0xC0 }; /* try channels 0-1, 2-3, 4-5, 6-7 */
voiceIO->ch_masks = chanMasks;
voiceIO->nm_masks = 4;
```
### "The Echo" — Forgetting io_Data Nesting
When you send a `CMD_WRITE` to the narrator device, the `io_Data` pointer must remain valid until the I/O completes. Using a stack buffer with `DoIO()` is fine (blocking); using `SendIO()` (asynchronous) with a stack buffer is not:
```c
/* BAD: Stack buffer with async I/O */
void SpeakAsync(STRPTR text)
{
UBYTE phonBuf[512]; /* stack — disappears on return! */
Translate(text, strlen(text), (STRPTR)phonBuf, 512);
voiceIO->message.io_Data = phonBuf;
SendIO((struct IORequest *)voiceIO); /* async — phonBuf gone when this returns */
}
/* CORRECT: Allocate or use static buffer for async */
UBYTE phonBuf[2048]; /* static — stays valid */
void SpeakAsync(STRPTR text)
{
Translate(text, strlen(text), (STRPTR)phonBuf, sizeof(phonBuf));
voiceIO->message.io_Data = phonBuf;
SendIO((struct IORequest *)voiceIO);
/* phonBuf lives until AbortIO or CMD_FLUSH */
}
```
---
## FAQ
**Q: Can I use translator.library without narrator.device?**
Yes. The phonetic output is a plain ASCII string — you can save it, analyze it, send it over a network, or use it as input to a custom speech synthesizer. The translator and narrator are independent.
**Q: Why does the same word sometimes translate differently?**
The translator uses **context-sensitive** rules. The pronunciation of "read" depends on surrounding tense markers; "record" as a noun vs. verb gets different stress. The same word in different sentences may produce different phonetics — this is correct behavior.
**Q: How do I make the narrator sound female?**
Set `voiceIO->sex = 1` (female). This adjusts formant frequencies and baseline pitch. For manual fine-tuning, adjust `voiceIO->pitch` (160220 Hz for female) and `voiceIO->F1adj` through `F3adj` (formant shifts).
**Q: Can translator.library handle multiple languages?**
No. The rule set and exception dictionary are English-only. German, French, or other languages will be treated as misspelled English and produce garbled phonetics. Use locale-specific TTS solutions for non-English speech.
**Q: How big is the output buffer really needed?**
Empirically, 4× the input length plus a 512-byte safety margin. The longest single English word phonetics (like "supercalifragilisticexpialidocious") is roughly 80 characters from 34 input characters. A typical sentence expands 2.53×.
**Q: Does Translate() handle punctuation?**
Yes. Punctuation marks (`.`, `,`, `?`, `!`, `;`, `:`) are passed through to the phonetic output. The narrator device interprets them as prosody cues: `.` = falling intonation, `?` = rising intonation.
---
## Use-Case Cookbook
### 1. Simple One-Shot Speech
The blocking pattern — suitable for alert messages, game notifications, short announcements:
```c
void Say(STRPTR english)
{
struct Library *TranslatorBase = OpenLibrary("translator.library", 0);
if (!TranslatorBase) return;
UBYTE phonBuf[2048];
LONG rtn = Translate(english, strlen(english),
(STRPTR)phonBuf, sizeof(phonBuf));
if (rtn == 0)
{
struct MsgPort *mp = CreatePort(NULL, 0);
struct narrator_rb *vio = (struct narrator_rb *)
CreateExtIO(mp, sizeof(struct narrator_rb));
if (OpenDevice("narrator.device", 0, (struct IORequest *)vio, 0) == 0)
{
vio->rate = 150;
vio->pitch = 110;
vio->volume = 64;
vio->sampfreq = 22200;
vio->message.io_Command = CMD_WRITE;
vio->message.io_Data = phonBuf;
vio->message.io_Length = strlen((STRPTR)phonBuf);
DoIO((struct IORequest *)vio);
CloseDevice((struct IORequest *)vio);
}
DeleteExtIO((struct IORequest *)vio);
DeletePort(mp);
}
CloseLibrary(TranslatorBase);
}
/* Usage: */
Say("Game over. Insert coin to continue.");
```
### 2. Animated Talking Head (with Mouth Shapes)
The narrator can generate mouth width/height data while speaking:
```c
/* Open two I/O requests — one for speech, one for mouth data */
struct narrator_rb *voiceIO = /* ... */;
struct mouth_rb *mouthIO = (struct mouth_rb *)
CreateExtIO(mp, sizeof(struct mouth_rb));
/* Enable mouth shape generation */
voiceIO->mouths = 1; /* non-zero = generate mouth data */
/* Send speech command */
voiceIO->message.io_Command = CMD_WRITE;
voiceIO->message.io_Data = phonBuf;
voiceIO->message.io_Length = strlen((STRPTR)phonBuf);
SendIO((struct IORequest *)voiceIO);
/* While speaking, read mouth shapes */
while (!CheckIO((struct IORequest *)voiceIO))
{
mouthIO->voice.message.io_Command = CMD_READ;
mouthIO->voice.message.io_Data = phonBuf; /* same buffer — narrator correlates */
mouthIO->voice.message.io_Length = strlen((STRPTR)phonBuf);
DoIO((struct IORequest *)mouthIO);
/* mouthIO->width = 0..255 (closed → wide open) */
/* mouthIO->height = 0..255 (closed → wide open) */
AnimateMouth(mouthIO->width, mouthIO->height);
}
```
### 3. Progressive Translation of Long Text
For documents or long-form text where a single 2 KB buffer won't suffice:
```c
LONG TranslateLongText(STRPTR text, LONG totalLen, BPTR outputFH)
{
UBYTE phonBuf[2048];
LONG offset = 0;
while (offset < totalLen)
{
LONG bytesAvail = totalLen - offset;
LONG rtn = Translate(text + offset, bytesAvail,
(STRPTR)phonBuf, sizeof(phonBuf));
if (rtn == 0)
{
/* Final chunk — write and done */
LONG phonLen = strlen((STRPTR)phonBuf);
Write(outputFH, phonBuf, phonLen);
break;
}
else if (rtn < 0)
{
/* Write completed portion, resume at word boundary */
LONG phonLen = strlen((STRPTR)phonBuf);
Write(outputFH, phonBuf, phonLen);
offset += (-rtn);
}
else
{
/* unexpected error */
return rtn;
}
}
return 0;
}
```
---
## Modern Analogies
| Amiga Concept | Modern Equivalent | Why It Maps | Where It Diverges |
|---|---|---|---|
| **translator.library** | macOS `NSSpeechSynthesizer` / Windows SAPI Text-to-Speech | Both accept English text and produce speech. The API philosophy — text in, audio out — is identical | Modern APIs bundle translation and synthesis; Amiga splits them into library (translate) and device (speak) |
| **ARPABET phonemes** | IPA (International Phonetic Alphabet) | Both encode pronunciation as discrete symbols. ARPABET is a machine-readable subset of IPA | ARPABET is English-only; IPA is universal. ARPABET uses ASCII, IPA uses Unicode |
| **450 context-sensitive rules** | Modern TTS neural networks (Tacotron, FastSpeech) | Both learn pronunciation from data — rules are a 1985 hand-crafted "model" | Neural TTS requires gigabytes of training data; rule-based works with zero training |
| **narrator.device formant synthesis** | Vocaloid / singing synthesis | Both use formant models (F0, F1, F2...) to generate vocal sounds | Narrator.device is a 1985-era 8-bit formant synth; Vocaloid uses concatenative sampling + ML |
| **`Say` command / `speak:` handler** | `say` command on macOS / `espeak` on Linux | Both provide command-line text-to-speech | Amiga `Say` feeds translator.library → narrator.device; macOS `say` uses a system-wide speech server |
---
## References
- ADCD 2.1: *ROM Kernel Reference Manual: Libraries* — Chapter 36: Translator Library
- ADCD 2.1: *ROM Kernel Reference Manual: Devices* — Chapter 8: Narrator Device
- NDK 3.9: `devices/narrator.h``narrator_rb` and `mouth_rb` structures
- NDK 3.9: `clib/translator_protos.h``Translate()` prototype
- AmigaOS Documentation Wiki: [Narrator Device](https://wiki.amigaos.net/wiki/Narrator_Device) — complete phoneme table and phonetic writing guide
- AmigaOS Documentation Wiki: [Translator Library](https://wiki.amigaos.net/wiki/Translator_Library) — OS 4.x interface reference
- See also: [audio.md](../10_devices/audio.md) — audio.device DMA channel allocation used by narrator
- See also: [iffparse.md](iffparse.md) — IFF FTXT parsing (the AmigaGuide format sometimes wraps speech metadata in IFF chunks)