amiga-bootcamp/11_libraries/translator.md
2026-04-26 14:46:18 -04:00

536 lines
21 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

[← Home](../README.md) · [Libraries](README.md)
# translator.library — English-to-Phonetic Translation for Speech Synthesis
## Overview
`translator.library` is the front half of the Amiga's built-in text-to-speech pipeline: a single-function library that converts unrestricted English text into **phonetic strings** — the expanded ARPABET phoneme codes used by `narrator.device` to generate human-like speech through the Amiga's audio hardware. Introduced with AmigaOS 1.2 and distributed as a disk-based library in `LIBS:`, it encapsulates over 450 context-sensitive pronunciation rules, an exception dictionary for irregular words (through, though, cough), abbreviation expansion (Dr., Prof., lb.), and automatic content-word accentuation — all in a single call: `Translate()`. The output is a string of space-delimited phoneme codes with stress markers that can be passed directly to `narrator.device` via `CMD_WRITE`, stored for later playback, or analyzed for phonetic research. While hand-coded phonetics always produce higher-quality speech, `Translate()` is the only practical option when the input is arbitrary user text at runtime.
---
## Architecture
### The Amiga Speech Pipeline
```mermaid
flowchart LR
subgraph INPUT["Input Layer"]
ENG["English Text<br/>(ASCII)"]
PHON["Hand-Coded<br/>Phonetic String"]
end
subgraph TRANSLATOR["translator.library"]
TR["Translate()<br/>English → Phonetic"]
RULES["450+ Context Rules<br/>Exception Dictionary<br/>Abbreviation Expansion"]
end
subgraph NARRATOR["narrator.device"]
SYNTH["Speech Synthesizer<br/>Formant Model"]
MOUTH["Mouth Shape<br/>Generator"]
end
subgraph OUTPUT["Output Layer"]
AUDIO["audio.device<br/>DMA Audio Channels"]
MOUTHDATA["mouth_rb<br/>Width/Height"]
end
ENG --> TR
TR --> RULES
RULES --> TR
TR -->|"Phonetic String"| SYNTH
PHON --> SYNTH
SYNTH --> AUDIO
SYNTH --> MOUTH
MOUTH --> MOUTHDATA
style TR fill:#e8f4fd,stroke:#2196f3,color:#333
style SYNTH fill:#fff3e0,stroke:#ff9800,color:#333
```
### Library Base
| Name | Type | Description |
|---|---|---|
| `TranslatorBase` | `struct Library *` | Library base pointer returned by `OpenLibrary()` |
| `ITranslator` | Interface pointer (OS 4.x+) | Interface-based access for AmigaOS 4+ |
`translator.library` is a **disk-based** library — it lives in `LIBS:translator.library`, not in ROM. This means `OpenLibrary()` can fail if the file is missing, and the library can be expunged from memory under low-memory conditions.
### Key Design Decisions
| Decision | Rationale |
|---|---|
| **Single-function API** | Translation is inherently stateless — input text, output phonetics. No session, no configuration |
| **Disk-based, not ROM** | Phonetic dictionary is large (~20+ KB of rules); keeping it out of ROM saves Kickstart space |
| **Negative return codes for overflow** | Allows progressive translation of long texts without pre-allocating a huge buffer |
| **Rule-based, not neural** | 1985 technology couldn't run a neural TTS; the 450 context-sensitive rules were state-of-the-art for the era |
---
## API Reference
### Opening and Closing
```c
/* Classic AmigaOS (1.x3.x) — LVO -30 */
struct Library *TranslatorBase;
TranslatorBase = OpenLibrary("translator.library", 0);
if (!TranslatorBase) { /* LIBS:translator.library not found */ }
/* ... use Translate() ... */
CloseLibrary(TranslatorBase);
```
```c
/* AmigaOS 4.x — Interface-based */
struct Library *TranslatorBase;
struct TranslatorIFace *ITranslator;
TranslatorBase = IExec->OpenLibrary("translator.library", 0);
if (TranslatorBase)
{
ITranslator = (struct TranslatorIFace *)
IExec->GetInterface(TranslatorBase, "main", 1, NULL);
if (ITranslator)
{
/* ... use ITranslator->Translate() ... */
}
IExec->DropInterface((struct Interface *)ITranslator);
}
IExec->CloseLibrary(TranslatorBase);
```
### Translate()
```c
/* LVO -36 — Converts English text to phonetic string */
LONG Translate(STRPTR input, /* a0: English input string */
LONG inputLen, /* d0: length of input */
STRPTR output, /* a1: output buffer for phonetics */
LONG outputSize /* d0: size of output buffer */);
```
| Parameter | Description |
|---|---|
| `input` | Null-terminated or length-delimited English ASCII string. Case-insensitive; punctuation is preserved where it affects pronunciation |
| `inputLen` | Number of characters to translate from `input`. Use `strlen(input)` for the full string |
| `output` | Pre-allocated buffer to receive the phonetic string. **Must be large enough** — phonetics are typically 24× the input length |
| `outputSize` | Size of the output buffer in bytes |
**Return value:**
| Return | Meaning |
|---|---|
| `0` | Full translation succeeded; output buffer was large enough |
| **Negative** value | Buffer overflow — translation stopped at a word boundary. `-(rtnCode)` is the character offset in the input string where translation ended. Resume by calling `Translate(input + offset, inputLen - offset, output, outputSize)` |
| Other non-zero | Translation error (unlikely — the library tries to translate literally if rules fail) |
> [!NOTE]
> The negative return value always stops at a **word boundary** (space or punctuation), not mid-word. This prevents split phonemes and makes resumption seamless.
### Output Format
The output is a space-delimited string of **ARPABET phoneme codes** with **stress markers** appended to vowels:
```
Input: "This is Amiga speaking."
Output: "DH IH1 Z IH1 Z AE1 M IH0 G AH0 S P IY1 K IH0 NG ."
└─ "This" ─┘ └"is"─┘ └─── "Amiga" ───┘ └─── "speaking" ───┘
```
| Marker | Meaning | Example |
|---|---|---|
| `0` | No stress (unstressed vowel) | `IH0` = unstressed "i" (as in "rabbit") |
| `1` | Primary stress | `IY1` = stressed "ee" (as in "speak") |
| `2` | Secondary stress | `OW2` = secondary "oh" (as in "overflow") |
| `3` | Emphatic stress (rare) | Used for contrastive emphasis |
---
## Phonetic Output Examples
| English Input | Phonetic Output (approx.) |
|---|---|
| `Hello world.` | `HH EH0 L OW1 W ER1 L D .` |
| `The quick brown fox.` | `DH AH0 K W IH1 K B R AW1 N F AA1 K S .` |
| `Amiga` | `AE1 M IH0 G AH0` or `AH0 M IY1 G AH0` (both valid) |
| `Commodore` | `K AA1 M AH0 D AO1 R` |
| `Guru Meditation` | `G UH1 R UW0 M EH2 D IH0 T EY1 SH AH0 N` |
> [!WARNING]
> The translator library was designed for **American English** pronunciation. British spellings (colour, centre) and non-English words will be translated using American phonetic rules and may sound odd.
---
## Integration with narrator.device
The standard workflow:
```c
#include <devices/narrator.h>
#include <clib/translator_protos.h>
/* 1. Open translator */
struct Library *TranslatorBase = OpenLibrary("translator.library", 0);
/* 2. Open narrator device */
struct MsgPort *mp = CreatePort(NULL, 0);
struct narrator_rb *voiceIO = (struct narrator_rb *)
CreateExtIO(mp, sizeof(struct narrator_rb));
OpenDevice("narrator.device", 0, (struct IORequest *)voiceIO, 0);
/* 3. Translate English → phonetic */
#define PHONBUF_SIZE 2048
STRPTR english = "Welcome to the Amiga speech system.";
UBYTE phonBuffer[PHONBUF_SIZE];
LONG result = Translate(english, strlen(english),
(STRPTR)phonBuffer, PHONBUF_SIZE);
if (result == 0)
{
/* 4. Configure voice parameters */
voiceIO->rate = 150; /* words per minute */
voiceIO->pitch = 110; /* Hz baseline */
voiceIO->sex = 0; /* 0=male, 1=female */
voiceIO->volume = 64; /* 064 */
voiceIO->sampfreq = 22200; /* Hz (Amiga native rate) */
/* 5. Send to narrator */
voiceIO->message.io_Command = CMD_WRITE;
voiceIO->message.io_Data = phonBuffer;
voiceIO->message.io_Length = strlen((STRPTR)phonBuffer);
DoIO((struct IORequest *)voiceIO);
}
/* 6. Cleanup */
CloseDevice((struct IORequest *)voiceIO);
DeleteExtIO((struct IORequest *)voiceIO);
DeletePort(mp);
CloseLibrary(TranslatorBase);
```
---
## When to Use / When NOT to Use
| Scenario | Use `Translate()`? | Rationale |
|---|---|---|
| **Unrestricted user input** (text editor, terminal, chat) | ✅ Yes | Only practical option — you can't pre-code phonetics for arbitrary text |
| **Fixed application strings** (game dialog, error messages) | ❌ No | Hand-code phonetics once; ship the phonetic strings. Much better quality |
| **Accessibility screen reader** | ✅ Yes | Essential — must speak whatever is on screen |
| **Demo/game with iconic lines** | ❌ No | Hand-tune phonetics, stress, and timing for maximum impact |
| **Multi-language support** | ❌ No | translator.library is English-only; use a third-party TTS or pre-recorded samples |
| **Phonetic research/analysis** | ⚠️ Maybe | Output is useful for analysis but not linguistically rigorous — use as a starting point |
| **Speaking numbers/dates** | ⚠️ Maybe | Library handles some abbreviations but not all; pre-process complex formats into spelled-out words |
---
## Pitfalls & Common Mistakes
### 1. Underestimating Phonetic Buffer Size
The phonetic representation is **always longer** than the input English. A 100-character sentence typically produces 300500 bytes of phonetics:
```c
/* BAD: Same-sized buffer — will overflow on first long word */
UBYTE phonBuf[256];
STRPTR english = "The extraordinarily complicated implementation...";
LONG result = Translate(english, strlen(english), (STRPTR)phonBuf, 256);
/* result will be negative — phonetic for "extraordinarily" alone is ~40 chars */
/* CORRECT: Allocate 4× input length, minimum 512 bytes */
#define PHONBUF_SIZE(maxInput) (((maxInput) * 4) + 512)
UBYTE *phonBuf = AllocMem(PHONBUF_SIZE(strlen(english)), MEMF_ANY);
```
### 2. Ignoring Negative Return Code
A negative return from `Translate()` is a **resumption offset**, not a fatal error:
```c
/* BAD: Treats partial translation as failure */
LONG rtn = Translate(text, len, buf, size);
if (rtn != 0) { /* panic — but text was partially translated! */ }
/* CORRECT: Resume from offset on negative return */
LONG offset = 0;
while (offset < len)
{
LONG rtn = Translate(text + offset, len - offset, buf, BUF_SIZE);
if (rtn == 0) break; /* done */
if (rtn < 0) offset += (-rtn); /* resume from word boundary */
else { /* unexpected error */ break; }
}
```
### 3. Passing Non-Null-Terminated Input with Wrong Length
If `inputLen` doesn't match the actual string, `Translate()` reads garbage or stops early:
```c
/* BAD: strlen() on a buffer that may not be null-terminated */
UBYTE buf[256];
Read(fh, buf, 256); /* may fill entire buffer — no terminator */
Translate((STRPTR)buf, strlen((STRPTR)buf), out, 1024);
/* strlen() may read past the buffer! */
/* CORRECT: Use the explicit read count */
LONG actual = Read(fh, buf, 256);
Translate((STRPTR)buf, actual, out, 1024);
```
### 4. Not Checking for Missing Disk-Based Library
Unlike ROM libraries, `translator.library` may not be present:
```c
/* BAD: Assumes library is always available */
struct Library *TranslatorBase = OpenLibrary("translator.library", 0);
Translate("Hello", 5, buf, 512); /* crash if TranslatorBase == NULL! */
/* CORRECT: Always check the return */
struct Library *TranslatorBase = OpenLibrary("translator.library", 0);
if (TranslatorBase)
{
Translate("Hello", 5, buf, 512);
CloseLibrary(TranslatorBase);
}
else
{
Printf("Speech not available — translator.library missing\n");
}
```
---
## Named Antipatterns
### "The Mumbler" — Unrealistic Rate/Pitch
Setting `rate` extremely high makes speech unintelligible, but the translator itself has nothing to do with it — the problem is feeding valid phonetics to a misconfigured narrator:
```c
/* BAD: Chipmunk speech */
voiceIO->rate = 400; /* 400 words/min — unintelligible */
voiceIO->pitch = 255; /* extremely high pitch */
/* Sensible defaults: */
voiceIO->rate = 150; /* natural conversational speed */
voiceIO->pitch = 110; /* male baseline (85110 for male, 160220 for female) */
voiceIO->sex = 0; /* 0=male, 1=female */
```
### "The Silent Speaker" — Mismatched Audio Allocation
The narrator device must allocate audio channels. If another application holds all four channels, `OpenDevice("narrator.device", ...)` succeeds but speech may not be audible:
```c
/* BAD: No check on audio channel availability */
OpenDevice("narrator.device", 0, (struct IORequest *)voiceIO, 0);
/* Speech may be silent if audio channels are all in use */
/* CORRECT: Set channel mask to request specific channels */
UBYTE chanMasks[] = { 0x03, 0x0C, 0x30, 0xC0 }; /* try channels 0-1, 2-3, 4-5, 6-7 */
voiceIO->ch_masks = chanMasks;
voiceIO->nm_masks = 4;
```
### "The Echo" — Forgetting io_Data Nesting
When you send a `CMD_WRITE` to the narrator device, the `io_Data` pointer must remain valid until the I/O completes. Using a stack buffer with `DoIO()` is fine (blocking); using `SendIO()` (asynchronous) with a stack buffer is not:
```c
/* BAD: Stack buffer with async I/O */
void SpeakAsync(STRPTR text)
{
UBYTE phonBuf[512]; /* stack — disappears on return! */
Translate(text, strlen(text), (STRPTR)phonBuf, 512);
voiceIO->message.io_Data = phonBuf;
SendIO((struct IORequest *)voiceIO); /* async — phonBuf gone when this returns */
}
/* CORRECT: Allocate or use static buffer for async */
UBYTE phonBuf[2048]; /* static — stays valid */
void SpeakAsync(STRPTR text)
{
Translate(text, strlen(text), (STRPTR)phonBuf, sizeof(phonBuf));
voiceIO->message.io_Data = phonBuf;
SendIO((struct IORequest *)voiceIO);
/* phonBuf lives until AbortIO or CMD_FLUSH */
}
```
---
## FAQ
**Q: Can I use translator.library without narrator.device?**
Yes. The phonetic output is a plain ASCII string — you can save it, analyze it, send it over a network, or use it as input to a custom speech synthesizer. The translator and narrator are independent.
**Q: Why does the same word sometimes translate differently?**
The translator uses **context-sensitive** rules. The pronunciation of "read" depends on surrounding tense markers; "record" as a noun vs. verb gets different stress. The same word in different sentences may produce different phonetics — this is correct behavior.
**Q: How do I make the narrator sound female?**
Set `voiceIO->sex = 1` (female). This adjusts formant frequencies and baseline pitch. For manual fine-tuning, adjust `voiceIO->pitch` (160220 Hz for female) and `voiceIO->F1adj` through `F3adj` (formant shifts).
**Q: Can translator.library handle multiple languages?**
No. The rule set and exception dictionary are English-only. German, French, or other languages will be treated as misspelled English and produce garbled phonetics. Use locale-specific TTS solutions for non-English speech.
**Q: How big is the output buffer really needed?**
Empirically, 4× the input length plus a 512-byte safety margin. The longest single English word phonetics (like "supercalifragilisticexpialidocious") is roughly 80 characters from 34 input characters. A typical sentence expands 2.53×.
**Q: Does Translate() handle punctuation?**
Yes. Punctuation marks (`.`, `,`, `?`, `!`, `;`, `:`) are passed through to the phonetic output. The narrator device interprets them as prosody cues: `.` = falling intonation, `?` = rising intonation.
---
## Use-Case Cookbook
### 1. Simple One-Shot Speech
The blocking pattern — suitable for alert messages, game notifications, short announcements:
```c
void Say(STRPTR english)
{
struct Library *TranslatorBase = OpenLibrary("translator.library", 0);
if (!TranslatorBase) return;
UBYTE phonBuf[2048];
LONG rtn = Translate(english, strlen(english),
(STRPTR)phonBuf, sizeof(phonBuf));
if (rtn == 0)
{
struct MsgPort *mp = CreatePort(NULL, 0);
struct narrator_rb *vio = (struct narrator_rb *)
CreateExtIO(mp, sizeof(struct narrator_rb));
if (OpenDevice("narrator.device", 0, (struct IORequest *)vio, 0) == 0)
{
vio->rate = 150;
vio->pitch = 110;
vio->volume = 64;
vio->sampfreq = 22200;
vio->message.io_Command = CMD_WRITE;
vio->message.io_Data = phonBuf;
vio->message.io_Length = strlen((STRPTR)phonBuf);
DoIO((struct IORequest *)vio);
CloseDevice((struct IORequest *)vio);
}
DeleteExtIO((struct IORequest *)vio);
DeletePort(mp);
}
CloseLibrary(TranslatorBase);
}
/* Usage: */
Say("Game over. Insert coin to continue.");
```
### 2. Animated Talking Head (with Mouth Shapes)
The narrator can generate mouth width/height data while speaking:
```c
/* Open two I/O requests — one for speech, one for mouth data */
struct narrator_rb *voiceIO = /* ... */;
struct mouth_rb *mouthIO = (struct mouth_rb *)
CreateExtIO(mp, sizeof(struct mouth_rb));
/* Enable mouth shape generation */
voiceIO->mouths = 1; /* non-zero = generate mouth data */
/* Send speech command */
voiceIO->message.io_Command = CMD_WRITE;
voiceIO->message.io_Data = phonBuf;
voiceIO->message.io_Length = strlen((STRPTR)phonBuf);
SendIO((struct IORequest *)voiceIO);
/* While speaking, read mouth shapes */
while (!CheckIO((struct IORequest *)voiceIO))
{
mouthIO->voice.message.io_Command = CMD_READ;
mouthIO->voice.message.io_Data = phonBuf; /* same buffer — narrator correlates */
mouthIO->voice.message.io_Length = strlen((STRPTR)phonBuf);
DoIO((struct IORequest *)mouthIO);
/* mouthIO->width = 0..255 (closed → wide open) */
/* mouthIO->height = 0..255 (closed → wide open) */
AnimateMouth(mouthIO->width, mouthIO->height);
}
```
### 3. Progressive Translation of Long Text
For documents or long-form text where a single 2 KB buffer won't suffice:
```c
LONG TranslateLongText(STRPTR text, LONG totalLen, BPTR outputFH)
{
UBYTE phonBuf[2048];
LONG offset = 0;
while (offset < totalLen)
{
LONG bytesAvail = totalLen - offset;
LONG rtn = Translate(text + offset, bytesAvail,
(STRPTR)phonBuf, sizeof(phonBuf));
if (rtn == 0)
{
/* Final chunk — write and done */
LONG phonLen = strlen((STRPTR)phonBuf);
Write(outputFH, phonBuf, phonLen);
break;
}
else if (rtn < 0)
{
/* Write completed portion, resume at word boundary */
LONG phonLen = strlen((STRPTR)phonBuf);
Write(outputFH, phonBuf, phonLen);
offset += (-rtn);
}
else
{
/* unexpected error */
return rtn;
}
}
return 0;
}
```
---
## Modern Analogies
| Amiga Concept | Modern Equivalent | Why It Maps | Where It Diverges |
|---|---|---|---|
| **translator.library** | macOS `NSSpeechSynthesizer` / Windows SAPI Text-to-Speech | Both accept English text and produce speech. The API philosophy — text in, audio out — is identical | Modern APIs bundle translation and synthesis; Amiga splits them into library (translate) and device (speak) |
| **ARPABET phonemes** | IPA (International Phonetic Alphabet) | Both encode pronunciation as discrete symbols. ARPABET is a machine-readable subset of IPA | ARPABET is English-only; IPA is universal. ARPABET uses ASCII, IPA uses Unicode |
| **450 context-sensitive rules** | Modern TTS neural networks (Tacotron, FastSpeech) | Both learn pronunciation from data — rules are a 1985 hand-crafted "model" | Neural TTS requires gigabytes of training data; rule-based works with zero training |
| **narrator.device formant synthesis** | Vocaloid / singing synthesis | Both use formant models (F0, F1, F2...) to generate vocal sounds | Narrator.device is a 1985-era 8-bit formant synth; Vocaloid uses concatenative sampling + ML |
| **`Say` command / `speak:` handler** | `say` command on macOS / `espeak` on Linux | Both provide command-line text-to-speech | Amiga `Say` feeds translator.library → narrator.device; macOS `say` uses a system-wide speech server |
---
## References
- ADCD 2.1: *ROM Kernel Reference Manual: Libraries* — Chapter 36: Translator Library
- ADCD 2.1: *ROM Kernel Reference Manual: Devices* — Chapter 8: Narrator Device
- NDK 3.9: `devices/narrator.h``narrator_rb` and `mouth_rb` structures
- NDK 3.9: `clib/translator_protos.h``Translate()` prototype
- AmigaOS Documentation Wiki: [Narrator Device](https://wiki.amigaos.net/wiki/Narrator_Device) — complete phoneme table and phonetic writing guide
- AmigaOS Documentation Wiki: [Translator Library](https://wiki.amigaos.net/wiki/Translator_Library) — OS 4.x interface reference
- See also: [audio.md](../10_devices/audio.md) — audio.device DMA channel allocation used by narrator
- See also: [iffparse.md](iffparse.md) — IFF FTXT parsing (the AmigaGuide format sometimes wraps speech metadata in IFF chunks)