mirror of
https://github.com/alfishe/amiga-bootcamp.git
synced 2026-06-12 16:16:28 +00:00
536 lines
21 KiB
Markdown
536 lines
21 KiB
Markdown
[← Home](../README.md) · [Libraries](README.md)
|
||
|
||
# translator.library — English-to-Phonetic Translation for Speech Synthesis
|
||
|
||
## Overview
|
||
|
||
`translator.library` is the front half of the Amiga's built-in text-to-speech pipeline: a single-function library that converts unrestricted English text into **phonetic strings** — the expanded ARPABET phoneme codes used by `narrator.device` to generate human-like speech through the Amiga's audio hardware. Introduced with AmigaOS 1.2 and distributed as a disk-based library in `LIBS:`, it encapsulates over 450 context-sensitive pronunciation rules, an exception dictionary for irregular words (through, though, cough), abbreviation expansion (Dr., Prof., lb.), and automatic content-word accentuation — all in a single call: `Translate()`. The output is a string of space-delimited phoneme codes with stress markers that can be passed directly to `narrator.device` via `CMD_WRITE`, stored for later playback, or analyzed for phonetic research. While hand-coded phonetics always produce higher-quality speech, `Translate()` is the only practical option when the input is arbitrary user text at runtime.
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
### The Amiga Speech Pipeline
|
||
|
||
```mermaid
|
||
flowchart LR
|
||
subgraph INPUT["Input Layer"]
|
||
ENG["English Text<br/>(ASCII)"]
|
||
PHON["Hand-Coded<br/>Phonetic String"]
|
||
end
|
||
|
||
subgraph TRANSLATOR["translator.library"]
|
||
TR["Translate()<br/>English → Phonetic"]
|
||
RULES["450+ Context Rules<br/>Exception Dictionary<br/>Abbreviation Expansion"]
|
||
end
|
||
|
||
subgraph NARRATOR["narrator.device"]
|
||
SYNTH["Speech Synthesizer<br/>Formant Model"]
|
||
MOUTH["Mouth Shape<br/>Generator"]
|
||
end
|
||
|
||
subgraph OUTPUT["Output Layer"]
|
||
AUDIO["audio.device<br/>DMA Audio Channels"]
|
||
MOUTHDATA["mouth_rb<br/>Width/Height"]
|
||
end
|
||
|
||
ENG --> TR
|
||
TR --> RULES
|
||
RULES --> TR
|
||
TR -->|"Phonetic String"| SYNTH
|
||
PHON --> SYNTH
|
||
SYNTH --> AUDIO
|
||
SYNTH --> MOUTH
|
||
MOUTH --> MOUTHDATA
|
||
|
||
style TR fill:#e8f4fd,stroke:#2196f3,color:#333
|
||
style SYNTH fill:#fff3e0,stroke:#ff9800,color:#333
|
||
```
|
||
|
||
### Library Base
|
||
|
||
| Name | Type | Description |
|
||
|---|---|---|
|
||
| `TranslatorBase` | `struct Library *` | Library base pointer returned by `OpenLibrary()` |
|
||
| `ITranslator` | Interface pointer (OS 4.x+) | Interface-based access for AmigaOS 4+ |
|
||
|
||
`translator.library` is a **disk-based** library — it lives in `LIBS:translator.library`, not in ROM. This means `OpenLibrary()` can fail if the file is missing, and the library can be expunged from memory under low-memory conditions.
|
||
|
||
### Key Design Decisions
|
||
|
||
| Decision | Rationale |
|
||
|---|---|
|
||
| **Single-function API** | Translation is inherently stateless — input text, output phonetics. No session, no configuration |
|
||
| **Disk-based, not ROM** | Phonetic dictionary is large (~20+ KB of rules); keeping it out of ROM saves Kickstart space |
|
||
| **Negative return codes for overflow** | Allows progressive translation of long texts without pre-allocating a huge buffer |
|
||
| **Rule-based, not neural** | 1985 technology couldn't run a neural TTS; the 450 context-sensitive rules were state-of-the-art for the era |
|
||
|
||
---
|
||
|
||
## API Reference
|
||
|
||
### Opening and Closing
|
||
|
||
```c
|
||
/* Classic AmigaOS (1.x–3.x) — LVO -30 */
|
||
struct Library *TranslatorBase;
|
||
|
||
TranslatorBase = OpenLibrary("translator.library", 0);
|
||
if (!TranslatorBase) { /* LIBS:translator.library not found */ }
|
||
|
||
/* ... use Translate() ... */
|
||
|
||
CloseLibrary(TranslatorBase);
|
||
```
|
||
|
||
```c
|
||
/* AmigaOS 4.x — Interface-based */
|
||
struct Library *TranslatorBase;
|
||
struct TranslatorIFace *ITranslator;
|
||
|
||
TranslatorBase = IExec->OpenLibrary("translator.library", 0);
|
||
if (TranslatorBase)
|
||
{
|
||
ITranslator = (struct TranslatorIFace *)
|
||
IExec->GetInterface(TranslatorBase, "main", 1, NULL);
|
||
if (ITranslator)
|
||
{
|
||
/* ... use ITranslator->Translate() ... */
|
||
}
|
||
IExec->DropInterface((struct Interface *)ITranslator);
|
||
}
|
||
IExec->CloseLibrary(TranslatorBase);
|
||
```
|
||
|
||
### Translate()
|
||
|
||
```c
|
||
/* LVO -36 — Converts English text to phonetic string */
|
||
LONG Translate(STRPTR input, /* a0: English input string */
|
||
LONG inputLen, /* d0: length of input */
|
||
STRPTR output, /* a1: output buffer for phonetics */
|
||
LONG outputSize /* d0: size of output buffer */);
|
||
```
|
||
|
||
| Parameter | Description |
|
||
|---|---|
|
||
| `input` | Null-terminated or length-delimited English ASCII string. Case-insensitive; punctuation is preserved where it affects pronunciation |
|
||
| `inputLen` | Number of characters to translate from `input`. Use `strlen(input)` for the full string |
|
||
| `output` | Pre-allocated buffer to receive the phonetic string. **Must be large enough** — phonetics are typically 2–4× the input length |
|
||
| `outputSize` | Size of the output buffer in bytes |
|
||
|
||
**Return value:**
|
||
|
||
| Return | Meaning |
|
||
|---|---|
|
||
| `0` | Full translation succeeded; output buffer was large enough |
|
||
| **Negative** value | Buffer overflow — translation stopped at a word boundary. `-(rtnCode)` is the character offset in the input string where translation ended. Resume by calling `Translate(input + offset, inputLen - offset, output, outputSize)` |
|
||
| Other non-zero | Translation error (unlikely — the library tries to translate literally if rules fail) |
|
||
|
||
> [!NOTE]
|
||
> The negative return value always stops at a **word boundary** (space or punctuation), not mid-word. This prevents split phonemes and makes resumption seamless.
|
||
|
||
### Output Format
|
||
|
||
The output is a space-delimited string of **ARPABET phoneme codes** with **stress markers** appended to vowels:
|
||
|
||
```
|
||
Input: "This is Amiga speaking."
|
||
Output: "DH IH1 Z IH1 Z AE1 M IH0 G AH0 S P IY1 K IH0 NG ."
|
||
└─ "This" ─┘ └"is"─┘ └─── "Amiga" ───┘ └─── "speaking" ───┘
|
||
```
|
||
|
||
| Marker | Meaning | Example |
|
||
|---|---|---|
|
||
| `0` | No stress (unstressed vowel) | `IH0` = unstressed "i" (as in "rabbit") |
|
||
| `1` | Primary stress | `IY1` = stressed "ee" (as in "speak") |
|
||
| `2` | Secondary stress | `OW2` = secondary "oh" (as in "overflow") |
|
||
| `3` | Emphatic stress (rare) | Used for contrastive emphasis |
|
||
|
||
---
|
||
|
||
## Phonetic Output Examples
|
||
|
||
| English Input | Phonetic Output (approx.) |
|
||
|---|---|
|
||
| `Hello world.` | `HH EH0 L OW1 W ER1 L D .` |
|
||
| `The quick brown fox.` | `DH AH0 K W IH1 K B R AW1 N F AA1 K S .` |
|
||
| `Amiga` | `AE1 M IH0 G AH0` or `AH0 M IY1 G AH0` (both valid) |
|
||
| `Commodore` | `K AA1 M AH0 D AO1 R` |
|
||
| `Guru Meditation` | `G UH1 R UW0 M EH2 D IH0 T EY1 SH AH0 N` |
|
||
|
||
> [!WARNING]
|
||
> The translator library was designed for **American English** pronunciation. British spellings (colour, centre) and non-English words will be translated using American phonetic rules and may sound odd.
|
||
|
||
---
|
||
|
||
## Integration with narrator.device
|
||
|
||
The standard workflow:
|
||
|
||
```c
|
||
#include <devices/narrator.h>
|
||
#include <clib/translator_protos.h>
|
||
|
||
/* 1. Open translator */
|
||
struct Library *TranslatorBase = OpenLibrary("translator.library", 0);
|
||
|
||
/* 2. Open narrator device */
|
||
struct MsgPort *mp = CreatePort(NULL, 0);
|
||
struct narrator_rb *voiceIO = (struct narrator_rb *)
|
||
CreateExtIO(mp, sizeof(struct narrator_rb));
|
||
OpenDevice("narrator.device", 0, (struct IORequest *)voiceIO, 0);
|
||
|
||
/* 3. Translate English → phonetic */
|
||
#define PHONBUF_SIZE 2048
|
||
STRPTR english = "Welcome to the Amiga speech system.";
|
||
UBYTE phonBuffer[PHONBUF_SIZE];
|
||
LONG result = Translate(english, strlen(english),
|
||
(STRPTR)phonBuffer, PHONBUF_SIZE);
|
||
|
||
if (result == 0)
|
||
{
|
||
/* 4. Configure voice parameters */
|
||
voiceIO->rate = 150; /* words per minute */
|
||
voiceIO->pitch = 110; /* Hz baseline */
|
||
voiceIO->sex = 0; /* 0=male, 1=female */
|
||
voiceIO->volume = 64; /* 0–64 */
|
||
voiceIO->sampfreq = 22200; /* Hz (Amiga native rate) */
|
||
|
||
/* 5. Send to narrator */
|
||
voiceIO->message.io_Command = CMD_WRITE;
|
||
voiceIO->message.io_Data = phonBuffer;
|
||
voiceIO->message.io_Length = strlen((STRPTR)phonBuffer);
|
||
DoIO((struct IORequest *)voiceIO);
|
||
}
|
||
|
||
/* 6. Cleanup */
|
||
CloseDevice((struct IORequest *)voiceIO);
|
||
DeleteExtIO((struct IORequest *)voiceIO);
|
||
DeletePort(mp);
|
||
CloseLibrary(TranslatorBase);
|
||
```
|
||
|
||
---
|
||
|
||
## When to Use / When NOT to Use
|
||
|
||
| Scenario | Use `Translate()`? | Rationale |
|
||
|---|---|---|
|
||
| **Unrestricted user input** (text editor, terminal, chat) | ✅ Yes | Only practical option — you can't pre-code phonetics for arbitrary text |
|
||
| **Fixed application strings** (game dialog, error messages) | ❌ No | Hand-code phonetics once; ship the phonetic strings. Much better quality |
|
||
| **Accessibility screen reader** | ✅ Yes | Essential — must speak whatever is on screen |
|
||
| **Demo/game with iconic lines** | ❌ No | Hand-tune phonetics, stress, and timing for maximum impact |
|
||
| **Multi-language support** | ❌ No | translator.library is English-only; use a third-party TTS or pre-recorded samples |
|
||
| **Phonetic research/analysis** | ⚠️ Maybe | Output is useful for analysis but not linguistically rigorous — use as a starting point |
|
||
| **Speaking numbers/dates** | ⚠️ Maybe | Library handles some abbreviations but not all; pre-process complex formats into spelled-out words |
|
||
|
||
---
|
||
|
||
## Pitfalls & Common Mistakes
|
||
|
||
### 1. Underestimating Phonetic Buffer Size
|
||
|
||
The phonetic representation is **always longer** than the input English. A 100-character sentence typically produces 300–500 bytes of phonetics:
|
||
|
||
```c
|
||
/* BAD: Same-sized buffer — will overflow on first long word */
|
||
UBYTE phonBuf[256];
|
||
STRPTR english = "The extraordinarily complicated implementation...";
|
||
LONG result = Translate(english, strlen(english), (STRPTR)phonBuf, 256);
|
||
/* result will be negative — phonetic for "extraordinarily" alone is ~40 chars */
|
||
|
||
/* CORRECT: Allocate 4× input length, minimum 512 bytes */
|
||
#define PHONBUF_SIZE(maxInput) (((maxInput) * 4) + 512)
|
||
UBYTE *phonBuf = AllocMem(PHONBUF_SIZE(strlen(english)), MEMF_ANY);
|
||
```
|
||
|
||
### 2. Ignoring Negative Return Code
|
||
|
||
A negative return from `Translate()` is a **resumption offset**, not a fatal error:
|
||
|
||
```c
|
||
/* BAD: Treats partial translation as failure */
|
||
LONG rtn = Translate(text, len, buf, size);
|
||
if (rtn != 0) { /* panic — but text was partially translated! */ }
|
||
|
||
/* CORRECT: Resume from offset on negative return */
|
||
LONG offset = 0;
|
||
while (offset < len)
|
||
{
|
||
LONG rtn = Translate(text + offset, len - offset, buf, BUF_SIZE);
|
||
if (rtn == 0) break; /* done */
|
||
if (rtn < 0) offset += (-rtn); /* resume from word boundary */
|
||
else { /* unexpected error */ break; }
|
||
}
|
||
```
|
||
|
||
### 3. Passing Non-Null-Terminated Input with Wrong Length
|
||
|
||
If `inputLen` doesn't match the actual string, `Translate()` reads garbage or stops early:
|
||
|
||
```c
|
||
/* BAD: strlen() on a buffer that may not be null-terminated */
|
||
UBYTE buf[256];
|
||
Read(fh, buf, 256); /* may fill entire buffer — no terminator */
|
||
Translate((STRPTR)buf, strlen((STRPTR)buf), out, 1024);
|
||
/* strlen() may read past the buffer! */
|
||
|
||
/* CORRECT: Use the explicit read count */
|
||
LONG actual = Read(fh, buf, 256);
|
||
Translate((STRPTR)buf, actual, out, 1024);
|
||
```
|
||
|
||
### 4. Not Checking for Missing Disk-Based Library
|
||
|
||
Unlike ROM libraries, `translator.library` may not be present:
|
||
|
||
```c
|
||
/* BAD: Assumes library is always available */
|
||
struct Library *TranslatorBase = OpenLibrary("translator.library", 0);
|
||
Translate("Hello", 5, buf, 512); /* crash if TranslatorBase == NULL! */
|
||
|
||
/* CORRECT: Always check the return */
|
||
struct Library *TranslatorBase = OpenLibrary("translator.library", 0);
|
||
if (TranslatorBase)
|
||
{
|
||
Translate("Hello", 5, buf, 512);
|
||
CloseLibrary(TranslatorBase);
|
||
}
|
||
else
|
||
{
|
||
Printf("Speech not available — translator.library missing\n");
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Named Antipatterns
|
||
|
||
### "The Mumbler" — Unrealistic Rate/Pitch
|
||
|
||
Setting `rate` extremely high makes speech unintelligible, but the translator itself has nothing to do with it — the problem is feeding valid phonetics to a misconfigured narrator:
|
||
|
||
```c
|
||
/* BAD: Chipmunk speech */
|
||
voiceIO->rate = 400; /* 400 words/min — unintelligible */
|
||
voiceIO->pitch = 255; /* extremely high pitch */
|
||
|
||
/* Sensible defaults: */
|
||
voiceIO->rate = 150; /* natural conversational speed */
|
||
voiceIO->pitch = 110; /* male baseline (85–110 for male, 160–220 for female) */
|
||
voiceIO->sex = 0; /* 0=male, 1=female */
|
||
```
|
||
|
||
### "The Silent Speaker" — Mismatched Audio Allocation
|
||
|
||
The narrator device must allocate audio channels. If another application holds all four channels, `OpenDevice("narrator.device", ...)` succeeds but speech may not be audible:
|
||
|
||
```c
|
||
/* BAD: No check on audio channel availability */
|
||
OpenDevice("narrator.device", 0, (struct IORequest *)voiceIO, 0);
|
||
/* Speech may be silent if audio channels are all in use */
|
||
|
||
/* CORRECT: Set channel mask to request specific channels */
|
||
UBYTE chanMasks[] = { 0x03, 0x0C, 0x30, 0xC0 }; /* try channels 0-1, 2-3, 4-5, 6-7 */
|
||
voiceIO->ch_masks = chanMasks;
|
||
voiceIO->nm_masks = 4;
|
||
```
|
||
|
||
### "The Echo" — Forgetting io_Data Nesting
|
||
|
||
When you send a `CMD_WRITE` to the narrator device, the `io_Data` pointer must remain valid until the I/O completes. Using a stack buffer with `DoIO()` is fine (blocking); using `SendIO()` (asynchronous) with a stack buffer is not:
|
||
|
||
```c
|
||
/* BAD: Stack buffer with async I/O */
|
||
void SpeakAsync(STRPTR text)
|
||
{
|
||
UBYTE phonBuf[512]; /* stack — disappears on return! */
|
||
Translate(text, strlen(text), (STRPTR)phonBuf, 512);
|
||
voiceIO->message.io_Data = phonBuf;
|
||
SendIO((struct IORequest *)voiceIO); /* async — phonBuf gone when this returns */
|
||
}
|
||
|
||
/* CORRECT: Allocate or use static buffer for async */
|
||
UBYTE phonBuf[2048]; /* static — stays valid */
|
||
void SpeakAsync(STRPTR text)
|
||
{
|
||
Translate(text, strlen(text), (STRPTR)phonBuf, sizeof(phonBuf));
|
||
voiceIO->message.io_Data = phonBuf;
|
||
SendIO((struct IORequest *)voiceIO);
|
||
/* phonBuf lives until AbortIO or CMD_FLUSH */
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## FAQ
|
||
|
||
**Q: Can I use translator.library without narrator.device?**
|
||
|
||
Yes. The phonetic output is a plain ASCII string — you can save it, analyze it, send it over a network, or use it as input to a custom speech synthesizer. The translator and narrator are independent.
|
||
|
||
**Q: Why does the same word sometimes translate differently?**
|
||
|
||
The translator uses **context-sensitive** rules. The pronunciation of "read" depends on surrounding tense markers; "record" as a noun vs. verb gets different stress. The same word in different sentences may produce different phonetics — this is correct behavior.
|
||
|
||
**Q: How do I make the narrator sound female?**
|
||
|
||
Set `voiceIO->sex = 1` (female). This adjusts formant frequencies and baseline pitch. For manual fine-tuning, adjust `voiceIO->pitch` (160–220 Hz for female) and `voiceIO->F1adj` through `F3adj` (formant shifts).
|
||
|
||
**Q: Can translator.library handle multiple languages?**
|
||
|
||
No. The rule set and exception dictionary are English-only. German, French, or other languages will be treated as misspelled English and produce garbled phonetics. Use locale-specific TTS solutions for non-English speech.
|
||
|
||
**Q: How big is the output buffer really needed?**
|
||
|
||
Empirically, 4× the input length plus a 512-byte safety margin. The longest single English word phonetics (like "supercalifragilisticexpialidocious") is roughly 80 characters from 34 input characters. A typical sentence expands 2.5–3×.
|
||
|
||
**Q: Does Translate() handle punctuation?**
|
||
|
||
Yes. Punctuation marks (`.`, `,`, `?`, `!`, `;`, `:`) are passed through to the phonetic output. The narrator device interprets them as prosody cues: `.` = falling intonation, `?` = rising intonation.
|
||
|
||
---
|
||
|
||
## Use-Case Cookbook
|
||
|
||
### 1. Simple One-Shot Speech
|
||
|
||
The blocking pattern — suitable for alert messages, game notifications, short announcements:
|
||
|
||
```c
|
||
void Say(STRPTR english)
|
||
{
|
||
struct Library *TranslatorBase = OpenLibrary("translator.library", 0);
|
||
if (!TranslatorBase) return;
|
||
|
||
UBYTE phonBuf[2048];
|
||
LONG rtn = Translate(english, strlen(english),
|
||
(STRPTR)phonBuf, sizeof(phonBuf));
|
||
if (rtn == 0)
|
||
{
|
||
struct MsgPort *mp = CreatePort(NULL, 0);
|
||
struct narrator_rb *vio = (struct narrator_rb *)
|
||
CreateExtIO(mp, sizeof(struct narrator_rb));
|
||
|
||
if (OpenDevice("narrator.device", 0, (struct IORequest *)vio, 0) == 0)
|
||
{
|
||
vio->rate = 150;
|
||
vio->pitch = 110;
|
||
vio->volume = 64;
|
||
vio->sampfreq = 22200;
|
||
|
||
vio->message.io_Command = CMD_WRITE;
|
||
vio->message.io_Data = phonBuf;
|
||
vio->message.io_Length = strlen((STRPTR)phonBuf);
|
||
DoIO((struct IORequest *)vio);
|
||
|
||
CloseDevice((struct IORequest *)vio);
|
||
}
|
||
DeleteExtIO((struct IORequest *)vio);
|
||
DeletePort(mp);
|
||
}
|
||
CloseLibrary(TranslatorBase);
|
||
}
|
||
|
||
/* Usage: */
|
||
Say("Game over. Insert coin to continue.");
|
||
```
|
||
|
||
### 2. Animated Talking Head (with Mouth Shapes)
|
||
|
||
The narrator can generate mouth width/height data while speaking:
|
||
|
||
```c
|
||
/* Open two I/O requests — one for speech, one for mouth data */
|
||
struct narrator_rb *voiceIO = /* ... */;
|
||
struct mouth_rb *mouthIO = (struct mouth_rb *)
|
||
CreateExtIO(mp, sizeof(struct mouth_rb));
|
||
|
||
/* Enable mouth shape generation */
|
||
voiceIO->mouths = 1; /* non-zero = generate mouth data */
|
||
|
||
/* Send speech command */
|
||
voiceIO->message.io_Command = CMD_WRITE;
|
||
voiceIO->message.io_Data = phonBuf;
|
||
voiceIO->message.io_Length = strlen((STRPTR)phonBuf);
|
||
SendIO((struct IORequest *)voiceIO);
|
||
|
||
/* While speaking, read mouth shapes */
|
||
while (!CheckIO((struct IORequest *)voiceIO))
|
||
{
|
||
mouthIO->voice.message.io_Command = CMD_READ;
|
||
mouthIO->voice.message.io_Data = phonBuf; /* same buffer — narrator correlates */
|
||
mouthIO->voice.message.io_Length = strlen((STRPTR)phonBuf);
|
||
DoIO((struct IORequest *)mouthIO);
|
||
|
||
/* mouthIO->width = 0..255 (closed → wide open) */
|
||
/* mouthIO->height = 0..255 (closed → wide open) */
|
||
AnimateMouth(mouthIO->width, mouthIO->height);
|
||
}
|
||
```
|
||
|
||
### 3. Progressive Translation of Long Text
|
||
|
||
For documents or long-form text where a single 2 KB buffer won't suffice:
|
||
|
||
```c
|
||
LONG TranslateLongText(STRPTR text, LONG totalLen, BPTR outputFH)
|
||
{
|
||
UBYTE phonBuf[2048];
|
||
LONG offset = 0;
|
||
|
||
while (offset < totalLen)
|
||
{
|
||
LONG bytesAvail = totalLen - offset;
|
||
LONG rtn = Translate(text + offset, bytesAvail,
|
||
(STRPTR)phonBuf, sizeof(phonBuf));
|
||
|
||
if (rtn == 0)
|
||
{
|
||
/* Final chunk — write and done */
|
||
LONG phonLen = strlen((STRPTR)phonBuf);
|
||
Write(outputFH, phonBuf, phonLen);
|
||
break;
|
||
}
|
||
else if (rtn < 0)
|
||
{
|
||
/* Write completed portion, resume at word boundary */
|
||
LONG phonLen = strlen((STRPTR)phonBuf);
|
||
Write(outputFH, phonBuf, phonLen);
|
||
offset += (-rtn);
|
||
}
|
||
else
|
||
{
|
||
/* unexpected error */
|
||
return rtn;
|
||
}
|
||
}
|
||
return 0;
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Modern Analogies
|
||
|
||
| Amiga Concept | Modern Equivalent | Why It Maps | Where It Diverges |
|
||
|---|---|---|---|
|
||
| **translator.library** | macOS `NSSpeechSynthesizer` / Windows SAPI Text-to-Speech | Both accept English text and produce speech. The API philosophy — text in, audio out — is identical | Modern APIs bundle translation and synthesis; Amiga splits them into library (translate) and device (speak) |
|
||
| **ARPABET phonemes** | IPA (International Phonetic Alphabet) | Both encode pronunciation as discrete symbols. ARPABET is a machine-readable subset of IPA | ARPABET is English-only; IPA is universal. ARPABET uses ASCII, IPA uses Unicode |
|
||
| **450 context-sensitive rules** | Modern TTS neural networks (Tacotron, FastSpeech) | Both learn pronunciation from data — rules are a 1985 hand-crafted "model" | Neural TTS requires gigabytes of training data; rule-based works with zero training |
|
||
| **narrator.device formant synthesis** | Vocaloid / singing synthesis | Both use formant models (F0, F1, F2...) to generate vocal sounds | Narrator.device is a 1985-era 8-bit formant synth; Vocaloid uses concatenative sampling + ML |
|
||
| **`Say` command / `speak:` handler** | `say` command on macOS / `espeak` on Linux | Both provide command-line text-to-speech | Amiga `Say` feeds translator.library → narrator.device; macOS `say` uses a system-wide speech server |
|
||
|
||
---
|
||
|
||
## References
|
||
|
||
- ADCD 2.1: *ROM Kernel Reference Manual: Libraries* — Chapter 36: Translator Library
|
||
- ADCD 2.1: *ROM Kernel Reference Manual: Devices* — Chapter 8: Narrator Device
|
||
- NDK 3.9: `devices/narrator.h` — `narrator_rb` and `mouth_rb` structures
|
||
- NDK 3.9: `clib/translator_protos.h` — `Translate()` prototype
|
||
- AmigaOS Documentation Wiki: [Narrator Device](https://wiki.amigaos.net/wiki/Narrator_Device) — complete phoneme table and phonetic writing guide
|
||
- AmigaOS Documentation Wiki: [Translator Library](https://wiki.amigaos.net/wiki/Translator_Library) — OS 4.x interface reference
|
||
- See also: [audio.md](../10_devices/audio.md) — audio.device DMA channel allocation used by narrator
|
||
- See also: [iffparse.md](iffparse.md) — IFF FTXT parsing (the AmigaGuide format sometimes wraps speech metadata in IFF chunks)
|