Speech Providers
To get started with speech, see How to Read Messages Aloud and Dictate Questions. For the design rationale behind multiple providers, see Speech in Daneel.
Text-to-speech providers
Section titled “Text-to-speech providers”Daneel supports three TTS providers. All implement the same interface, switching is a single click in Settings > Speech.
System voices (default)
Section titled “System voices (default)”Uses the browser’s built-in Speech Synthesis API. The voice catalog is whatever your operating system provides.
| Property | Value |
|---|---|
| Provider id | web-speech |
| Data residency | On-device (mostly) |
| Download size | 0 MB |
| Internet required | No (except for optional cloud voices) |
| Languages | All voices your OS provides |
| Streaming start | Instant |
| Cancellation latency | ~100 ms |
Chrome exposes a subset of voices named Google <language> that stream text to Google servers for higher-quality prosody. Daneel filters these by default. Flip Settings > Speech > Advanced > Allow Google cloud voices to expose them. They are clearly marked (cloud) in the voice list.
Kokoro 82M (local)
Section titled “Kokoro 82M (local)”A neural TTS model running entirely in your browser on WebGPU. 82 million parameters, 54 voices, seven languages.
| Property | Value |
|---|---|
| Provider id | kokoro |
| Data residency | On-device |
| Download size | ~326 MB (one-time) |
| Internet required | First download only |
| Languages | en-US, en-GB, es, fr, it, hi, ja, zh |
| Quantization (dtype) | fp32 (Xenova reference config for WebGPU) |
| Sample rate | 24 kHz mono |
| Cache location | Browser Cache API (transformers-cache + kokoro-voices) |
The 54-voice list is split by locale and gender. High-quality voices are marked with emoji in kokoro-js’s voice table (Heart ❤️, Bella 🔥, Nicole 🎧, Emma 🚺, George 🚹).
Voice style files are fetched on first use per voice and cached separately under the kokoro-voices Cache API bucket.
Moonshine (coming soon)
Section titled “Moonshine (coming soon)”Placeholder provider. Catalog entries exist in the provider picker but the card remains disabled. When the provider class ships, it will extend local speech recognition with the same privacy guarantees as Kokoro.
Speech-to-text providers
Section titled “Speech-to-text providers”Browser speech recognition (default)
Section titled “Browser speech recognition (default)”Uses Chrome’s built-in SpeechRecognition API. Audio streams to Google servers for transcription.
| Property | Value |
|---|---|
| Provider id | web-speech |
| Data residency | Third-party cloud (Google) |
| Download size | 0 MB |
| Internet required | Yes |
| Languages | Any BCP-47 tag supported by Chrome |
| Offline Mode behavior | Blocked, mic button disables with tooltip |
Set the recognition language in Settings > Speech > Speech recognition > Language. The default is en-US.
Moonshine Base / Tiny (coming soon)
Section titled “Moonshine Base / Tiny (coming soon)”Two sizes of a local English speech recognition model. Catalog entries exist, provider classes pending.
| Variant | Download | Use case |
|---|---|---|
| Moonshine Base | ~120 MB | Best accuracy |
| Moonshine Tiny | ~55 MB | Low-end devices |
Settings reference
Section titled “Settings reference”All speech controls live under Settings > Speech, split into two sections.
Text-to-speech section
Section titled “Text-to-speech section”| Control | Values | Default |
|---|---|---|
| Enabled | on / off | on |
| Provider | System voices / Kokoro 82M | System voices |
| Voice | provider-specific list | provider default |
| Speed | 0.5× to 2.0× | 1.0× |
| Auto-read responses | on / off | off |
| Allow Google cloud voices | on / off | off (advanced) |
The voice picker updates based on the active provider. Kokoro’s picker is populated after the model is cached; before that, the card shows a Download button instead of the picker.
Speech recognition section
Section titled “Speech recognition section”| Control | Values | Default |
|---|---|---|
| Enabled | on / off | on |
| Provider | Browser speech recognition / Moonshine Base / Moonshine Tiny | Browser speech recognition |
| Recognition language | BCP-47 tag | en-US |
Keyboard shortcut
Section titled “Keyboard shortcut”Alt+Space toggles dictation from anywhere on the page. The shortcut is registered via the toggle-stt Chrome extension command and can be reassigned at chrome://extensions/shortcuts.
UI affordances
Section titled “UI affordances”- Play button — appears in the hover action row on every assistant message, between Copy and Delete. Flips to Stop when that message is playing.
- Mic button — appears in the chat composer next to Send. Four states: idle (grey), requesting-permission (amber, pulsing), listening (red, pulsing), transcribing (amber, static).
- Test button — next to the voice picker in Settings. Plays a short sample of the currently selected voice at the current rate.
- Cloud badge — the
(cloud)suffix on voice list entries indicates a voice that streams text to a remote service. Visible only when Allow Google cloud voices is enabled.
Privacy profiles
Section titled “Privacy profiles”Each provider carries a PrivacyProfile consulted by the Offline Mode network gate.
| Provider | leavesProcess | leavesMachine | dataObservers |
|---|---|---|---|
| System voices (local) | true | false | browser-vendor |
| System voices (Google cloud) | true | true | browser-vendor |
| Kokoro 82M | false | false | none |
| Web Speech STT | true | true | browser-vendor |
| Moonshine (planned) | false | false | none |
When leavesMachine: true and Offline Mode is effective, the network gate blocks the call and the relevant UI affordance (mic button, cloud-voice playback) disables with a tooltip.
What Daneel never touches
Section titled “What Daneel never touches”- Raw audio waveforms are not persisted. Neither the PCM produced by Kokoro nor the audio captured by the mic is written to storage. Everything lives in memory for the duration of the playback or recording.
- Transcripts are not saved outside the chat message. When dictation completes, the text lands in the composer. If you do not send the message, nothing is stored.
- No telemetry includes speech content. The analytics catalog explicitly forbids logging transcripts, voice IDs the user typed, or error messages; only enums, booleans, durations, and character counts are emitted.