Speech Providers

To get started with speech, see How to Read Messages Aloud and Dictate Questions. For the design rationale behind multiple providers, see Speech in Daneel.

Text-to-speech providers

Daneel supports three TTS providers. All implement the same interface, switching is a single click in Settings > Speech.

System voices (default)

Uses the browser’s built-in Speech Synthesis API. The voice catalog is whatever your operating system provides.

Property	Value
Provider id	`web-speech`
Data residency	On-device (mostly)
Download size	0 MB
Internet required	No (except for optional cloud voices)
Languages	All voices your OS provides
Streaming start	Instant
Cancellation latency	~100 ms

Chrome exposes a subset of voices named Google <language> that stream text to Google servers for higher-quality prosody. Daneel filters these by default. Flip Settings > Speech > Advanced > Allow Google cloud voices to expose them. They are clearly marked (cloud) in the voice list.

Kokoro 82M (local)

A neural TTS model running entirely in your browser on WebGPU. 82 million parameters, 54 voices, seven languages.

Property	Value
Provider id	`kokoro`
Data residency	On-device
Download size	~326 MB (one-time)
Internet required	First download only
Languages	en-US, en-GB, es, fr, it, hi, ja, zh
Quantization (dtype)	fp32 (Xenova reference config for WebGPU)
Sample rate	24 kHz mono
Cache location	Browser Cache API (`transformers-cache` + `kokoro-voices`)

The 54-voice list is split by locale and gender. High-quality voices are marked with emoji in kokoro-js’s voice table (Heart ❤️, Bella 🔥, Nicole 🎧, Emma 🚺, George 🚹).

Voice style files are fetched on first use per voice and cached separately under the kokoro-voices Cache API bucket.

Moonshine (coming soon)

Placeholder provider. Catalog entries exist in the provider picker but the card remains disabled. When the provider class ships, it will extend local speech recognition with the same privacy guarantees as Kokoro.

Speech-to-text providers

Browser speech recognition (default)

Uses Chrome’s built-in SpeechRecognition API. Audio streams to Google servers for transcription.

Property	Value
Provider id	`web-speech`
Data residency	Third-party cloud (Google)
Download size	0 MB
Internet required	Yes
Languages	Any BCP-47 tag supported by Chrome
Offline Mode behavior	Blocked, mic button disables with tooltip

Set the recognition language in Settings > Speech > Speech recognition > Language. The default is en-US.

Moonshine Base / Tiny (coming soon)

Two sizes of a local English speech recognition model. Catalog entries exist, provider classes pending.

Variant	Download	Use case
Moonshine Base	~120 MB	Best accuracy
Moonshine Tiny	~55 MB	Low-end devices

Settings reference

All speech controls live under Settings > Speech, split into two sections.

Text-to-speech section

Control	Values	Default
Enabled	on / off	on
Provider	System voices / Kokoro 82M	System voices
Voice	provider-specific list	provider default
Speed	0.5× to 2.0×	1.0×
Auto-read responses	on / off	off
Allow Google cloud voices	on / off	off (advanced)

The voice picker updates based on the active provider. Kokoro’s picker is populated after the model is cached; before that, the card shows a Download button instead of the picker.

Speech recognition section

Control	Values	Default
Enabled	on / off	on
Provider	Browser speech recognition / Moonshine Base / Moonshine Tiny	Browser speech recognition
Recognition language	BCP-47 tag	`en-US`

Keyboard shortcut

Alt+Space toggles dictation from anywhere on the page. The shortcut is registered via the toggle-stt Chrome extension command and can be reassigned at chrome://extensions/shortcuts.

UI affordances

Play button — appears in the hover action row on every assistant message, between Copy and Delete. Flips to Stop when that message is playing.
Mic button — appears in the chat composer next to Send. Four states: idle (grey), requesting-permission (amber, pulsing), listening (red, pulsing), transcribing (amber, static).
Test button — next to the voice picker in Settings. Plays a short sample of the currently selected voice at the current rate.
Cloud badge — the (cloud) suffix on voice list entries indicates a voice that streams text to a remote service. Visible only when Allow Google cloud voices is enabled.

Privacy profiles

Each provider carries a PrivacyProfile consulted by the Offline Mode network gate.

Provider	leavesProcess	leavesMachine	dataObservers
System voices (local)	true	false	browser-vendor
System voices (Google cloud)	true	true	browser-vendor
Kokoro 82M	false	false	none
Web Speech STT	true	true	browser-vendor
Moonshine (planned)	false	false	none

When leavesMachine: true and Offline Mode is effective, the network gate blocks the call and the relevant UI affordance (mic button, cloud-voice playback) disables with a tooltip.

What Daneel never touches

Raw audio waveforms are not persisted. Neither the PCM produced by Kokoro nor the audio captured by the mic is written to storage. Everything lives in memory for the duration of the playback or recording.
Transcripts are not saved outside the chat message. When dictation completes, the text lands in the composer. If you do not send the message, nothing is stored.
No telemetry includes speech content. The analytics catalog explicitly forbids logging transcripts, voice IDs the user typed, or error messages; only enums, booleans, durations, and character counts are emitted.