Skip to content

Speech Providers

To get started with speech, see How to Read Messages Aloud and Dictate Questions. For the design rationale behind multiple providers, see Speech in Daneel.

Daneel supports three TTS providers. All implement the same interface, switching is a single click in Settings > Speech.

Uses the browser’s built-in Speech Synthesis API. The voice catalog is whatever your operating system provides.

PropertyValue
Provider idweb-speech
Data residencyOn-device (mostly)
Download size0 MB
Internet requiredNo (except for optional cloud voices)
LanguagesAll voices your OS provides
Streaming startInstant
Cancellation latency~100 ms

Chrome exposes a subset of voices named Google <language> that stream text to Google servers for higher-quality prosody. Daneel filters these by default. Flip Settings > Speech > Advanced > Allow Google cloud voices to expose them. They are clearly marked (cloud) in the voice list.

A neural TTS model running entirely in your browser on WebGPU. 82 million parameters, 54 voices, seven languages.

PropertyValue
Provider idkokoro
Data residencyOn-device
Download size~326 MB (one-time)
Internet requiredFirst download only
Languagesen-US, en-GB, es, fr, it, hi, ja, zh
Quantization (dtype)fp32 (Xenova reference config for WebGPU)
Sample rate24 kHz mono
Cache locationBrowser Cache API (transformers-cache + kokoro-voices)

The 54-voice list is split by locale and gender. High-quality voices are marked with emoji in kokoro-js’s voice table (Heart ❤️, Bella 🔥, Nicole 🎧, Emma 🚺, George 🚹).

Voice style files are fetched on first use per voice and cached separately under the kokoro-voices Cache API bucket.

Placeholder provider. Catalog entries exist in the provider picker but the card remains disabled. When the provider class ships, it will extend local speech recognition with the same privacy guarantees as Kokoro.

Uses Chrome’s built-in SpeechRecognition API. Audio streams to Google servers for transcription.

PropertyValue
Provider idweb-speech
Data residencyThird-party cloud (Google)
Download size0 MB
Internet requiredYes
LanguagesAny BCP-47 tag supported by Chrome
Offline Mode behaviorBlocked, mic button disables with tooltip

Set the recognition language in Settings > Speech > Speech recognition > Language. The default is en-US.

Two sizes of a local English speech recognition model. Catalog entries exist, provider classes pending.

VariantDownloadUse case
Moonshine Base~120 MBBest accuracy
Moonshine Tiny~55 MBLow-end devices

All speech controls live under Settings > Speech, split into two sections.

ControlValuesDefault
Enabledon / offon
ProviderSystem voices / Kokoro 82MSystem voices
Voiceprovider-specific listprovider default
Speed0.5× to 2.0×1.0×
Auto-read responseson / offoff
Allow Google cloud voiceson / offoff (advanced)

The voice picker updates based on the active provider. Kokoro’s picker is populated after the model is cached; before that, the card shows a Download button instead of the picker.

ControlValuesDefault
Enabledon / offon
ProviderBrowser speech recognition / Moonshine Base / Moonshine TinyBrowser speech recognition
Recognition languageBCP-47 tagen-US

Alt+Space toggles dictation from anywhere on the page. The shortcut is registered via the toggle-stt Chrome extension command and can be reassigned at chrome://extensions/shortcuts.

  • Play button — appears in the hover action row on every assistant message, between Copy and Delete. Flips to Stop when that message is playing.
  • Mic button — appears in the chat composer next to Send. Four states: idle (grey), requesting-permission (amber, pulsing), listening (red, pulsing), transcribing (amber, static).
  • Test button — next to the voice picker in Settings. Plays a short sample of the currently selected voice at the current rate.
  • Cloud badge — the (cloud) suffix on voice list entries indicates a voice that streams text to a remote service. Visible only when Allow Google cloud voices is enabled.

Each provider carries a PrivacyProfile consulted by the Offline Mode network gate.

ProviderleavesProcessleavesMachinedataObservers
System voices (local)truefalsebrowser-vendor
System voices (Google cloud)truetruebrowser-vendor
Kokoro 82Mfalsefalsenone
Web Speech STTtruetruebrowser-vendor
Moonshine (planned)falsefalsenone

When leavesMachine: true and Offline Mode is effective, the network gate blocks the call and the relevant UI affordance (mic button, cloud-voice playback) disables with a tooltip.

  • Raw audio waveforms are not persisted. Neither the PCM produced by Kokoro nor the audio captured by the mic is written to storage. Everything lives in memory for the duration of the playback or recording.
  • Transcripts are not saved outside the chat message. When dictation completes, the text lands in the composer. If you do not send the message, nothing is stored.
  • No telemetry includes speech content. The analytics catalog explicitly forbids logging transcripts, voice IDs the user typed, or error messages; only enums, booleans, durations, and character counts are emitted.