AI Providers
To get started with a provider, see Connect a Cloud Provider. For a conceptual comparison, see The Provider Spectrum.
LLM Providers
Section titled “LLM Providers”Daneel supports five LLM backends. All implement the same interface — switching providers changes the AI brain without affecting the rest of the experience.
WebGPU (Local)
Section titled “WebGPU (Local)”Runs AI models directly on your GPU using WebGPU and ONNX Runtime. No server, no API key, no internet after first model download.
| Property | Value |
|---|---|
| Data residency | On-device |
| Internet required | No (after model cache) |
| Tool calling | Experimental (prompt-based XML tags) |
| Streaming | Yes |
| Thinking/reasoning | Yes (model-dependent) |
| Cost | Free |
Available models:
Models are auto-selected based on your GPU capabilities. The catalog includes 20+ models from Liquid AI, Microsoft, HuggingFace, PrismML, DeepSeek, Zhipu, Alibaba, Meta, Google, and IBM, ranging from 350M to 3B+ parameters. Open Settings > AI Models to browse the full catalog with hardware compatibility scores.
Default model: Granite 4.0 Micro 3B (q4f16).
Quantization formats: Models ship in various quantization levels. Most use q4 (4-bit), which balances quality and size. Some models also offer q4f16 (4-bit with fp16 compute, requires shader-f16), q8 (8-bit), q2 (2-bit), and q1 (1-bit). The 1-bit and 2-bit formats are new, enabled by @huggingface/transformers 4.1.0.
Notable: Bonsai 1.7B from PrismML is available in both q4 (1.1 GB) and q1 (291 MB). The q1 variant is the lightest thinking-capable model in the catalog, designed for low-end GPUs and fast cold starts.
Configuration: Settings > WebGPU. Model selection, quantization level, and context window are auto-configured based on GPU detection.
Ollama (Local Server)
Section titled “Ollama (Local Server)”Connects to a local Ollama server via the OpenAI-compatible API.
| Property | Value |
|---|---|
| Data residency | Local network |
| Internet required | No (LAN only) |
| Tool calling | Yes (OpenAI function format) |
| Streaming | Yes |
| Thinking/reasoning | Yes (think-block stripping) |
| Cost | Free (self-hosted) |
Configuration: Settings > Ollama.
| Setting | Default | Description |
|---|---|---|
| Base URL | http://localhost:11434 | Ollama server address |
| Model | — | Selected from detected installed models |
| Availability timeout | 3,000 ms | How long to wait for server probe |
| Model list timeout | 5,000 ms | How long to wait for model enumeration |
Daneel auto-probes the Ollama server on settings open. Model management (pull, delete) is available in the Ollama settings panel.
Gemini Nano (Chrome Built-in)
Section titled “Gemini Nano (Chrome Built-in)”Uses Chrome’s built-in Gemini Nano model via the Chrome AI API.
| Property | Value |
|---|---|
| Data residency | On-device |
| Internet required | No |
| Tool calling | Experimental (prompt-based XML tags) |
| Streaming | Yes |
| Thinking/reasoning | No |
| Cost | Free |
Configuration: Settings > Gemini Nano. Language selection. Availability is auto-detected — requires Chrome 120+ with the Gemini Nano flag enabled.
Claude (Anthropic API)
Section titled “Claude (Anthropic API)”Connects to Anthropic’s Claude models via the API.
| Property | Value |
|---|---|
| Data residency | Third-party cloud (Anthropic) |
| Internet required | Yes |
| Tool calling | Yes (native tool_use blocks) |
| Streaming | Yes (SSE) |
| Thinking/reasoning | Yes |
| Cost | Per-token (see below) |
Available models:
| Model | Input cost | Output cost | Context |
|---|---|---|---|
| Claude Opus 4.7 | $5 / 1M tokens | $25 / 1M tokens | 200K |
| Claude Opus 4.6 | $5 / 1M tokens | $25 / 1M tokens | 200K |
| Claude Sonnet 4.6 | $3 / 1M tokens | $15 / 1M tokens | 200K |
| Claude Haiku 4.5 | $1 / 1M tokens | $5 / 1M tokens | 200K |
Cost annotations appear next to each response in the chat panel.
Configuration: Settings > Claude. API key is encrypted with AES-256-GCM and stored locally. The key never leaves your browser unencrypted.
Azure OpenAI (Enterprise)
Section titled “Azure OpenAI (Enterprise)”Connects to Azure OpenAI Service deployments.
| Property | Value |
|---|---|
| Data residency | Your Azure tenant |
| Internet required | Yes |
| Tool calling | Yes (OpenAI function format) |
| Streaming | Yes |
| Thinking/reasoning | Model-dependent |
| Cost | Per your Azure pricing |
Authentication: API Key or Entra ID (OAuth2). See How to Set Up Azure OpenAI for configuration steps.
Embedding Providers
Section titled “Embedding Providers”Daneel uses a local embedding model for all vector operations (site indexing, vault search, knowledge graph).
| Model | Dimensions | Context | Backend | Size |
|---|---|---|---|---|
| BGE Small EN v1.5 (default) | 384 | 512 tokens | WebGPU fp16 | ~23 MB |
| Granite Embedding | 384 | 1,024 tokens | WebGPU q8 | — |
| MiniLM-L6-v2 | 384 | 256 tokens | WebGPU q8 | — |
Embeddings always run locally regardless of your LLM provider choice. Batched at 32 chunks maximum to prevent GPU memory issues.
Vector Search
Section titled “Vector Search”| Implementation | Persistence | Use case |
|---|---|---|
| IndexedDBVectorStore | Persistent (survives browser restart) | Production — site indexes, vaults |
| GPUCosineSearch | In-memory (GPU-accelerated) | <5ms search over 50k+ chunks |
| InMemoryVectorStore | Ephemeral | Testing only |
Tool Calling Support by Provider
Section titled “Tool Calling Support by Provider”| Provider | Strategy | Reliability | Notes |
|---|---|---|---|
| Claude | Native tool_use blocks | High | Best MCP experience |
| Ollama | OpenAI function format | High | Depends on model |
| Azure OpenAI | OpenAI function format | High | Depends on deployment |
| WebGPU | Prompt-based XML tags | Low | Small models struggle with tool format |
| Gemini Nano | Prompt-based XML tags | Low | 3B model often misformats calls |