
To get started with a provider, see [Connect a Cloud Provider](/guides/connect-provider/). For a conceptual comparison, see [The Provider Spectrum](/concepts/providers/).

## LLM Providers

Daneel supports five LLM backends. All implement the same interface — switching providers changes the AI brain without affecting the rest of the experience.

### WebGPU (Local)

Runs AI models directly on your GPU using WebGPU and ONNX Runtime. No server, no API key, no internet after first model download.

| Property | Value |
|----------|-------|
| Data residency | On-device |
| Internet required | No (after model cache) |
| Tool calling | Experimental (prompt-based XML tags) |
| Streaming | Yes |
| Thinking/reasoning | Yes (model-dependent) |
| Cost | Free |

**Available models:**

Models are auto-selected based on your GPU capabilities. The catalog includes 20+ models from Liquid AI, Microsoft, HuggingFace, PrismML, DeepSeek, Zhipu, Alibaba, Meta, Google, and IBM, ranging from 350M to 3B+ parameters. Open **Settings > AI Models** to browse the full catalog with hardware compatibility scores.

Default model: **Granite 4.0 Micro 3B** (q4f16).

**Quantization formats:** Models ship in various quantization levels. Most use q4 (4-bit), which balances quality and size. Some models also offer q4f16 (4-bit with fp16 compute, requires shader-f16), q8 (8-bit), q2 (2-bit), and q1 (1-bit). The 1-bit and 2-bit formats are new, enabled by `@huggingface/transformers` 4.1.0.

Notable: [Bonsai 1.7B](https://huggingface.co/onnx-community/Bonsai-1.7B-ONNX) from [PrismML](https://prismml.com) is available in both q4 (1.1 GB) and q1 (291 MB). The q1 variant is the lightest thinking-capable model in the catalog, designed for low-end GPUs and fast cold starts.

**Configuration:** Settings > WebGPU. Model selection, quantization level, and context window are auto-configured based on GPU detection.

### Ollama (Local Server)

Connects to a local [Ollama](https://ollama.com/) server via the OpenAI-compatible API.

| Property | Value |
|----------|-------|
| Data residency | Local network |
| Internet required | No (LAN only) |
| Tool calling | Yes (OpenAI function format) |
| Streaming | Yes |
| Thinking/reasoning | Yes (think-block stripping) |
| Cost | Free (self-hosted) |

**Configuration:** Settings > Ollama.

| Setting | Default | Description |
|---------|---------|-------------|
| Base URL | `http://localhost:11434` | Ollama server address |
| Model | — | Selected from detected installed models |
| Availability timeout | 3,000 ms | How long to wait for server probe |
| Model list timeout | 5,000 ms | How long to wait for model enumeration |

Daneel auto-probes the Ollama server on settings open. Model management (pull, delete) is available in the Ollama settings panel.

### Gemini Nano (Chrome Built-in)

Uses Chrome's built-in Gemini Nano model via the Chrome AI API.

| Property | Value |
|----------|-------|
| Data residency | On-device |
| Internet required | No |
| Tool calling | Experimental (prompt-based XML tags) |
| Streaming | Yes |
| Thinking/reasoning | No |
| Cost | Free |

**Configuration:** Settings > Gemini Nano. Language selection. Availability is auto-detected — requires Chrome 120+ with the Gemini Nano flag enabled.

### Claude (Anthropic API)

Connects to Anthropic's Claude models via the API.

| Property | Value |
|----------|-------|
| Data residency | Third-party cloud (Anthropic) |
| Internet required | Yes |
| Tool calling | Yes (native `tool_use` blocks) |
| Streaming | Yes (SSE) |
| Thinking/reasoning | Yes |
| Cost | Per-token (see below) |

**Available models:**

| Model | Input cost | Output cost | Context |
|-------|-----------|-------------|---------|
| Claude Opus 4.7 | $5 / 1M tokens | $25 / 1M tokens | 200K |
| Claude Opus 4.6 | $5 / 1M tokens | $25 / 1M tokens | 200K |
| Claude Sonnet 4.6 | $3 / 1M tokens | $15 / 1M tokens | 200K |
| Claude Haiku 4.5 | $1 / 1M tokens | $5 / 1M tokens | 200K |

Cost annotations appear next to each response in the chat panel.

**Configuration:** Settings > Claude. API key is encrypted with AES-256-GCM and stored locally. The key never leaves your browser unencrypted.

### Azure OpenAI (Enterprise)

Connects to Azure OpenAI Service deployments.

| Property | Value |
|----------|-------|
| Data residency | Your Azure tenant |
| Internet required | Yes |
| Tool calling | Yes (OpenAI function format) |
| Streaming | Yes |
| Thinking/reasoning | Model-dependent |
| Cost | Per your Azure pricing |

**Authentication:** API Key or Entra ID (OAuth2). See [How to Set Up Azure OpenAI](/how-to/azure-openai/) for configuration steps.

## Embedding Providers

Daneel uses a local embedding model for all vector operations (site indexing, vault search, knowledge graph).

| Model | Dimensions | Context | Backend | Size |
|-------|-----------|---------|---------|------|
| BGE Small EN v1.5 (default) | 384 | 512 tokens | WebGPU fp16 | ~23 MB |
| Granite Embedding | 384 | 1,024 tokens | WebGPU q8 | — |
| MiniLM-L6-v2 | 384 | 256 tokens | WebGPU q8 | — |

Embeddings always run locally regardless of your LLM provider choice. Batched at 32 chunks maximum to prevent GPU memory issues.

:::caution
Switching embedding models clears all existing indexes and vault embeddings, since vector dimensions may differ. Back up your data first.
:::

## Vector Search

| Implementation | Persistence | Use case |
|----------------|------------|----------|
| IndexedDBVectorStore | Persistent (survives browser restart) | Production — site indexes, vaults |
| GPUCosineSearch | In-memory (GPU-accelerated) | <5ms search over 50k+ chunks |
| InMemoryVectorStore | Ephemeral | Testing only |

## Tool Calling Support by Provider

| Provider | Strategy | Reliability | Notes |
|----------|----------|-------------|-------|
| Claude | Native `tool_use` blocks | High | Best MCP experience |
| Ollama | OpenAI function format | High | Depends on model |
| Azure OpenAI | OpenAI function format | High | Depends on deployment |
| WebGPU | Prompt-based XML tags | Low | Small models struggle with tool format |
| Gemini Nano | Prompt-based XML tags | Low | 3B model often misformats calls |
