Skip to content

AI Providers

To get started with a provider, see Connect a Cloud Provider. For a conceptual comparison, see The Provider Spectrum.

Daneel supports five LLM backends. All implement the same interface — switching providers changes the AI brain without affecting the rest of the experience.

Runs AI models directly on your GPU using WebGPU and ONNX Runtime. No server, no API key, no internet after first model download.

PropertyValue
Data residencyOn-device
Internet requiredNo (after model cache)
Tool callingExperimental (prompt-based XML tags)
StreamingYes
Thinking/reasoningYes (model-dependent)
CostFree

Available models:

Models are auto-selected based on your GPU capabilities. The catalog includes 20+ models from Liquid AI, Microsoft, HuggingFace, PrismML, DeepSeek, Zhipu, Alibaba, Meta, Google, and IBM, ranging from 350M to 3B+ parameters. Open Settings > AI Models to browse the full catalog with hardware compatibility scores.

Default model: Granite 4.0 Micro 3B (q4f16).

Quantization formats: Models ship in various quantization levels. Most use q4 (4-bit), which balances quality and size. Some models also offer q4f16 (4-bit with fp16 compute, requires shader-f16), q8 (8-bit), q2 (2-bit), and q1 (1-bit). The 1-bit and 2-bit formats are new, enabled by @huggingface/transformers 4.1.0.

Notable: Bonsai 1.7B from PrismML is available in both q4 (1.1 GB) and q1 (291 MB). The q1 variant is the lightest thinking-capable model in the catalog, designed for low-end GPUs and fast cold starts.

Configuration: Settings > WebGPU. Model selection, quantization level, and context window are auto-configured based on GPU detection.

Connects to a local Ollama server via the OpenAI-compatible API.

PropertyValue
Data residencyLocal network
Internet requiredNo (LAN only)
Tool callingYes (OpenAI function format)
StreamingYes
Thinking/reasoningYes (think-block stripping)
CostFree (self-hosted)

Configuration: Settings > Ollama.

SettingDefaultDescription
Base URLhttp://localhost:11434Ollama server address
ModelSelected from detected installed models
Availability timeout3,000 msHow long to wait for server probe
Model list timeout5,000 msHow long to wait for model enumeration

Daneel auto-probes the Ollama server on settings open. Model management (pull, delete) is available in the Ollama settings panel.

Uses Chrome’s built-in Gemini Nano model via the Chrome AI API.

PropertyValue
Data residencyOn-device
Internet requiredNo
Tool callingExperimental (prompt-based XML tags)
StreamingYes
Thinking/reasoningNo
CostFree

Configuration: Settings > Gemini Nano. Language selection. Availability is auto-detected — requires Chrome 120+ with the Gemini Nano flag enabled.

Connects to Anthropic’s Claude models via the API.

PropertyValue
Data residencyThird-party cloud (Anthropic)
Internet requiredYes
Tool callingYes (native tool_use blocks)
StreamingYes (SSE)
Thinking/reasoningYes
CostPer-token (see below)

Available models:

ModelInput costOutput costContext
Claude Opus 4.7$5 / 1M tokens$25 / 1M tokens200K
Claude Opus 4.6$5 / 1M tokens$25 / 1M tokens200K
Claude Sonnet 4.6$3 / 1M tokens$15 / 1M tokens200K
Claude Haiku 4.5$1 / 1M tokens$5 / 1M tokens200K

Cost annotations appear next to each response in the chat panel.

Configuration: Settings > Claude. API key is encrypted with AES-256-GCM and stored locally. The key never leaves your browser unencrypted.

Connects to Azure OpenAI Service deployments.

PropertyValue
Data residencyYour Azure tenant
Internet requiredYes
Tool callingYes (OpenAI function format)
StreamingYes
Thinking/reasoningModel-dependent
CostPer your Azure pricing

Authentication: API Key or Entra ID (OAuth2). See How to Set Up Azure OpenAI for configuration steps.

Daneel uses a local embedding model for all vector operations (site indexing, vault search, knowledge graph).

ModelDimensionsContextBackendSize
BGE Small EN v1.5 (default)384512 tokensWebGPU fp16~23 MB
Granite Embedding3841,024 tokensWebGPU q8
MiniLM-L6-v2384256 tokensWebGPU q8

Embeddings always run locally regardless of your LLM provider choice. Batched at 32 chunks maximum to prevent GPU memory issues.

ImplementationPersistenceUse case
IndexedDBVectorStorePersistent (survives browser restart)Production — site indexes, vaults
GPUCosineSearchIn-memory (GPU-accelerated)<5ms search over 50k+ chunks
InMemoryVectorStoreEphemeralTesting only
ProviderStrategyReliabilityNotes
ClaudeNative tool_use blocksHighBest MCP experience
OllamaOpenAI function formatHighDepends on model
Azure OpenAIOpenAI function formatHighDepends on deployment
WebGPUPrompt-based XML tagsLowSmall models struggle with tool format
Gemini NanoPrompt-based XML tagsLow3B model often misformats calls