The Provider Spectrum
Daneel supports five LLM providers, each with different trade-offs in quality, privacy, cost, and setup. This page helps you understand when to use which.
For configuration steps, see Connect a Cloud Provider. For the full technical specs, see the AI Providers reference.
The spectrum
Section titled “The spectrum”Providers are arranged from most private (left) to most capable (right):
WebGPU → Gemini Nano → Ollama → Azure OpenAI → Claude ↑ ↑ 100% local highest quality zero cost per-token cost smaller models largest modelsThere is no universally “best” provider. The right choice depends on what you’re optimizing for.
Provider comparison
Section titled “Provider comparison”| WebGPU | Gemini Nano | Ollama | Azure OpenAI | Claude | |
|---|---|---|---|---|---|
| Privacy | On-device | On-device | Local network | Your cloud | Third-party |
| Quality | Good (up to 3B models) | Basic (3B) | Good to excellent | Excellent | Excellent |
| Cost | Free | Free | Free | Azure pricing | Per-token |
| Setup | None | Chrome flag | Install Ollama | Azure subscription | API key |
| Internet | No | No | LAN only | Yes | Yes |
| Tool calling | Experimental | Experimental | Yes | Yes | Yes (best) |
| Model variety | 20+ models | 1 model | Thousands | Your deployments | 3 models |
When to use each
Section titled “When to use each”WebGPU — privacy above all
Section titled “WebGPU — privacy above all”Choose WebGPU when:
- You need complete privacy with zero data leaving your machine
- You’re working with sensitive or confidential content
- You don’t have (or don’t want to use) API keys
- You have a decent GPU (most modern integrated GPUs work)
Limitations: smaller models (up to ~3B parameters) mean lower quality on complex reasoning tasks. Tool calling is experimental and unreliable. That said, models like Bonsai 1.7B (q1) weigh just 291 MB while still supporting step-by-step reasoning, making WebGPU viable even on low-end hardware.
Gemini Nano — zero-setup local AI
Section titled “Gemini Nano — zero-setup local AI”Choose Gemini Nano when:
- You want on-device inference with zero downloads
- You’re on a Chrome version that supports the AI API
- Quality requirements are modest
Limitations: single model, no model choice, limited capabilities, experimental tool calling.
Ollama — local power
Section titled “Ollama — local power”Choose Ollama when:
- You want to run larger, more capable models locally
- Privacy matters but you’re comfortable with localhost/LAN traffic
- You want to experiment with many different models
- You need reliable tool calling with local models
Limitations: requires installing and running the Ollama server. Resource-heavy for large models.
Azure OpenAI — enterprise compliance
Section titled “Azure OpenAI — enterprise compliance”Choose Azure OpenAI when:
- Your organization requires Azure data residency
- You need enterprise-grade compliance and audit trails
- You have existing Azure OpenAI deployments
- Tool calling reliability matters
Limitations: requires Azure subscription and deployment setup.
Claude — maximum quality
Section titled “Claude — maximum quality”Choose Claude when:
- Response quality is the top priority
- You need the best tool calling experience with MCP servers
- You’re comfortable sending prompts to Anthropic’s API
- Cost per token is acceptable for your use case
Limitations: requires API key, internet connection, and costs money per token.
Mixing providers
Section titled “Mixing providers”You can switch providers at any time from the chat panel dropdown. A common pattern:
- Daily browsing: WebGPU for quick, private page Q&A
- Deep research: Switch to Claude when you need high-quality synthesis across a large site index
- Tool workflows: Use Claude or Ollama when working with MCP-connected agents
Embedding always runs locally regardless of LLM provider. Your indexes and vaults work with any provider — switching only changes the AI that generates answers.
Quality vs. model size
Section titled “Quality vs. model size”A rough rule of thumb for LLM quality:
- < 1B parameters — basic summarization, simple Q&A
- 1B–3B parameters — good for most page Q&A and document chat (WebGPU default)
- 7B–13B parameters — strong reasoning, reliable tool calling (Ollama sweet spot)
- 70B+ parameters — near state-of-the-art (large Ollama models, Claude, Azure)
Larger models need more memory and compute. WebGPU is limited by browser GPU memory. Ollama can use system RAM for larger models but inference is slower.
Quantization also matters. A 1.7B model at standard q4 weighs ~1.1 GB, but at q1 (1-bit) it drops to 291 MB. Daneel supports q1 and q2 quantization for models that are designed for it, like PrismML’s Bonsai family. Not every model benefits from extreme quantization, though: 1-bit works best when the model was trained for it from the start.