The Provider Spectrum

Daneel supports five LLM providers, each with different trade-offs in quality, privacy, cost, and setup. This page helps you understand when to use which.

For configuration steps, see Connect a Cloud Provider. For the full technical specs, see the AI Providers reference.

The spectrum

Providers are arranged from most private (left) to most capable (right):

WebGPU → Gemini Nano → Ollama → Azure OpenAI → Claude
  ↑                                                 ↑
  100% local                              highest quality
  zero cost                               per-token cost
  smaller models                          largest models

There is no universally “best” provider. The right choice depends on what you’re optimizing for.

Provider comparison

	WebGPU	Gemini Nano	Ollama	Azure OpenAI	Claude
Privacy	On-device	On-device	Local network	Your cloud	Third-party
Quality	Good (up to 3B models)	Basic (3B)	Good to excellent	Excellent	Excellent
Cost	Free	Free	Free	Azure pricing	Per-token
Setup	None	Chrome flag	Install Ollama	Azure subscription	API key
Internet	No	No	LAN only	Yes	Yes
Tool calling	Experimental	Experimental	Yes	Yes	Yes (best)
Model variety	20+ models	1 model	Thousands	Your deployments	3 models

When to use each

WebGPU — privacy above all

Choose WebGPU when:

You need complete privacy with zero data leaving your machine
You’re working with sensitive or confidential content
You don’t have (or don’t want to use) API keys
You have a decent GPU (most modern integrated GPUs work)

Limitations: smaller models (up to ~3B parameters) mean lower quality on complex reasoning tasks. Tool calling is experimental and unreliable. That said, models like Bonsai 1.7B (q1) weigh just 291 MB while still supporting step-by-step reasoning, making WebGPU viable even on low-end hardware.

Gemini Nano — zero-setup local AI

Choose Gemini Nano when:

You want on-device inference with zero downloads
You’re on a Chrome version that supports the AI API
Quality requirements are modest

Limitations: single model, no model choice, limited capabilities, experimental tool calling.

Ollama — local power

Choose Ollama when:

You want to run larger, more capable models locally
Privacy matters but you’re comfortable with localhost/LAN traffic
You want to experiment with many different models
You need reliable tool calling with local models

Limitations: requires installing and running the Ollama server. Resource-heavy for large models.

Azure OpenAI — enterprise compliance

Choose Azure OpenAI when:

Your organization requires Azure data residency
You need enterprise-grade compliance and audit trails
You have existing Azure OpenAI deployments
Tool calling reliability matters

Limitations: requires Azure subscription and deployment setup.

Claude — maximum quality

Choose Claude when:

Response quality is the top priority
You need the best tool calling experience with MCP servers
You’re comfortable sending prompts to Anthropic’s API
Cost per token is acceptable for your use case

Limitations: requires API key, internet connection, and costs money per token.

Mixing providers

You can switch providers at any time from the chat panel dropdown. A common pattern:

Daily browsing: WebGPU for quick, private page Q&A
Deep research: Switch to Claude when you need high-quality synthesis across a large site index
Tool workflows: Use Claude or Ollama when working with MCP-connected agents

Embedding always runs locally regardless of LLM provider. Your indexes and vaults work with any provider — switching only changes the AI that generates answers.

Quality vs. model size

A rough rule of thumb for LLM quality:

< 1B parameters — basic summarization, simple Q&A
1B–3B parameters — good for most page Q&A and document chat (WebGPU default)
7B–13B parameters — strong reasoning, reliable tool calling (Ollama sweet spot)
70B+ parameters — near state-of-the-art (large Ollama models, Claude, Azure)

Larger models need more memory and compute. WebGPU is limited by browser GPU memory. Ollama can use system RAM for larger models but inference is slower.

Quantization also matters. A 1.7B model at standard q4 weighs ~1.1 GB, but at q1 (1-bit) it drops to 291 MB. Daneel supports q1 and q2 quantization for models that are designed for it, like PrismML’s Bonsai family. Not every model benefits from extreme quantization, though: 1-bit works best when the model was trained for it from the start.