Skip to content

The Provider Spectrum

Daneel supports five LLM providers, each with different trade-offs in quality, privacy, cost, and setup. This page helps you understand when to use which.

For configuration steps, see Connect a Cloud Provider. For the full technical specs, see the AI Providers reference.

Providers are arranged from most private (left) to most capable (right):

WebGPU → Gemini Nano → Ollama → Azure OpenAI → Claude
↑ ↑
100% local highest quality
zero cost per-token cost
smaller models largest models

There is no universally “best” provider. The right choice depends on what you’re optimizing for.

WebGPUGemini NanoOllamaAzure OpenAIClaude
PrivacyOn-deviceOn-deviceLocal networkYour cloudThird-party
QualityGood (up to 3B models)Basic (3B)Good to excellentExcellentExcellent
CostFreeFreeFreeAzure pricingPer-token
SetupNoneChrome flagInstall OllamaAzure subscriptionAPI key
InternetNoNoLAN onlyYesYes
Tool callingExperimentalExperimentalYesYesYes (best)
Model variety20+ models1 modelThousandsYour deployments3 models

Choose WebGPU when:

  • You need complete privacy with zero data leaving your machine
  • You’re working with sensitive or confidential content
  • You don’t have (or don’t want to use) API keys
  • You have a decent GPU (most modern integrated GPUs work)

Limitations: smaller models (up to ~3B parameters) mean lower quality on complex reasoning tasks. Tool calling is experimental and unreliable. That said, models like Bonsai 1.7B (q1) weigh just 291 MB while still supporting step-by-step reasoning, making WebGPU viable even on low-end hardware.

Choose Gemini Nano when:

  • You want on-device inference with zero downloads
  • You’re on a Chrome version that supports the AI API
  • Quality requirements are modest

Limitations: single model, no model choice, limited capabilities, experimental tool calling.

Choose Ollama when:

  • You want to run larger, more capable models locally
  • Privacy matters but you’re comfortable with localhost/LAN traffic
  • You want to experiment with many different models
  • You need reliable tool calling with local models

Limitations: requires installing and running the Ollama server. Resource-heavy for large models.

Choose Azure OpenAI when:

  • Your organization requires Azure data residency
  • You need enterprise-grade compliance and audit trails
  • You have existing Azure OpenAI deployments
  • Tool calling reliability matters

Limitations: requires Azure subscription and deployment setup.

Choose Claude when:

  • Response quality is the top priority
  • You need the best tool calling experience with MCP servers
  • You’re comfortable sending prompts to Anthropic’s API
  • Cost per token is acceptable for your use case

Limitations: requires API key, internet connection, and costs money per token.

You can switch providers at any time from the chat panel dropdown. A common pattern:

  • Daily browsing: WebGPU for quick, private page Q&A
  • Deep research: Switch to Claude when you need high-quality synthesis across a large site index
  • Tool workflows: Use Claude or Ollama when working with MCP-connected agents

Embedding always runs locally regardless of LLM provider. Your indexes and vaults work with any provider — switching only changes the AI that generates answers.

A rough rule of thumb for LLM quality:

  • < 1B parameters — basic summarization, simple Q&A
  • 1B–3B parameters — good for most page Q&A and document chat (WebGPU default)
  • 7B–13B parameters — strong reasoning, reliable tool calling (Ollama sweet spot)
  • 70B+ parameters — near state-of-the-art (large Ollama models, Claude, Azure)

Larger models need more memory and compute. WebGPU is limited by browser GPU memory. Ollama can use system RAM for larger models but inference is slower.

Quantization also matters. A 1.7B model at standard q4 weighs ~1.1 GB, but at q1 (1-bit) it drops to 291 MB. Daneel supports q1 and q2 quantization for models that are designed for it, like PrismML’s Bonsai family. Not every model benefits from extreme quantization, though: 1-bit works best when the model was trained for it from the start.