
Daneel supports five LLM providers, each with different trade-offs in quality, privacy, cost, and setup. This page helps you understand when to use which.

For configuration steps, see [Connect a Cloud Provider](/guides/connect-provider/). For the full technical specs, see the [AI Providers reference](/reference/providers/).

## The spectrum

Providers are arranged from most private (left) to most capable (right):

```
WebGPU → Gemini Nano → Ollama → Azure OpenAI → Claude
  ↑                                                 ↑
  100% local                              highest quality
  zero cost                               per-token cost
  smaller models                          largest models
```

There is no universally "best" provider. The right choice depends on what you're optimizing for.

## Provider comparison

| | WebGPU | Gemini Nano | Ollama | Azure OpenAI | Claude |
|---|---|---|---|---|---|
| **Privacy** | On-device | On-device | Local network | Your cloud | Third-party |
| **Quality** | Good (up to 3B models) | Basic (3B) | Good to excellent | Excellent | Excellent |
| **Cost** | Free | Free | Free | Azure pricing | Per-token |
| **Setup** | None | Chrome flag | Install Ollama | Azure subscription | API key |
| **Internet** | No | No | LAN only | Yes | Yes |
| **Tool calling** | Experimental | Experimental | Yes | Yes | Yes (best) |
| **Model variety** | 20+ models | 1 model | Thousands | Your deployments | 3 models |

## When to use each

### WebGPU — privacy above all

Choose WebGPU when:
- You need complete privacy with zero data leaving your machine
- You're working with sensitive or confidential content
- You don't have (or don't want to use) API keys
- You have a decent GPU (most modern integrated GPUs work)

Limitations: smaller models (up to ~3B parameters) mean lower quality on complex reasoning tasks. Tool calling is experimental and unreliable. That said, models like Bonsai 1.7B (q1) weigh just 291 MB while still supporting step-by-step reasoning, making WebGPU viable even on low-end hardware.

### Gemini Nano — zero-setup local AI

Choose Gemini Nano when:
- You want on-device inference with zero downloads
- You're on a Chrome version that supports the AI API
- Quality requirements are modest

Limitations: single model, no model choice, limited capabilities, experimental tool calling.

### Ollama — local power

Choose Ollama when:
- You want to run larger, more capable models locally
- Privacy matters but you're comfortable with localhost/LAN traffic
- You want to experiment with many different models
- You need reliable tool calling with local models

Limitations: requires installing and running the Ollama server. Resource-heavy for large models.

### Azure OpenAI — enterprise compliance

Choose Azure OpenAI when:
- Your organization requires Azure data residency
- You need enterprise-grade compliance and audit trails
- You have existing Azure OpenAI deployments
- Tool calling reliability matters

Limitations: requires Azure subscription and deployment setup.

### Claude — maximum quality

Choose Claude when:
- Response quality is the top priority
- You need the best tool calling experience with MCP servers
- You're comfortable sending prompts to Anthropic's API
- Cost per token is acceptable for your use case

Limitations: requires API key, internet connection, and costs money per token.

## Mixing providers

You can switch providers at any time from the chat panel dropdown. A common pattern:

- **Daily browsing:** WebGPU for quick, private page Q&A
- **Deep research:** Switch to Claude when you need high-quality synthesis across a large site index
- **Tool workflows:** Use Claude or Ollama when working with MCP-connected agents

Embedding always runs locally regardless of LLM provider. Your indexes and vaults work with any provider — switching only changes the AI that generates answers.

## Quality vs. model size

A rough rule of thumb for LLM quality:

- **< 1B parameters** — basic summarization, simple Q&A
- **1B–3B parameters** — good for most page Q&A and document chat (WebGPU default)
- **7B–13B parameters** — strong reasoning, reliable tool calling (Ollama sweet spot)
- **70B+ parameters** — near state-of-the-art (large Ollama models, Claude, Azure)

Larger models need more memory and compute. WebGPU is limited by browser GPU memory. Ollama can use system RAM for larger models but inference is slower.

Quantization also matters. A 1.7B model at standard q4 weighs ~1.1 GB, but at q1 (1-bit) it drops to 291 MB. Daneel supports q1 and q2 quantization for models that are designed for it, like PrismML's Bonsai family. Not every model benefits from extreme quantization, though: 1-bit works best when the model was trained for it from the start.
