
A knowledge graph is a structured representation of the entities (people, organizations, places, concepts) in your documents and the relationships between them. Daneel builds knowledge graphs from vault documents to help you see connections that aren't obvious from reading individual files.

To build one, follow [How to Build a Knowledge Graph](/how-to/knowledge-graph/). To explore one with analytics and Wikipedia lookup, follow [How to Explore Your Knowledge Graph](/how-to/explore-knowledge-graph/). For the analytics layer's underlying ideas (importance, topics, bridges, paths), see [Graph Analytics](/concepts/graph-analytics/). For configuration details, see [Settings > Knowledge Graph](/reference/settings/#knowledge-graph).

## What it does

When you enable the knowledge graph on a vault, Daneel:

1. Reads every document in the vault
2. Extracts named entities using a local NER (Named Entity Recognition) model
3. Resolves duplicates ("OpenAI", "Open AI", "OPENAI" become one entity)
4. Identifies relationships based on co-occurrence within the same text passages
5. Builds an interactive 3D graph you can explore visually

The result is a map of your documents' key concepts and how they connect.

## Named Entity Recognition (NER)

NER is the process of identifying and classifying named things in text. Given the sentence:

> *"Satya Nadella announced that Microsoft would invest $10 billion in OpenAI."*

A NER model extracts:
- **Satya Nadella** — Person
- **Microsoft** — Organization
- **$10 billion** — Financial value
- **OpenAI** — Organization

Daneel uses GLiNER, an ONNX-based NER model that runs entirely in your browser via a dedicated web worker. No text is sent to any server for entity extraction.

The model comes in four variants, trading size for accuracy and language support:

| Model | Size | Languages | Best for |
|-------|------|-----------|----------|
| GLiNER Small v2.1 (fp32) | 583 MB | English | Maximum accuracy |
| GLiNER Small v2.1 (int8) | 183 MB | English | Good balance (default) |
| GLiNER Multi v2.1 (int8) | 349 MB | Multilingual | Non-English documents |
| GLiNER Multi v2.1 (fp16) | 580 MB | Multilingual | Best multilingual accuracy |

## Entity resolution

Raw NER output contains duplicates. "IBM", "I.B.M.", and "International Business Machines" might all refer to the same entity. Daneel's `EntityResolver` deduplicates using normalized string matching — comparing lowercased, whitespace-collapsed versions of entity names and merging those above a similarity threshold (default: 85%).

This is a heuristic, not perfect. It handles case variations and minor formatting differences well, but won't merge "IBM" and "Big Blue" (which would require semantic understanding). You can adjust the threshold in settings — lower values merge more aggressively, higher values are more conservative.

## Ontology presets

An ontology defines what types of entities the NER model looks for. Different domains have different relevant entity types. Daneel ships with 8 presets:

- **General** — people, organizations, places, events, concepts
- **Academic** — researchers, institutions, theories, publications
- **Legal** — cases, statutes, courts, parties
- **Medical** — conditions, treatments, drugs, anatomy
- **Programming** — languages, frameworks, APIs, data structures
- **Business** — companies, products, markets, financials
- **Travel** — destinations, landmarks, transport, accommodations
- **History** — historical figures, battles, treaties, eras

You can also define custom ontology labels for specialized domains. The ontology is configured per-vault, so a legal vault and a programming vault can use different entity types.

## Relationships

Daneel infers relationships from co-occurrence: if two entities appear in the same text passage, they're likely related. The more often they co-occur, the stronger the relationship.

This is simpler than hand-curated knowledge graphs (like Wikidata) where relationships have explicit types ("works for", "located in"). But for document analysis, co-occurrence captures the important signal: these things are discussed together.

## The visualization

The 3D graph uses WebGL rendering (via ngraph):

- **Nodes** are entities, sized by mention frequency
- **Edges** are relationships, weighted by co-occurrence strength
- **Colors** map to entity types (configurable per type)
- **Physics simulation** positions nodes — related entities cluster together, unrelated ones drift apart

You can rotate, zoom, and hover to explore. Customizable parameters include charge strength (node repulsion), link opacity, particle animations, and bloom glow effects.

## Why it helps

Reading 50 documents individually, you might miss that three different papers all mention the same researcher, or that a concept from document A is the foundation for the technique described in document D. The knowledge graph surfaces these cross-document connections visually.

It's most useful for:

- **Literature reviews** — mapping the landscape of who studies what
- **Legal discovery** — seeing which entities appear across case files
- **Business intelligence** — understanding relationships between companies, people, and products
- **Research synthesis** — finding thematic connections across a corpus

## Beyond visualization: analytics

A graph by itself is hard to read at scale. With thousands of nodes, the visualization is impressive but not directly informative. Daneel adds an [analytics layer](/concepts/graph-analytics/) on top of the graph that summarizes it into actionable insights:

- **Key Entities** — entities that are structurally important via PageRank
- **Topics** — clusters of entities that travel together (Louvain communities)
- **Bridges** — entities that connect otherwise separate parts of the graph (betweenness centrality)
- **Paths** — shortest connection chains between any two entities (Dijkstra)
- **Graph Health** — fragmentation diagnostics + possible duplicate detection

The analytics layer also enables **one-click Wikipedia lookup**: clicking any entity in the graph triggers a search and lets you read articles directly inside the document viewer.

## Limitations

- **Co-occurrence is not causation.** Two entities appearing in the same paragraph doesn't mean they're directly related. The graph shows proximity, not meaning.
- **NER quality depends on the model.** The int8 model is fast but occasionally misidentifies entities or misses subtle ones. The fp32 model is more accurate but larger.
- **English bias.** The English-only models work best on English text. For other languages, use the multilingual variants.
- **No relationship typing.** Edges don't have labels like "works at" or "located in" — they just indicate co-occurrence strength.
