Knowledge Graphs
A knowledge graph is a structured representation of the entities (people, organizations, places, concepts) in your documents and the relationships between them. Daneel builds knowledge graphs from vault documents to help you see connections that aren’t obvious from reading individual files.
To build one, follow How to Build a Knowledge Graph. To explore one with analytics and Wikipedia lookup, follow How to Explore Your Knowledge Graph. For the analytics layer’s underlying ideas (importance, topics, bridges, paths), see Graph Analytics. For configuration details, see Settings > Knowledge Graph.
What it does
Section titled “What it does”When you enable the knowledge graph on a vault, Daneel:
- Reads every document in the vault
- Extracts named entities using a local NER (Named Entity Recognition) model
- Resolves duplicates (“OpenAI”, “Open AI”, “OPENAI” become one entity)
- Identifies relationships based on co-occurrence within the same text passages
- Builds an interactive 3D graph you can explore visually
The result is a map of your documents’ key concepts and how they connect.
Named Entity Recognition (NER)
Section titled “Named Entity Recognition (NER)”NER is the process of identifying and classifying named things in text. Given the sentence:
“Satya Nadella announced that Microsoft would invest $10 billion in OpenAI.”
A NER model extracts:
- Satya Nadella — Person
- Microsoft — Organization
- $10 billion — Financial value
- OpenAI — Organization
Daneel uses GLiNER, an ONNX-based NER model that runs entirely in your browser via a dedicated web worker. No text is sent to any server for entity extraction.
The model comes in four variants, trading size for accuracy and language support:
| Model | Size | Languages | Best for |
|---|---|---|---|
| GLiNER Small v2.1 (fp32) | 583 MB | English | Maximum accuracy |
| GLiNER Small v2.1 (int8) | 183 MB | English | Good balance (default) |
| GLiNER Multi v2.1 (int8) | 349 MB | Multilingual | Non-English documents |
| GLiNER Multi v2.1 (fp16) | 580 MB | Multilingual | Best multilingual accuracy |
Entity resolution
Section titled “Entity resolution”Raw NER output contains duplicates. “IBM”, “I.B.M.”, and “International Business Machines” might all refer to the same entity. Daneel’s EntityResolver deduplicates using normalized string matching — comparing lowercased, whitespace-collapsed versions of entity names and merging those above a similarity threshold (default: 85%).
This is a heuristic, not perfect. It handles case variations and minor formatting differences well, but won’t merge “IBM” and “Big Blue” (which would require semantic understanding). You can adjust the threshold in settings — lower values merge more aggressively, higher values are more conservative.
Ontology presets
Section titled “Ontology presets”An ontology defines what types of entities the NER model looks for. Different domains have different relevant entity types. Daneel ships with 8 presets:
- General — people, organizations, places, events, concepts
- Academic — researchers, institutions, theories, publications
- Legal — cases, statutes, courts, parties
- Medical — conditions, treatments, drugs, anatomy
- Programming — languages, frameworks, APIs, data structures
- Business — companies, products, markets, financials
- Travel — destinations, landmarks, transport, accommodations
- History — historical figures, battles, treaties, eras
You can also define custom ontology labels for specialized domains. The ontology is configured per-vault, so a legal vault and a programming vault can use different entity types.
Relationships
Section titled “Relationships”Daneel infers relationships from co-occurrence: if two entities appear in the same text passage, they’re likely related. The more often they co-occur, the stronger the relationship.
This is simpler than hand-curated knowledge graphs (like Wikidata) where relationships have explicit types (“works for”, “located in”). But for document analysis, co-occurrence captures the important signal: these things are discussed together.
The visualization
Section titled “The visualization”The 3D graph uses WebGL rendering (via ngraph):
- Nodes are entities, sized by mention frequency
- Edges are relationships, weighted by co-occurrence strength
- Colors map to entity types (configurable per type)
- Physics simulation positions nodes — related entities cluster together, unrelated ones drift apart
You can rotate, zoom, and hover to explore. Customizable parameters include charge strength (node repulsion), link opacity, particle animations, and bloom glow effects.
Why it helps
Section titled “Why it helps”Reading 50 documents individually, you might miss that three different papers all mention the same researcher, or that a concept from document A is the foundation for the technique described in document D. The knowledge graph surfaces these cross-document connections visually.
It’s most useful for:
- Literature reviews — mapping the landscape of who studies what
- Legal discovery — seeing which entities appear across case files
- Business intelligence — understanding relationships between companies, people, and products
- Research synthesis — finding thematic connections across a corpus
Beyond visualization: analytics
Section titled “Beyond visualization: analytics”A graph by itself is hard to read at scale. With thousands of nodes, the visualization is impressive but not directly informative. Daneel adds an analytics layer on top of the graph that summarizes it into actionable insights:
- Key Entities — entities that are structurally important via PageRank
- Topics — clusters of entities that travel together (Louvain communities)
- Bridges — entities that connect otherwise separate parts of the graph (betweenness centrality)
- Paths — shortest connection chains between any two entities (Dijkstra)
- Graph Health — fragmentation diagnostics + possible duplicate detection
The analytics layer also enables one-click Wikipedia lookup: clicking any entity in the graph triggers a search and lets you read articles directly inside the document viewer.
Limitations
Section titled “Limitations”- Co-occurrence is not causation. Two entities appearing in the same paragraph doesn’t mean they’re directly related. The graph shows proximity, not meaning.
- NER quality depends on the model. The int8 model is fast but occasionally misidentifies entities or misses subtle ones. The fp32 model is more accurate but larger.
- English bias. The English-only models work best on English text. For other languages, use the multilingual variants.
- No relationship typing. Edges don’t have labels like “works at” or “located in” — they just indicate co-occurrence strength.