Knowledge Graphs

A knowledge graph is a structured representation of the entities (people, organizations, places, concepts) in your documents and the relationships between them. Daneel builds knowledge graphs from vault documents to help you see connections that aren’t obvious from reading individual files.

To build one, follow How to Build a Knowledge Graph. To explore one with analytics and Wikipedia lookup, follow How to Explore Your Knowledge Graph. For the analytics layer’s underlying ideas (importance, topics, bridges, paths), see Graph Analytics. For configuration details, see Settings > Knowledge Graph.

What it does

When you enable the knowledge graph on a vault, Daneel:

Reads every document in the vault
Extracts named entities using a local NER (Named Entity Recognition) model
Resolves duplicates (“OpenAI”, “Open AI”, “OPENAI” become one entity)
Identifies relationships based on co-occurrence within the same text passages
Builds an interactive 3D graph you can explore visually

The result is a map of your documents’ key concepts and how they connect.

Named Entity Recognition (NER)

NER is the process of identifying and classifying named things in text. Given the sentence:

“Satya Nadella announced that Microsoft would invest $10 billion in OpenAI.”

A NER model extracts:

Satya Nadella — Person
Microsoft — Organization
$10 billion — Financial value
OpenAI — Organization

Daneel uses GLiNER, an ONNX-based NER model that runs entirely in your browser via a dedicated web worker. No text is sent to any server for entity extraction.

The model comes in four variants, trading size for accuracy and language support:

Model	Size	Languages	Best for
GLiNER Small v2.1 (fp32)	583 MB	English	Maximum accuracy
GLiNER Small v2.1 (int8)	183 MB	English	Good balance (default)
GLiNER Multi v2.1 (int8)	349 MB	Multilingual	Non-English documents
GLiNER Multi v2.1 (fp16)	580 MB	Multilingual	Best multilingual accuracy

Entity resolution

Raw NER output contains duplicates. “IBM”, “I.B.M.”, and “International Business Machines” might all refer to the same entity. Daneel’s EntityResolver deduplicates using normalized string matching — comparing lowercased, whitespace-collapsed versions of entity names and merging those above a similarity threshold (default: 85%).

This is a heuristic, not perfect. It handles case variations and minor formatting differences well, but won’t merge “IBM” and “Big Blue” (which would require semantic understanding). You can adjust the threshold in settings — lower values merge more aggressively, higher values are more conservative.

Ontology presets

An ontology defines what types of entities the NER model looks for. Different domains have different relevant entity types. Daneel ships with 8 presets:

General — people, organizations, places, events, concepts
Academic — researchers, institutions, theories, publications
Legal — cases, statutes, courts, parties
Medical — conditions, treatments, drugs, anatomy
Programming — languages, frameworks, APIs, data structures
Business — companies, products, markets, financials
Travel — destinations, landmarks, transport, accommodations
History — historical figures, battles, treaties, eras

You can also define custom ontology labels for specialized domains. The ontology is configured per-vault, so a legal vault and a programming vault can use different entity types.

Relationships

Daneel infers relationships from co-occurrence: if two entities appear in the same text passage, they’re likely related. The more often they co-occur, the stronger the relationship.

This is simpler than hand-curated knowledge graphs (like Wikidata) where relationships have explicit types (“works for”, “located in”). But for document analysis, co-occurrence captures the important signal: these things are discussed together.

The visualization

The 3D graph uses WebGL rendering (via ngraph):

Nodes are entities, sized by mention frequency
Edges are relationships, weighted by co-occurrence strength
Colors map to entity types (configurable per type)
Physics simulation positions nodes — related entities cluster together, unrelated ones drift apart

You can rotate, zoom, and hover to explore. Customizable parameters include charge strength (node repulsion), link opacity, particle animations, and bloom glow effects.

Why it helps

Reading 50 documents individually, you might miss that three different papers all mention the same researcher, or that a concept from document A is the foundation for the technique described in document D. The knowledge graph surfaces these cross-document connections visually.

It’s most useful for:

Literature reviews — mapping the landscape of who studies what
Legal discovery — seeing which entities appear across case files
Business intelligence — understanding relationships between companies, people, and products
Research synthesis — finding thematic connections across a corpus

Beyond visualization: analytics

A graph by itself is hard to read at scale. With thousands of nodes, the visualization is impressive but not directly informative. Daneel adds an analytics layer on top of the graph that summarizes it into actionable insights:

Key Entities — entities that are structurally important via PageRank
Topics — clusters of entities that travel together (Louvain communities)
Bridges — entities that connect otherwise separate parts of the graph (betweenness centrality)
Paths — shortest connection chains between any two entities (Dijkstra)
Graph Health — fragmentation diagnostics + possible duplicate detection

The analytics layer also enables one-click Wikipedia lookup: clicking any entity in the graph triggers a search and lets you read articles directly inside the document viewer.

Limitations

Co-occurrence is not causation. Two entities appearing in the same paragraph doesn’t mean they’re directly related. The graph shows proximity, not meaning.
NER quality depends on the model. The int8 model is fast but occasionally misidentifies entities or misses subtle ones. The fp32 model is more accurate but larger.
English bias. The English-only models work best on English text. For other languages, use the multilingual variants.
No relationship typing. Edges don’t have labels like “works at” or “located in” — they just indicate co-occurrence strength.