How to Chat with a PDF

Daneel detects when Chrome is displaying a PDF and automatically extracts its text, letting you chat with the document, copy its content as Markdown, or save it to a vault.

Steps

Open any PDF in Chrome (click a link, paste a URL, or navigate directly — e.g. arxiv.org/pdf/2601.00162).
The Daneel widget appears in the corner, just like on any web page.
Open the chat panel (sparkles icon). The mode button shows PDF instead of Page, and a green status bar confirms how much text was extracted.
Ask a question about the document:

Summarize the main contributions of this paper in bullet points.

The AI receives the extracted text as context and responds based on the PDF content.

Quick actions

Action	How
Copy as Markdown	Single-click the Markdown button on the launcher — PDF text is copied to your clipboard.
Download as Markdown	Double-click the Markdown button — saves a `.md` file named `daneel.{title}.{timestamp}.md`.
Save to Vault	Click + Vault in the chat panel, pick a vault, and the PDF is imported with a descriptive filename (`{hostname}.{path}.{timestamp}.pdf.md`).

How it works

Chrome’s modern PDF viewer (OOPIF, Chrome 126+) renders PDFs at the original URL rather than redirecting to an internal chrome-extension:// page. This means Daneel’s widget can inject normally.

When the widget detects a PDF page, it:

Detects the PDF via three signals: the pdfoopifenabled attribute on <html> (set by Chrome’s OOPIF viewer), document.contentType, or a .pdf URL suffix.
Fetches the PDF binary through the background service worker proxy (bypasses CORS restrictions).
Extracts structured Markdown using EdgeParse WASM, preserving headings, tables, and reading order.
Caches the result so subsequent questions reuse the same extraction.

The extracted Markdown flows into the same prompt pipeline as any other page — context selection, prompt building, and streaming to whichever AI provider you have active.

What works differently on PDF pages

Site mode is disabled. A PDF has no sitemap or crawlable structure, so the Site toggle is hidden.
Page title comes from the URL. Chrome’s PDF viewer leaves document.title empty, so Daneel derives a display title from the URL path (e.g., 2601.00162 from arxiv.org/pdf/2601.00162).
DOM extraction is skipped. The PDF viewer wraps its content in a closed shadow root that cannot be read. Daneel fetches the PDF binary directly instead of parsing the DOM.

Limitations

Scanned PDFs (image-only, no selectable text) cannot be extracted. Daneel will show an error if every page contains fewer than 20 characters.
Very large PDFs work but may take a few seconds to fetch and extract. The context selection algorithm trims the text to fit the model’s token budget.
file:// PDFs require granting Daneel file access in Chrome’s extension settings — this is not enabled by default.

Next steps

Build a Document Vault to organize and search across multiple PDFs
How RAG works explains the chunking and search pipeline behind document Q&A
Your First Page Chat covers the general chat flow that PDFs build on

PDF extraction is powered by EdgeParse by Raphaël Mansuy. Apache 2.0 licensed.