How to Chat with a PDF
Daneel detects when Chrome is displaying a PDF and automatically extracts its text, letting you chat with the document, copy its content as Markdown, or save it to a vault.
- Open any PDF in Chrome (click a link, paste a URL, or navigate directly — e.g.
arxiv.org/pdf/2601.00162). - The Daneel widget appears in the corner, just like on any web page.
- Open the chat panel (sparkles icon). The mode button shows PDF instead of Page, and a green status bar confirms how much text was extracted.
- Ask a question about the document:
Summarize the main contributions of this paper in bullet points.
The AI receives the extracted text as context and responds based on the PDF content.
Quick actions
Section titled “Quick actions”| Action | How |
|---|---|
| Copy as Markdown | Single-click the Markdown button on the launcher — PDF text is copied to your clipboard. |
| Download as Markdown | Double-click the Markdown button — saves a .md file named daneel.{title}.{timestamp}.md. |
| Save to Vault | Click + Vault in the chat panel, pick a vault, and the PDF is imported with a descriptive filename ({hostname}.{path}.{timestamp}.pdf.md). |
How it works
Section titled “How it works”Chrome’s modern PDF viewer (OOPIF, Chrome 126+) renders PDFs at the original URL rather than redirecting to an internal chrome-extension:// page. This means Daneel’s widget can inject normally.
When the widget detects a PDF page, it:
- Detects the PDF via three signals: the
pdfoopifenabledattribute on<html>(set by Chrome’s OOPIF viewer),document.contentType, or a.pdfURL suffix. - Fetches the PDF binary through the background service worker proxy (bypasses CORS restrictions).
- Extracts structured Markdown using EdgeParse WASM, preserving headings, tables, and reading order.
- Caches the result so subsequent questions reuse the same extraction.
The extracted Markdown flows into the same prompt pipeline as any other page — context selection, prompt building, and streaming to whichever AI provider you have active.
What works differently on PDF pages
Section titled “What works differently on PDF pages”- Site mode is disabled. A PDF has no sitemap or crawlable structure, so the Site toggle is hidden.
- Page title comes from the URL. Chrome’s PDF viewer leaves
document.titleempty, so Daneel derives a display title from the URL path (e.g.,2601.00162fromarxiv.org/pdf/2601.00162). - DOM extraction is skipped. The PDF viewer wraps its content in a closed shadow root that cannot be read. Daneel fetches the PDF binary directly instead of parsing the DOM.
Limitations
Section titled “Limitations”- Scanned PDFs (image-only, no selectable text) cannot be extracted. Daneel will show an error if every page contains fewer than 20 characters.
- Very large PDFs work but may take a few seconds to fetch and extract. The context selection algorithm trims the text to fit the model’s token budget.
file://PDFs require granting Daneel file access in Chrome’s extension settings — this is not enabled by default.
Next steps
Section titled “Next steps”- Build a Document Vault to organize and search across multiple PDFs
- How RAG works explains the chunking and search pipeline behind document Q&A
- Your First Page Chat covers the general chat flow that PDFs build on
PDF extraction is powered by EdgeParse by Raphaël Mansuy. Apache 2.0 licensed.