
Daneel detects when Chrome is displaying a PDF and automatically extracts its text, letting you chat with the document, copy its content as Markdown, or save it to a vault.

## Steps

1. Open any PDF in Chrome (click a link, paste a URL, or navigate directly — e.g. `arxiv.org/pdf/2601.00162`).
2. The Daneel widget appears in the corner, just like on any web page.
3. Open the chat panel (sparkles icon). The mode button shows **PDF** instead of *Page*, and a green status bar confirms how much text was extracted.
4. Ask a question about the document:

> *Summarize the main contributions of this paper in bullet points.*

The AI receives the extracted text as context and responds based on the PDF content.

## Quick actions

| Action | How |
|--------|-----|
| **Copy as Markdown** | Single-click the Markdown button on the launcher — PDF text is copied to your clipboard. |
| **Download as Markdown** | Double-click the Markdown button — saves a `.md` file named `daneel.{title}.{timestamp}.md`. |
| **Save to Vault** | Click *+ Vault* in the chat panel, pick a vault, and the PDF is imported with a descriptive filename (`{hostname}.{path}.{timestamp}.pdf.md`). |

## How it works

Chrome's modern PDF viewer (OOPIF, Chrome 126+) renders PDFs at the original URL rather than redirecting to an internal `chrome-extension://` page. This means Daneel's widget can inject normally.

When the widget detects a PDF page, it:

1. **Detects** the PDF via three signals: the `pdfoopifenabled` attribute on `<html>` (set by Chrome's OOPIF viewer), `document.contentType`, or a `.pdf` URL suffix.
2. **Fetches** the PDF binary through the background service worker proxy (bypasses CORS restrictions).
3. **Extracts** structured Markdown using [EdgeParse WASM](https://github.com/raphaelmansuy/edgeparse), preserving headings, tables, and reading order.
4. **Caches** the result so subsequent questions reuse the same extraction.

The extracted Markdown flows into the same prompt pipeline as any other page — context selection, prompt building, and streaming to whichever AI provider you have active.

## What works differently on PDF pages

- **Site mode is disabled.** A PDF has no sitemap or crawlable structure, so the *Site* toggle is hidden.
- **Page title comes from the URL.** Chrome's PDF viewer leaves `document.title` empty, so Daneel derives a display title from the URL path (e.g., `2601.00162` from `arxiv.org/pdf/2601.00162`).
- **DOM extraction is skipped.** The PDF viewer wraps its content in a closed shadow root that cannot be read. Daneel fetches the PDF binary directly instead of parsing the DOM.

## Limitations

- **Scanned PDFs** (image-only, no selectable text) cannot be extracted. Daneel will show an error if every page contains fewer than 20 characters.
- **Very large PDFs** work but may take a few seconds to fetch and extract. The context selection algorithm trims the text to fit the model's token budget.
- **`file://` PDFs** require granting Daneel file access in Chrome's extension settings — this is not enabled by default.

## Next steps

- [Build a Document Vault](/guides/vault/) to organize and search across multiple PDFs
- [How RAG works](/concepts/rag/) explains the chunking and search pipeline behind document Q&A
- [Your First Page Chat](/guides/first-page-chat/) covers the general chat flow that PDFs build on

---

*PDF extraction is powered by [EdgeParse](https://github.com/raphaelmansuy/edgeparse) by [Raphaël Mansuy](https://github.com/raphaelmansuy). Apache 2.0 licensed.*
