PDF Reader
STDIOMCP server for secure PDF reading and text extraction within project context
MCP server for secure PDF reading and text extraction within project context
Empower your AI agents with the ability to securely read and extract information from PDF files using the Model Context Protocol (MCP).
exclusiveMinimum issue affecting Windsurf, Mistral API, and other toolsInstall automatically for Claude Desktop:
npx -y @smithery/cli install @sylphxltd/pdf-reader-mcp --client claude
Install the package:
pnpm add @sylphx/pdf-reader-mcp # or npm install @sylphx/pdf-reader-mcp
Configure your MCP client (e.g., Claude Desktop, Cursor):
{ "mcpServers": { "pdf-reader-mcp": { "command": "npx", "args": ["@sylphx/pdf-reader-mcp"] } } }
Important: Make sure your MCP client sets the correct working directory (cwd) to your project root.
git clone https://github.com/sylphlab/pdf-reader-mcp.git cd pdf-reader-mcp pnpm install pnpm run build
Then configure your MCP client to use node dist/index.js.
Once configured, your AI agent can read PDFs using the read_pdf tool:
{ "sources": [ { "path": "documents/report.pdf", "pages": [1, 2, 3] } ], "include_metadata": true }
{ "sources": [{ "path": "documents/report.pdf" }], "include_metadata": true, "include_page_count": true, "include_full_text": false }
{ "sources": [ { "url": "https://example.com/document.pdf" } ], "include_full_text": true }
{ "sources": [ { "path": "doc1.pdf", "pages": "1-5" }, { "path": "doc2.pdf" }, { "url": "https://example.com/doc3.pdf" } ], "include_full_text": true }
{ "sources": [ { "path": "presentation.pdf", "pages": [1, 2, 3] } ], "include_images": true, "include_full_text": true }
Response includes:
Note: Image extraction works best with JPEG and PNG images. Large PDFs with many images may produce large responses.
You can specify pages in multiple ways:
[1, 3, 5] (1-based indexing)"1-10" (extracts pages 1 through 10)"1-5,10-15,20" (commas separate ranges and individual pages)pages field to extract all pagesFor large PDF files (>20 MB), extract specific pages instead of the full document:
{ "sources": [ { "path": "large-document.pdf", "pages": "1-10" } ] }
This prevents hitting AI model context limits and improves performance.
Extract embedded images from PDF pages as base64-encoded data:
{ "sources": [{ "path": "document.pdf" }], "include_images": true }
Image data format:
{ "images": [ { "page": 1, "index": 0, "width": 800, "height": 600, "format": "rgb", "data": "base64-encoded-image-data..." } ] }
Supported formats:
Important considerations:
include_images: false (default) to extract text onlypages parameter to limit extraction scopeText and images are now returned in exact document order!
The server uses Y-coordinates from PDF.js to preserve the natural reading flow of the document. This means AI models receive content parts in the same sequence as they appear on the page.
Example document layout:
Page 1:
  [Heading text]
  [Image: Chart]
  [Description text]
  [Image: Photo A]
  [Image: Photo B]
  [Conclusion text]
Content parts returned:
[
  { type: "text", text: "Heading text" },
  { type: "image", data: "base64..." },  // Chart
  { type: "text", text: "Description text" },
  { type: "image", data: "base64..." },  // Photo A
  { type: "image", data: "base64..." },  // Photo B
  { type: "text", text: "Conclusion text" }
]
Benefits:
When is ordering applied?
include_images: trueImportant: The server only accepts relative paths for security reasons. Absolute paths are blocked to prevent unauthorized file system access.
✅ Good: "path": "documents/report.pdf"
❌ Bad: "path": "/Users/john/documents/report.pdf"
Solution: Configure the cwd (current working directory) in your MCP client settings.
Solution: Clear npm cache and reinstall:
npm cache clean --force npx @sylphx/pdf-reader-mcp@latest
Restart your MCP client completely after updating.
Causes:
Solution: Use relative paths and configure cwd in your MCP client:
{ "mcpServers": { "pdf-reader-mcp": { "command": "npx", "args": ["@sylphx/pdf-reader-mcp"], "cwd": "/path/to/your/project" } } }
Solution: Update to the latest version (all recent compatibility issues have been fixed):
npm update @sylphx/pdf-reader-mcp@latest
Then restart your editor completely.
Benchmarks on a standard PDF file:
| Operation | Ops/sec | Speed | 
|---|---|---|
| Handle Non-Existent File | ~12,933 | Fastest | 
| Get Full Text | ~5,575 | |
| Get Specific Page | ~5,329 | |
| Get Multiple Pages | ~5,242 | |
| Get Metadata & Page Count | ~4,912 | Slowest | 
Performance varies based on PDF complexity and system resources.
See Performance Documentation for details.
See Design Philosophy for more details.
git clone https://github.com/sylphlab/pdf-reader-mcp.git cd pdf-reader-mcp pnpm install
pnpm run build # Build TypeScript to dist/ pnpm run watch # Build in watch mode pnpm run test # Run tests pnpm run test:watch # Run tests in watch mode pnpm run test:cov # Run tests with coverage pnpm run check # Run Biome (lint + format check) pnpm run check:fix # Fix Biome issues automatically pnpm run lint # Lint with Biome pnpm run format # Format with Biome pnpm run typecheck # TypeScript type checking pnpm run benchmark # Run performance benchmarks pnpm run validate # Full validation (check + test)
We maintain high test coverage using Vitest:
pnpm run test # Run all tests pnpm run test:cov # Run with coverage report
All tests must pass before merging. Current: 31/31 tests passing ✅
The project uses Biome for fast, unified linting and formatting:
pnpm run check # Check code quality pnpm run check:fix # Auto-fix issues
We welcome contributions! Please:
git checkout -b feature/amazing-feature)pnpm run check:fix to format codeSee CONTRIBUTING.md for detailed guidelines.
If you find this project useful, please:
This project is licensed under the MIT License.
Made with ❤️ by Sylphx