Overview
Integration details
| Class | Package | Local | Serializable | JS support |
|---|---|---|---|---|
| OpenDataLoader PDF | langchain-opendataloader-pdf | ✅ | ❌ | ❌ |
Loader features
| Source | Document Lazy Loading | Native Async Support |
|---|---|---|
| OpenDataLoaderPDFLoader | ✅ | ❌ |
OpenDataLoaderPDFLoader component enables you to parse PDFs into structured Document objects.
Requirements
- Python >= 3.9
- Java 11 or newer available on the system
PATH - opendataloader-pdf >= 1.1.1
Installation
Quick start
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file_path | List[str] | ✅ Yes | — | One or more PDF file paths or directories to process. |
format | str | No | None | Output formats (e.g. "json", "html", "markdown", "text"). |
quiet | bool | No | False | Suppresses CLI logging output when True. |
content_safety_off | Optional[List[str]] | No | None | List of content safety filters to disable (e.g. "all", "hidden-text", "off-page", "tiny", "hidden-ocg"). |
Additional Resources
- LangChain OpenDataLoader PDF integration GitHub
- LangChain OpenDataLoader PDF integration PyPI package
- OpenDataLoader PDF GitHub
- OpenDataLoader PDF Homepage
Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.