Interface
Each document loader may define its own parameters, but they share a common API:.load(): Loads all documents at once..loadAndSplit(): Loads all documents at once and splits them into smaller documents.
By category
LangChain.js categorizes document loaders in two different ways:- File loaders, which load data into LangChain formats from your local filesystem.
- Web loaders, which load data from remote sources.
File loaders
If you’d like to contribute an integration, see Contributing integrations.
PDFs
| Document Loader | Description | Package/API |
|---|---|---|
| PDFLoader | Load and parse PDF files using pdf-parse | Package |
Common File Types
| Document Loader | Description | Package/API |
|---|---|---|
| CSV | Load data from CSV files with configurable column extraction | Package |
| JSON | Load JSON files using JSON pointer to target specific keys | Package |
| JSONLines | Load data from JSONLines/JSONL files | Package |
| Text | Load plain text files | Package |
| DOCX | Load Microsoft Word documents (.docx and .doc formats) | Package |
| EPUB | Load EPUB files with optional chapter splitting | Package |
| PPTX | Load PowerPoint presentations | Package |
| Subtitles | Load subtitle files (.srt format) | Package |
Specialized File Loaders
| Document Loader | Description | Package/API |
|---|---|---|
| DirectoryLoader | Load all files from a directory with custom loader mappings | Package |
| UnstructuredLoader | Load multiple file types using Unstructured API | API |
| MultiFileLoader | Load data from multiple individual file paths | Package |
| ChatGPT | Load ChatGPT conversation exports | Package |
| Notion Markdown | Load Notion pages exported as Markdown | Package |
| OpenAI Whisper Audio | Transcribe audio files using OpenAI Whisper API | API |
Web loaders
Webpages
| Document Loader | Description | Web Support | Package/API |
|---|---|---|---|
| Cheerio | Load webpages using Cheerio (lightweight, no JavaScript execution) | ✅ | Package |
| Playwright | Load dynamic webpages using Playwright (supports JavaScript rendering) | ❌ | Package |
| Puppeteer | Load dynamic webpages using Puppeteer (headless Chrome) | ❌ | Package |
| FireCrawl | Crawl and convert websites into LLM-ready markdown | ✅ | API |
| Spider | Fast crawler that converts websites into HTML, markdown, or text | ✅ | API |
| RecursiveUrlLoader | Recursively load webpages following links | ❌ | Package |
| Sitemap | Load all pages from a sitemap.xml | ✅ | Package |
| Browserbase | Load webpages using managed headless browsers with stealth mode | ✅ | API |
| WebPDFLoader | Load PDF files in web environments | ✅ | Package |
Cloud Providers
| Document Loader | Description | Web Support | Package/API |
|---|---|---|---|
| S3 | Load files from AWS S3 buckets | ❌ | Package |
| Azure Blob Storage Container | Load all files from Azure Blob Storage container | ❌ | Package |
| Azure Blob Storage File | Load individual files from Azure Blob Storage | ❌ | Package |
| Google Cloud Storage | Load files from Google Cloud Storage buckets | ❌ | Package |
| Google Cloud SQL for PostgreSQL | Load documents from Cloud SQL PostgreSQL databases | ✅ | Package |
Productivity Tools
| Document Loader | Description | Web Support | Package/API |
|---|---|---|---|
| Notion API | Load Notion pages and databases via API | ✅ | API |
| Figma | Load Figma file data | ✅ | API |
| Confluence | Load pages from Confluence spaces | ❌ | API |
| GitHub | Load files from GitHub repositories | ✅ | API |
| GitBook | Load GitBook documentation pages | ✅ | Package |
| Jira | Load issues from Jira projects | ❌ | API |
| Airtable | Load records from Airtable bases | ✅ | API |
| Taskade | Load Taskade project data | ✅ | API |
Search & Data APIs
| Document Loader | Description | Web Support | Package/API |
|---|---|---|---|
| SearchAPI | Load web search results from SearchAPI (Google, YouTube, etc.) | ✅ | API |
| SerpAPI | Load web search results from SerpAPI | ✅ | API |
| Apify Dataset | Load scraped data from Apify platform | ✅ | API |
Audio & Video
| Document Loader | Description | Web Support | Package/API |
|---|---|---|---|
| YouTube | Load YouTube video transcripts | ✅ | Package |
| AssemblyAI | Transcribe audio and video files using AssemblyAI API | ✅ | API |
| Sonix | Transcribe audio files using Sonix API | ❌ | API |
Other
| Document Loader | Description | Web Support | Package/API |
|---|---|---|---|
| Couchbase | Load documents from Couchbase database using SQL++ queries | ✅ | Package |
| LangSmith | Load datasets and traces from LangSmith | ✅ | API |
| Hacker News | Load Hacker News threads and comments | ✅ | Package |
| IMSDB | Load movie scripts from Internet Movie Script Database | ✅ | Package |
| College Confidential | Load college information from College Confidential | ✅ | Package |
| Blockchain Data | Load blockchain data (NFTs, transactions) via Sort.xyz API | ✅ | API |