Documentation Index Fetch the complete documentation index at: https://docs.langchain.com/llms.txt
Use this file to discover all available pages before exploring further.
Document loaders provide a standard interface for reading data from different sources (such as Slack, Notion, or Google Drive) into LangChain’s Document format.
This ensures that data can be handled consistently regardless of the source.
All document loaders implement the BaseLoader interface.
Interface
Each document loader may define its own parameters, but they share a common API:
load(): Loads all documents at once.
loadAndSplit(): Loads all documents at once and splits them into smaller documents.
import { CSVLoader } from "@langchain/community/document_loaders/fs/csv" ;
const loader = new CSVLoader (
... // <-- Integration specific parameters here
) ;
const data = await loader . load () ;
By category
LangChain.js categorizes document loaders in two different ways:
File loaders , which load data into LangChain formats from your local filesystem.
Web loaders , which load data from remote sources.
File loaders
PDFs
Document Loader Description Package/API PDFLoaderLoad and parse PDF files using pdf-parse Package
Common file types
Document Loader Description Package/API CSV Load data from CSV files with configurable column extraction Package JSON Load JSON files using JSON pointer to target specific keys Package JSONLinesLoad data from JSONLines/JSONL files Package TextLoad plain text files Package DOCXLoad Microsoft Word documents (.docx and .doc formats) Package EPUBLoad EPUB files with optional chapter splitting Package PPTXLoad PowerPoint presentations Package SubtitlesLoad subtitle files (.srt format) Package
Specialized file loaders
Document Loader Description Package/API DirectoryLoaderLoad all files from a directory with custom loader mappings Package UnstructuredLoaderLoad multiple file types using Unstructured API API MultiFileLoaderLoad data from multiple individual file paths Package ChatGPTLoad ChatGPT conversation exports Package Notion Markdown Load Notion pages exported as Markdown Package OracleDocLoaderIngest Oracle AI Vector Search tables or Oracle Text-supported files Package OpenAI Whisper Audio Transcribe audio files using OpenAI Whisper API API
Web loaders
Webpages
Document Loader Description Web Support Package/API CheerioLoad webpages using Cheerio (lightweight, no JavaScript execution) ✅ Package PlaywrightLoad dynamic webpages using Playwright (supports JavaScript rendering) ❌ Package PuppeteerLoad dynamic webpages using Puppeteer (headless Chrome) ❌ Package FireCrawlCrawl and convert websites into LLM-ready markdown ✅ API SpiderFast crawler that converts websites into HTML, markdown, or text ✅ API RecursiveUrlLoaderRecursively load webpages following links ❌ Package SitemapLoad all pages from a sitemap.xml ✅ Package BrowserbaseLoad webpages using managed headless browsers with stealth mode ✅ API WebPDFLoaderLoad PDF files in web environments ✅ Package
Cloud providers
Document Loader Description Web Support Package/API S3Load files from AWS S3 buckets ❌ Package Azure Blob Storage Container Load all files from Azure Blob Storage container ❌ Package Azure Blob Storage File Load individual files from Azure Blob Storage ❌ Package Google Cloud Storage Load files from Google Cloud Storage buckets ❌ Package Google Cloud SQL for PostgreSQL Load documents from Cloud SQL PostgreSQL databases ✅ Package
Document Loader Description Web Support Package/API Notion API Load Notion pages and databases via API ✅ API FigmaLoad Figma file data ✅ API ConfluenceLoad pages from Confluence spaces ❌ API GitHubLoad files from GitHub repositories ✅ API GitBookLoad GitBook documentation pages ✅ Package JiraLoad issues from Jira projects ❌ API AirtableLoad records from Airtable bases ✅ API TaskadeLoad Taskade project data ✅ API
Search & data APIs
Document Loader Description Web Support Package/API SearchAPILoad web search results from SearchAPI (Google, YouTube, etc.) ✅ API SerpApiLoad web search results from SerpApi ✅ API Apify Dataset Load scraped data from Apify platform ✅ API
Audio & video
Document Loader Description Web Support Package/API YouTubeLoad YouTube video transcripts ✅ Package AssemblyAITranscribe audio and video files using AssemblyAI API ✅ API SonioxTranscribe multilingual audio files with optional translation using Soniox API ✅ API SonixTranscribe audio files using Sonix API ❌ API
Other
Document Loader Description Web Support Package/API CouchbaseLoad documents from Couchbase database using SQL++ queries ✅ Package LangSmithLoad datasets and traces from LangSmith ✅ API Hacker News Load Hacker News threads and comments ✅ Package IMSDBLoad movie scripts from Internet Movie Script Database ✅ Package College Confidential Load college information from College Confidential ✅ Package Blockchain Data Load blockchain data (NFTs, transactions) via Sort.xyz API ✅ API
All document loaders
AssemblyAI Audio Transcript
Azure Blob Storage Container
Google Cloud SQL for PostgreSQL