Documentation Index Fetch the complete documentation index at: https://docs.langchain.com/llms.txt
Use this file to discover all available pages before exploring further.
Document loaders provide a standard interface for reading data from different sources (such as Slack, Notion, or Google Drive) into LangChain’s Document format.
This ensures that data can be handled consistently regardless of the source.
All document loaders implement the BaseLoader interface.
Interface
Each document loader may define its own parameters, but they share a common API:
load() – Loads all documents at once.
lazy_load() – Streams documents lazily, useful for large datasets.
from langchain_community . document_loaders . csv_loader import CSVLoader
loader = CSVLoader (
... # Integration-specific parameters here
)
# Load all documents
documents = loader . load ()
# For large datasets, lazily load documents
for document in loader . lazy_load ():
print ( document )
By category
Webpages
The below document loaders allow you to load webpages.
Document Loader Description Package/API Web Uses urllib and BeautifulSoup to load and parse HTML web pages Package Unstructured Uses Unstructured to load and parse web pages Package RecursiveURL Recursively scrapes all child links from a root URL Package Sitemap Scrapes all pages on a given sitemap Package Spider Crawler and scraper that returns LLM-ready data API Firecrawl API service that can be deployed locally API Apify Dataset Load documents from Apify datasets API Docling Uses Docling to load and parse web pages Package Hyperbrowser Platform for running and scaling headless browsers, can be used to scrape/crawl any site API AgentQL Web interaction and structured data extraction from any web page using an AgentQL query or a Natural Language prompt API
PDFs
The below document loaders allow you to load PDF documents.
Document Loader Description Package/API PyPDF Uses pypdf to load and parse PDFs Package Unstructured Uses Unstructured’s open source library to load PDFs Package Amazon Textract Uses AWS API to load PDFs API MathPix Uses MathPix to load PDFs Package PDFPlumber Load PDF files using PDFPlumber Package PyPDFDirectry Load a directory with PDF files Package PyPDFium2 Load PDF files using PyPDFium2 Package PyMuPDF Load PDF files using PyMuPDF Package PyMuPDF4LLM Load PDF content to Markdown using PyMuPDF4LLM Package PDFMiner Load PDF files using PDFMiner Package Upstage Document Parse Loader Load PDF files using UpstageDocumentParseLoader Package Docling Load PDF files using Docling Package UnDatasIO Load PDF files using UnDatasIO Package OpenDataLoader PDF Load PDF files using OpenDataLoader PDF Package
Cloud providers
The below document loaders allow you to load documents from your favorite cloud providers.
Document Loader Description Partner Package API reference AWS S3 Directory Load documents from an AWS S3 directory ❌ S3DirectoryLoaderAWS S3 File Load documents from an AWS S3 file ❌ S3FileLoaderAzure AI Data Load documents from Azure AI services ❌ AzureAIDataLoaderAzure Blob Storage Load documents from Azure Blob Storage ✅ AzureBlobStorageLoaderDropbox Load documents from Dropbox ❌ DropboxLoaderGoogle Cloud Storage Directory Load documents from GCS bucket ✅ GCSDirectoryLoaderGoogle Cloud Storage File Load documents from GCS file object ✅ GCSFileLoaderGoogle Drive Load documents from Google Drive (Google Docs only) ✅ GoogleDriveLoaderHuawei OBS Directory Load documents from Huawei Object Storage Service Directory ❌ OBSDirectoryLoaderHuawei OBS File Load documents from Huawei Object Storage Service File ❌ OBSFileLoaderMicrosoft OneDrive Load documents from Microsoft OneDrive ❌ OneDriveLoaderMicrosoft SharePoint Load documents from Microsoft SharePoint ❌ SharePointLoaderTencent COS Directory Load documents from Tencent Cloud Object Storage Directory ❌ TencentCOSDirectoryLoaderTencent COS File Load documents from Tencent Cloud Object Storage File ❌ TencentCOSFileLoader
The below document loaders allow you to load documents from different social media platforms.
Messaging services
The below document loaders allow you to load data from different messaging platforms.
The below document loaders allow you to load data from commonly used productivity tools.
Common file types
The below document loaders allow you to load data from common data formats.
All document loaders
AssemblyAI Audio Transcripts
Azure AI Document Intelligence
Google AlloyDB for PostgreSQL
Google Cloud SQL for SQL Server
Google Cloud SQL for MySQL
Google Cloud SQL for PostgreSQL
Google Cloud Storage Directory
Google Cloud Storage File
Google Firestore in Datastore Mode
Google El Carro for Oracle Workloads
Google Firestore (Native Mode)
Google Memorystore for Redis
Open Document Format (ODT)
Oracle Autonomous Database
Pebblo Safe DocumentLoader
ReadTheDocs Documentation
UnstructuredMarkdownLoader