Document loader integrations - Docs by LangChain

Document loaders provide a standard interface for reading data from different sources (such as Slack, Notion, or Google Drive) into LangChain’s Document format. This ensures that data can be handled consistently regardless of the source. All document loaders implement the BaseLoader interface.

Community document loaders are user-contributed and unverified. LangChain does not review or endorse these integrations; use them at your own risk.

Interface

Each document loader may define its own parameters, but they share a common API:

load() – Loads all documents at once.
lazy_load() – Streams documents lazily, useful for large datasets.

from langchain_docling.loader import DoclingLoader

FILE_PATH = "https://arxiv.org/pdf/2408.09869"

loader = DoclingLoader(file_path=FILE_PATH)

# Load all documents
documents = loader.load()

# For large datasets, lazily load documents
for document in loader.lazy_load():
    print(document)

By category

Productivity tools

The below document loaders allow you to load data from commonly used productivity tools.

Document Loader	API reference
AgentMail	`AgentMailLoader`

Webpages

The below document loaders allow you to load webpages.

Document Loader	Description	Package/API
Unstructured	Uses Unstructured to load and parse web pages	Package
Apify Dataset	Load documents from Apify datasets	API
Docling	Uses Docling to load and parse web pages	Package
Hyperbrowser	Platform for running and scaling headless browsers, can be used to scrape/crawl any site	API
AgentQL	Web interaction and structured data extraction from any web page using an AgentQL query or a Natural Language prompt	API
Browserbase	Load webpages using managed headless browsers with stealth mode	API

PDFs

The below document loaders allow you to load PDF documents.

Document Loader	Description	Package/API
Unstructured	Uses Unstructured’s open source library to load PDFs	Package
Upstage Document Parse Loader	Load PDF files using UpstageDocumentParseLoader	Package
Docling	Load PDF files using Docling	Package
UnDatasIO	Load PDF files using UnDatasIO	Package
OpenDataLoader PDF	Load PDF files using OpenDataLoader PDF	Package

Cloud providers

The below document loaders allow you to load documents from your favorite cloud providers.

Document Loader	Description	Partner Package	API reference
Google Cloud Storage Directory	Load documents from GCS bucket	✅	`GCSDirectoryLoader`
Google Cloud Storage File	Load documents from GCS file object	✅	`GCSFileLoader`
Google Drive	Load documents from Google Drive (Google Docs only)	✅	`GoogleDriveLoader`

Common file types

The below document loaders allow you to load data from common data formats.

Document Loader	Data Type
`Unstructured`	Many file types (see https://docs.unstructured.io/platform/supported-file-types)
`DoclingLoader`	Various file types (see https://ds4sd.github.io/docling/)
`PolarisAIDataInsightLoader`	Various file types (see https://datainsight.polarisoffice.com/documentation?docType=doc_extract)

All document loaders

AgentMail

AgentQLLoader

AirbyteLoader

Apify Dataset

AstraDB

Azure Blob Storage

Box

Browserbase

Copy Paste

Docling

Docugami

Google AlloyDB for PostgreSQL

Google BigQuery

Google Bigtable

Google Cloud SQL for SQL Server

Google Cloud SQL for MySQL

Google Cloud SQL for PostgreSQL

Google Cloud Storage Directory

Google Cloud Storage File

Google Firestore in Datastore Mode

Google Drive

Google El Carro for Oracle Workloads

Google Firestore (Native Mode)

Google Memorystore for Redis

Google Spanner

Google Speech-to-Text

HyperbrowserLoader

Kinetica

LangSmith

Near Blockchain

OpenDataLoader PDF

Oracle Autonomous Database

Oracle AI Database

Outline Document Loader

PaddleOCR-VL

Polaris AI DataInsight

Dell PowerScale

PyMuPDF4LLM

SingleStore

Soniox

UnDatasIO

Unstructured

Upstage

YoutubeLoaderDL

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Edit this page on GitHub or file an issue.

​Interface

​By category

​Productivity tools

​Webpages

​PDFs

​Cloud providers

​Common file types

​All document loaders

AgentMail

AgentQLLoader

AirbyteLoader

Apify Dataset

AstraDB

Azure Blob Storage

Box

Browserbase

Copy Paste

Docling

Docugami

Google AlloyDB for PostgreSQL

Google BigQuery

Google Bigtable

Google Cloud SQL for SQL Server

Google Cloud SQL for MySQL

Google Cloud SQL for PostgreSQL

Google Cloud Storage Directory

Google Cloud Storage File

Google Firestore in Datastore Mode

Google Drive

Google El Carro for Oracle Workloads

Google Firestore (Native Mode)

Google Memorystore for Redis

Google Spanner

Google Speech-to-Text

HyperbrowserLoader

Kinetica

LangSmith

Near Blockchain

OpenDataLoader PDF

Oracle Autonomous Database

Oracle AI Database

Outline Document Loader

PaddleOCR-VL

Polaris AI DataInsight

Dell PowerScale

PyMuPDF4LLM

SingleStore

Soniox

UnDatasIO

Unstructured

Upstage

YoutubeLoaderDL

Interface

By category

Productivity tools

Webpages

PDFs

Cloud providers

Common file types

All document loaders