Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.langchain.com/llms.txt

Use this file to discover all available pages before exploring further.

Document loaders provide a standard interface for reading data from different sources (such as Slack, Notion, or Google Drive) into LangChain’s Document format. This ensures that data can be handled consistently regardless of the source. All document loaders implement the BaseLoader interface.

Interface

Each document loader may define its own parameters, but they share a common API:
  • load() – Loads all documents at once.
  • lazy_load() – Streams documents lazily, useful for large datasets.
from langchain_docling.loader import DoclingLoader

FILE_PATH = "https://arxiv.org/pdf/2408.09869"

loader = DoclingLoader(file_path=FILE_PATH)

# Load all documents
documents = loader.load()

# For large datasets, lazily load documents
for document in loader.lazy_load():
    print(document)

By category

Webpages

The below document loaders allow you to load webpages.
Document LoaderDescriptionPackage/API
UnstructuredUses Unstructured to load and parse web pagesPackage
Apify DatasetLoad documents from Apify datasetsAPI
DoclingUses Docling to load and parse web pagesPackage
HyperbrowserPlatform for running and scaling headless browsers, can be used to scrape/crawl any siteAPI
AgentQLWeb interaction and structured data extraction from any web page using an AgentQL query or a Natural Language promptAPI
BrowserbaseLoad webpages using managed headless browsers with stealth modeAPI

PDFs

The below document loaders allow you to load PDF documents.
Document LoaderDescriptionPackage/API
UnstructuredUses Unstructured’s open source library to load PDFsPackage
Upstage Document Parse LoaderLoad PDF files using UpstageDocumentParseLoaderPackage
DoclingLoad PDF files using DoclingPackage
UnDatasIOLoad PDF files using UnDatasIOPackage
OpenDataLoader PDFLoad PDF files using OpenDataLoader PDFPackage

Cloud providers

The below document loaders allow you to load documents from your favorite cloud providers.
Document LoaderDescriptionPartner PackageAPI reference
Google Cloud Storage DirectoryLoad documents from GCS bucketGCSDirectoryLoader
Google Cloud Storage FileLoad documents from GCS file objectGCSFileLoader
Google DriveLoad documents from Google Drive (Google Docs only)GoogleDriveLoader

Common file types

The below document loaders allow you to load data from common data formats.

All document loaders

AgentQLLoader

AirbyteLoader

Apify Dataset

AstraDB

Azure Blob Storage

Box

Browserbase

Copy Paste

Docling

Docugami

Google AlloyDB for PostgreSQL

Google BigQuery

Google Bigtable

Google Cloud SQL for SQL Server

Google Cloud SQL for MySQL

Google Cloud SQL for PostgreSQL

Google Cloud Storage Directory

Google Cloud Storage File

Google Firestore in Datastore Mode

Google Drive

Google El Carro for Oracle Workloads

Google Firestore (Native Mode)

Google Memorystore for Redis

Google Spanner

Google Speech-to-Text

HyperbrowserLoader

Kinetica

LangSmith

Near Blockchain

OpenDataLoader PDF

Oracle Autonomous Database

Oracle AI Database

Outline Document Loader

PaddleOCR-VL

Polaris AI DataInsight

Dell PowerScale

PyMuPDF4LLM

SingleStore

Soniox

UnDatasIO

Unstructured

Upstage

YoutubeLoaderDL