DocumentLoaders load data into the standard LangChain Document format. Each DocumentLoader has its own specific parameters, but they can all be invoked in the same way with the .load method. An example use case is as follows:
from langchain_community.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(
    ...  # <-- Integration specific parameters here
)
data = loader.load()

Webpages

The below document loaders allow you to load webpages. See this guide for a starting point: How to: load web pages.
Document LoaderDescriptionPackage/API
WebUses urllib and BeautifulSoup to load and parse HTML web pagesPackage
UnstructuredUses Unstructured to load and parse web pagesPackage
RecursiveURLRecursively scrapes all child links from a root URLPackage
SitemapScrapes all pages on a given sitemapPackage
SpiderCrawler and scraper that returns LLM-ready dataAPI
FirecrawlAPI service that can be deployed locallyAPI
DoclingUses Docling to load and parse web pagesPackage
HyperbrowserPlatform for running and scaling headless browsers, can be used to scrape/crawl any siteAPI
AgentQLWeb interaction and structured data extraction from any web page using an AgentQL query or a Natural Language promptAPI
OxylabsWeb intelligence platform enabling the access to various data sourcesAPI

PDFs

The below document loaders allow you to load PDF documents. See this guide for a starting point: How to: load PDF files.
Document LoaderDescriptionPackage/API
PyPDFUses pypdf to load and parse PDFsPackage
UnstructuredUses Unstructured’s open source library to load PDFsPackage
Amazon TextractUses AWS API to load PDFsAPI
MathPixUses MathPix to load PDFsPackage
PDFPlumberLoad PDF files using PDFPlumberPackage
PyPDFDirectryLoad a directory with PDF filesPackage
PyPDFium2Load PDF files using PyPDFium2Package
PyMuPDFLoad PDF files using PyMuPDFPackage
PyMuPDF4LLMLoad PDF content to Markdown using PyMuPDF4LLMPackage
PDFMinerLoad PDF files using PDFMinerPackage
Upstage Document Parse LoaderLoad PDF files using UpstageDocumentParseLoaderPackage
DoclingLoad PDF files using DoclingPackage

Cloud Providers

The below document loaders allow you to load documents from your favorite cloud providers.
Document LoaderDescriptionPartner PackageAPI reference
AWS S3 DirectoryLoad documents from an AWS S3 directoryS3DirectoryLoader
AWS S3 FileLoad documents from an AWS S3 fileS3FileLoader
Azure AI DataLoad documents from Azure AI servicesAzureAIDataLoader
Azure Blob Storage ContainerLoad documents from an Azure Blob Storage containerAzureBlobStorageContainerLoader
Azure Blob Storage FileLoad documents from an Azure Blob Storage fileAzureBlobStorageFileLoader
DropboxLoad documents from DropboxDropboxLoader
Google Cloud Storage DirectoryLoad documents from GCS bucketGCSDirectoryLoader
Google Cloud Storage FileLoad documents from GCS file objectGCSFileLoader
Google DriveLoad documents from Google Drive (Google Docs only)GoogleDriveLoader
Huawei OBS DirectoryLoad documents from Huawei Object Storage Service DirectoryOBSDirectoryLoader
Huawei OBS FileLoad documents from Huawei Object Storage Service FileOBSFileLoader
Microsoft OneDriveLoad documents from Microsoft OneDriveOneDriveLoader
Microsoft SharePointLoad documents from Microsoft SharePointSharePointLoader
Tencent COS DirectoryLoad documents from Tencent Cloud Object Storage DirectoryTencentCOSDirectoryLoader
Tencent COS FileLoad documents from Tencent Cloud Object Storage FileTencentCOSFileLoader

Social Platforms

The below document loaders allow you to load documents from different social media platforms.
Document LoaderAPI reference
TwitterTwitterTweetLoader
RedditRedditPostsLoader

Messaging Services

The below document loaders allow you to load data from different messaging platforms.
Document LoaderAPI reference
TelegramTelegramChatFileLoader
WhatsAppWhatsAppChatLoader
DiscordDiscordChatLoader
Facebook ChatFacebookChatLoader
MastodonMastodonTootsLoader

Productivity tools

The below document loaders allow you to load data from commonly used productivity tools.
Document LoaderAPI reference
FigmaFigmaFileLoader
NotionNotionDirectoryLoader
SlackSlackDirectoryLoader
QuipQuipLoader
TrelloTrelloLoader
RoamRoamLoader
GitHubGithubFileLoader

Common File Types

The below document loaders allow you to load data from common data formats.
Document LoaderData Type
CSVLoaderCSV files
DirectoryLoaderAll files in a given directory
UnstructuredMany file types (see https://docs.unstructured.io/platform/supported-file-types)
JSONLoaderJSON files
BSHTMLLoaderHTML files
DoclingLoaderVarious file types (see https://ds4sd.github.io/docling/)

All document loaders

acreom

AgentQLLoader

AirbyteLoader

Airtable

Alibaba Cloud MaxCompute

Amazon Textract

Apify Dataset

ArcGIS

ArxivLoader

AssemblyAI Audio Transcripts

AstraDB

Async Chromium

AsyncHtml

Athena

AWS S3 Directory

AWS S3 File

AZLyrics

Azure AI Data

Azure Blob Storage Container

Azure Blob Storage File

Azure AI Document Intelligence

BibTeX

BiliBili

Blackboard

Blockchain

Box

Brave Search

Browserbase

Browserless

BSHTMLLoader

Cassandra

ChatGPT Data

College Confidential

Concurrent Loader

Confluence

CoNLL-U

Copy Paste

Couchbase

CSV

Cube Semantic Layer

Datadog Logs

Dedoc

Diffbot

Discord

Docling

Docugami

Docusaurus

Dropbox

DuckDB

Email

EPub

Etherscan

EverNote

Facebook Chat

Fauna

Figma

FireCrawl

Geopandas

Git

GitBook

GitHub

Glue Catalog

Google AlloyDB for PostgreSQL

Google BigQuery

Google Bigtable

Google Cloud SQL for SQL Server

Google Cloud SQL for MySQL

Google Cloud SQL for PostgreSQL

Google Cloud Storage Directory

Google Cloud Storage File

Google Firestore in Datastore Mode

Google Drive

Google El Carro for Oracle Workloads

Google Firestore (Native Mode)

Google Memorystore for Redis

Google Spanner

Google Speech-to-Text

Grobid

Gutenberg

Hacker News

Huawei OBS Directory

Huawei OBS File

HuggingFace Dataset

HyperbrowserLoader

iFixit

Images

Image Captions

IMSDb

Iugu

Joplin

JSONLoader

Jupyter Notebook

Kinetica

lakeFS

LangSmith

LarkSuite (FeiShu)

LLM Sherpa

Mastodon

MathPixPDFLoader

MediaWiki Dump

Merge Documents Loader

MHTML

Microsoft Excel

Microsoft OneDrive

Microsoft OneNote

Microsoft PowerPoint

Microsoft SharePoint

Microsoft Word

Near Blockchain

Modern Treasury

MongoDB

Needle Document Loader

News URL

Notion DB

Nuclia

Obsidian

Open Document Format (ODT)

Open City Data

Oracle Autonomous Database

Oracle AI Vector Search

Org-mode

Outline Document Loader

Oxylabs

Pandas DataFrame

PDFMinerLoader

PDFPlumber

Pebblo Safe DocumentLoader

Polars DataFrame

Dell PowerScale

Psychic

PubMed

PullMdLoader

PyMuPDFLoader

PyMuPDF4LLM

PyPDFDirectoryLoader

PyPDFium2Loader

PyPDFLoader

PySpark

Quip

ReadTheDocs Documentation

Recursive URL

Reddit

Roam

Rockset

rspace

RSS Feeds

RST

scrapfly

ScrapingAnt

SingleStore

Sitemap

Slack

Snowflake

Source Code

Spider

Spreedly

Stripe

Subtitle

SurrealDB

Telegram

Tencent COS Directory

Tencent COS File

TensorFlow Datasets

TiDB

2Markdown

TOML

Trello

TSV

Twitter

Unstructured

UnstructuredMarkdownLoader

UnstructuredPDFLoader

Upstage

URL

Vsdx

Weather

WebBaseLoader

WhatsApp Chat

Wikipedia

UnstructuredXMLLoader

Xorbits Pandas DataFrame

YouTube Audio

YouTube Transcripts

YoutubeLoaderDL

Yuque

ZeroxPDFLoader