> ## Documentation Index
> Fetch the complete documentation index at: https://docs.langchain.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Document loader integrations

> Integrate with document loaders using LangChain Python.

Document loaders provide a **standard interface** for reading data from different sources (such as Slack, Notion, or Google Drive) into LangChain’s [Document](https://reference.langchain.com/python/langchain-core/documents/base/Document) format.
This ensures that data can be handled consistently regardless of the source.

All document loaders implement the [`BaseLoader`](https://reference.langchain.com/python/langchain-core/document_loaders/base/BaseLoader) interface.

## Interface

Each document loader may define its own parameters, but they share a common API:

* `load()` – Loads all documents at once.
* `lazy_load()` – Streams documents lazily, useful for large datasets.

```python theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
from langchain_community.document_loaders.csv_loader import CSVLoader

loader = CSVLoader(
    ...  # Integration-specific parameters here
)

# Load all documents
documents = loader.load()

# For large datasets, lazily load documents
for document in loader.lazy_load():
    print(document)
```

## By category

### Webpages

The below document loaders allow you to load webpages.

| Document Loader                                                             | Description                                                                                                          | Package/API |
| --------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- | ----------- |
| [Web](/oss/python/integrations/document_loaders/web_base)                   | Uses urllib and BeautifulSoup to load and parse HTML web pages                                                       | Package     |
| [Unstructured](/oss/python/integrations/document_loaders/unstructured_file) | Uses Unstructured to load and parse web pages                                                                        | Package     |
| [RecursiveURL](/oss/python/integrations/document_loaders/recursive_url)     | Recursively scrapes all child links from a root URL                                                                  | Package     |
| [Sitemap](/oss/python/integrations/document_loaders/sitemap)                | Scrapes all pages on a given sitemap                                                                                 | Package     |
| [Spider](/oss/python/integrations/document_loaders/spider)                  | Crawler and scraper that returns LLM-ready data                                                                      | API         |
| [Firecrawl](/oss/python/integrations/document_loaders/firecrawl)            | API service that can be deployed locally                                                                             | API         |
| [Apify Dataset](/oss/python/integrations/document_loaders/apify_dataset)    | Load documents from Apify datasets                                                                                   | API         |
| [Docling](/oss/python/integrations/document_loaders/docling)                | Uses Docling to load and parse web pages                                                                             | Package     |
| [Hyperbrowser](/oss/python/integrations/document_loaders/hyperbrowser)      | Platform for running and scaling headless browsers, can be used to scrape/crawl any site                             | API         |
| [AgentQL](/oss/python/integrations/document_loaders/agentql)                | Web interaction and structured data extraction from any web page using an AgentQL query or a Natural Language prompt | API         |

### PDFs

The below document loaders allow you to load PDF documents.

| Document Loader                                                                    | Description                                          | Package/API |
| ---------------------------------------------------------------------------------- | ---------------------------------------------------- | ----------- |
| [PyPDF](/oss/python/integrations/document_loaders/pypdfloader)                     | Uses `pypdf` to load and parse PDFs                  | Package     |
| [Unstructured](/oss/python/integrations/document_loaders/unstructured_file)        | Uses Unstructured's open source library to load PDFs | Package     |
| [Amazon Textract](/oss/python/integrations/document_loaders/amazon_textract)       | Uses AWS API to load PDFs                            | API         |
| [MathPix](/oss/python/integrations/document_loaders/mathpix)                       | Uses MathPix to load PDFs                            | Package     |
| [PDFPlumber](/oss/python/integrations/document_loaders/pdfplumber)                 | Load PDF files using PDFPlumber                      | Package     |
| [PyPDFDirectry](/oss/python/integrations/document_loaders/pypdfdirectory)          | Load a directory with PDF files                      | Package     |
| [PyPDFium2](/oss/python/integrations/document_loaders/pypdfium2)                   | Load PDF files using PyPDFium2                       | Package     |
| [PyMuPDF](/oss/python/integrations/document_loaders/pymupdf)                       | Load PDF files using PyMuPDF                         | Package     |
| [PyMuPDF4LLM](/oss/python/integrations/document_loaders/pymupdf4llm)               | Load PDF content to Markdown using PyMuPDF4LLM       | Package     |
| [PDFMiner](/oss/python/integrations/document_loaders/pdfminer)                     | Load PDF files using PDFMiner                        | Package     |
| [Upstage Document Parse Loader](/oss/python/integrations/document_loaders/upstage) | Load PDF files using UpstageDocumentParseLoader      | Package     |
| [Docling](/oss/python/integrations/document_loaders/docling)                       | Load PDF files using Docling                         | Package     |
| [UnDatasIO](/oss/python/integrations/document_loaders/undatasio)                   | Load PDF files using UnDatasIO                       | Package     |
| [OpenDataLoader PDF](/oss/python/integrations/document_loaders/opendataloader_pdf) | Load PDF files using OpenDataLoader PDF              | Package     |

### Cloud providers

The below document loaders allow you to load documents from your favorite cloud providers.

| Document Loader                                                                                            | Description                                                 | Partner Package | API reference                                                                                                                                              |
| ---------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------- | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [AWS S3 Directory](/oss/python/integrations/document_loaders/aws_s3_directory)                             | Load documents from an AWS S3 directory                     | ❌               | [`S3DirectoryLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/s3_directory/S3DirectoryLoader)                          |
| [AWS S3 File](/oss/python/integrations/document_loaders/aws_s3_file)                                       | Load documents from an AWS S3 file                          | ❌               | [`S3FileLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/s3_file/S3FileLoader)                                         |
| [Azure AI Data](/oss/python/integrations/document_loaders/azure_ai_data)                                   | Load documents from Azure AI services                       | ❌               | [`AzureAIDataLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/azure_ai_data/AzureAIDataLoader)                         |
| [Azure Blob Storage](/oss/python/integrations/document_loaders/azure_blob_storage)                         | Load documents from Azure Blob Storage                      | ✅               | [`AzureBlobStorageLoader`](https://reference.langchain.com/python/integrations/langchain_azure/storage/)                                                   |
| [Dropbox](/oss/python/integrations/document_loaders/dropbox)                                               | Load documents from Dropbox                                 | ❌               | [`DropboxLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/dropbox/DropboxLoader)                                       |
| [Google Cloud Storage Directory](/oss/python/integrations/document_loaders/google_cloud_storage_directory) | Load documents from GCS bucket                              | ✅               | [`GCSDirectoryLoader`](https://reference.langchain.com/python/langchain-google-community/gcs_directory/GCSDirectoryLoader)                                 |
| [Google Cloud Storage File](/oss/python/integrations/document_loaders/google_cloud_storage_file)           | Load documents from GCS file object                         | ✅               | [`GCSFileLoader`](https://reference.langchain.com/python/langchain-google-community/gcs_file/GCSFileLoader)                                                |
| [Google Drive](/oss/python/integrations/document_loaders/google_drive)                                     | Load documents from Google Drive (Google Docs only)         | ✅               | [`GoogleDriveLoader`](https://reference.langchain.com/python/langchain-google-community/drive/GoogleDriveLoader)                                           |
| [Huawei OBS Directory](/oss/python/integrations/document_loaders/huawei_obs_directory)                     | Load documents from Huawei Object Storage Service Directory | ❌               | [`OBSDirectoryLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/obs_directory/OBSDirectoryLoader)                       |
| [Huawei OBS File](/oss/python/integrations/document_loaders/huawei_obs_file)                               | Load documents from Huawei Object Storage Service File      | ❌               | [`OBSFileLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/obs_file/OBSFileLoader)                                      |
| [Microsoft OneDrive](/oss/python/integrations/document_loaders/microsoft_onedrive)                         | Load documents from Microsoft OneDrive                      | ❌               | [`OneDriveLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/onedrive/OneDriveLoader)                                    |
| [Microsoft SharePoint](/oss/python/integrations/document_loaders/microsoft_sharepoint)                     | Load documents from Microsoft SharePoint                    | ❌               | [`SharePointLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/sharepoint/SharePointLoader)                              |
| [Tencent COS Directory](/oss/python/integrations/document_loaders/tencent_cos_directory)                   | Load documents from Tencent Cloud Object Storage Directory  | ❌               | [`TencentCOSDirectoryLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/tencent_cos_directory/TencentCOSDirectoryLoader) |
| [Tencent COS File](/oss/python/integrations/document_loaders/tencent_cos_file)                             | Load documents from Tencent Cloud Object Storage File       | ❌               | [`TencentCOSFileLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/tencent_cos_file/TencentCOSFileLoader)                |

### Social platforms

The below document loaders allow you to load documents from different social media platforms.

| Document Loader                                              | API reference                                                                                                                  |
| ------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------ |
| [Twitter](/oss/python/integrations/document_loaders/twitter) | [`TwitterTweetLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/twitter/TwitterTweetLoader) |
| [Reddit](/oss/python/integrations/document_loaders/reddit)   | [`RedditPostsLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/reddit/RedditPostsLoader)    |

### Messaging services

The below document loaders allow you to load data from different messaging platforms.

| Document Loader                                                          | API reference                                                                                                                           |
| ------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------- |
| [Telegram](/oss/python/integrations/document_loaders/telegram)           | [`TelegramChatFileLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/telegram/TelegramChatFileLoader) |
| [WhatsApp](/oss/python/integrations/document_loaders/whatsapp_chat)      | [`WhatsAppChatLoader`](https://reference.langchain.com/python/langchain-community/chat_loaders/whatsapp/WhatsAppChatLoader)             |
| [Discord](/oss/python/integrations/document_loaders/discord)             | [`DiscordChatLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/discord/DiscordChatLoader)            |
| [Facebook Chat](/oss/python/integrations/document_loaders/facebook_chat) | [`FacebookChatLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/facebook_chat/FacebookChatLoader)    |
| [Mastodon](/oss/python/integrations/document_loaders/mastodon)           | [`MastodonTootsLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/mastodon/MastodonTootsLoader)       |

### Productivity tools

The below document loaders allow you to load data from commonly used productivity tools.

| Document Loader                                            | API reference                                                                                                                              |
| ---------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| [Figma](/oss/python/integrations/document_loaders/figma)   | [`FigmaFileLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/figma/FigmaFileLoader)                     |
| [Notion](/oss/python/integrations/document_loaders/notion) | [`NotionDirectoryLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/notion/NotionDirectoryLoader)        |
| [Slack](/oss/python/integrations/document_loaders/slack)   | [`SlackDirectoryLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/slack_directory/SlackDirectoryLoader) |
| [Quip](/oss/python/integrations/document_loaders/quip)     | [`QuipLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/quip/QuipLoader)                                |
| [Trello](/oss/python/integrations/document_loaders/trello) | [`TrelloLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/trello/TrelloLoader)                          |
| [Roam](/oss/python/integrations/document_loaders/roam)     | [`RoamLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/roam/RoamLoader)                                |
| [GitHub](/oss/python/integrations/document_loaders/github) | [`GithubFileLoader`](https://reference.langchain.com/python/langchain-community/document_loaders/github/GithubFileLoader)                  |

### Common file types

The below document loaders allow you to load data from common data formats.

| Document Loader                                                                                  | Data Type                                                                                                                                                                    |
| ------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [`CSVLoader`](/oss/python/integrations/document_loaders/csv)                                     | CSV files                                                                                                                                                                    |
| [`Unstructured`](/oss/python/integrations/document_loaders/unstructured_file)                    | Many file types (see [https://docs.unstructured.io/platform/supported-file-types](https://docs.unstructured.io/platform/supported-file-types))                               |
| [`JSONLoader`](/oss/python/integrations/document_loaders/json)                                   | JSON files                                                                                                                                                                   |
| [`BSHTMLLoader`](/oss/python/integrations/document_loaders/bshtml)                               | HTML files                                                                                                                                                                   |
| [`DoclingLoader`](/oss/python/integrations/document_loaders/docling)                             | Various file types (see [https://ds4sd.github.io/docling/](https://ds4sd.github.io/docling/))                                                                                |
| [`PolarisAIDataInsightLoader`](/oss/python/integrations/document_loaders/polaris_ai_datainsight) | Various file types (see [https://datainsight.polarisoffice.com/documentation?docType=doc\_extract](https://datainsight.polarisoffice.com/documentation?docType=doc_extract)) |

## All document loaders

<Columns cols={3}>
  <Card title="acreom" icon="link" href="/oss/python/integrations/document_loaders/acreom" arrow="true" cta="View guide" />

  <Card title="AgentQLLoader" icon="link" href="/oss/python/integrations/document_loaders/agentql" arrow="true" cta="View guide" />

  <Card title="AirbyteLoader" icon="link" href="/oss/python/integrations/document_loaders/airbyte" arrow="true" cta="View guide" />

  <Card title="Airtable" icon="link" href="/oss/python/integrations/document_loaders/airtable" arrow="true" cta="View guide" />

  <Card title="Alibaba Cloud MaxCompute" icon="link" href="/oss/python/integrations/document_loaders/alibaba_cloud_maxcompute" arrow="true" cta="View guide" />

  <Card title="Amazon Textract" icon="link" href="/oss/python/integrations/document_loaders/amazon_textract" arrow="true" cta="View guide" />

  <Card title="Apify Dataset" icon="link" href="/oss/python/integrations/document_loaders/apify_dataset" arrow="true" cta="View guide" />

  <Card title="ArxivLoader" icon="link" href="/oss/python/integrations/document_loaders/arxiv" arrow="true" cta="View guide" />

  <Card title="AssemblyAI Audio Transcripts" icon="link" href="/oss/python/integrations/document_loaders/assemblyai" arrow="true" cta="View guide" />

  <Card title="AstraDB" icon="link" href="/oss/python/integrations/document_loaders/astradb" arrow="true" cta="View guide" />

  <Card title="Async Chromium" icon="link" href="/oss/python/integrations/document_loaders/async_chromium" arrow="true" cta="View guide" />

  <Card title="AsyncHtml" icon="link" href="/oss/python/integrations/document_loaders/async_html" arrow="true" cta="View guide" />

  <Card title="Athena" icon="link" href="/oss/python/integrations/document_loaders/athena" arrow="true" cta="View guide" />

  <Card title="AWS S3 Directory" icon="link" href="/oss/python/integrations/document_loaders/aws_s3_directory" arrow="true" cta="View guide" />

  <Card title="AWS S3 File" icon="link" href="/oss/python/integrations/document_loaders/aws_s3_file" arrow="true" cta="View guide" />

  <Card title="AZLyrics" icon="link" href="/oss/python/integrations/document_loaders/azlyrics" arrow="true" cta="View guide" />

  <Card title="Azure AI Data" icon="link" href="/oss/python/integrations/document_loaders/azure_ai_data" arrow="true" cta="View guide" />

  <Card title="Azure Blob Storage" icon="link" href="/oss/python/integrations/document_loaders/azure_blob_storage" arrow="true" cta="View guide" />

  <Card title="Azure AI Document Intelligence" icon="link" href="/oss/python/integrations/document_loaders/azure_document_intelligence" arrow="true" cta="View guide" />

  <Card title="BibTeX" icon="link" href="/oss/python/integrations/document_loaders/bibtex" arrow="true" cta="View guide" />

  <Card title="BiliBili" icon="link" href="/oss/python/integrations/document_loaders/bilibili" arrow="true" cta="View guide" />

  <Card title="Blackboard" icon="link" href="/oss/python/integrations/document_loaders/blackboard" arrow="true" cta="View guide" />

  <Card title="Blockchain" icon="link" href="/oss/python/integrations/document_loaders/blockchain" arrow="true" cta="View guide" />

  <Card title="Box" icon="link" href="/oss/python/integrations/document_loaders/box" arrow="true" cta="View guide" />

  <Card title="Brave Search" icon="link" href="/oss/python/integrations/document_loaders/brave_search" arrow="true" cta="View guide" />

  <Card title="Browserbase" icon="link" href="/oss/python/integrations/document_loaders/browserbase" arrow="true" cta="View guide" />

  <Card title="Browserless" icon="link" href="/oss/python/integrations/document_loaders/browserless" arrow="true" cta="View guide" />

  <Card title="BSHTMLLoader" icon="link" href="/oss/python/integrations/document_loaders/bshtml" arrow="true" cta="View guide" />

  <Card title="Cassandra" icon="link" href="/oss/python/integrations/document_loaders/cassandra" arrow="true" cta="View guide" />

  <Card title="ChatGPT Data" icon="link" href="/oss/python/integrations/document_loaders/chatgpt_loader" arrow="true" cta="View guide" />

  <Card title="College Confidential" icon="link" href="/oss/python/integrations/document_loaders/college_confidential" arrow="true" cta="View guide" />

  <Card title="Concurrent Loader" icon="link" href="/oss/python/integrations/document_loaders/concurrent" arrow="true" cta="View guide" />

  <Card title="Confluence" icon="link" href="/oss/python/integrations/document_loaders/confluence" arrow="true" cta="View guide" />

  <Card title="CoNLL-U" icon="link" href="/oss/python/integrations/document_loaders/conll-u" arrow="true" cta="View guide" />

  <Card title="Copy Paste" icon="link" href="/oss/python/integrations/document_loaders/copypaste" arrow="true" cta="View guide" />

  <Card title="Couchbase" icon="link" href="/oss/python/integrations/document_loaders/couchbase" arrow="true" cta="View guide" />

  <Card title="CSV" icon="link" href="/oss/python/integrations/document_loaders/csv" arrow="true" cta="View guide" />

  <Card title="Cube Semantic Layer" icon="link" href="/oss/python/integrations/document_loaders/cube_semantic" arrow="true" cta="View guide" />

  <Card title="Datadog Logs" icon="link" href="/oss/python/integrations/document_loaders/datadog_logs" arrow="true" cta="View guide" />

  <Card title="Dedoc" icon="link" href="/oss/python/integrations/document_loaders/dedoc" arrow="true" cta="View guide" />

  <Card title="Diffbot" icon="link" href="/oss/python/integrations/document_loaders/diffbot" arrow="true" cta="View guide" />

  <Card title="Discord" icon="link" href="/oss/python/integrations/document_loaders/discord" arrow="true" cta="View guide" />

  <Card title="Docling" icon="link" href="/oss/python/integrations/document_loaders/docling" arrow="true" cta="View guide" />

  <Card title="Docugami" icon="link" href="/oss/python/integrations/document_loaders/docugami" arrow="true" cta="View guide" />

  <Card title="Docusaurus" icon="link" href="/oss/python/integrations/document_loaders/docusaurus" arrow="true" cta="View guide" />

  <Card title="Dropbox" icon="link" href="/oss/python/integrations/document_loaders/dropbox" arrow="true" cta="View guide" />

  <Card title="Email" icon="link" href="/oss/python/integrations/document_loaders/email" arrow="true" cta="View guide" />

  <Card title="EPub" icon="link" href="/oss/python/integrations/document_loaders/epub" arrow="true" cta="View guide" />

  <Card title="Etherscan" icon="link" href="/oss/python/integrations/document_loaders/etherscan" arrow="true" cta="View guide" />

  <Card title="EverNote" icon="link" href="/oss/python/integrations/document_loaders/evernote" arrow="true" cta="View guide" />

  <Card title="Facebook Chat" icon="link" href="/oss/python/integrations/document_loaders/facebook_chat" arrow="true" cta="View guide" />

  <Card title="Fauna" icon="link" href="/oss/python/integrations/document_loaders/fauna" arrow="true" cta="View guide" />

  <Card title="Figma" icon="link" href="/oss/python/integrations/document_loaders/figma" arrow="true" cta="View guide" />

  <Card title="FireCrawl" icon="link" href="/oss/python/integrations/document_loaders/firecrawl" arrow="true" cta="View guide" />

  <Card title="Geopandas" icon="link" href="/oss/python/integrations/document_loaders/geopandas" arrow="true" cta="View guide" />

  <Card title="Git" icon="link" href="/oss/python/integrations/document_loaders/git" arrow="true" cta="View guide" />

  <Card title="GitBook" icon="link" href="/oss/python/integrations/document_loaders/gitbook" arrow="true" cta="View guide" />

  <Card title="GitHub" icon="link" href="/oss/python/integrations/document_loaders/github" arrow="true" cta="View guide" />

  <Card title="Glue Catalog" icon="link" href="/oss/python/integrations/document_loaders/glue_catalog" arrow="true" cta="View guide" />

  <Card title="Google AlloyDB for PostgreSQL" icon="link" href="/oss/python/integrations/document_loaders/google_alloydb" arrow="true" cta="View guide" />

  <Card title="Google BigQuery" icon="link" href="/oss/python/integrations/document_loaders/google_bigquery" arrow="true" cta="View guide" />

  <Card title="Google Bigtable" icon="link" href="/oss/python/integrations/document_loaders/google_bigtable" arrow="true" cta="View guide" />

  <Card title="Google Cloud SQL for SQL Server" icon="link" href="/oss/python/integrations/document_loaders/google_cloud_sql_mssql" arrow="true" cta="View guide" />

  <Card title="Google Cloud SQL for MySQL" icon="link" href="/oss/python/integrations/document_loaders/google_cloud_sql_mysql" arrow="true" cta="View guide" />

  <Card title="Google Cloud SQL for PostgreSQL" icon="link" href="/oss/python/integrations/document_loaders/google_cloud_sql_pg" arrow="true" cta="View guide" />

  <Card title="Google Cloud Storage Directory" icon="link" href="/oss/python/integrations/document_loaders/google_cloud_storage_directory" arrow="true" cta="View guide" />

  <Card title="Google Cloud Storage File" icon="link" href="/oss/python/integrations/document_loaders/google_cloud_storage_file" arrow="true" cta="View guide" />

  <Card title="Google Firestore in Datastore Mode" icon="link" href="/oss/python/integrations/document_loaders/google_datastore" arrow="true" cta="View guide" />

  <Card title="Google Drive" icon="link" href="/oss/python/integrations/document_loaders/google_drive" arrow="true" cta="View guide" />

  <Card title="Google El Carro for Oracle Workloads" icon="link" href="/oss/python/integrations/document_loaders/google_el_carro" arrow="true" cta="View guide" />

  <Card title="Google Firestore (Native Mode)" icon="link" href="/oss/python/integrations/document_loaders/google_firestore" arrow="true" cta="View guide" />

  <Card title="Google Memorystore for Redis" icon="link" href="/oss/python/integrations/document_loaders/google_memorystore_redis" arrow="true" cta="View guide" />

  <Card title="Google Spanner" icon="link" href="/oss/python/integrations/document_loaders/google_spanner" arrow="true" cta="View guide" />

  <Card title="Google Speech-to-Text" icon="link" href="/oss/python/integrations/document_loaders/google_speech_to_text" arrow="true" cta="View guide" />

  <Card title="Grobid" icon="link" href="/oss/python/integrations/document_loaders/grobid" arrow="true" cta="View guide" />

  <Card title="Gutenberg" icon="link" href="/oss/python/integrations/document_loaders/gutenberg" arrow="true" cta="View guide" />

  <Card title="Hacker News" icon="link" href="/oss/python/integrations/document_loaders/hacker_news" arrow="true" cta="View guide" />

  <Card title="Huawei OBS Directory" icon="link" href="/oss/python/integrations/document_loaders/huawei_obs_directory" arrow="true" cta="View guide" />

  <Card title="Huawei OBS File" icon="link" href="/oss/python/integrations/document_loaders/huawei_obs_file" arrow="true" cta="View guide" />

  <Card title="HuggingFace Dataset" icon="link" href="/oss/python/integrations/document_loaders/hugging_face_dataset" arrow="true" cta="View guide" />

  <Card title="HyperbrowserLoader" icon="link" href="/oss/python/integrations/document_loaders/hyperbrowser" arrow="true" cta="View guide" />

  <Card title="iFixit" icon="link" href="/oss/python/integrations/document_loaders/ifixit" arrow="true" cta="View guide" />

  <Card title="Images" icon="link" href="/oss/python/integrations/document_loaders/image" arrow="true" cta="View guide" />

  <Card title="Image Captions" icon="link" href="/oss/python/integrations/document_loaders/image_captions" arrow="true" cta="View guide" />

  <Card title="IMSDb" icon="link" href="/oss/python/integrations/document_loaders/imsdb" arrow="true" cta="View guide" />

  <Card title="Iugu" icon="link" href="/oss/python/integrations/document_loaders/iugu" arrow="true" cta="View guide" />

  <Card title="Joplin" icon="link" href="/oss/python/integrations/document_loaders/joplin" arrow="true" cta="View guide" />

  <Card title="JSONLoader" icon="link" href="/oss/python/integrations/document_loaders/json" arrow="true" cta="View guide" />

  <Card title="Jupyter Notebook" icon="link" href="/oss/python/integrations/document_loaders/jupyter_notebook" arrow="true" cta="View guide" />

  <Card title="Kinetica" icon="link" href="/oss/python/integrations/document_loaders/kinetica" arrow="true" cta="View guide" />

  <Card title="lakeFS" icon="link" href="/oss/python/integrations/document_loaders/lakefs" arrow="true" cta="View guide" />

  <Card title="LangSmith" icon="link" href="/oss/python/integrations/document_loaders/langsmith" arrow="true" cta="View guide" />

  <Card title="LarkSuite (FeiShu)" icon="link" href="/oss/python/integrations/document_loaders/larksuite" arrow="true" cta="View guide" />

  <Card title="LLM Sherpa" icon="link" href="/oss/python/integrations/document_loaders/llmsherpa" arrow="true" cta="View guide" />

  <Card title="Mastodon" icon="link" href="/oss/python/integrations/document_loaders/mastodon" arrow="true" cta="View guide" />

  <Card title="MathPixPDFLoader" icon="link" href="/oss/python/integrations/document_loaders/mathpix" arrow="true" cta="View guide" />

  <Card title="MediaWiki Dump" icon="link" href="/oss/python/integrations/document_loaders/mediawikidump" arrow="true" cta="View guide" />

  <Card title="Merge Documents Loader" icon="link" href="/oss/python/integrations/document_loaders/merge_doc" arrow="true" cta="View guide" />

  <Card title="MHTML" icon="link" href="/oss/python/integrations/document_loaders/mhtml" arrow="true" cta="View guide" />

  <Card title="Microsoft Excel" icon="link" href="/oss/python/integrations/document_loaders/microsoft_excel" arrow="true" cta="View guide" />

  <Card title="Microsoft OneDrive" icon="link" href="/oss/python/integrations/document_loaders/microsoft_onedrive" arrow="true" cta="View guide" />

  <Card title="Microsoft OneNote" icon="link" href="/oss/python/integrations/document_loaders/microsoft_onenote" arrow="true" cta="View guide" />

  <Card title="Microsoft PowerPoint" icon="link" href="/oss/python/integrations/document_loaders/microsoft_powerpoint" arrow="true" cta="View guide" />

  <Card title="Microsoft SharePoint" icon="link" href="/oss/python/integrations/document_loaders/microsoft_sharepoint" arrow="true" cta="View guide" />

  <Card title="Microsoft Word" icon="link" href="/oss/python/integrations/document_loaders/microsoft_word" arrow="true" cta="View guide" />

  <Card title="Near Blockchain" icon="link" href="/oss/python/integrations/document_loaders/mintbase" arrow="true" cta="View guide" />

  <Card title="Modern Treasury" icon="link" href="/oss/python/integrations/document_loaders/modern_treasury" arrow="true" cta="View guide" />

  <Card title="MongoDB" icon="link" href="/oss/python/integrations/document_loaders/mongodb" arrow="true" cta="View guide" />

  <Card title="Needle Document Loader" icon="link" href="/oss/python/integrations/document_loaders/needle" arrow="true" cta="View guide" />

  <Card title="News URL" icon="link" href="/oss/python/integrations/document_loaders/news" arrow="true" cta="View guide" />

  <Card title="Notion DB" icon="link" href="/oss/python/integrations/document_loaders/notion" arrow="true" cta="View guide" />

  <Card title="Nuclia" icon="link" href="/oss/python/integrations/document_loaders/nuclia" arrow="true" cta="View guide" />

  <Card title="Obsidian" icon="link" href="/oss/python/integrations/document_loaders/obsidian" arrow="true" cta="View guide" />

  <Card title="OpenDataLoader PDF" icon="link" href="/oss/python/integrations/document_loaders/opendataloader_pdf" arrow="true" cta="View guide" />

  <Card title="Open Document Format (ODT)" icon="link" href="/oss/python/integrations/document_loaders/odt" arrow="true" cta="View guide" />

  <Card title="Open City Data" icon="link" href="/oss/python/integrations/document_loaders/open_city_data" arrow="true" cta="View guide" />

  <Card title="Oracle Autonomous Database" icon="link" href="/oss/python/integrations/document_loaders/oracleadb_loader" arrow="true" cta="View guide" />

  <Card title="Oracle AI Database" icon="link" href="/oss/python/integrations/document_loaders/oracleai" arrow="true" cta="View guide" />

  <Card title="Org-mode" icon="link" href="/oss/python/integrations/document_loaders/org_mode" arrow="true" cta="View guide" />

  <Card title="Outline Document Loader" icon="link" href="/oss/python/integrations/document_loaders/outline" arrow="true" cta="View guide" />

  <Card title="PaddleOCR-VL" icon="link" href="/oss/python/integrations/document_loaders/paddleocr_vl" arrow="true" cta="View guide" />

  <Card title="Pandas DataFrame" icon="link" href="/oss/python/integrations/document_loaders/pandas_dataframe" arrow="true" cta="View guide" />

  <Card title="PDFMinerLoader" icon="link" href="/oss/python/integrations/document_loaders/pdfminer" arrow="true" cta="View guide" />

  <Card title="PDFPlumber" icon="link" href="/oss/python/integrations/document_loaders/pdfplumber" arrow="true" cta="View guide" />

  <Card title="Pebblo Safe DocumentLoader" icon="link" href="/oss/python/integrations/document_loaders/pebblo" arrow="true" cta="View guide" />

  <Card title="Polaris AI DataInsight" icon="link" href="/oss/python/integrations/document_loaders/polaris_ai_datainsight" arrow="true" cta="View guide" />

  <Card title="Polars DataFrame" icon="link" href="/oss/python/integrations/document_loaders/polars_dataframe" arrow="true" cta="View guide" />

  <Card title="Dell PowerScale" icon="link" href="/oss/python/integrations/document_loaders/powerscale" arrow="true" cta="View guide" />

  <Card title="Psychic" icon="link" href="/oss/python/integrations/document_loaders/psychic" arrow="true" cta="View guide" />

  <Card title="PubMed" icon="link" href="/oss/python/integrations/document_loaders/pubmed" arrow="true" cta="View guide" />

  <Card title="PyMuPDFLoader" icon="link" href="/oss/python/integrations/document_loaders/pymupdf" arrow="true" cta="View guide" />

  <Card title="PyMuPDF4LLM" icon="link" href="/oss/python/integrations/document_loaders/pymupdf4llm" arrow="true" cta="View guide" />

  <Card title="PyPDFDirectoryLoader" icon="link" href="/oss/python/integrations/document_loaders/pypdfdirectory" arrow="true" cta="View guide" />

  <Card title="PyPDFium2Loader" icon="link" href="/oss/python/integrations/document_loaders/pypdfium2" arrow="true" cta="View guide" />

  <Card title="PyPDFLoader" icon="link" href="/oss/python/integrations/document_loaders/pypdfloader" arrow="true" cta="View guide" />

  <Card title="PySpark" icon="link" href="/oss/python/integrations/document_loaders/pyspark_dataframe" arrow="true" cta="View guide" />

  <Card title="Quip" icon="link" href="/oss/python/integrations/document_loaders/quip" arrow="true" cta="View guide" />

  <Card title="ReadTheDocs Documentation" icon="link" href="/oss/python/integrations/document_loaders/readthedocs_documentation" arrow="true" cta="View guide" />

  <Card title="Recursive URL" icon="link" href="/oss/python/integrations/document_loaders/recursive_url" arrow="true" cta="View guide" />

  <Card title="Reddit" icon="link" href="/oss/python/integrations/document_loaders/reddit" arrow="true" cta="View guide" />

  <Card title="Roam" icon="link" href="/oss/python/integrations/document_loaders/roam" arrow="true" cta="View guide" />

  <Card title="Rockset" icon="link" href="/oss/python/integrations/document_loaders/rockset" arrow="true" cta="View guide" />

  <Card title="rspace" icon="link" href="/oss/python/integrations/document_loaders/rspace" arrow="true" cta="View guide" />

  <Card title="RSS Feeds" icon="link" href="/oss/python/integrations/document_loaders/rss" arrow="true" cta="View guide" />

  <Card title="RST" icon="link" href="/oss/python/integrations/document_loaders/rst" arrow="true" cta="View guide" />

  <Card title="scrapfly" icon="link" href="/oss/python/integrations/document_loaders/scrapfly" arrow="true" cta="View guide" />

  <Card title="ScrapingAnt" icon="link" href="/oss/python/integrations/document_loaders/scrapingant" arrow="true" cta="View guide" />

  <Card title="SingleStore" icon="link" href="/oss/python/integrations/document_loaders/singlestore" arrow="true" cta="View guide" />

  <Card title="Sitemap" icon="link" href="/oss/python/integrations/document_loaders/sitemap" arrow="true" cta="View guide" />

  <Card title="Slack" icon="link" href="/oss/python/integrations/document_loaders/slack" arrow="true" cta="View guide" />

  <Card title="Snowflake" icon="link" href="/oss/python/integrations/document_loaders/snowflake" arrow="true" cta="View guide" />

  <Card title="Soniox" icon="link" href="/oss/python/integrations/document_loaders/soniox" arrow="true" cta="View guide" />

  <Card title="Source Code" icon="link" href="/oss/python/integrations/document_loaders/source_code" arrow="true" cta="View guide" />

  <Card title="Spider" icon="link" href="/oss/python/integrations/document_loaders/spider" arrow="true" cta="View guide" />

  <Card title="Spreedly" icon="link" href="/oss/python/integrations/document_loaders/spreedly" arrow="true" cta="View guide" />

  <Card title="Stripe" icon="link" href="/oss/python/integrations/document_loaders/stripe" arrow="true" cta="View guide" />

  <Card title="Subtitle" icon="link" href="/oss/python/integrations/document_loaders/subtitle" arrow="true" cta="View guide" />

  <Card title="SurrealDB" icon="link" href="/oss/python/integrations/document_loaders/surrealdb" arrow="true" cta="View guide" />

  <Card title="Telegram" icon="link" href="/oss/python/integrations/document_loaders/telegram" arrow="true" cta="View guide" />

  <Card title="Tencent COS Directory" icon="link" href="/oss/python/integrations/document_loaders/tencent_cos_directory" arrow="true" cta="View guide" />

  <Card title="Tencent COS File" icon="link" href="/oss/python/integrations/document_loaders/tencent_cos_file" arrow="true" cta="View guide" />

  <Card title="TensorFlow Datasets" icon="link" href="/oss/python/integrations/document_loaders/tensorflow_datasets" arrow="true" cta="View guide" />

  <Card title="TiDB" icon="link" href="/oss/python/integrations/document_loaders/tidb" arrow="true" cta="View guide" />

  <Card title="2Markdown" icon="link" href="/oss/python/integrations/document_loaders/tomarkdown" arrow="true" cta="View guide" />

  <Card title="TOML" icon="link" href="/oss/python/integrations/document_loaders/toml" arrow="true" cta="View guide" />

  <Card title="Trello" icon="link" href="/oss/python/integrations/document_loaders/trello" arrow="true" cta="View guide" />

  <Card title="TSV" icon="link" href="/oss/python/integrations/document_loaders/tsv" arrow="true" cta="View guide" />

  <Card title="Twitter" icon="link" href="/oss/python/integrations/document_loaders/twitter" arrow="true" cta="View guide" />

  <Card title="UnDatasIO" icon="link" href="/oss/python/integrations/document_loaders/undatasio" arrow="true" cta="View guide" />

  <Card title="Unstructured" icon="link" href="/oss/python/integrations/document_loaders/unstructured_file" arrow="true" cta="View guide" />

  <Card title="UnstructuredMarkdownLoader" icon="link" href="/oss/python/integrations/document_loaders/unstructured_markdown" arrow="true" cta="View guide" />

  <Card title="UnstructuredPDFLoader" icon="link" href="/oss/python/integrations/document_loaders/unstructured_pdfloader" arrow="true" cta="View guide" />

  <Card title="Upstage" icon="link" href="/oss/python/integrations/document_loaders/upstage" arrow="true" cta="View guide" />

  <Card title="URL" icon="link" href="/oss/python/integrations/document_loaders/url" arrow="true" cta="View guide" />

  <Card title="Vsdx" icon="link" href="/oss/python/integrations/document_loaders/vsdx" arrow="true" cta="View guide" />

  <Card title="Weather" icon="link" href="/oss/python/integrations/document_loaders/weather" arrow="true" cta="View guide" />

  <Card title="WebBaseLoader" icon="link" href="/oss/python/integrations/document_loaders/web_base" arrow="true" cta="View guide" />

  <Card title="WhatsApp Chat" icon="link" href="/oss/python/integrations/document_loaders/whatsapp_chat" arrow="true" cta="View guide" />

  <Card title="Wikipedia" icon="link" href="/oss/python/integrations/document_loaders/wikipedia" arrow="true" cta="View guide" />

  <Card title="UnstructuredXMLLoader" icon="link" href="/oss/python/integrations/document_loaders/xml" arrow="true" cta="View guide" />

  <Card title="Xorbits Pandas DataFrame" icon="link" href="/oss/python/integrations/document_loaders/xorbits" arrow="true" cta="View guide" />

  <Card title="YouTube Audio" icon="link" href="/oss/python/integrations/document_loaders/youtube_audio" arrow="true" cta="View guide" />

  <Card title="YouTube Transcripts" icon="link" href="/oss/python/integrations/document_loaders/youtube_transcript" arrow="true" cta="View guide" />

  <Card title="YoutubeLoaderDL" icon="link" href="/oss/python/integrations/document_loaders/yt_dlp" arrow="true" cta="View guide" />

  <Card title="Yuque" icon="link" href="/oss/python/integrations/document_loaders/yuque" arrow="true" cta="View guide" />

  <Card title="ZeroxPDFLoader" icon="link" href="/oss/python/integrations/document_loaders/zeroxpdfloader" arrow="true" cta="View guide" />
</Columns>

***

<div className="source-links">
  <Callout icon="terminal-2">
    [Connect these docs](/use-these-docs) to Claude, VSCode, and more via MCP for real-time answers.
  </Callout>

  <Callout icon="edit">
    [Edit this page on GitHub](https://github.com/langchain-ai/docs/edit/main/src/oss/python/integrations/document_loaders/index.mdx) or [file an issue](https://github.com/langchain-ai/docs/issues/new/choose).
  </Callout>
</div>
