> ## Documentation Index
> Fetch the complete documentation index at: https://docs.langchain.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Document loader integrations

> Integrate with document loaders using LangChain JavaScript.

Document loaders provide a **standard interface** for reading data from different sources (such as Slack, Notion, or Google Drive) into LangChain's [Document](https://reference.langchain.com/javascript/langchain-core/documents/Document) format.
This ensures that data can be handled consistently regardless of the source.

All document loaders implement the [BaseLoader](https://reference.langchain.com/javascript/classes/_langchain_core.document_loaders_base.BaseDocumentLoader.html) interface.

## Interface

Each document loader may define its own parameters, but they share a common API:

* `load()`: Loads all documents at once.
* `loadAndSplit()`: Loads all documents at once and splits them into smaller documents.

```typescript theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
import { CSVLoader } from "@langchain/community/document_loaders/fs/csv";

const loader = new CSVLoader(
  ...  // <-- Integration specific parameters here
);
const data = await loader.load();
```

## By category

LangChain.js categorizes document loaders in two different ways:

* [File loaders](/oss/javascript/integrations/document_loaders/file_loaders/), which load data into LangChain formats from your local filesystem.
* [Web loaders](/oss/javascript/integrations/document_loaders/web_loaders/), which load data from remote sources.

### File loaders

<Info>
  If you'd like to contribute an integration, see [Contributing integrations](/oss/javascript/contributing#add-a-new-integration).
</Info>

#### PDFs

| Document Loader                                                               | Description                              | Package/API |
| ----------------------------------------------------------------------------- | ---------------------------------------- | ----------- |
| [`PDFLoader`](/oss/javascript/integrations/document_loaders/file_loaders/pdf) | Load and parse PDF files using pdf-parse | Package     |

#### Common file types

| Document Loader                                                                     | Description                                                  | Package/API |
| ----------------------------------------------------------------------------------- | ------------------------------------------------------------ | ----------- |
| [CSV](/oss/javascript/integrations/document_loaders/file_loaders/csv)               | Load data from CSV files with configurable column extraction | Package     |
| [JSON](/oss/javascript/integrations/document_loaders/file_loaders/json)             | Load JSON files using JSON pointer to target specific keys   | Package     |
| [`JSONLines`](/oss/javascript/integrations/document_loaders/file_loaders/jsonlines) | Load data from JSONLines/JSONL files                         | Package     |
| [`Text`](/oss/javascript/integrations/document_loaders/file_loaders/text)           | Load plain text files                                        | Package     |
| [`DOCX`](/oss/javascript/integrations/document_loaders/file_loaders/docx)           | Load Microsoft Word documents (.docx and .doc formats)       | Package     |
| [`EPUB`](/oss/javascript/integrations/document_loaders/file_loaders/epub)           | Load EPUB files with optional chapter splitting              | Package     |
| [`PPTX`](/oss/javascript/integrations/document_loaders/file_loaders/pptx)           | Load PowerPoint presentations                                | Package     |
| [`Subtitles`](/oss/javascript/integrations/document_loaders/file_loaders/subtitles) | Load subtitle files (.srt format)                            | Package     |

#### Specialized file loaders

| Document Loader                                                                                         | Description                                                          | Package/API |
| ------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------- | ----------- |
| [`DirectoryLoader`](/oss/javascript/integrations/document_loaders/file_loaders/directory)               | Load all files from a directory with custom loader mappings          | Package     |
| [`UnstructuredLoader`](/oss/javascript/integrations/document_loaders/file_loaders/unstructured)         | Load multiple file types using Unstructured API                      | API         |
| [`MultiFileLoader`](/oss/javascript/integrations/document_loaders/file_loaders/multi_file)              | Load data from multiple individual file paths                        | Package     |
| [`ChatGPT`](/oss/javascript/integrations/document_loaders/file_loaders/chatgpt)                         | Load ChatGPT conversation exports                                    | Package     |
| [Notion Markdown](/oss/javascript/integrations/document_loaders/file_loaders/notion_markdown)           | Load Notion pages exported as Markdown                               | Package     |
| [`OracleDocLoader`](/oss/javascript/integrations/document_loaders/file_loaders/oracleai)                | Ingest Oracle AI Vector Search tables or Oracle Text-supported files | Package     |
| [OpenAI Whisper Audio](/oss/javascript/integrations/document_loaders/file_loaders/openai_whisper_audio) | Transcribe audio files using OpenAI Whisper API                      | API         |

### Web loaders

#### Webpages

| Document Loader                                                                                        | Description                                                            | Web Support | Package/API |
| ------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------- | :---------: | ----------- |
| [`Cheerio`](/oss/javascript/integrations/document_loaders/web_loaders/web_cheerio)                     | Load webpages using Cheerio (lightweight, no JavaScript execution)     |      ✅      | Package     |
| [`Playwright`](/oss/javascript/integrations/document_loaders/web_loaders/web_playwright)               | Load dynamic webpages using Playwright (supports JavaScript rendering) |      ❌      | Package     |
| [`Puppeteer`](/oss/javascript/integrations/document_loaders/web_loaders/web_puppeteer)                 | Load dynamic webpages using Puppeteer (headless Chrome)                |      ❌      | Package     |
| [`FireCrawl`](/oss/javascript/integrations/document_loaders/web_loaders/firecrawl)                     | Crawl and convert websites into LLM-ready markdown                     |      ✅      | API         |
| [`Spider`](/oss/javascript/integrations/document_loaders/web_loaders/spider)                           | Fast crawler that converts websites into HTML, markdown, or text       |      ✅      | API         |
| [`RecursiveUrlLoader`](/oss/javascript/integrations/document_loaders/web_loaders/recursive_url_loader) | Recursively load webpages following links                              |      ❌      | Package     |
| [`Sitemap`](/oss/javascript/integrations/document_loaders/web_loaders/sitemap)                         | Load all pages from a sitemap.xml                                      |      ✅      | Package     |
| [`Browserbase`](/oss/javascript/integrations/document_loaders/web_loaders/browserbase)                 | Load webpages using managed headless browsers with stealth mode        |      ✅      | API         |
| [`WebPDFLoader`](/oss/javascript/integrations/document_loaders/web_loaders/pdf)                        | Load PDF files in web environments                                     |      ✅      | Package     |

#### Cloud providers

| Document Loader                                                                                                        | Description                                        | Web Support | Package/API |
| ---------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------- | :---------: | ----------- |
| [`S3`](/oss/javascript/integrations/document_loaders/web_loaders/s3)                                                   | Load files from AWS S3 buckets                     |      ❌      | Package     |
| [Azure Blob Storage Container](/oss/javascript/integrations/document_loaders/web_loaders/azure_blob_storage_container) | Load all files from Azure Blob Storage container   |      ❌      | Package     |
| [Azure Blob Storage File](/oss/javascript/integrations/document_loaders/web_loaders/azure_blob_storage_file)           | Load individual files from Azure Blob Storage      |      ❌      | Package     |
| [Google Cloud Storage](/oss/javascript/integrations/document_loaders/web_loaders/google_cloud_storage)                 | Load files from Google Cloud Storage buckets       |      ❌      | Package     |
| [Google Cloud SQL for PostgreSQL](/oss/javascript/integrations/document_loaders/web_loaders/google_cloudsql_pg)        | Load documents from Cloud SQL PostgreSQL databases |      ✅      | Package     |

#### Productivity tools

| Document Loader                                                                      | Description                             | Web Support | Package/API |
| ------------------------------------------------------------------------------------ | --------------------------------------- | :---------: | ----------- |
| [Notion API](/oss/javascript/integrations/document_loaders/web_loaders/notionapi)    | Load Notion pages and databases via API |      ✅      | API         |
| [`Figma`](/oss/javascript/integrations/document_loaders/web_loaders/figma)           | Load Figma file data                    |      ✅      | API         |
| [`Confluence`](/oss/javascript/integrations/document_loaders/web_loaders/confluence) | Load pages from Confluence spaces       |      ❌      | API         |
| [`GitHub`](/oss/javascript/integrations/document_loaders/web_loaders/github)         | Load files from GitHub repositories     |      ✅      | API         |
| [`GitBook`](/oss/javascript/integrations/document_loaders/web_loaders/gitbook)       | Load GitBook documentation pages        |      ✅      | Package     |
| [`Jira`](/oss/javascript/integrations/document_loaders/web_loaders/jira)             | Load issues from Jira projects          |      ❌      | API         |
| [`Airtable`](/oss/javascript/integrations/document_loaders/web_loaders/airtable)     | Load records from Airtable bases        |      ✅      | API         |
| [`Taskade`](/oss/javascript/integrations/document_loaders/web_loaders/taskade)       | Load Taskade project data               |      ✅      | API         |

#### Search & data APIs

| Document Loader                                                                          | Description                                                    | Web Support | Package/API |
| ---------------------------------------------------------------------------------------- | -------------------------------------------------------------- | :---------: | ----------- |
| [`SearchAPI`](/oss/javascript/integrations/document_loaders/web_loaders/searchapi)       | Load web search results from SearchAPI (Google, YouTube, etc.) |      ✅      | API         |
| [`SerpApi`](/oss/javascript/integrations/document_loaders/web_loaders/serpapi)           | Load web search results from SerpApi                           |      ✅      | API         |
| [Apify Dataset](/oss/javascript/integrations/document_loaders/web_loaders/apify_dataset) | Load scraped data from Apify platform                          |      ✅      | API         |

#### Audio & video

| Document Loader                                                                                          | Description                                                                    | Web Support | Package/API |
| -------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------ | :---------: | ----------- |
| [`YouTube`](/oss/javascript/integrations/document_loaders/web_loaders/youtube)                           | Load YouTube video transcripts                                                 |      ✅      | Package     |
| [`AssemblyAI`](/oss/javascript/integrations/document_loaders/web_loaders/assemblyai_audio_transcription) | Transcribe audio and video files using AssemblyAI API                          |      ✅      | API         |
| [`Soniox`](/oss/javascript/integrations/document_loaders/web_loaders/soniox)                             | Transcribe multilingual audio files with optional translation using Soniox API |      ✅      | API         |
| [`Sonix`](/oss/javascript/integrations/document_loaders/web_loaders/sonix_audio_transcription)           | Transcribe audio files using Sonix API                                         |      ❌      | API         |

#### Other

| Document Loader                                                                                        | Description                                                | Web Support | Package/API |
| ------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------- | :---------: | ----------- |
| [`Couchbase`](/oss/javascript/integrations/document_loaders/web_loaders/couchbase)                     | Load documents from Couchbase database using SQL++ queries |      ✅      | Package     |
| [`LangSmith`](/oss/javascript/integrations/document_loaders/web_loaders/langsmith)                     | Load datasets and traces from LangSmith                    |      ✅      | API         |
| [Hacker News](/oss/javascript/integrations/document_loaders/web_loaders/hn)                            | Load Hacker News threads and comments                      |      ✅      | Package     |
| [`IMSDB`](/oss/javascript/integrations/document_loaders/web_loaders/imsdb)                             | Load movie scripts from Internet Movie Script Database     |      ✅      | Package     |
| [College Confidential](/oss/javascript/integrations/document_loaders/web_loaders/college_confidential) | Load college information from College Confidential         |      ✅      | Package     |
| [Blockchain Data](/oss/javascript/integrations/document_loaders/web_loaders/sort_xyz_blockchain)       | Load blockchain data (NFTs, transactions) via Sort.xyz API |      ✅      | API         |

## All document loaders

<Columns cols={3}>
  <Card title="Airtable" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/airtable" arrow="true" cta="View guide" />

  <Card title="Apify Dataset" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/apify_dataset" arrow="true" cta="View guide" />

  <Card title="AssemblyAI Audio Transcript" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/assemblyai_audio_transcription" arrow="true" cta="View guide" />

  <Card title="Azure Blob Storage Container" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/azure_blob_storage_container" arrow="true" cta="View guide" />

  <Card title="Azure Blob Storage File" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/azure_blob_storage_file" arrow="true" cta="View guide" />

  <Card title="Blockchain Data" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/sort_xyz_blockchain" arrow="true" cta="View guide" />

  <Card title="Browserbase" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/browserbase" arrow="true" cta="View guide" />

  <Card title="ChatGPT" icon="link" href="/oss/javascript/integrations/document_loaders/file_loaders/chatgpt" arrow="true" cta="View guide" />

  <Card title="Cheerio" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/web_cheerio" arrow="true" cta="View guide" />

  <Card title="College Confidential" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/college_confidential" arrow="true" cta="View guide" />

  <Card title="Confluence" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/confluence" arrow="true" cta="View guide" />

  <Card title="Couchbase" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/couchbase" arrow="true" cta="View guide" />

  <Card title="CSV" icon="link" href="/oss/javascript/integrations/document_loaders/file_loaders/csv" arrow="true" cta="View guide" />

  <Card title="DirectoryLoader" icon="link" href="/oss/javascript/integrations/document_loaders/file_loaders/directory" arrow="true" cta="View guide" />

  <Card title="DOCX" icon="link" href="/oss/javascript/integrations/document_loaders/file_loaders/docx" arrow="true" cta="View guide" />

  <Card title="EPUB" icon="link" href="/oss/javascript/integrations/document_loaders/file_loaders/epub" arrow="true" cta="View guide" />

  <Card title="Figma" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/figma" arrow="true" cta="View guide" />

  <Card title="FireCrawl" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/firecrawl" arrow="true" cta="View guide" />

  <Card title="GitHub" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/github" arrow="true" cta="View guide" />

  <Card title="GitBook" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/gitbook" arrow="true" cta="View guide" />

  <Card title="Google Cloud SQL for PostgreSQL" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/google_cloudsql_pg" arrow="true" cta="View guide" />

  <Card title="Google Cloud Storage" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/google_cloud_storage" arrow="true" cta="View guide" />

  <Card title="Hacker News" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/hn" arrow="true" cta="View guide" />

  <Card title="IMSDB" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/imsdb" arrow="true" cta="View guide" />

  <Card title="Jira" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/jira" arrow="true" cta="View guide" />

  <Card title="JSON" icon="link" href="/oss/javascript/integrations/document_loaders/file_loaders/json" arrow="true" cta="View guide" />

  <Card title="JSONLines" icon="link" href="/oss/javascript/integrations/document_loaders/file_loaders/jsonlines" arrow="true" cta="View guide" />

  <Card title="LangSmith" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/langsmith" arrow="true" cta="View guide" />

  <Card title="MultiFileLoader" icon="link" href="/oss/javascript/integrations/document_loaders/file_loaders/multi_file" arrow="true" cta="View guide" />

  <Card title="Notion API" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/notionapi" arrow="true" cta="View guide" />

  <Card title="Notion Markdown" icon="link" href="/oss/javascript/integrations/document_loaders/file_loaders/notion_markdown" arrow="true" cta="View guide" />

  <Card title="OpenAI Whisper Audio" icon="link" href="/oss/javascript/integrations/document_loaders/file_loaders/openai_whisper_audio" arrow="true" cta="View guide" />

  <Card title="OracleDocLoader" icon="link" href="/oss/javascript/integrations/document_loaders/file_loaders/oracleai" arrow="true" cta="View guide" />

  <Card title="PDFLoader" icon="link" href="/oss/javascript/integrations/document_loaders/file_loaders/pdf" arrow="true" cta="View guide" />

  <Card title="Playwright" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/web_playwright" arrow="true" cta="View guide" />

  <Card title="PPTX" icon="link" href="/oss/javascript/integrations/document_loaders/file_loaders/pptx" arrow="true" cta="View guide" />

  <Card title="Puppeteer" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/web_puppeteer" arrow="true" cta="View guide" />

  <Card title="RecursiveUrlLoader" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/recursive_url_loader" arrow="true" cta="View guide" />

  <Card title="S3" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/s3" arrow="true" cta="View guide" />

  <Card title="SearchAPI" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/searchapi" arrow="true" cta="View guide" />

  <Card title="SerpApi" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/serpapi" arrow="true" cta="View guide" />

  <Card title="Sitemap" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/sitemap" arrow="true" cta="View guide" />

  <Card title="Soniox" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/soniox" arrow="true" cta="View guide" />

  <Card title="Sonix Audio" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/sonix_audio_transcription" arrow="true" cta="View guide" />

  <Card title="Spider" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/spider" arrow="true" cta="View guide" />

  <Card title="Subtitles" icon="link" href="/oss/javascript/integrations/document_loaders/file_loaders/subtitles" arrow="true" cta="View guide" />

  <Card title="Taskade" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/taskade" arrow="true" cta="View guide" />

  <Card title="Text" icon="link" href="/oss/javascript/integrations/document_loaders/file_loaders/text" arrow="true" cta="View guide" />

  <Card title="UnstructuredLoader" icon="link" href="/oss/javascript/integrations/document_loaders/file_loaders/unstructured" arrow="true" cta="View guide" />

  <Card title="WebPDFLoader" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/pdf" arrow="true" cta="View guide" />

  <Card title="YouTube" icon="link" href="/oss/javascript/integrations/document_loaders/web_loaders/youtube" arrow="true" cta="View guide" />
</Columns>

***

<div className="source-links">
  <Callout icon="terminal-2">
    [Connect these docs](/use-these-docs) to Claude, VSCode, and more via MCP for real-time answers.
  </Callout>

  <Callout icon="edit">
    [Edit this page on GitHub](https://github.com/langchain-ai/docs/edit/main/src/oss/javascript/integrations/document_loaders/index.mdx) or [file an issue](https://github.com/langchain-ai/docs/issues/new/choose).
  </Callout>
</div>
