> ## Documentation Index
> Fetch the complete documentation index at: https://docs.langchain.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Unstructured integrations

> Integrate with Unstructured using LangChain Python.

> The `unstructured` package from
> [Unstructured.IO](https://www.unstructured.io/) extracts clean text from raw source documents like
> PDFs and Word documents.
> This page covers how to use the [`unstructured`](https://github.com/Unstructured-IO/unstructured)
> ecosystem within LangChain.

## Installation and setup

If you are using a loader that runs locally, use the following steps to get `unstructured` and its
dependencies running.

* For the smallest installation footprint and to take advantage of features not available in the
  open-source `unstructured` package, install the Python SDK with `pip install unstructured-client`
  along with `pip install langchain-unstructured` to use the `UnstructuredLoader` and partition
  remotely against the Unstructured API. This loader lives
  in a LangChain partner repo instead of the `langchain-community` repo and you will need an
  `api_key`. You can [generate a free key on the Unstructured API key page](https://unstructured.io/api-key/).
  * Unstructured's documentation for the sdk can be found here:
    [https://docs.unstructured.io/api-reference/api-services/sdk](https://docs.unstructured.io/api-reference/api-services/sdk)

* To run everything locally, install the open-source python package with `pip install unstructured`
  along with `pip install langchain-community` and use the same `UnstructuredLoader` as mentioned above.
  * You can install document specific dependencies with extras, e.g. `pip install "unstructured[docx]"`. Learn more about extras in the [full installation documentation](https://docs.unstructured.io/open-source/installation/full-installation).
  * To install the dependencies for all document types, use `pip install "unstructured[all-docs]"`.

* Install the following system dependencies if they are not already available on your system with e.g. `brew install` for Mac.
  Depending on what document types you're parsing, you may not need all of these.
  * `libmagic-dev` (filetype detection)
  * `poppler-utils` (images and PDFs)
  * `tesseract-ocr`(images and PDFs)
  * `qpdf` (PDFs)
  * `libreoffice` (MS Office docs)
  * `pandoc` (EPUBs)

* When running locally, Unstructured also recommends using Docker [by following this
  guide](https://docs.unstructured.io/open-source/installation/docker-installation) to ensure all
  system dependencies are installed correctly.

The Unstructured API requires API keys to make requests.
You can [request an API key](https://unstructured.io/api-key-hosted) and start using it today!
Check out the [Unstructured API README](https://github.com/Unstructured-IO/unstructured-api) to get started making API calls.
We'd love to hear your feedback, let us know how it goes in our [community slack](https://join.slack.com/t/unstructuredw-kbe4326/shared_invite/zt-1x7cgo0pg-PTptXWylzPQF9xZolzCnwQ).
And stay tuned for improvements to both quality and performance!
Check out the [Docker self-hosting instructions](https://github.com/Unstructured-IO/unstructured-api#dizzy-instructions-for-using-the-docker-image) if you'd like to self-host the Unstructured API or run it locally.

## Data loaders

The primary usage of `Unstructured` is in data loaders.

### UnstructuredLoader

See a [usage example](/oss/python/integrations/document_loaders/unstructured_file) to see how you can use
this loader for both partitioning locally and remotely with the serverless Unstructured API.

```python theme={"theme":{"light":"catppuccin-latte","dark":"catppuccin-mocha"}}
from langchain_unstructured import UnstructuredLoader
```

***

<div className="source-links">
  <Callout icon="terminal-2">
    [Connect these docs](/use-these-docs) to Claude, VSCode, and more via MCP for real-time answers.
  </Callout>

  <Callout icon="edit">
    [Edit this page on GitHub](https://github.com/langchain-ai/docs/edit/main/src/oss/python/integrations/providers/unstructured.mdx) or [file an issue](https://github.com/langchain-ai/docs/issues/new/choose).
  </Callout>
</div>
