The undatasio package from UnDatasIO extracts clean text from raw source documents like PDFs. This page covers how to use the undatasio ecosystem within LangChain.

Installation and Setup

  • Install the Python SDK with
    pip install undatasio
    along with
    pip install langchain-undatasio
    to use the UnDatasIOLoader and partition remotely against the UnDatasIO API. You will need an API key, which you can generate for free at
    undatas.io.
  • No local system dependencies are required; all processing runs in the cloud.

Data Loaders

The primary usage of UnDatasIO is through the document loader.

UnDatasIOLoader

See the usage example for single-file parsing and lazy loading.
from langchain_undatasio import UnDatasIOLoader