LLM Sherpa
to load files of many types. LLM Sherpa
supports different file formats including DOCX, PPTX, HTML, TXT, and XML.
LLMSherpaFileLoader
use LayoutPDFReader, which is part of the LLMSherpa library. This tool is designed to parse PDFs while preserving their layout information, which is often lost when using most PDF to text parsers.
Here are some key features of LayoutPDFReader:
INFO: this library fail with some pdf files so use it with caution.
llmsherpa_api_url
or use the default.