Dedoc is an open-source library/service that extracts texts, tables, attached files and document structure (e.g., titles, list items, etc.) from files of various formats.
Dedoc
supports DOCX
, XLSX
, PPTX
, EML
, HTML
, PDF
, images and more.
Full list of supported formats can be found here.
Dedoc
using pip
.
In this case, you will need to install dependencies,
please go here
to get more information.
Dedoc
API, you don’t need to install dedoc
library.
In this case, you should run the Dedoc
service, e.g. Docker
container (please see
the documentation
for more details):
Dedoc
), you can use DedocFileLoader
:
DedocPDFLoader
:
Dedoc API
with DedocAPIFileLoader
: