Doctran is a python package. It uses LLMs and open-source
NLP libraries to transform raw text into clean, structured, information-dense documents
that are optimized for vector space retrieval. You can think of Doctran
as a black box where
messy strings go in and nice, clean, labelled strings come out.