PyMuPDF
document loader. For detailed documentation of all __ModuleName__Loader features and configurations head to the API reference.
Class | Package | Local | Serializable | JS support |
---|---|---|---|---|
PyMuPDFLoader | langchain-community | ✅ | ❌ | ❌ |
Source | Document Lazy Loading | Native Async Support | Extract Images | Extract Tables |
---|---|---|---|---|
PyMuPDFLoader | ✅ | ❌ | ✅ | ✅ |
open
to read the binary content of either a PDF or a markdown file, but you need different parsing logic to convert that binary data into text.
As a result, it can be helpful to decouple the parsing logic from the loading logic, which makes it easier to re-use a given parser regardless of how the data was loaded.
You can use this strategy to analyze different files, with the same parsing parameters.
PyMuPDFLoader
features and configurations head to the API reference: https://python.langchain.com/api_reference/community/document_loaders/langchain_community.document_loaders.pdf.PyMuPDFLoader.html