DocxLoader
allows you to extract text data from Microsoft Word documents. It supports both the modern .docx
format and the legacy .doc
format. Depending on the file type, additional dependencies are required.
DocxLoader
, you’ll need the @langchain/community
integration along with either mammoth
or word-extractor
package:
mammoth
: For processing .docx
files.word-extractor
: For handling .doc
files..docx
Files.doc
Files.docx
Files.docx
files, there is no need to explicitly specify any parameters when initializing the loader:
.doc
Files.doc
files, you must explicitly specify the type
as doc
when initializing the loader: