DocxLoader
allows you to extract text data from Microsoft Word documents. It supports both the modern .docx
format and the legacy .doc
format. Depending on the file type, additional dependencies are required.
Setup
To useDocxLoader
, you’ll need the @langchain/community
integration along with either mammoth
or word-extractor
package:
mammoth
: For processing.docx
files.word-extractor
: For handling.doc
files.
Installation
For .docx
Files
npm
For .doc
Files
npm
Usage
Loading .docx
Files
For .docx
files, there is no need to explicitly specify any parameters when initializing the loader:
Loading .doc
Files
For .doc
files, you must explicitly specify the type
as doc
when initializing the loader: