Docx files

The DocxLoader allows you to extract text data from Microsoft Word documents. It supports both the modern .docx format and the legacy .doc format. Depending on the file type, additional dependencies are required.

Setup

To use DocxLoader, you’ll need the @langchain/community integration along with either mammoth or word-extractor package:

mammoth: For processing .docx files.
word-extractor: For handling .doc files.

Installation

For `.docx` Files

npm

npm install @langchain/community @langchain/core mammoth

For `.doc` Files

npm

npm install @langchain/community @langchain/core word-extractor

Usage

Loading `.docx` Files

For .docx files, there is no need to explicitly specify any parameters when initializing the loader:

import { DocxLoader } from "@langchain/community/document_loaders/fs/docx";

const loader = new DocxLoader(
  "src/document_loaders/tests/example_data/attention.docx"
);

const docs = await loader.load();

Loading `.doc` Files

For .doc files, you must explicitly specify the type as doc when initializing the loader:

import { DocxLoader } from "@langchain/community/document_loaders/fs/docx";

const loader = new DocxLoader(
  "src/document_loaders/tests/example_data/attention.doc",
  {
    type: "doc",
  }
);

const docs = await loader.load();

Release versions

LangGraph reference

Prebuilt reference

Error troubleshooting

Setup

Installation

For `.docx` Files

For `.doc` Files

Usage

Loading `.docx` Files

Loading `.doc` Files

Release versions

LangGraph reference

Prebuilt reference

Error troubleshooting

​Setup

​Installation

​For .docx Files

​For .doc Files

​Usage

​Loading .docx Files

​Loading .doc Files

Setup

Installation

For `.docx` Files

For `.doc` Files

Usage

Loading `.docx` Files

Loading `.doc` Files