MetadataTagger document transformer automates this process by extracting metadata from each provided document according to a provided schema. It uses a configurable OpenAI Functions-powered chain under the hood, so if you pass a custom LLM instance, it must be an OpenAI model with functions support.
Note: This document transformer works best with complete documents, so it’s best to run it first with whole documents before doing any other splitting or processing!
Usage
For example, let’s say you wanted to index a set of movie reviews. You could initialize the document transformer as follows:createMetadataTagger method that accepts a valid JSON Schema object as well.
Customization
You can pass the underlying tagging chain the standard LLMChain arguments in the second options parameter. For example, if you wanted to ask the LLM to focus specific details in the input documents, or extract metadata in a certain style, you could pass in a custom prompt:Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

