Skip to main content
LangChain v1.0Welcome to the new LangChain documentation! If you encounter any issues or have feedback, please open an issue so we can improve. Archived v0 documentation can be found here.See the release notes and migration guide for a complete list of changes and instructions on how to upgrade your code.
Middleware provides a way to more tightly control what happens inside the agent. The core agent loop involves calling a model, letting it choose tools to execute, and then finishing when it calls no more tools:
Core agent loop diagram
Middleware exposes hooks before and after each of those steps:
Middleware flow diagram

What can middleware do?

Monitor

Track agent behavior with logging, analytics, and debugging

Modify

Transform prompts, tool selection, and output formatting

Control

Add retries, fallbacks, and early termination logic

Enforce

Apply rate limits, guardrails, and PII detection
Add middleware by passing it to @[create_agent]:
import {
  createAgent,
  summarizationMiddleware,
  humanInTheLoopMiddleware,
} from "langchain";

const agent = createAgent({
  model: "openai:gpt-4o",
  tools: [...],
  middleware: [summarizationMiddleware, humanInTheLoopMiddleware],
});

Built-in middleware

LangChain provides prebuilt middleware for common use cases:

Summarization

Automatically summarize conversation history when approaching token limits.
Perfect for:
  • Long-running conversations that exceed context windows
  • Multi-turn dialogues with extensive history
  • Applications where preserving full conversation context matters
import { createAgent, summarizationMiddleware } from "langchain";

const agent = createAgent({
  model: "openai:gpt-4o",
  tools: [weatherTool, calculatorTool],
  middleware: [
    summarizationMiddleware({
      model: "openai:gpt-4o-mini",
      maxTokensBeforeSummary: 4000, // Trigger summarization at 4000 tokens
      messagesToKeep: 20, // Keep last 20 messages after summary
      summaryPrompt: "Custom prompt for summarization...", // Optional
    }),
  ],
});
model
string
required
Model for generating summaries
maxTokensBeforeSummary
number
Token threshold for triggering summarization
messagesToKeep
number
default:"20"
Recent messages to preserve
tokenCounter
function
Custom token counting function. Defaults to character-based counting.
summaryPrompt
string
Custom prompt template. Uses built-in template if not specified.
summaryPrefix
string
default:"## Previous conversation summary:"
Prefix for summary messages

Human-in-the-loop

Pause agent execution for human approval, editing, or rejection of tool calls before they execute.
Perfect for:
  • High-stakes operations requiring human approval (database writes, financial transactions)
  • Compliance workflows where human oversight is mandatory
  • Long running conversations where human feedback is used to guide the agent
import { createAgent, humanInTheLoopMiddleware } from "langchain";

const agent = createAgent({
  model: "openai:gpt-4o",
  tools: [readEmailTool, sendEmailTool],
  middleware: [
    humanInTheLoopMiddleware({
      interruptOn: {
        // Require approval, editing, or rejection for sending emails
        send_email: {
          allowAccept: true,
          allowEdit: true,
          allowRespond: true,
        },
        // Auto-approve reading emails
        read_email: false,
      }
    })
  ]
});
interruptOn
object
required
Mapping of tool names to approval configs
Tool approval config options:
allowAccept
boolean
default:"false"
Whether approval is allowed
allowEdit
boolean
default:"false"
Whether editing is allowed
allowRespond
boolean
default:"false"
Whether responding/rejection is allowed
Important: Human-in-the-loop middleware requires a checkpointer to maintain state across interruptions.See the human-in-the-loop documentation for complete examples and integration patterns.

Anthropic prompt caching

Reduce costs by caching repetitive prompt prefixes with Anthropic models.
Perfect for:
  • Applications with long, repeated system prompts
  • Agents that reuse the same context across invocations
  • Reducing API costs for high-volume deployments
Learn more about Anthropic Prompt Caching strategies and limitations.
import { createAgent, HumanMessage, anthropicPromptCachingMiddleware } from "langchain";

const LONG_PROMPT = `
Please be a helpful assistant.

<Lots more context ...>
`;

const agent = createAgent({
  model: "anthropic:claude-sonnet-4-latest",
  prompt: LONG_PROMPT,
  middleware: [anthropicPromptCachingMiddleware({ ttl: "5m" })],
});

// cache store
await agent.invoke({
  messages: [new HumanMessage("Hi, my name is Bob")]
});

// cache hit, system prompt is cached
const result = await agent.invoke({
  messages: [new HumanMessage("What's my name?")]
});
ttl
string
default:"5m"
Time to live for cached content. Valid values: "5m" or "1h"

Model call limit

Limit the number of model calls to prevent infinite loops or excessive costs.
Perfect for:
  • Preventing runaway agents from making too many API calls
  • Enforcing cost controls on production deployments
  • Testing agent behavior within specific call budgets
import { createAgent, modelCallLimitMiddleware } from "langchain";

const agent = createAgent({
  model: "openai:gpt-4o",
  tools: [...],
  middleware: [
    modelCallLimitMiddleware({
      threadLimit: 10, // Max 10 calls per thread (across runs)
      runLimit: 5, // Max 5 calls per run (single invocation)
      exitBehavior: "end", // Or "error" to throw exception
    }),
  ],
});
threadLimit
number
Maximum model calls across all runs in a thread. Defaults to no limit.
runLimit
number
Maximum model calls per single invocation. Defaults to no limit.
exitBehavior
string
default:"end"
Behavior when limit is reached. Options: "end" (graceful termination) or "error" (throw exception)

Tool call limit

Limit the number of tool calls to specific tools or all tools.
Perfect for:
  • Preventing excessive calls to expensive external APIs
  • Limiting web searches or database queries
  • Enforcing rate limits on specific tool usage
import { createAgent, toolCallLimitMiddleware } from "langchain";

// Limit all tool calls
const globalLimiter = toolCallLimitMiddleware({ threadLimit: 20, runLimit: 10 });

// Limit specific tool
const searchLimiter = toolCallLimitMiddleware({
  toolName: "search",
  threadLimit: 5,
  runLimit: 3,
});

const agent = createAgent({
  model: "openai:gpt-4o",
  tools: [...],
  middleware: [globalLimiter, searchLimiter],
});
toolName
string
Specific tool to limit. If not provided, limits apply to all tools.
threadLimit
number
Maximum tool calls across all runs in a thread. Defaults to no limit.
runLimit
number
Maximum tool calls per single invocation. Defaults to no limit.
exitBehavior
string
default:"end"
Behavior when limit is reached. Options: "end" (graceful termination) or "error" (throw exception)

Model fallback

Automatically fallback to alternative models when the primary model fails.
Perfect for:
  • Building resilient agents that handle model outages
  • Cost optimization by falling back to cheaper models
  • Provider redundancy across OpenAI, Anthropic, etc.
import { createAgent, modelFallbackMiddleware } from "langchain";

const agent = createAgent({
  model: "openai:gpt-4o", // Primary model
  tools: [...],
  middleware: [
    modelFallbackMiddleware(
      "openai:gpt-4o-mini", // Try first on error
      "anthropic:claude-3-5-sonnet-20241022" // Then this
    ),
  ],
});
The middleware accepts a variable number of string arguments representing fallback models in order:
...models
string[]
required
One or more fallback model strings to try in order when the primary model fails
modelFallbackMiddleware(
  "first-fallback-model",
  "second-fallback-model",
  // ... more models
)

PII detection

Detect and handle Personally Identifiable Information in conversations.
Perfect for:
  • Healthcare and financial applications with compliance requirements
  • Customer service agents that need to sanitize logs
  • Any application handling sensitive user data
import { createAgent, piiRedactionMiddleware } from "langchain";

const agent = createAgent({
  model: "openai:gpt-4o",
  tools: [...],
  middleware: [
    // Redact emails in user input
    piiRedactionMiddleware({
      piiType: "email",
      strategy: "redact",
      applyToInput: true,
    }),
    // Mask credit cards (show last 4 digits)
    piiRedactionMiddleware({
      piiType: "credit_card",
      strategy: "mask",
      applyToInput: true,
    }),
    // Custom PII type with regex
    piiRedactionMiddleware({
      piiType: "api_key",
      detector: /sk-[a-zA-Z0-9]{32}/,
      strategy: "block", // Throw error if detected
    }),
  ],
});
piiType
string
required
Type of PII to detect. Can be a built-in type (email, credit_card, ip, mac_address, url) or a custom type name.
strategy
string
default:"redact"
How to handle detected PII. Options:
  • "block" - Throw error when detected
  • "redact" - Replace with [REDACTED_TYPE]
  • "mask" - Partially mask (e.g., ****-****-****-1234)
  • "hash" - Replace with deterministic hash
detector
RegExp
Custom detector regex pattern. If not provided, uses built-in detector for the PII type.
applyToInput
boolean
default:"true"
Check user messages before model call
applyToOutput
boolean
default:"false"
Check AI messages after model call
applyToToolResults
boolean
default:"false"
Check tool result messages after execution

Planning

Add todo list management capabilities for complex multi-step tasks.
This middleware automatically provides agents with a write_todos tool and system prompts to guide effective task planning.
import { createAgent, HumanMessage, todoListMiddleware } from "langchain";

const agent = createAgent({
  model: "openai:gpt-4o",
  tools: [
    /* ... */
  ],
  middleware: [todoListMiddleware()] as const,
});

const result = await agent.invoke({
  messages: [new HumanMessage("Help me refactor my codebase")],
});
console.log(result.todos); // Array of todo items with status tracking
No configuration options available (uses defaults).

LLM tool selector

Use an LLM to intelligently select relevant tools before calling the main model.
Perfect for:
  • Agents with many tools (10+) where most aren’t relevant per query
  • Reducing token usage by filtering irrelevant tools
  • Improving model focus and accuracy
import { createAgent, llmToolSelectorMiddleware } from "langchain";

const agent = createAgent({
  model: "openai:gpt-4o",
  tools: [tool1, tool2, tool3, tool4, tool5, ...], // Many tools
  middleware: [
    llmToolSelectorMiddleware({
      model: "openai:gpt-4o-mini", // Use cheaper model for selection
      maxTools: 3, // Limit to 3 most relevant tools
      alwaysInclude: ["search"], // Always include certain tools
    }),
  ],
});
model
string
Model for tool selection. Defaults to the agent’s main model.
maxTools
number
Maximum number of tools to select. Defaults to no limit.
alwaysInclude
string[]
Array of tool names to always include in the selection

Context editing

Manage conversation context by trimming, summarizing, or clearing tool uses.
Perfect for:
  • Long conversations that need periodic context cleanup
  • Removing failed tool attempts from context
  • Custom context management strategies
import { createAgent, contextEditingMiddleware, ClearToolUsesEdit } from "langchain";

const agent = createAgent({
  model: "openai:gpt-4o",
  tools: [...],
  middleware: [
    contextEditingMiddleware({
      edits: [
        new ClearToolUsesEdit({ maxTokens: 1000 }), // Clear old tool uses
      ],
    }),
  ],
});
edits
ContextEdit[]
default:"[new ClearToolUsesEdit()]"
Array of ContextEdit strategies to apply
@[ClearToolUsesEdit] options:
maxTokens
number
default:"1000"
Token count that triggers the edit

Custom middleware

Build custom middleware by implementing hooks that run at specific points in the agent execution flow.

Class-based middleware

Two hook styles

Node-style hooks

Run sequentially at specific execution points. Use for logging, validation, and state updates.

Wrap-style hooks

Intercept execution with full control over handler calls. Use for retries, caching, and transformation.

Node-style hooks

Run at specific points in the execution flow:
  • beforeAgent - Before agent starts (once per invocation)
  • beforeModel - Before each model call
  • afterModel - After each model response
  • afterAgent - After agent completes (up to once per invocation)
Example: Logging middleware
import { createMiddleware } from "langchain";

const loggingMiddleware = createMiddleware({
  name: "LoggingMiddleware",
  beforeModel: (state) => {
    console.log(`About to call model with ${state.messages.length} messages`);
    return;
  },
  afterModel: (state) => {
    const lastMessage = state.messages[state.messages.length - 1];
    console.log(`Model returned: ${lastMessage.content}`);
    return;
  },
});
Example: Conversation length limit
import { createMiddleware, AIMessage } from "langchain";

const createMessageLimitMiddleware = (maxMessages: number = 50) => {
  return createMiddleware({
    name: "MessageLimitMiddleware",
    beforeModel: (state) => {
      if (state.messages.length === maxMessages) {
        return {
          messages: [new AIMessage("Conversation limit reached.")],
          jumpTo: "end",
        };
      }
      return;
    },
  });
};

Wrap-style hooks

Intercept execution and control when the handler is called:
  • wrapModelCall - Around each model call
  • wrapToolCall - Around each tool call
You decide if the handler is called zero times (short-circuit), once (normal flow), or multiple times (retry logic). Example: Model retry middleware
import { createMiddleware } from "langchain";

const createRetryMiddleware = (maxRetries: number = 3) => {
  return createMiddleware({
    name: "RetryMiddleware",
    wrapModelCall: (request, handler) => {
      for (let attempt = 0; attempt < maxRetries; attempt++) {
        try {
          return handler(request);
        } catch (e) {
          if (attempt === maxRetries - 1) {
            throw e;
          }
          console.log(`Retry ${attempt + 1}/${maxRetries} after error: ${e}`);
        }
      }
      throw new Error("Unreachable");
    },
  });
};
Example: Dynamic model selection
import { createMiddleware, initChatModel } from "langchain";

const dynamicModelMiddleware = createMiddleware({
  name: "DynamicModelMiddleware",
  wrapModelCall: (request, handler) => {
    // Use different model based on conversation length
    const modifiedRequest = { ...request };
    if (request.messages.length > 10) {
      modifiedRequest.model = initChatModel("openai:gpt-4o");
    } else {
      modifiedRequest.model = initChatModel("openai:gpt-4o-mini");
    }
    return handler(modifiedRequest);
  },
});
Example: Tool call monitoring
import { createMiddleware } from "langchain";

const toolMonitoringMiddleware = createMiddleware({
  name: "ToolMonitoringMiddleware",
  wrapToolCall: (request, handler) => {
    console.log(`Executing tool: ${request.toolCall.name}`);
    console.log(`Arguments: ${JSON.stringify(request.toolCall.args)}`);

    try {
      const result = handler(request);
      console.log("Tool completed successfully");
      return result;
    } catch (e) {
      console.log(`Tool failed: ${e}`);
      throw e;
    }
  },
});

Custom state schema

Middleware can extend the agent’s state with custom properties. Define a custom state type and set it as the state_schema:
import { createMiddleware, createAgent, HumanMessage } from "langchain";
import * as z from "zod";

// Middleware with custom state requirements
const callCounterMiddleware = createMiddleware({
  name: "CallCounterMiddleware",
  stateSchema: z.object({
    modelCallCount: z.number().default(0),
    userId: z.string().optional(),
  }),
  beforeModel: (state) => {
    // Access custom state properties
    if (state.modelCallCount > 10) {
      return { jumpTo: "end" };
    }
    return;
  },
  afterModel: (state) => {
    // Update custom state
    return { modelCallCount: state.modelCallCount + 1 };
  },
});
const agent = createAgent({
  model: "openai:gpt-4o",
  tools: [...],
  middleware: [callCounterMiddleware] as const,
});

// TypeScript enforces required state properties
const result = await agent.invoke({
  messages: [new HumanMessage("Hello")],
  modelCallCount: 0, // Optional due to default value
  userId: "user-123", // Optional
});

Context extension

Context properties are configuration values passed through the runnable config. Unlike state, context is read-only and typically used for configuration that doesn’t change during execution. Middleware can define context requirements that must be satisfied through the agent’s configuration:
import * as z from "zod";
import { createMiddleware, HumanMessage } from "langchain";

const rateLimitMiddleware = createMiddleware({
  name: "RateLimitMiddleware",
  contextSchema: z.object({
    maxRequestsPerMinute: z.number(),
    apiKey: z.string(),
  }),
  beforeModel: async (state, runtime) => {
    // Access context through runtime
    const { maxRequestsPerMinute, apiKey } = runtime.context;

    // Implement rate limiting logic
    const allowed = await checkRateLimit(apiKey, maxRequestsPerMinute);
    if (!allowed) {
      return { jumpTo: "END" };
    }

    return state;
  },
});

// Context is provided through config
await agent.invoke(
  { messages: [new HumanMessage("Process data")] },
  {
    context: {
      maxRequestsPerMinute: 60,
      apiKey: "api-key-123",
    },
  }
);

Execution order

When using multiple middleware, understanding execution order is important:
const agent = createAgent({
  model: "openai:gpt-4o",
  middleware: [middleware1, middleware2, middleware3],
  tools: [...],
});
Before hooks run in order:
  1. middleware1.before_agent()
  2. middleware2.before_agent()
  3. middleware3.before_agent()
Agent loop starts
  1. middleware1.before_model()
  2. middleware2.before_model()
  3. middleware3.before_model()
Wrap hooks nest like function calls:
  1. middleware1.wrap_model_call()middleware2.wrap_model_call()middleware3.wrap_model_call() → model
After hooks run in reverse order:
  1. middleware3.after_model()
  2. middleware2.after_model()
  3. middleware1.after_model()
Agent loop ends
  1. middleware3.after_agent()
  2. middleware2.after_agent()
  3. middleware1.after_agent()
Key rules:
  • before_* hooks: First to last
  • after_* hooks: Last to first (reverse)
  • wrap_* hooks: Nested (first middleware wraps all others)

Agent jumps

To exit early from middleware, return a dictionary with jump_to:
import { createMiddleware, AIMessage } from "langchain";

const earlyExitMiddleware = createMiddleware({
  name: "EarlyExitMiddleware",
  beforeModel: (state) => {
    // Check some condition
    if (shouldExit(state)) {
      return {
        messages: [new AIMessage("Exiting early due to condition.")],
        jumpTo: "end",
      };
    }
    return;
  },
});
Available jump targets:
  • "end": Jump to the end of the agent execution
  • "tools": Jump to the tools node
  • "model": Jump to the model node (or the first before_model hook)
Important: When jumping from before_model or after_model, jumping to "model" will cause all before_model middleware to run again. To enable jumping, decorate your hook with @hook_config(can_jump_to=[...]):
import { createMiddleware } from "langchain";

const conditionalMiddleware = createMiddleware({
  name: "ConditionalMiddleware",
  afterModel: (state) => {
    if (someCondition(state)) {
      return { jumpTo: "end" };
    }
    return;
  },
});

Best practices

  1. Keep middleware focused - each should do one thing well
  2. Handle errors gracefully - don’t let middleware errors crash the agent
  3. Use appropriate hook types:
    • Node-style for sequential logic (logging, validation)
    • Wrap-style for control flow (retry, fallback, caching)
  4. Clearly document any custom state properties
  5. Unit test middleware independently before integrating
  6. Consider execution order - place critical middleware first in the list
  7. Use built-in middleware when possible, don’t reinvent the wheel :)

Examples

Dynamically selecting tools

Select relevant tools at runtime to improve performance and accuracy.
Benefits:
  • Shorter prompts - Reduce complexity by exposing only relevant tools
  • Better accuracy - Models choose correctly from fewer options
  • Permission control - Dynamically filter tools based on user access
import { createAgent, createMiddleware } from "langchain";

const toolSelectorMiddleware = createMiddleware({
  name: "ToolSelector",
  wrapModelCall: (request, handler) => {
    // Select a small, relevant subset of tools based on state/context
    const relevantTools = selectRelevantTools(request.state, request.runtime);
    const modifiedRequest = { ...request, tools: relevantTools };
    return handler(modifiedRequest);
  },
});

const agent = createAgent({
  model: "openai:gpt-4o",
  tools: allTools, // All available tools need to be registered upfront
  // Middleware can be used to select a smaller subset that's relevant for the given run.
  middleware: [toolSelectorMiddleware],
});

Additional resources


I