Run sequentially at specific execution points. Use for logging, validation, and state updates.Choose the hooks your middleware needs. You can choose between node-style hooks and wrap-style hooks.Node-style hooks run at specific execution points:
Hook
When it runs
beforeAgent
Before agent starts (once per invocation)
beforeModel
Before each model call
afterModel
After each model response
afterAgent
After agent completes (once per invocation)
Wrap-style hooks run around each call, giving you control over execution:
Intercept execution and control when the handler is called. Use for retries, caching, and transformation.You decide if the handler is called zero times (short-circuit), once (normal flow), or multiple times (retry logic).Available hooks:
Both node-style and wrap-style hooks can update agent state. The mechanism differs:
Node-style hooks (beforeAgent, beforeModel, afterModel, afterAgent): Return a dict directly. The dict is applied to the agent state using the graph’s reducers.
Wrap-style hooks (wrapModelCall, wrapToolCall): For model calls, return a Command directly to inject state updates alongside the model response. For tool calls, return a Command directly. Use these when you need to track or update state based on logic that runs during the model or tool call, such as summarization trigger points, usage metadata, or custom fields calculated from the request or response.
When multiple middleware layers return responses, the framework passes on the last AIMessages produced:
AIMessage flows through: Each middleware’s handler() receives the AIMessage from the previous layer. When a middleware returns an AIMessage, that becomes the input to the next middleware’s handler.
Command without message updates is pass-through: If a middleware returns a Command whose state update does not touch messages, the framework treats it as a no-op for message flow. The next middleware’s handler receives the AIMessage from the middleware before the one that returned the Command.
Reducer behavior and retry-safety: Commands still apply through reducers (messages additive, outer wins on conflicts). Retry logic discards commands from earlier calls.
import * as z from "zod";import { createMiddleware } from "langchain";import { Command, StateSchema, ReducedValue } from "@langchain/langgraph";import { AIMessage, SystemMessage } from "@langchain/core/messages";/** Last-wins reducer: when both middleware write, outer overwrites inner. */const customMiddlewareStateSchema = new StateSchema({ traceLayer: new ReducedValue( z.string().optional(), { reducer: (a, b) => b }, ),});const outerMiddleware = createMiddleware({ name: "OuterMiddleware", stateSchema: customMiddlewareStateSchema, wrapModelCall: async (_request, handler) => { await handler(_request); return new Command({ update: { traceLayer: "outer", messages: [new SystemMessage({ content: "[Outer ran]" })], }, }); },});const innerMiddleware = createMiddleware({ name: "InnerMiddleware", stateSchema: customMiddlewareStateSchema, wrapModelCall: async (_request, handler) => { await handler(_request); return new Command({ update: { traceLayer: "inner", messages: [new SystemMessage({ content: "[Inner ran]" })], }, }); },});
If your middleware needs to track state across hooks, middleware can extend the agent’s state with custom properties. This enables middleware to:
Track state across execution: Maintain counters, flags, or other values that persist throughout the agent’s execution lifecycle
Share data between hooks: Pass information from beforeModel to afterModel or between different middleware instances
Implement cross-cutting concerns: Add functionality like rate limiting, usage tracking, user context, or audit logging without modifying the core agent logic
Make conditional decisions: Use accumulated state to determine whether to continue execution, jump to different nodes, or modify behavior dynamically
State fields can be either public or private. Fields that start with an underscore (_) are considered private and will not be included in the agent’s result. Only public fields (those without a leading underscore) are returned.This is useful for storing internal middleware state that shouldn’t be exposed to the caller, such as temporary tracking variables or internal flags:
import { StateSchema } from "@langchain/langgraph";import * as z from "zod";const PrivateState = new StateSchema({ // Public field - included in invoke result publicCounter: z.number().default(0), // Private field - excluded from invoke result _internalFlag: z.boolean().default(false),});const middleware = createMiddleware({ name: "ExampleMiddleware", stateSchema: PrivateState, afterModel: (state) => { // Both fields are accessible during execution if (state._internalFlag) { return { publicCounter: state.publicCounter + 1 }; } return { _internalFlag: true }; },});const result = await agent.invoke({ messages: [new HumanMessage("Hello")], publicCounter: 0});// result only contains publicCounter, not _internalFlagconsole.log(result.publicCounter); // 1console.log(result._internalFlag); // undefined
Middleware can define a custom context schema to access per-invocation metadata. Unlike state, context is read-only and not persisted between invocations. This makes it ideal for:
User information: Pass user ID, roles, or preferences that don’t change during execution
Configuration overrides: Provide per-invocation settings like rate limits or feature flags
Tenant/workspace context: Include organization-specific data for multi-tenant applications
Request metadata: Pass request IDs, API keys, or other metadata needed by middleware
Define a context schema using Zod and access it via runtime.context in middleware hooks. Required fields in the context schema will be enforced at the TypeScript level, ensuring you must provide them when calling agent.invoke().
import { createAgent, createMiddleware, HumanMessage } from "langchain";import * as z from "zod";const contextSchema = z.object({ userId: z.string(), tenantId: z.string(), apiKey: z.string().optional(),});const userContextMiddleware = createMiddleware({ name: "UserContextMiddleware", contextSchema, wrapModelCall: (request, handler) => { // Access context from runtime const { userId, tenantId } = request.runtime.context; // Add user context to system message const contextText = `User ID: ${userId}, Tenant: ${tenantId}`; const newSystemMessage = request.systemMessage.concat(contextText); return handler({ ...request, systemMessage: newSystemMessage, }); },});const agent = createAgent({ model: "gpt-5.4", middleware: [userContextMiddleware], tools: [], contextSchema,});const result = await agent.invoke( { messages: [new HumanMessage("Hello")] }, // Required fields (userId, tenantId) must be provided { context: { userId: "user-123", tenantId: "acme-corp", }, });
Required context fields: When you define required fields in your contextSchema (fields without .optional() or .default()), TypeScript will enforce that these fields must be provided during agent.invoke() calls. This ensures type safety and prevents runtime errors from missing required context.
// This will cause a TypeScript error if userId or tenantId are missingconst result = await agent.invoke( { messages: [new HumanMessage("Hello")] }, { context: { userId: "user-123" } } // Error: tenantId is required);
Dynamically modify the system prompt at runtime to inject context, user-specific instructions, or other information before each model call. This is one of the most common middleware use cases.Use the systemMessage field in ModelRequest to read and modify the system prompt. It contains a SystemMessage object (even if the agent was created with a string systemPrompt).
Select relevant tools at runtime to improve performance and accuracy. This section covers filtering pre-registered tools. For registering tools that are discovered at runtime (e.g., from MCP servers), see Runtime tool registration.Benefits:
Shorter prompts - Reduce complexity by exposing only relevant tools
Better accuracy - Models choose correctly from fewer options
Permission control - Dynamically filter tools based on user access
When working with Anthropic models, use structured content blocks with cache control directives to cache large system prompts:
Decorator
Class
from langchain.agents.middleware import wrap_model_call, ModelRequest, ModelResponsefrom langchain.messages import SystemMessagefrom typing import Callable@wrap_model_calldef add_cached_context( request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse],) -> ModelResponse: # Always work with content blocks new_content = list(request.system_message.content_blocks) + [ { "type": "text", "text": "Here is a large document to analyze:\n\n<document>...</document>", # content up until this point is cached "cache_control": {"type": "ephemeral"} } ] new_system_message = SystemMessage(content=new_content) return handler(request.override(system_message=new_system_message))
from langchain.agents.middleware import AgentMiddleware, ModelRequest, ModelResponsefrom langchain.messages import SystemMessagefrom typing import Callableclass CachedContextMiddleware(AgentMiddleware): def wrap_model_call( self, request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse], ) -> ModelResponse: # Always work with content blocks new_content = list(request.system_message.content_blocks) + [ { "type": "text", "text": "Here is a large document to analyze:\n\n<document>...</document>", "cache_control": {"type": "ephemeral"} # This content will be cached } ] new_system_message = SystemMessage(content=new_content) return handler(request.override(system_message=new_system_message))
Notes:
ModelRequest.system_message is always a SystemMessage object, even if the agent was created with system_prompt="string"
Use SystemMessage.content_blocks to access content as a list of blocks, regardless of whether the original content was a string or list
When modifying system messages, use content_blocks and append new blocks to preserve existing structure
You can pass SystemMessage objects directly to create_agent’s system_prompt parameter for advanced use cases like cache control
:::Modify system messages in middleware using the systemMessage field in ModelRequest. It contains a SystemMessage object (even if the agent was created with a string systemPrompt).Example: Chaining middleware - Different middleware can use different approaches: