Run sequentially at specific execution points. Use for logging, validation, and state updates.Choose the hooks your middleware needs. You can choose between node-style hooks and wrap-style hooks.Node-style hooks run at specific execution points:
Hook
When it runs
before_agent
Before agent starts (once per invocation)
before_model
Before each model call
after_model
After each model response
after_agent
After agent completes (once per invocation)
Wrap-style hooks run around each call, giving you control over execution:
Intercept execution and control when the handler is called. Use for retries, caching, and transformation.You decide if the handler is called zero times (short-circuit), once (normal flow), or multiple times (retry logic).Available hooks:
wrap_model_call - Around each model call
wrap_tool_call - Around each tool call
Example:
Decorator
Class
from langchain.agents.middleware import wrap_model_call, ModelRequest, ModelResponsefrom typing import Callable@wrap_model_calldef retry_model( request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse],) -> ModelResponse: for attempt in range(3): try: return handler(request) except Exception as e: if attempt == 2: raise print(f"Retry {attempt + 1}/3 after error: {e}")
from langchain.agents.middleware import AgentMiddleware, ModelRequest, ModelResponsefrom typing import Callableclass RetryMiddleware(AgentMiddleware): def __init__(self, max_retries: int = 3): super().__init__() self.max_retries = max_retries def wrap_model_call( self, request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse], ) -> ModelResponse: for attempt in range(self.max_retries): try: return handler(request) except Exception as e: if attempt == self.max_retries - 1: raise print(f"Retry {attempt + 1}/{self.max_retries} after error: {e}")
Both node-style and wrap-style hooks can update agent state. The mechanism differs:
Node-style hooks (before_agent, before_model, after_model, after_agent): Return a dict directly. The dict is applied to the agent state using the graph’s reducers.
Wrap-style hooks (wrap_model_call, wrap_tool_call): For model calls, return ExtendedModelResponse with a Command to inject state updates alongside the model response. For tool calls, return a Command directly. Use these when you need to track or update state based on logic that runs during the model or tool call, such as summarization trigger points, usage metadata, or custom fields calculated from the request or response.
When multiple middleware layers return ExtendedModelResponse, their commands compose:
Commands are applied through reducers: Each Command becomes a separate state update. For messages, this means they are additive.
Outer wins on conflicts: For non-reducer state fields, commands are applied inner-first, then outer. The outermost middleware’s value takes precedence on conflicting keys.
Retry-safe: If the outer middleware implements logic that can result in multiple calls to handler() again (for example, retry logic), commands from earlier calls are discarded.
More powerful for complex middleware with multiple hooks or configuration. Use classes when you need to define both sync and async implementations for the same hook, or when you want to combine multiple hooks in a single middleware.Example:
If your middleware needs to track state across hooks, middleware can extend the agent’s state with custom properties. This enables middleware to:
Track state across execution: Maintain counters, flags, or other values that persist throughout the agent’s execution lifecycle
Share data between hooks: Pass information from before_model to after_model or between different middleware instances
Implement cross-cutting concerns: Add functionality like rate limiting, usage tracking, user context, or audit logging without modifying the core agent logic
Make conditional decisions: Use accumulated state to determine whether to continue execution, jump to different nodes, or modify behavior dynamically
Dynamically modify the system prompt at runtime to inject context, user-specific instructions, or other information before each model call. This is one of the most common middleware use cases.Use the system_message field on ModelRequest to read and modify the system prompt. It contains a SystemMessage object (even if the agent was created with a string system_prompt).
Select relevant tools at runtime to improve performance and accuracy. This section covers filtering pre-registered tools. For registering tools that are discovered at runtime (e.g., from MCP servers), see Runtime tool registration.Benefits:
Shorter prompts - Reduce complexity by exposing only relevant tools
Better accuracy - Models choose correctly from fewer options
Permission control - Dynamically filter tools based on user access
Decorator
Class
from langchain.agents import create_agentfrom langchain.agents.middleware import wrap_model_call, ModelRequest, ModelResponsefrom typing import Callable@wrap_model_calldef select_tools( request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse],) -> ModelResponse: """Middleware to select relevant tools based on state/context.""" # Select a small, relevant subset of tools based on state/context relevant_tools = select_relevant_tools(request.state, request.runtime) return handler(request.override(tools=relevant_tools))agent = create_agent( model="gpt-5.4", tools=all_tools, # All available tools need to be registered upfront middleware=[select_tools],)
from langchain.agents import create_agentfrom langchain.agents.middleware import AgentMiddleware, ModelRequest, ModelResponsefrom typing import Callableclass ToolSelectorMiddleware(AgentMiddleware): def wrap_model_call( self, request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse], ) -> ModelResponse: """Middleware to select relevant tools based on state/context.""" # Select a small, relevant subset of tools based on state/context relevant_tools = select_relevant_tools(request.state, request.runtime) return handler(request.override(tools=relevant_tools))agent = create_agent( model="gpt-5.4", tools=all_tools, # All available tools need to be registered upfront middleware=[ToolSelectorMiddleware()],)
When working with Anthropic models, use structured content blocks with cache control directives to cache large system prompts:
Decorator
Class
from langchain.agents.middleware import wrap_model_call, ModelRequest, ModelResponsefrom langchain.messages import SystemMessagefrom typing import Callable@wrap_model_calldef add_cached_context( request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse],) -> ModelResponse: # Always work with content blocks new_content = list(request.system_message.content_blocks) + [ { "type": "text", "text": "Here is a large document to analyze:\n\n<document>...</document>", # content up until this point is cached "cache_control": {"type": "ephemeral"} } ] new_system_message = SystemMessage(content=new_content) return handler(request.override(system_message=new_system_message))
from langchain.agents.middleware import AgentMiddleware, ModelRequest, ModelResponsefrom langchain.messages import SystemMessagefrom typing import Callableclass CachedContextMiddleware(AgentMiddleware): def wrap_model_call( self, request: ModelRequest, handler: Callable[[ModelRequest], ModelResponse], ) -> ModelResponse: # Always work with content blocks new_content = list(request.system_message.content_blocks) + [ { "type": "text", "text": "Here is a large document to analyze:\n\n<document>...</document>", "cache_control": {"type": "ephemeral"} # This content will be cached } ] new_system_message = SystemMessage(content=new_content) return handler(request.override(system_message=new_system_message))
Notes:
ModelRequest.system_message is always a SystemMessage object, even if the agent was created with system_prompt="string"
Use SystemMessage.content_blocks to access content as a list of blocks, regardless of whether the original content was a string or list
When modifying system messages, use content_blocks and append new blocks to preserve existing structure
You can pass SystemMessage objects directly to create_agent’s system_prompt parameter for advanced use cases like cache control