Skip to main content
Build custom middleware by implementing hooks that run at specific points in the agent execution flow.

Hooks

Middleware provides two styles of hooks to intercept agent execution:

Node-style hooks

Run sequentially at specific execution points. Use for logging, validation, and state updates. Available hooks:
  • before_agent - Before agent starts (once per invocation)
  • before_model - Before each model call
  • after_model - After each model response
  • after_agent - After agent completes (once per invocation)
Example:
  • Decorator
  • Class
from langchain.agents.middleware import before_model, after_model, AgentState
from langchain.messages import AIMessage
from langgraph.runtime import Runtime
from typing import Any


@before_model(can_jump_to=["end"])
def check_message_limit(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    if len(state["messages"]) >= 50:
        return {
            "messages": [AIMessage("Conversation limit reached.")],
            "jump_to": "end"
        }
    return None

@after_model
def log_response(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    print(f"Model returned: {state['messages'][-1].content}")
    return None

Wrap-style hooks

Intercept execution and control when the handler is called. Use for retries, caching, and transformation. You decide if the handler is called zero times (short-circuit), once (normal flow), or multiple times (retry logic). Available hooks:
  • wrap_model_call - Around each model call
  • wrap_tool_call - Around each tool call
Example:
  • Decorator
  • Class
from langchain.agents.middleware import wrap_model_call, ModelRequest, ModelResponse
from typing import Callable


@wrap_model_call
def retry_model(
    request: ModelRequest,
    handler: Callable[[ModelRequest], ModelResponse],
) -> ModelResponse:
    for attempt in range(3):
        try:
            return handler(request)
        except Exception as e:
            if attempt == 2:
                raise
            print(f"Retry {attempt + 1}/3 after error: {e}")

Create middleware

You can create middleware in two ways:

Decorator-based middleware

Quick and simple for single-hook middleware. Use decorators to wrap individual functions. Available decorators: Node-style:
  • @[@before_agent] - Runs before agent starts (once per invocation)
  • @before_model - Runs before each model call
  • @after_model - Runs after each model response
  • @[@after_agent] - Runs after agent completes (once per invocation)
Wrap-style: Convenience: Example:
from langchain.agents.middleware import before_model, wrap_model_call
from langchain.agents.middleware import AgentState, ModelRequest, ModelResponse
from langchain.agents import create_agent
from langgraph.runtime import Runtime
from typing import Any, Callable


@before_model
def log_before_model(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    print(f"About to call model with {len(state['messages'])} messages")
    return None

@wrap_model_call
def retry_model(
    request: ModelRequest,
    handler: Callable[[ModelRequest], ModelResponse],
) -> ModelResponse:
    for attempt in range(3):
        try:
            return handler(request)
        except Exception as e:
            if attempt == 2:
                raise
            print(f"Retry {attempt + 1}/3 after error: {e}")

agent = create_agent(
    model="gpt-4o",
    middleware=[log_before_model, retry_model],
    tools=[...],
)
When to use decorators:
  • Single hook needed
  • No complex configuration
  • Quick prototyping

Class-based middleware

More powerful for complex middleware with multiple hooks or configuration. Example:
from langchain.agents.middleware import AgentMiddleware, AgentState, ModelRequest, ModelResponse
from langgraph.runtime import Runtime
from typing import Any, Callable

class LoggingMiddleware(AgentMiddleware):
    def before_model(self, state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
        print(f"About to call model with {len(state['messages'])} messages")
        return None

    def after_model(self, state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
        print(f"Model returned: {state['messages'][-1].content}")
        return None

agent = create_agent(
    model="gpt-4o",
    middleware=[LoggingMiddleware()],
    tools=[...],
)
When to use classes:
  • Multiple hooks needed
  • Complex configuration required
  • Reuse across projects with init-time configuration

Custom state schema

Middleware can extend the agent’s state with custom properties.
  • Decorator
  • Class
from langchain.agents.middleware import AgentState, before_model, after_model
from typing_extensions import NotRequired
from typing import Any
from langgraph.runtime import Runtime

class CustomState(AgentState):
    model_call_count: NotRequired[int]
    user_id: NotRequired[str]

@before_model(state_schema=CustomState, can_jump_to=["end"])
def check_call_limit(state: CustomState, runtime: Runtime) -> dict[str, Any] | None:
    count = state.get("model_call_count", 0)
    if count > 10:
        return {"jump_to": "end"}
    return None

@after_model(state_schema=CustomState)
def increment_counter(state: CustomState, runtime: Runtime) -> dict[str, Any] | None:
    return {"model_call_count": state.get("model_call_count", 0) + 1}

agent = create_agent(
    model="gpt-4o",
    middleware=[check_call_limit, increment_counter],
    tools=[...],
)

# Invoke with custom state
result = agent.invoke({
    "messages": [HumanMessage("Hello")],
    "model_call_count": 0,
    "user_id": "user-123",
})

Execution order

When using multiple middleware, understand how they execute:
agent = create_agent(
    model="gpt-4o",
    middleware=[middleware1, middleware2, middleware3],
    tools=[...],
)
Before hooks run in order:
  1. middleware1.before_agent()
  2. middleware2.before_agent()
  3. middleware3.before_agent()
Agent loop starts
  1. middleware1.before_model()
  2. middleware2.before_model()
  3. middleware3.before_model()
Wrap hooks nest like function calls:
  1. middleware1.wrap_model_call()middleware2.wrap_model_call()middleware3.wrap_model_call() → model
After hooks run in reverse order:
  1. middleware3.after_model()
  2. middleware2.after_model()
  3. middleware1.after_model()
Agent loop ends
  1. middleware3.after_agent()
  2. middleware2.after_agent()
  3. middleware1.after_agent()
Key rules:
  • before_* hooks: First to last
  • after_* hooks: Last to first (reverse)
  • wrap_* hooks: Nested (first middleware wraps all others)

Agent jumps

To exit early from middleware, return a dictionary with jump_to: Available jump targets:
  • 'end': Jump to the end of the agent execution (or the first after_agent hook)
  • 'tools': Jump to the tools node
  • 'model': Jump to the model node (or the first before_model hook)
  • Decorator
  • Class
from langchain.agents.middleware import after_model, hook_config, AgentState
from langchain.messages import AIMessage
from langgraph.runtime import Runtime
from typing import Any


@after_model
@hook_config(can_jump_to=["end"])
def check_for_blocked(state: AgentState, runtime: Runtime) -> dict[str, Any] | None:
    last_message = state["messages"][-1]
    if "BLOCKED" in last_message.content:
        return {
            "messages": [AIMessage("I cannot respond to that request.")],
            "jump_to": "end"
        }
    return None

Best practices

  1. Keep middleware focused - each should do one thing well
  2. Handle errors gracefully - don’t let middleware errors crash the agent
  3. Use appropriate hook types:
    • Node-style for sequential logic (logging, validation)
    • Wrap-style for control flow (retry, fallback, caching)
  4. Clearly document any custom state properties
  5. Unit test middleware independently before integrating
  6. Consider execution order - place critical middleware first in the list
  7. Use built-in middleware when possible

Examples

Dynamic model selection

  • Decorator
  • Class
from langchain.agents.middleware import wrap_model_call, ModelRequest, ModelResponse
from langchain.chat_models import init_chat_model
from typing import Callable


complex_model = init_chat_model("gpt-4o")
simple_model = init_chat_model("gpt-4o-mini")

@wrap_model_call
def dynamic_model(
    request: ModelRequest,
    handler: Callable[[ModelRequest], ModelResponse],
) -> ModelResponse:
    # Use different model based on conversation length
   if len(request.messages) > 10:
       model = complex_model
    else:
       model = simple_model
    return handler(request.override(model=model))

Tool call monitoring

  • Decorator
  • Class
from langchain.agents.middleware import wrap_tool_call
from langchain.tools.tool_node import ToolCallRequest
from langchain_core.messages import ToolMessage
from langgraph.types import Command
from typing import Callable


@wrap_tool_call
def monitor_tool(
    request: ToolCallRequest,
    handler: Callable[[ToolCallRequest], ToolMessage | Command],
) -> ToolMessage | Command:
    print(f"Executing tool: {request.tool_call['name']}")
    print(f"Arguments: {request.tool_call['args']}")
    try:
        result = handler(request)
        print(f"Tool completed successfully")
        return result
    except Exception as e:
        print(f"Tool failed: {e}")
        raise

Dynamically selecting tools

Select relevant tools at runtime to improve performance and accuracy. Benefits:
  • Shorter prompts - Reduce complexity by exposing only relevant tools
  • Better accuracy - Models choose correctly from fewer options
  • Permission control - Dynamically filter tools based on user access
  • Decorator
  • Class
from langchain.agents import create_agent
from langchain.agents.middleware import wrap_model_call, ModelRequest, ModelResponse
from typing import Callable


@wrap_model_call
def select_tools(
    request: ModelRequest,
    handler: Callable[[ModelRequest], ModelResponse],
) -> ModelResponse:
    """Middleware to select relevant tools based on state/context."""
    # Select a small, relevant subset of tools based on state/context
    relevant_tools = select_relevant_tools(request.state, request.runtime)
    return handler(request.override(tools=relevant_tools))

agent = create_agent(
    model="gpt-4o",
    tools=all_tools,  # All available tools need to be registered upfront
    middleware=[select_tools],
)

Additional resources


Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.