Skip to main content
LangChain provides prebuilt middleware for common use cases. Each middleware is production-ready and configurable for your specific needs.

Provider-agnostic middleware

The following middleware work with any LLM provider:
MiddlewareDescription
SummarizationAutomatically summarize conversation history when approaching token limits.
Human-in-the-loopPause execution for human approval of tool calls.
Model call limitLimit the number of model calls to prevent excessive costs.
Tool call limitControl tool execution by limiting call counts.
Model fallbackAutomatically fallback to alternative models when primary fails.
PII detectionDetect and handle Personally Identifiable Information (PII).
To-do listEquip agents with task planning and tracking capabilities.
LLM tool selectorUse an LLM to select relevant tools before calling main model.
Tool retryAutomatically retry failed tool calls with exponential backoff.
LLM tool emulatorEmulate tool execution using anLLM for testing purposes.
Context editingManage conversation context by trimming or clearing tool uses.

Summarization

Automatically summarize conversation history when approaching token limits, preserving recent messages while compressing older context. Summarization is useful for the following:
  • Long-running conversations that exceed context windows.
  • Multi-turn dialogues with extensive history.
  • Applications where preserving full conversation context matters.
import { createAgent, summarizationMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [weatherTool, calculatorTool],
  middleware: [
    summarizationMiddleware({
      model: "gpt-4o-mini",
      trigger: { tokens: 4000 },
      keep: { messages: 20 },
    }),
  ],
});
model
string | BaseChatModel
required
Model for generating summaries. Can be a model identifier string (e.g., 'openai:gpt-4o-mini') or a BaseChatModel instance.
trigger
object | object[]
Conditions for triggering summarization. Can be:
  • A single condition object (all properties must be met - AND logic)
  • An array of condition objects (any condition must be met - OR logic)
Each condition can include:
  • fraction (number): Fraction of model’s context size (0-1)
  • tokens (number): Absolute token count
  • messages (number): Message count
At least one property must be specified per condition. If not provided, summarization will not trigger automatically.
keep
object
default:"{messages: 20}"
How much context to preserve after summarization. Specify exactly one of:
  • fraction (number): Fraction of model’s context size to keep (0-1)
  • tokens (number): Absolute token count to keep
  • messages (number): Number of recent messages to keep
tokenCounter
function
Custom token counting function. Defaults to character-based counting.
summaryPrompt
string
Custom prompt template for summarization. Uses built-in template if not specified. The template should include {messages} placeholder where conversation history will be inserted.
trimTokensToSummarize
number
default:"4000"
Maximum number of tokens to include when generating the summary. Messages will be trimmed to fit this limit before summarization.
summaryPrefix
string
Prefix to add to the summary message. If not provided, a default prefix is used.
maxTokensBeforeSummary
number
deprecated
Deprecated: Use trigger: { tokens: value } instead. Token threshold for triggering summarization.
messagesToKeep
number
deprecated
Deprecated: Use keep: { messages: value } instead. Recent messages to preserve.
The summarization middleware monitors message token counts and automatically summarizes older messages when thresholds are reached.Trigger conditions control when summarization runs:
  • Single condition object (all properties must be met - AND logic)
  • Array of conditions (any condition must be met - OR logic)
  • Each condition can use fraction (of model’s context size), tokens (absolute count), or messages (message count)
Keep conditions control how much context to preserve (specify exactly one):
  • fraction - Fraction of model’s context size to keep
  • tokens - Absolute token count to keep
  • messages - Number of recent messages to keep
import { createAgent, summarizationMiddleware } from "langchain";

// Single condition
const agent = createAgent({
  model: "gpt-4o",
  tools: [weatherTool, calculatorTool],
  middleware: [
    summarizationMiddleware({
      model: "gpt-4o-mini",
      trigger: { tokens: 4000, messages: 10 },
      keep: { messages: 20 },
    }),
  ],
});

// Multiple conditions
const agent2 = createAgent({
  model: "gpt-4o",
  tools: [weatherTool, calculatorTool],
  middleware: [
    summarizationMiddleware({
      model: "gpt-4o-mini",
      trigger: [
        { tokens: 5000, messages: 3 },
        { tokens: 3000, messages: 6 },
      ],
      keep: { messages: 20 },
    }),
  ],
});

// Using fractional limits
const agent3 = createAgent({
  model: "gpt-4o",
  tools: [weatherTool, calculatorTool],
  middleware: [
    summarizationMiddleware({
      model: "gpt-4o-mini",
      trigger: { fraction: 0.8 },
      keep: { fraction: 0.3 },
    }),
  ],
});

Human-in-the-loop

Pause agent execution for human approval, editing, or rejection of tool calls before they execute. Human-in-the-loop is useful for the following:
  • High-stakes operations requiring human approval (e.g. database writes, financial transactions).
  • Compliance workflows where human oversight is mandatory.
  • Long-running conversations where human feedback guides the agent.
Human-in-the-loop middleware requires a checkpointer to maintain state across interruptions.
import { createAgent, humanInTheLoopMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [readEmailTool, sendEmailTool],
  middleware: [
    humanInTheLoopMiddleware({
      interruptOn: {
        send_email: {
          allowAccept: true,
          allowEdit: true,
          allowRespond: true,
        },
        read_email: false,
      }
    })
  ]
});
For complete examples, configuration options, and integration patterns, see the Human-in-the-loop documentation.

Model call limit

Limit the number of model calls to prevent infinite loops or excessive costs. Model call limit is useful for the following:
  • Preventing runaway agents from making too many API calls.
  • Enforcing cost controls on production deployments.
  • Testing agent behavior within specific call budgets.
import { createAgent, modelCallLimitMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [...],
  middleware: [
    modelCallLimitMiddleware({
      threadLimit: 10,
      runLimit: 5,
      exitBehavior: "end",
    }),
  ],
});
threadLimit
number
Maximum model calls across all runs in a thread. Defaults to no limit.
runLimit
number
Maximum model calls per single invocation. Defaults to no limit.
exitBehavior
string
default:"end"
Behavior when limit is reached. Options: 'end' (graceful termination) or 'error' (throw exception)
The middleware tracks model calls across two scopes:
  • Thread limit - Max calls across all runs in a conversation thread (requires checkpointer)
  • Run limit - Max calls per single invocation (resets each turn)
Exit behaviors:
  • 'end' - Graceful termination (default)
  • 'error' - Raise/throw exception
import { createAgent, modelCallLimitMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [searchTool, calculatorTool],
  middleware: [
    modelCallLimitMiddleware({
      threadLimit: 10,
      runLimit: 5,
      exitBehavior: "end",
    }),
  ],
});

Tool call limit

Control agent execution by limiting the number of tool calls, either globally across all tools or for specific tools. Tool call limits are useful for the following:
  • Preventing excessive calls to expensive external APIs.
  • Limiting web searches or database queries.
  • Enforcing rate limits on specific tool usage.
  • Protecting against runaway agent loops.
import { createAgent, toolCallLimitMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [searchTool, databaseTool],
  middleware: [
    toolCallLimitMiddleware({ threadLimit: 20, runLimit: 10 }),
    toolCallLimitMiddleware({
      toolName: "search",
      threadLimit: 5,
      runLimit: 3,
    }),
  ],
});
toolName
string
Name of specific tool to limit. If not provided, limits apply to all tools globally.
threadLimit
number
Maximum tool calls across all runs in a thread (conversation). Persists across multiple invocations with the same thread ID. Requires a checkpointer to maintain state. undefined means no thread limit.
runLimit
number
Maximum tool calls per single invocation (one user message → response cycle). Resets with each new user message. undefined means no run limit.Note: At least one of threadLimit or runLimit must be specified.
exitBehavior
string
default:"continue"
Behavior when limit is reached:
  • 'continue' (default) - Block exceeded tool calls with error messages, let other tools and the model continue. The model decides when to end based on the error messages.
  • 'error' - Throw a ToolCallLimitExceededError exception, stopping execution immediately
  • 'end' - Stop execution immediately with a ToolMessage and AI message for the exceeded tool call. Only works when limiting a single tool; throws error if other tools have pending calls.
Specify limits with:
  • Thread limit - Max calls across all runs in a conversation (requires checkpointer)
  • Run limit - Max calls per single invocation (resets each turn)
Exit behaviors:
  • 'continue' (default) - Block exceeded calls with error messages, agent continues
  • 'error' - Raise exception immediately
  • 'end' - Stop with ToolMessage + AI message (single-tool scenarios only)
import { createAgent, toolCallLimitMiddleware } from "langchain";

const globalLimiter = toolCallLimitMiddleware({ threadLimit: 20, runLimit: 10 });
const searchLimiter = toolCallLimitMiddleware({ toolName: "search", threadLimit: 5, runLimit: 3 });
const databaseLimiter = toolCallLimitMiddleware({ toolName: "query_database", threadLimit: 10 });
const strictLimiter = toolCallLimitMiddleware({ toolName: "scrape_webpage", runLimit: 2, exitBehavior: "error" });

const agent = createAgent({
  model: "gpt-4o",
  tools: [searchTool, databaseTool, scraperTool],
  middleware: [globalLimiter, searchLimiter, databaseLimiter, strictLimiter],
});

Model fallback

Automatically fallback to alternative models when the primary model fails. Model fallback is useful for the following:
  • Building resilient agents that handle model outages.
  • Cost optimization by falling back to cheaper models.
  • Provider redundancy across OpenAI, Anthropic, etc.
import { createAgent, modelFallbackMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [...],
  middleware: [
    modelFallbackMiddleware(
      "gpt-4o-mini",
      "claude-3-5-sonnet-20241022"
    ),
  ],
});
The middleware accepts a variable number of string arguments representing fallback models in order:
...models
string[]
required
One or more fallback model strings to try in order when the primary model fails
modelFallbackMiddleware(
  "first-fallback-model",
  "second-fallback-model",
  // ... more models
)
The middleware tries fallback models in order when the primary model fails.
import { createAgent, modelFallbackMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [searchTool, calculatorTool],
  middleware: [
    modelFallbackMiddleware(
      "gpt-4o-mini",
      "claude-3-5-sonnet-20241022",
      "claude-3-haiku-20240307"
    ),
  ],
});

PII detection

Detect and handle Personally Identifiable Information (PII) in conversations using configurable strategies. PII detection is useful for the following:
  • Healthcare and financial applications with compliance requirements.
  • Customer service agents that need to sanitize logs.
  • Any application handling sensitive user data.
import { createAgent, piiRedactionMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [...],
  middleware: [
    piiRedactionMiddleware({
      piiType: "email",
      strategy: "redact",
      applyToInput: true,
    }),
    piiRedactionMiddleware({
      piiType: "credit_card",
      strategy: "mask",
      applyToInput: true,
    }),
  ],
});
piiType
string
required
Type of PII to detect. Can be a built-in type (email, credit_card, ip, mac_address, url) or a custom type name.
strategy
string
default:"redact"
How to handle detected PII. Options:
  • 'block' - Throw error when detected
  • 'redact' - Replace with [REDACTED_TYPE]
  • 'mask' - Partially mask (e.g., ****-****-****-1234)
  • 'hash' - Replace with deterministic hash
detector
RegExp
Custom detector regex pattern. If not provided, uses built-in detector for the PII type.
applyToInput
boolean
default:"true"
Check user messages before model call
applyToOutput
boolean
default:"false"
Check AI messages after model call
applyToToolResults
boolean
default:"false"
Check tool result messages after execution
The middleware supports detecting built-in PII types (email, credit_card, ip, mac_address, url) or custom types with regex patterns.Detection strategies:
  • 'block' - Raise exception when detected
  • 'redact' - Replace with [REDACTED_TYPE]
  • 'mask' - Partially mask (e.g., ****-****-****-1234)
  • 'hash' - Replace with deterministic hash
Application scope:
  • apply_to_input - Check user messages before model call
  • apply_to_output - Check AI messages after model call
  • apply_to_tool_results - Check tool result messages after execution
import { createAgent, piiRedactionMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [databaseTool, emailTool],
  middleware: [
    piiRedactionMiddleware({ piiType: "email", strategy: "redact", applyToInput: true }),
    piiRedactionMiddleware({ piiType: "credit_card", strategy: "mask", applyToInput: true }),
    piiRedactionMiddleware({ piiType: "api_key", detector: /sk-[a-zA-Z0-9]{32}/, strategy: "block" }),
    piiRedactionMiddleware({ piiType: "ssn", detector: /\d{3}-\d{2}-\d{4}/, strategy: "hash", applyToToolResults: true }),
  ],
});

To-do list

Equip agents with task planning and tracking capabilities for complex multi-step tasks. To-do lists are useful for the following:
  • Complex multi-step tasks requiring coordination across multiple tools.
  • Long-running operations where progress visibility is important.
This middleware automatically provides agents with a write_todos tool and system prompts to guide effective task planning.
import { createAgent, todoListMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [readFile, writeFile, runTests],
  middleware: [todoListMiddleware()] as const,
});
No configuration options available (uses defaults).
Just as humans are more effective when they write down and track tasks, agents benefit from structured task management to break down complex problems.
import { createAgent, HumanMessage, todoListMiddleware, tool } from "langchain";
import * as z from "zod";

const readFile = tool(
  async ({ filePath }) => "file contents",
  {
    name: "read_file",
    description: "Read contents of a file",
    schema: z.object({ filePath: z.string() }),
  }
);

const agent = createAgent({
  model: "gpt-4o",
  tools: [readFile],
  middleware: [todoListMiddleware()] as const,
});

const result = await agent.invoke({
  messages: [new HumanMessage("Refactor the authentication module")],
});

console.log(result.todos);

LLM tool selector

Use an LLM to intelligently select relevant tools before calling the main model. LLM tool selectors are useful for the following:
  • Agents with many tools (10+) where most aren’t relevant per query.
  • Reducing token usage by filtering irrelevant tools.
  • Improving model focus and accuracy.
import { createAgent, llmToolSelectorMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [tool1, tool2, tool3, tool4, tool5, ...],
  middleware: [
    llmToolSelectorMiddleware({
      model: "gpt-4o-mini",
      maxTools: 3,
      alwaysInclude: ["search"],
    }),
  ],
});
model
string | BaseChatModel
Model for tool selection. Can be a model identifier string (e.g., 'openai:gpt-4o-mini') or a BaseChatModel instance. Defaults to the agent’s main model.
maxTools
number
Maximum number of tools to select. Defaults to no limit.
alwaysInclude
string[]
Array of tool names to always include in the selection
The middleware uses a (typically cheaper) LLM to analyze the user’s query and select the most relevant subset of tools.Benefits:
  • Shorter prompts - Reduce complexity by exposing only relevant tools
  • Better accuracy - Models choose correctly from fewer options
  • Cost savings - Use cheaper model for selection
import { createAgent, llmToolSelectorMiddleware } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [searchWeb, queryDatabase, sendEmail, getWeather, ...],
  middleware: [
    llmToolSelectorMiddleware({
      model: "gpt-4o-mini",
      maxTools: 3,
      alwaysInclude: ["search_web"],
    }),
  ],
});

Tool retry

Automatically retry failed tool calls with configurable exponential backoff. Tool retry is useful for the following:
  • Handling transient failures in external API calls.
  • Improving reliability of network-dependent tools.
  • Building resilient agents that gracefully handle temporary errors.
This middleware is only available in Python. For JavaScript/TypeScript, consider implementing retry logic in your tool definitions or using a wrap-style middleware.
max_retries
number
default:"2"
Maximum number of retry attempts after the initial call (3 total attempts with default)
tools
list[BaseTool | str]
Optional list of tools or tool names to apply retry logic to. If None, applies to all tools.
retry_on
tuple[type[Exception], ...] | callable
default:"(Exception,)"
Either a tuple of exception types to retry on, or a callable that takes an exception and returns True if it should be retried.
on_failure
string | callable
default:"return_message"
Behavior when all retries are exhausted. Options:
  • 'return_message' - Return a ToolMessage with error details (allows LLM to handle failure)
  • 'raise' - Re-raise the exception (stops agent execution)
  • Custom callable - Function that takes the exception and returns a string for the ToolMessage content
backoff_factor
number
default:"2.0"
Multiplier for exponential backoff. Each retry waits initial_delay * (backoff_factor ** retry_number) seconds. Set to 0.0 for constant delay.
initial_delay
number
default:"1.0"
Initial delay in seconds before first retry
max_delay
number
default:"60.0"
Maximum delay in seconds between retries (caps exponential backoff growth)
jitter
boolean
default:"true"
Whether to add random jitter (±25%) to delay to avoid thundering herd
The middleware automatically retries failed tool calls with exponential backoff.Key configuration:
  • max_retries - Number of retry attempts (default: 2)
  • backoff_factor - Multiplier for exponential backoff (default: 2.0)
  • initial_delay - Starting delay in seconds (default: 1.0)
  • max_delay - Cap on delay growth (default: 60.0)
  • jitter - Add random variation (default: True)
Failure handling:
  • on_failure='return_message' - Return error message
  • on_failure='raise' - Re-raise exception
  • Custom callable - Function returning error message

LLM tool emulator

Emulate tool execution using an LLM for testing purposes, replacing actual tool calls with AI-generated responses. LLM tool emulators are useful for the following:
  • Testing agent behavior without executing real tools.
  • Developing agents when external tools are unavailable or expensive.
  • Prototyping agent workflows before implementing actual tools.
This middleware is only available in Python. For JavaScript/TypeScript, consider creating mock tool implementations for testing.

Context editing

Manage conversation context by trimming, summarizing, or clearing tool uses. Context editing is useful for the following:
  • Long conversations that need periodic context cleanup.
  • Removing failed tool attempts from context.
  • Custom context management strategies.
import { createAgent, contextEditingMiddleware, ClearToolUsesEdit } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [...],
  middleware: [
    contextEditingMiddleware({
      edits: [
        new ClearToolUsesEdit({
          triggerTokens: 100000,
          keep: 3,
        }),
      ],
    }),
  ],
});
edits
ContextEdit[]
default:"[new ClearToolUsesEdit()]"
Array of ContextEdit strategies to apply
ClearToolUsesEdit options:
triggerTokens
number
default:"100000"
Token count that triggers the edit. When the conversation exceeds this token count, older tool outputs will be cleared.
clearAtLeast
number
default:"0"
Minimum number of tokens to reclaim when the edit runs. If set to 0, clears as much as needed.
keep
number
default:"3"
Number of most recent tool results that must be preserved. These will never be cleared.
clearToolInputs
boolean
default:"false"
Whether to clear the originating tool call parameters on the AI message. When true, tool call arguments are replaced with empty objects.
excludeTools
string[]
default:"[]"
List of tool names to exclude from clearing. These tools will never have their outputs cleared.
placeholder
string
default:"[cleared]"
Placeholder text inserted for cleared tool outputs. This replaces the original tool message content.
The middleware applies context editing strategies when token limits are reached. The most common strategy is ClearToolUsesEdit, which clears older tool results while preserving recent ones.How it works:
  1. Monitor token count in conversation
  2. When threshold is reached, clear older tool outputs
  3. Keep most recent N tool results
  4. Optionally preserve tool call arguments for context
import { createAgent, contextEditingMiddleware, ClearToolUsesEdit } from "langchain";

const agent = createAgent({
  model: "gpt-4o",
  tools: [searchTool, calculatorTool, databaseTool],
  middleware: [
    contextEditingMiddleware({
      edits: [
        new ClearToolUsesEdit({
          triggerTokens: 2000,
          keep: 3,
          clearToolInputs: false,
          excludeTools: [],
          placeholder: "[cleared]",
        }),
      ],
    }),
  ],
});

Provider-specific middleware

These middleware are optimized for specific LLM providers.

Anthropic

Middleware specifically designed for Anthropic’s Claude models.
MiddlewareDescription
Prompt cachingReduce costs by caching repetitive prompt prefixes

Anthropic prompt caching

Reduce costs by caching repetitive prompt prefixes with Anthropic models. Prompt caching is useful for the following:
  • Applications with long, repeated system prompts.
  • Agents that reuse the same context across invocations.
  • Reducing API costs for high-volume deployments.
Learn more about Anthropic prompt caching strategies and limitations.
import { createAgent, anthropicPromptCachingMiddleware } from "langchain";

const agent = createAgent({
  model: "claude-sonnet-4-5-20250929",
  prompt: "<Your long system prompt here>",
  middleware: [anthropicPromptCachingMiddleware({ ttl: "5m" })],
});
ttl
string
default:"5m"
Time to live for cached content. Valid values: '5m' or '1h'
import { createAgent, HumanMessage, anthropicPromptCachingMiddleware } from "langchain";

const LONG_PROMPT = `
Please be a helpful assistant.

<Lots more context ...>
`;

const agent = createAgent({
  model: "claude-sonnet-4-5-20250929",
  prompt: LONG_PROMPT,
  middleware: [anthropicPromptCachingMiddleware({ ttl: "5m" })],
});

// cache store
await agent.invoke({
  messages: [new HumanMessage("Hi, my name is Bob")]
});

// cache hit
const result = await agent.invoke({
  messages: [new HumanMessage("What's my name?")]
});

OpenAI

Middleware specifically designed for OpenAI models.
Coming soon! Check back for OpenAI-specific middleware optimizations.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.