| Middleware | Description |
|---|---|
| Prompt caching | Reduce costs by caching repetitive prompt prefixes |
Prompt caching
Reduce costs and latency by caching static or repetitive prompt content (like system prompts, tool definitions, and conversation history) on Anthropicâs servers. This middleware implements a conversational caching strategy that places cache breakpoints after the most recent message, allowing the entire conversation history (including the latest user message) to be cached and reused in subsequent API calls. Prompt caching is useful for the following:- Applications with long, static system prompts that donât change between requests
- Agents with many tool definitions that remain constant across invocations
- Conversations where early message history is reused across multiple turns
- High-volume deployments where reducing API costs and latency is critical
Learn more about Anthropic prompt caching strategies and limitations.
Configuration options
Configuration options
Time to live for cached content. Valid values:
'5m' or '1h'Full example
Full example
The middleware caches content up to and including the latest message in each request. On subsequent requests within the TTL window (5 minutes or 1 hour), previously seen content is retrieved from cache rather than reprocessed, significantly reducing costs and latency.How it works:
- First request: System prompt, tools, and the user message âHi, my name is Bobâ are sent to the API and cached
- Second request: The cached content (system prompt, tools, and first message) is retrieved from cache. Only the new message âWhatâs my name?â needs to be processed, plus the modelâs response from the first request
- This pattern continues for each turn, with each request reusing the cached conversation history