| Middleware | Description |
|---|---|
| Prompt caching | Reduce costs by caching repetitive prompt prefixes |
Prompt caching
Reduce inference latency and input token costs by caching frequently reused prompt prefixes on Amazon Bedrock. This middleware automatically places cache checkpoints after the system prompt, tool definitions, and the most recent message so that the model can skip recomputation of previously seen content on subsequent requests. Prompt caching is useful for the following:- Multi-turn conversations with long, consistent system prompts
- Agents with many tool definitions that remain constant across invocations
- Document-based Q&A where users ask multiple questions over the same uploaded context
- Batch processing workloads with repeated static content
- Anthropic Claude
- Amazon Nova
Learn more about AWS Bedrock prompt caching strategies and limitations. Cached content must exceed 1,024 tokens for a cache checkpoint to take effect, sometimes more depending on model. See supported models, regions, and limits.
BedrockPromptCachingMiddleware
ChatBedrockConverse
ChatBedrock
Configuration options
Configuration options
Cache type. For
ChatBedrock, only 'ephemeral' is currently supported. For ChatBedrockConverse, this value is ignored as the Converse API always uses "default" cache type.Time to live for cached content. Valid values:
'5m' or '1h'. Note that Amazon Nova models only support '5m'.Minimum number of messages before caching starts.
Behavior when using unsupported models. Options:
'ignore', 'warn', or 'raise'.Full example
Full example
The middleware caches content up to and including the latest message in each request. On subsequent requests within the TTL window (5 minutes or 1 hour), previously seen content is retrieved from cache rather than reprocessed, reducing costs and latency.How it works:
- First request: System prompt, tools, and the user message are sent to the API and cached
- Second request: The cached content is retrieved from cache. Only the new message needs to be processed
- This pattern continues for each turn, with each request reusing the cached conversation history
Prompt caching reduces API costs by caching tokens, but does not provide conversation memory. To persist conversation history across invocations, use a checkpointer like
MemorySaver.Model-specific behavior
The middleware handles differences between APIs and model families automatically:| Feature | ChatBedrockConverse (Anthropic) | ChatBedrockConverse (Nova) | ChatBedrock (Anthropic) |
|---|---|---|---|
| System prompt caching | ✅ | ✅ | ✅ |
| Tool definition caching | ✅ | ❌ | ✅ |
| Message caching | ✅ | ✅ (excludes tool result messages) | ✅ |
Extended TTL (1h) | ✅ | ❌ | ✅ |
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

