UpstashRatelimitHandler. This handler uses ratelimit library of Upstash, which utilizes Upstash Redis.
Upstash Ratelimit works by sending an HTTP request to Upstash Redis everytime the limit method is called. Remaining tokens/requests of the user are checked and updated. Based on the remaining tokens, we can stop the execution of costly operations like invoking an LLM or querying a vector store:
UpstashRatelimitHandler allows you to incorporate the ratelimit logic into your chain in a few minutes.
First, you will need to go to the Upstash Console and create a redis database (see our docs). After creating a database, you will need to set the environment variables:
Ratelimiting per request
Let’s imagine that we want to allow our users to invoke our chain 10 times per minute. Achieving this is as simple as:invoke method instead of passing the handler when defining the chain.
For rate limiting algorithms other than FixedWindow, see upstash-ratelimit docs.
Before executing any steps in our pipeline, ratelimit will check whether the user has passed the request limit. If so, UpstashRatelimitError is raised.
Ratelimiting per token
Another option is to rate limit chain invokations based on:- number of tokens in prompt
- number of tokens in prompt and LLM completion
LLMOutput.
How it works
The handler will get the remaining tokens before calling the LLM. If the remaining tokens is more than 0, LLM will be called. OtherwiseUpstashRatelimitError will be raised.
After LLM is called, token usage information will be used to subtracted from the remaining tokens of the user. No error is raised at this stage of the chain.
Configuration
For the first configuration, simply initialize the handler like this:request_ratelimit and token_ratelimit parameters.
Here is an example with a chain utilizing an LLM:
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.