Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.langchain.com/llms.txt

Use this file to discover all available pages before exploring further.

Private beta: The LLM Gateway is in private beta. Sign up for the waitlist to get access.
A spend policy defines a cost cap for a specific scope (organization, workspace, API key, or user) over a time window (monthly, weekly, daily, or hourly). The LLM Gateway tracks spend in real time and blocks any request that would push spend past the cap, returning a 402 response:
API Error: 402 request blocked by gateway policies: R&D Spend Cap
The blocked request is traced to LangSmith with the policy violation recorded as metadata, so you can see exactly what was blocked and why.

Policy dimensions

Spend policies are evaluated from broadest to most specific. All matching policies are checked, and if any one returns a block, the request is rejected. You can set a policy as a default (applying a blanket spend cap to all workspaces, users, or API keys) or as a granular policy (individual limits or limits on a group of entities).
ScopeWhat it capsExample
OrganizationTotal spend across all workspaces in the org”The entire org cannot spend more than $10,000/month on LLM calls”
WorkspaceTotal spend within a single workspace or group of workspaces”The workspaces related to R&D cannot spend more than $2,000/month”
API keySpend by a single API key or group of API keys (maps to a service or agent)“The customer support agent keys cannot spend more than $500/month cumulatively”
UserSpend by a single user or group of users (resolved from the API key’s identity)“No individual developer can spend more than $50/day”

Conflict resolution

By default, LLM Gateway assesses the broadest scope first. If a granular policy applies, the most restrictive policy wins. Narrower scopes can only tighten limits, never loosen them. If an org-level policy caps spend at $10,000/month and a workspace-level policy caps at $15,000/month, the $10,000 org cap still applies.

Defaults vs. granular policies

Spend policies have two aspects:
  1. Sums across a dimension: the total cap for that scope. Example: “This workspace’s total spend cannot exceed $5,000/month.”
  2. Defaults for each member of a dimension: a base limit that applies to every API key or user within a scope unless overridden. Example: “Each API key in this workspace gets a $200/month default cap.” Individual API keys can receive additional policies that raise their specific limit, but no policy can loosen a cap set at a broader scope.

Time windows

WindowResetsUse case
MonthlyFirst of each monthBudget alignment, overall cost control
WeeklyMidnight UTC on the Monday of each weekweekly budgeting
DailyMidnight UTCPrevent single-day cost spikes (for example, a coding agent in a retry loop overnight)
HourlyTop of each hourCatch runaway agents quickly
You can apply multiple time windows to the same scope. For example, a workspace can have both a $5,000/month cap and a $500/day cap. Both are enforced independently.

Create a spend policy

Creating and managing policies requires organization:manage permission. For the full permissions breakdown, refer to Traces, Engine, and access control.
  1. Go to Settings → Gateway → LLM Gateway.
  2. Click Create policy.
  3. Select the scope (organization, workspace, API key, or user).
  4. Set the time window (monthly, weekly, daily, or hourly).
  5. Set the spend cap in USD.
  6. Save.
Policies take effect immediately. The gateway evaluates them on every incoming request with sub-second enforcement latency.

View spend

The spend visibility dashboard shows real-time cost rollups so you can understand where your LLM budget is going before you reach the limit. From the gateway settings page, you can view how much each policy has spent against its cap.

Integration with LangSmith Engine

When a spend policy blocks a request, the violation is recorded as metadata on the trace. These violations surface as issues in LangSmith Engine, where you can click through from the issue to the trace to understand what the agent was doing when it hit the limit. This is useful for diagnosing whether a blocked request represents a genuine cost problem (a coding agent in a retry loop) or a policy that needs adjustment (a legitimate workload that grew beyond its cap).

Next steps