Custom model providers

Private beta: The LLM Gateway is in private beta. Sign up for the waitlist to get access.

In addition to the built-in providers, the LLM Gateway can proxy requests to any OpenAI-compatible endpoint, such as a self-hosted open-source model served through an inference server (vLLM, Ollama, and similar).

How it works

A custom provider is defined by an OpenAI Compatible Endpoint model configuration that you save under Settings → Model configurations in the LangSmith UI. The gateway uses the following options from the configuration:

A base URL: the upstream endpoint the gateway forwards requests to.
A model name: the model identifier your upstream expects.
An API key: stored as a workspace secret, never sent by the client.

You address the saved configuration by name through one of two routes, depending on whether you want callers to choose the model or want to enforce the configured one:

Route	Model name in the request body
`https://gateway.smith.langchain.com/providers/{configName}`	Forwarded to the upstream as-is—the client picks the model.
`https://gateway.smith.langchain.com/models/{configName}`	Overridden with the configuration’s model name—the client’s value is ignored.

Both routes look up the same configuration, resolve the same secret, and proxy to the same upstream URL; they only differ in whether the model name is enforced.

{configName} is the configuration name from your workspace model configuration. If the name contains characters that aren’t URL-safe (such as / or spaces), URL-encode them in the path. For example, a configuration named meta-llama/Llama-3.1-8B-Instruct becomes https://gateway.smith.langchain.com/providers/meta-llama%2FLlama-3.1-8B-Instruct/v1/chat/completions.

1. Create a custom provider configuration

Add the upstream endpoint’s API key as a workspace secret under Settings → Integrations → Provider Secrets. Give it a descriptive name (for example, MY_PROVIDER_API_KEY).
Go to Settings → Model configurations and create a configuration with OpenAI Compatible Endpoint as the provider.
Set the Base URL to your upstream endpoint (for example, https://my-inference-server.example.com/v1) and the Model Name to a model identifier the endpoint expects.
Set the API Key Name to the secret you created.
Save the configuration with a name. This name is what you’ll use in the gateway route.

The Model Name you save only matters if you call the configuration through /models/{configName}. Through /providers/{configName}, the client’s model field is sent through unchanged, so a single configuration can serve any model your upstream supports (for example, any model pulled into a shared Ollama instance).

2. Make a call

Call the saved configuration by name (my-custom-endpoint in the following examples).

Any model: `/providers/{configName}`

Use this route when the upstream serves multiple models and you want callers to pick which one:

curl https://gateway.smith.langchain.com/providers/my-custom-endpoint/v1/chat/completions \
    -H "Authorization: Bearer $LANGSMITH_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"model":"llama3.1:8b","messages":[{"role":"user","content":"ping"}]}'

The gateway forwards the request body’s model field to the upstream as-is.

One model: `/models/{configName}`

Use this route to pin every call through this configuration to a single model, regardless of what the client requests—useful for enforcing model behavior for a team or application:

curl https://gateway.smith.langchain.com/models/my-custom-endpoint/v1/chat/completions \
    -H "Authorization: Bearer $LANGSMITH_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"messages":[{"role":"user","content":"ping"}]}'

The gateway overrides the request body’s model field with the model name from the saved configuration, so any value the client passes (or omitting it, as per the example) is ignored. To serve multiple pinned models from the same upstream, create one configuration per model (each with its own name and /models/{configName} route).

Supported endpoints

Both /providers/{configName} and /models/{configName} use the same allowlist as the built-in OpenAI provider: POST /v1/chat/completions (including streaming), POST /v1/responses, and the GET /v1/models listing endpoints. Any other path returns 501 Not Implemented.

Next steps

Spend policies: apply cost limits to custom providers.
PII and secrets redaction: redact sensitive data before it reaches your endpoint.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Edit this page on GitHub or file an issue.

LLM Gateway

Auditing

Data & compliance

How it works

1. Create a custom provider configuration

2. Make a call

Any model: `/providers/{configName}`

One model: `/models/{configName}`

Supported endpoints

Next steps

​How it works

​1. Create a custom provider configuration

​2. Make a call

​Any model: /providers/{configName}

​One model: /models/{configName}

​Supported endpoints

​Next steps

How it works

1. Create a custom provider configuration

2. Make a call

Any model: `/providers/{configName}`

One model: `/models/{configName}`

Supported endpoints

Next steps