LangSmith Engine on self-hosted

Engine for self-hosted is coming soon!

LangSmith Engine is an agent within LangSmith that monitors your production traces, clusters them into issues, diagnoses each issue against your source code, proposes a fix as a PR, and identifies ground truth evals to add to your datasets. For a product overview, see Engine. This page explains how Engine runs in a self-hosted deployment, how it reaches managed inference, and what that means for your data. Engine works with three kinds of data:

Code (optional): your agent’s source, which Engine reads to diagnose issues and propose fixes.
Traces: runtime data from your agents, which can include user messages, tool outputs, and PII.
Model: the LLM calls Engine makes to run diagnosis, generate fixes, and write evaluators.

In a self-hosted deployment, Engine’s orchestration runs inside your VPC as part of LangSmith: reading traces, reading code, and running its detect, fix, and verify loop. The models it calls are the exception. Rather than run them locally, Engine reaches a LangChain-managed, zero data retention (ZDR) inference service over a private network connection. Traces are the sensitive data type, and the architecture below keeps trace data on private networks inside your cloud service provider (CSP).

How inference works

Engine’s inference is delivered through a LangChain-managed service, LangSmith Intelligence. The service is fully isolated from LangSmith and does not trace, store, or persist any customer data. The flow:

Your self-hosted deployment connects to LangSmith Intelligence over a private link (PrivateLink on AWS and Azure, Private Service Connect on GCP).
LangSmith Intelligence runs models hosted within your cloud provider’s environment (Bedrock, Vertex, or Foundry), so no data leaves your CSP.
The service records only request metadata for billing, plus telemetry needed to keep the service reliable.

Each inference request carries the trace content, code, and intermediate outputs the model needs to reason. The request is processed by a model running inside your cloud provider and is never stored. No copy of your data is persisted outside your CSP.

AWS

Architecture diagram showing a self-hosted LangSmith deployment on AWS connecting to LangSmith Intelligence over PrivateLink, with inference served by Bedrock inside the customer's account

GCP

Architecture diagram showing a self-hosted LangSmith deployment on GCP connecting to LangSmith Intelligence over Private Service Connect, with inference served by Vertex inside the customer's project

Azure

Model selection and quality

Model selection drives much of what makes Engine effective. Engine uses different models, tuned differently, for each step of its work: clustering issues, diagnosing root cause against your code, generating a fix, and writing the evaluator that verifies it. LangChain tunes these models for both quality and token efficiency, and upgrades them as better models ship. Managed inference makes that possible. Because Engine always runs the model LangChain has tuned for each step, behavior stays consistent and improves as those models are upgraded. A bring-your-own-key setup would instead tie Engine to the models you have configured, so tuning and token efficiency would vary from request to request.

What this means for your data

In a self-hosted deployment, Engine adds two data-locality guarantees on top of the controls common to every deployment:

Private networks only: all data transit happens over private link, never the public internet.
In-CSP: models run inside your CSP, so data never leaves it.

Engine’s deployment-independent data handling, including zero data retention with every model provider and no use of customer data to train or fine-tune models, is described in Engine security.

​How inference works

​AWS

​GCP

​Azure

​Model selection and quality

​What this means for your data

​See also

How inference works

AWS

GCP

Azure

Model selection and quality

What this means for your data

See also