Configure LangSmith for scale

A self-hosted LangSmith instance can handle a large number of traces and users. The default configuration for the self-hosted deployment can handle substantial load, and you can configure your deployment to be able to achieve higher scale. This page describes scaling considerations and provides some examples to help configure your self-hosted instance. For example configurations, refer to Example LangSmith configurations for scale.

Summary

The table below provides an overview comparing different LangSmith configurations for various load patterns (reads / writes):

	Low / low	Low / high	High / low	Medium / medium	High / high
	5	5	50	20	50
	10	1000	10	100	1000
Frontend replicas	1 (default)	4	2	2	4
Platform backend replicas	3 (default)	20	3 (default)	3 (default)	20
Queue replicas	3 (default)	160	6	10	160
Backend replicas	2 (default)	5	40	16	50
Redis resources	8 Gi (default)	200 Gi external	8 Gi (default)	13Gi external	200 Gi external
ClickHouse resources	4 CPU 16 Gi (default)	10 CPU 32Gi memory	8 CPU 16 Gi per replica	16 CPU 24Gi memory	14 CPU 24 Gi per replica
ClickHouse setup	Single instance	Single instance	3-node	Single instance	3-node
	2 CPU 8 GB memory 10GB storage (external)	2 CPU 8 GB memory 10GB storage (external)	2 CPU 8 GB memory 10GB storage (external)	2 CPU 8 GB memory 10GB storage (external)	2 CPU 8 GB memory 10GB storage (external)
Blob storage	Disabled	Enabled	Enabled	Enabled	Enabled

Below we go into more details about the read and write paths as well as provide a values.yaml snippet for you to start with for your self-hosted LangSmith instance.

Trace ingestion (write path)

Common usage that put load on the write path:

Ingesting traces via the Python or JavaScript LangSmith SDK
Ingesting traces via the @traceable wrapper
Submitting traces via the /runs/multipart endpoint

Services that play a large role in trace ingestion:

Platform backend service: Receives initial request to ingest traces and places traces on a Redis queue
Redis cache: Used to queue traces that need to be persisted
Queue service: Persists traces for querying
ClickHouse: Persistent storage used for traces

When scaling up the write path (trace ingestion), it is helpful to monitor the four services/resources listed above. Here are some typical changes that can help increase performance of trace ingestion:

Give ClickHouse more resources (CPU and memory) if it is approaching resource limits.
Increase the number of platform-backend pods if ingest requests are taking long to respond.
Increase queue service pod replicas if traces are not being processed from Redis fast enough.
Use a larger Redis cache if you notice that the current Redis instance is reaching resource limits. This could also be a reason why ingest requests take a long time.

Trace querying (read path)

Common usage that puts load on the read path:

Users on the frontend looking at tracing projects or individual traces
Scripts used to query for trace info
Hitting either the /runs/query or /runs/<run-id> api endpoints

Services that play a large role in querying traces:

Backend service: Receives the request and submits a query to ClickHouse to then respond to the request
ClickHouse: Persistent storage for traces. This is the main database that is queried when requesting trace info.

When scaling up the read path (trace querying), it is helpful to monitor the two services/resources listed above. Here are some typical changes that can help improve performance of trace querying:

Increase the number of backend service pods. This would be most impactful if backend service pods are reaching 1 core CPU usage.
Give ClickHouse more resources (CPU or Memory). ClickHouse can be very resource intensive, but it should lead to better performance.
Move to a replicated ClickHouse cluster. Adding replicas of ClickHouse helps with read performance, but we recommend staying below 5 replicas (start with 3).

For more precise guidance on how this translates to helm chart values, refer to the examples the following section. If you are unsure why your LangSmith instance cannot handle a certain load pattern, contact the LangChain team.

Example LangSmith configurations for scale

Below we provide some example LangSmith configurations based on expected read and write loads. For read load (trace querying):

Low means roughly 5 users looking at traces at a time (about 10 requests per second)
Medium means roughly 20 users looking at traces at a time (about 40 requests per second)
High means roughly 50 users looking at traces at a time (about 100 requests per second)

For write load (trace ingestion):

Low means up to 10 traces submitted per second
Medium means up to 100 traces submitted per second
High means up to 1000 traces submitted per second

The exact optimal configuration depends on your usage and trace payloads. Use the examples below in combination with the information above and your specific usage to update your LangSmith configuration as you see fit. If you have any questions, please reach out to the LangChain team.

Low reads, low writes

The default LangSmith configuration will handle this load. No custom resource configuration is needed here.

Low reads, high writes

You have a very high scale of trace ingestions, but single digit number of users on the frontend querying traces at any one time. For this, we recommend a configuration like this:

config:
  blobStorage:
    # Please also set the other keys to connect to your blob storage. See configuration section.
    enabled: true
  settings:
    redisRunsExpirySeconds: "3600"
# ttl:
#   enabled: true
#   ttl_period_seconds:
#     longlived: "7776000"  # 90 days (default is 400 days)
#     shortlived: "604800"  # 7 days (default is 14 days)

frontend:
  deployment:
    replicas: 4 # OR enable autoscaling to this level (example below)
# autoscaling:
#   enabled: true
#   maxReplicas: 4
#   minReplicas: 2

platformBackend:
  deployment:
    replicas: 20 # OR enable autoscaling to this level (example below)
# autoscaling:
#   enabled: true
#   maxReplicas: 20
#   minReplicas: 8

## Note that we are actively working on improving performance of this service to reduce the number of replicas.
queue:
  deployment:
    replicas: 160 # OR enable autoscaling to this level (example below)
# autoscaling:
#   enabled: true
#   maxReplicas: 160
#   minReplicas: 40

backend:
  deployment:
    replicas: 5 # OR enable autoscaling to this level (example below)
# autoscaling:
#   enabled: true
#   maxReplicas: 5
#   minReplicas: 3

## Ensure your Redis cache is at least 200 GB
redis:
  external:
    enabled: true
    existingSecretName: langsmith-redis-secret # Set the connection url for your external Redis instance (200+ GB)

clickhouse:
  statefulSet:
    persistence:
      # This may depend on your configured TTL (see config section).
      # We recommend 600Gi for every shortlived TTL day if operating at this scale constantly.
      size: 4200Gi # This assumes 7 days TTL and operating a this scale constantly.
    resources:
      requests:
        cpu: "10"
        memory: "32Gi"
      limits:
        cpu: "16"
        memory: "48Gi"

commonEnv:
  - name: "CLICKHOUSE_ASYNC_INSERT_WAIT_PCT_FLOAT"
    value: "0"

High reads, low writes

You have a relatively low scale of trace ingestions, but many frontend users querying traces and/or have scripts that hit the /runs/query or /runs/<run-id> endpoints frequently. For this, we strongly recommend setting up a replicated ClickHouse cluster to enable high read scale at low latency. See our external ClickHouse doc for more guidance on how to setup a replicated ClickHouse cluster. For this load pattern, we recommend using a 3 node replicated setup, where each replica in the cluster should have resource requests of 8+ cores and 16+ GB memory, and resource limit of 12 cores and 32 GB memory. For this, we recommend a configuration like this:

config:
  blobStorage:
    # Please also set the other keys to connect to your blob storage. See configuration section.
    enabled: true

frontend:
  deployment:
    replicas: 2

queue:
  deployment:
    replicas: 6 # OR enable autoscaling to this level (example below)
# autoscaling:
#   enabled: true
#   maxReplicas: 6
#   minReplicas: 4

backend:
  deployment:
    replicas: 40 # OR enable autoscaling to this level (example below)
# autoscaling:
#   enabled: true
#   maxReplicas: 40
#   minReplicas: 16

# We strongly recommend setting up a replicated clickhouse cluster for this load.
# Update these values as needed to connect to your replicated clickhouse cluster.
clickhouse:
  external:
    # If using a 3 node replicated setup, each replica in the cluster should have resource requests of 8+ cores and 16+ GB memory, and resource limit of 12 cores and 32 GB memory.
    enabled: true
    host: langsmith-ch-clickhouse-replicated.default.svc.cluster.local
    port: "8123"
    nativePort: "9000"
    user: "default"
    password: "password"
    database: "default"
    cluster: "replicated"

Medium reads, medium writes

This is a good all around configuration that should be able to handle most usage patterns of LangSmith. In internal testing, this configuration allowed us to scale to 100 traces ingested per second and 40 read requests per second. For this, we recommend a configuration like this:

config:
  blobStorage:
    # Please also set the other keys to connect to your blob storage. See configuration section.
    enabled: true
  settings:
    redisRunsExpirySeconds: "3600"

frontend:
  deployment:
    replicas: 2

queue:
  deployment:
    replicas: 10 # OR enable autoscaling to this level (example below)
# autoscaling:
#   enabled: true
#   maxReplicas: 10
#   minReplicas: 5

backend:
  deployment:
    replicas: 16 # OR enable autoscaling to this level (example below)
# autoscaling:
#   enabled: true
#   maxReplicas: 16
#   minReplicas: 8

redis:
  statefulSet:
    resources:
      requests:
        memory: 13Gi
      limits:
        memory: 13Gi

  # -- For external redis instead use something like below --
  # external:
  #   enabled: true
  #   connectionUrl: "<URL>" OR existingSecretName: "<SECRET-NAME>"

clickhouse:
  statefulSet:
    persistence:
      # This may depend on your configured TTL.
      # We recommend 60Gi for every shortlived TTL day if operating at this scale constantly.
      size: 420Gi # This assumes 7 days TTL and operating a this scale constantly.
    resources:
      requests:
        cpu: "16"
        memory: "24Gi"
      limits:
        cpu: "28"
        memory: "40Gi"

commonEnv:
  - name: "CLICKHOUSE_ASYNC_INSERT_WAIT_PCT_FLOAT"
    value: "0"

If you still notice slow reads with the above configuration, we recommend moving to a replicated Clickhouse cluster setup

High reads, high writes

You have a very high rate of trace ingestion (approaching 1000 traces submitted per second) and also have many users querying traces on the frontend (over 50 users) and/or scripts that are consistently making requests to /runs/query or /runs/<run-id> endpoints. For this, we very strongly recommend setting up a replicated ClickHouse cluster to prevent degraded read performance at high write scale. See our external ClickHouse doc for more guidance on how to set up a replicated ClickHouse cluster. For this load pattern, we recommend using a 3 node replicated setup, where each replica in the cluster should have resource requests of 14+ cores and 24+ GB memory, and resource limit of 20 cores and 48 GB memory. We also recommend that each node/instance of ClickHouse has 600 Gi of volume storage for each day of TTL that you enable (as per the configuration below). Overall, we recommend a configuration like this:

config:
  blobStorage:
    # Please also set the other keys to connect to your blob storage. See configuration section.
    enabled: true
  settings:
    redisRunsExpirySeconds: "3600"
# ttl:
#   enabled: true
#   ttl_period_seconds:
#     longlived: "7776000"  # 90 days (default is 400 days)
#     shortlived: "604800"  # 7 days (default is 14 days)

frontend:
  deployment:
    replicas: 4 # OR enable autoscaling to this level (example below)
# autoscaling:
#   enabled: true
#   maxReplicas: 4
#   minReplicas: 2

platformBackend:
  deployment:
    replicas: 20 # OR enable autoscaling to this level (example below)
# autoscaling:
#   enabled: true
#   maxReplicas: 20
#   minReplicas: 8

## Note that we are actively working on improving performance of this service to reduce the number of replicas.
queue:
  deployment:
    replicas: 160 # OR enable autoscaling to this level (example below)
# autoscaling:
#   enabled: true
#   maxReplicas: 160
#   minReplicas: 40

backend:
  deployment:
    replicas: 50 # OR enable autoscaling to this level (example below)
# autoscaling:
#   enabled: true
#   maxReplicas: 50
#   minReplicas: 20

## Ensure your Redis cache is at least 200 GB
redis:
  external:
    enabled: true
    existingSecretName: langsmith-redis-secret # Set the connection url for your external Redis instance (200+ GB)

# We strongly recommend setting up a replicated clickhouse cluster for this load.
# Update these values as needed to connect to your replicated clickhouse cluster.
clickhouse:
  external:
    # If using a 3 node replicated setup, each replica in the cluster should have resource requests of 14+ cores and 24+ GB memory, and resource limit of 20 cores and 48 GB memory.
    enabled: true
    host: langsmith-ch-clickhouse-replicated.default.svc.cluster.local
    port: "8123"
    nativePort: "9000"
    user: "default"
    password: "password"
    database: "default"
    cluster: "replicated"

commonEnv:
  - name: "CLICKHOUSE_ASYNC_INSERT_WAIT_PCT_FLOAT"
    value: "0"

Ensure that the Kubernetes cluster is configured with sufficient resources to scale to the recommended size. After deployment, all of the pods in the Kubernetes cluster should be in a Running state. Pods stuck in Pending may indicate that you are reaching node pool limits or need larger nodes.Also, ensure that any ingress controller deployed on the cluster is able to handle the desired load to prevent bottlenecks.

Edit the source of this page on GitHub.

Connect these docs programmatically to Claude, VSCode, and more via MCP for real-time answers.

Overview

Hybrid

Self-hosted

Summary

Trace ingestion (write path)

Trace querying (read path)

Example LangSmith configurations for scale

Low reads, low writes

Low reads, high writes

High reads, low writes

Medium reads, medium writes

High reads, high writes

Overview

Hybrid

Self-hosted

​Summary

​Trace ingestion (write path)

​Trace querying (read path)

​Example LangSmith configurations for scale

​Low reads, low writes

​Low reads, high writes

​High reads, low writes

​Medium reads, medium writes

​High reads, high writes

Summary

Trace ingestion (write path)

Trace querying (read path)

Example LangSmith configurations for scale

Low reads, low writes

Low reads, high writes

High reads, low writes

Medium reads, medium writes

High reads, high writes