Load Pattern (Reads/Writes) | Concurrent Frontend Users | Traces Submitted per Second | Frontend Replicas | Platform Backend Replicas | Queue Replicas | Backend Replicas | Redis Resources | ClickHouse Resources | ClickHouse Setup | Postgres Resources | Blob Storage |
---|---|---|---|---|---|---|---|---|---|---|---|
Low/Low | 5 | 10 | 1 (default) | 3 (default) | 3 (default) | 2 (default) | 8 Gi (default) | 4 CPU, 16 Gi (default) | Single instance | 2 CPU, 8 GB memory, 10GB storage (external) | Disabled |
Low/High | 5 | 1000 | 4 | 20 | 160 | 5 | 200 Gi external | 10 CPU, 32Gi memory | Single instance | 2 CPU, 8 GB memory, 10GB storage (external) | Enabled |
High/Low | 50 | 10 | 2 | 3 (default) | 6 | 40 | 8 Gi (default) | 8 CPU, 16 Gi per replica | 3-node replicated cluster | 2 CPU, 8 GB memory, 10GB storage (external) | Enabled |
Medium/Medium | 20 | 100 | 2 | 3 (default) | 10 | 16 | 13Gi external | 16 CPU, 24Gi memory | Single instance | 2 CPU, 8 GB memory, 10GB storage (external) | Enabled |
High/High | 50 | 1000 | 4 | 20 | 160 | 50 | 200 Gi external | 14 CPU, 24 Gi per replica | 3-node replicated cluster | 2 CPU, 8 GB memory, 10GB storage (external) | Enabled |
values.yaml
snippet for you to start with for your self-hosted LangSmith instance.
@traceable
wrapper/runs/multipart
endpoint/runs/query
or /runs/<run-id>
api endpoints/runs/query
or /runs/<run-id>
endpoints frequently.
For this, we strongly recommend setting up a replicated ClickHouse cluster to enable high read scale at low latency. See our external ClickHouse doc for more guidance on how to setup a replicated ClickHouse cluster. For this load pattern, we recommend using a 3 node replicated setup, where each replica in the cluster should have resource requests of 8+ cores and 16+ GB memory, and resource limit of 12 cores and 32 GB memory.
For this, we recommend a configuration like this:
/runs/query
or /runs/<run-id>
endpoints.
For this, we very strongly recommend setting up a replicated ClickHouse cluster to prevent degraded read performance at high write scale. See our external ClickHouse doc for more guidance on how to set up a replicated ClickHouse cluster. For this load pattern, we recommend using a 3 node replicated setup, where each replica in the cluster should have resource requests of 14+ cores and 24+ GB memory, and resource limit of 20 cores and 48 GB memory. We also recommend that each node/instance of ClickHouse has 600 Gi of volume storage for each day of TTL that you enable (as per the configuration below).
Overall, we recommend a configuration like this:
Running
state. Pods stuck in Pending
may indicate that you are reaching node pool limits or need larger nodes.Also, ensure that any ingress controller deployed on the cluster is able to handle the desired load to prevent bottlenecks.