This guide shows you how to run an evaluation using OpenTelemetry tracing with LangSmith. If you’re already using OpenTelemetry for tracing your LLM application, you can run evaluations by routing traces to an experiment session. This approach is useful when you want to evaluate applications that are instrumented with OpenTelemetry but don’t use the LangSmith SDK’sDocumentation Index
Fetch the complete documentation index at: https://docs.langchain.com/llms.txt
Use this file to discover all available pages before exploring further.
evaluate() function.
Overview
When evaluating with OpenTelemetry, you need to:- Create an experiment session in LangSmith.
- Configure OpenTelemetry to send traces to LangSmith.
- Add specific span attributes to link traces to the experiment and dataset examples.
- Run your application for each example in the dataset.
Prerequisites
This guide assumes you have:- An application instrumented with OpenTelemetry that sends traces to LangSmith.
- A dataset created in LangSmith with examples to evaluate. You can create a dataset via the LangSmith UI or via the SDK.
If you’re self-hosting LangSmith, replace
OTEL_EXPORTER_OTLP_ENDPOINT with your self-hosted URL and append /api/v1/otel. For example: OTEL_EXPORTER_OTLP_ENDPOINT = "https://ai-company.com/api/v1/otel".Replace LANGSMITH_ENDPOINT with your LangSmith API endpoint. For example: LANGSMITH_ENDPOINT = "https://ai-company.com/api/v1".Step 1. Create an experiment session
This guide assumes that a dataset has been created in LangSmith with examples to evaluate. You can create a dataset via the LangSmith UI or via the SDK. An experiment session groups all evaluation traces together. Create one using the LangSmith client:Step 2. Define an application and configure OpenTelemetry
First, you need an application that uses OpenTelemetry for tracing. This example uses a Strands agent, but you can use any OpenTelemetry-instrumented application. Set up OpenTelemetry to route traces to your experiment session by including the experiment ID in the OTEL headers. The general idea in this step is to have an agent or application that has been instrumented with OpenTelemetry.TypeScript examples are not provided for this step as the
Strands TypeScript SDK does not currently support OpenTelemetry observability (as of February 2026).Step 3. Set up key span attributes
Add the required span attributes to each application run. These attributes link each trace to the experiment and the specific dataset example. The following attributes are relevant for experiment evaluation:| Attribute | Purpose |
|---|---|
langsmith.trace.session_id | Routes the trace to your experiment session |
langsmith.reference_example_id | Links the trace to a specific dataset example |
langsmith.span.kind | Sets the span type (e.g., “llm”, “chain”, “tool”) |
inputs | Records the input to your application |
outputs | Records the output from your application |
Step 4. Run evaluation by iterating through dataset examples
Each experiment run creates traces in LangSmith that are linked to your dataset examples.- Individual trace details for each example
- Evaluator scores and feedback
- Comparisons between different experiment runs
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

