This guide shows you how to run an evaluation using OpenTelemetry tracing with LangSmith.
If you’re already using OpenTelemetry for tracing your LLM application, you can run evaluations by routing traces to an experiment session. This approach is useful when you want to evaluate applications that are instrumented with OpenTelemetry but don’t use the LangSmith SDK’s evaluate() function.
Overview
When evaluating with OpenTelemetry, you need to:
- Create an experiment session in LangSmith.
- Configure OpenTelemetry to send traces to LangSmith.
- Add specific span attributes to link traces to the experiment and dataset examples.
- Run your application for each example in the dataset.
Prerequisites
This guide assumes you have:
- An application instrumented with OpenTelemetry that sends traces to LangSmith.
- A dataset created in LangSmith with examples to evaluate. You can create a dataset via the LangSmith UI or via the SDK.
This tutorial uses Strands agents as example implementations, but the approach works with any OpenTelemetry-instrumentation.
Install dependencies:
pip install langsmith strands-agents strands-agents-tools opentelemetry-sdk opentelemetry-exporter-otlp
Set the following environment variables:
# Tracing configuration
LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
LANGSMITH_API_KEY="<your-langsmith-api-key>"
OTEL_EXPORTER_OTLP_ENDPOINT = "https://api.smith.langchain.com/otel/"
# AWS Credentials
AWS_ACCESS_KEY_ID="<your-aws-access-key-id>"
AWS_SECRET_ACCESS_KEY="<your-aws-secret-access-key>"
AWS_REGION_NAME="<your-aws-region>"
If you’re self-hosting LangSmith, replace OTEL_EXPORTER_OTLP_ENDPOINT with your self-hosted URL and append /api/v1/otel. For example: OTEL_EXPORTER_OTLP_ENDPOINT = "https://ai-company.com/api/v1/otel".Replace LANGSMITH_ENDPOINT with your LangSmith API endpoint. For example: LANGSMITH_ENDPOINT = "https://ai-company.com/api/v1".
Step 1. Create an experiment session
This guide assumes that a dataset has been created in LangSmith with examples to evaluate. You can create a dataset via the LangSmith UI or via the SDK.
An experiment session groups all evaluation traces together. Create one using the LangSmith client:
from langsmith import Client
# Initialize LangSmith client
client = Client()
experiment_name = "strands-agent-experiment"
# Assumes a dataset has been created. You can find the dataset ID in the LangSmith UI or via the SDK.
dataset_id = "<your-dataset-id>"
# Create an experiment session linked to the dataset
project = client.create_project(
project_name=experiment_name,
reference_dataset_id=dataset_id
)
experiment_id = str(project.id)
Additionally, you can create evaluators in the LangSmith UI and bind them to your dataset. For evaluators defined in the UI and bound to your dataset, they will automatically run on experiment traces.
To learn more about evaluators, see Evaluators.
First, you need an application that uses OpenTelemetry for tracing. This example uses a Strands agent, but you can use any OpenTelemetry-instrumented application. Set up OpenTelemetry to route traces to your experiment session by including the experiment ID in the OTEL headers. The general idea in this step is to have an agent or application that has been instrumented with OpenTelemetry.
TypeScript examples are not provided for this step as the Strands TypeScript SDK does not currently support OpenTelemetry observability (as of February 2026).
import os
from strands import Agent
from strands_tools import file_read, file_write, python_repl, shell, journal
from strands.telemetry import StrandsTelemetry
# Set OTEL headers with experiment ID as the project
api_key = os.getenv('LANGSMITH_API_KEY')
os.environ['OTEL_EXPORTER_OTLP_HEADERS'] = f"x-api-key={api_key},Langsmith-Project={experiment_id}"
# Initialize telemetry
strands_telemetry = StrandsTelemetry()
strands_telemetry.setup_otlp_exporter()
# Create an agent (Strands automatically creates OTel spans)
agent = Agent(
tools=[file_read, file_write, python_repl, shell, journal],
system_prompt="You are an Expert Software Developer.",
model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
)
For details on setting up OpenTelemetry tracing with LangSmith, see Trace with OpenTelemetry.
Step 3. Set up key span attributes
Add the required span attributes to each application run. These attributes link each trace to the experiment and the specific dataset example.
The following attributes are relevant for experiment evaluation:
| Attribute | Purpose |
|---|
langsmith.trace.session_id | Routes the trace to your experiment session |
langsmith.reference_example_id | Links the trace to a specific dataset example |
langsmith.span.kind | Sets the span type (e.g., “llm”, “chain”, “tool”) |
inputs | Records the input to your application |
outputs | Records the output from your application |
For a complete list of supported OpenTelemetry attributes, see Trace with OpenTelemetry.
from opentelemetry import trace
def evaluate_with_opentelemetry(agent, example_id: str, example_input: str, experiment_id: str):
tracer = trace.get_tracer(__name__)
# Wrapper span to add experiment metadata
with tracer.start_as_current_span("experiment_evaluation") as span:
# Route trace to the experiment
span.set_attribute("langsmith.trace.session_id", experiment_id)
# Link trace to the specific dataset example
span.set_attribute("langsmith.reference_example_id", example_id)
# Record input
span.set_attribute("inputs", example_input)
# Run the application
response = agent(example_input)
# Record output
output_text = getattr(response, "output", str(response))
span.set_attribute("outputs", output_text)
return output_text
Step 4. Run evaluation by iterating through dataset examples
Each experiment run creates traces in LangSmith that are linked to your dataset examples.
# Iterate through dataset examples
for example in client.list_examples(dataset_name=dataset_name):
# Extract input from the example inputs dictionary
# Adjust the key based on your dataset structure
# (e.g., "input", "question", etc.)
example_input = example.inputs.get("input")
evaluate_with_opentelemetry(
agent=agent,
example_id=str(example.id),
example_input=str(example_input),
experiment_id=experiment_id
)
After running the evaluation, you can analyze the experiment in the LangSmith UI to see:
- Individual trace details for each example
- Evaluator scores and feedback
- Comparisons between different experiment runs
Navigate to your experiment in the LangSmith UI to analyze the results.