Skip to main content
When you are iterating on your LLM application (such as changing the model or the prompt), you may want to compare the results of different experiments. LangSmith supports a comparison view that lets you hone in on key differences, regressions, and improvements between different experiments.

Open the comparison view

  1. To access the experiment comparison view, navigate to the Datasets & Experiments page.
  2. Select a dataset, which will open the Experiments tab.
  3. Select two or more experiments and then click Compare.
The Experiments view in the UI with 3 experiments selected and the Compare button highlighted, in light mode.

Adjust the table display

You can toggle between different display options on the right-hand-side bar of the Comparing Experiments page.
Table display options, in light mode.

Filters

You can apply filters to the experiment comparison view to narrow down specific examples. Common examples for filters include:
  • Examples that contain specific input / output.
  • Runs with status success or error.
  • Runs that take more than x seconds in latency.
  • Specific metadata, tag, or feedback.
In addition to applying filters on the overall experiment view, you can apply filters on individual columns as well.
Filtering on specific columns, in light mode.

Columns

You can select and hide individual feedback keys or individual metrics in the Columns settings to isolate the information you need in the comparison view.

Full vs. Compact view

  • Full: Toggling Full will show the full text of the input, output, and reference output for each run. If the output is too long to display in the table, you can click on Expand to view the full content.
  • Compact: Compact view displays a preview of the experiment results for each example.

Display types

There are three built-in experiment views that cover several display types: Default, YAML, JSON.

View regressions and improvements

In the comparison view, runs that regressed on any feedback key against your baseline experiment will be highlighted in red, while runs that improved will be highlighted in green. At the top of each feedback column, you can find how many runs in that experiment did better and how many did worse than your baseline experiment. Click on the regressions or improvements buttons on the top of each column to filter to the runs that regressed or improved in that specific experiment.
The comparison view comparing 3 experiments with the regressions and improvements highlighted in red and green respectively, in light mode.

Update baseline experiment and metric

To track regressions across experiments, you can:
  1. At the top of the comparison view, hovering over the experiment icons, you can select any of the experiments as the Baseline to be compared against. You can also add or remove experiments. By default, the first selected experiment is selected as the baseline.
Configuring baseline experiment in dropdown, in light mode.
  1. Within the Feedback columns, you can configure whether a higher score is better for each feedback key. This preference will be stored. By default, a higher score is assumed to be better.
Dropdown for feedback metric column, configuring whether a higher score is better, in light mode.

Open a trace

If the example you’re evaluating is from an ingested run, you can hover over the output cell and click on the trace icon to open the trace view for that run. This will open up a trace in the side panel.
The View trace icon highlighted from an ingested run, light mode.

Expand detailed view

You can click on any cell to open up a detailed view of the experiment result on that particular example input, along with feedback keys and scores.
An example in the expanded Comparing Experiments view, in light mode.

Use experiment metadata as chart labels

You can configure the x-axis labels for the charts based on experiment metadata. Select a metadata key in the charts dropdown to change the x-axis labels.
x-axis dropdown highlighted with a list of the metadata attached to the experiment, in light mode.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.