LangSmith Insights automatically analyzes your traces to detect usage patterns, common agent behaviors and failure modes — without requiring you to manually review thousands of traces. Insights uses hierarchical categorization to make sense of your data and highlight actionable trends.
Insights is in Beta and under active development. To provide feedback or use this feature, reach out to the LangSmith team.

Prerequisites

An OpenAI API key — generate one from the OpenAI dashboard.

Run your first Insights job

From the LangSmith UI:
  1. Navigate to Tracing Projects in the left-hand menu and select a tracing project.
  2. Click +New in the top right corner then New Insights Job to kick off a new Insights job.
  3. Enter a name for your job.
  4. Click the icon in the top right of the job creation pane to set your OpenAI API key as a workspace secret. If your workspace already has an OpenAI API key set, you can skip this step.
  5. Click Create.
This will kick off a background Insights job. Jobs can take up to 20 minutes to complete.
Generating insights over 1,000 runs typically costs $0.50-$1.00 in OpenAI API calls. The cost grows linearly in the number of runs and the size of each run.

Understand the results

Once your job has completed, you can navigate to the Insights tab where you’ll see a table of Insights jobs. Each job contains insights generated over a specific sample of runs from the tracing project.

Insights jobs for a single tracing project

Click into your job to see traces organized into a set of auto-generated categories. You can drill down through categories and subcategories to view the underlying traces, feedback, and run statistics.

Common topics of conversations with the https://chat.langchain.com chatbot

Top-level categories

Your traces are automatically grouped into top-level categories that represent the broadest patterns in your data. The distribution bars show how frequently each pattern occurs, making it easy to spot behaviors that happen more or less than expected. Each category has a brief description and displays aggregated metrics over the traces it contains, including:
  • Typical runs stats (like error rates, latency, cost)
  • Feedback scores from your evaluators
  • Attributes extracted as part of the job

Subcategories

Clicking on any category shows a breakdown into subcategories, which gives you a more granular understanding of interaction patterns in that category of traces. In the Chat Langchain example pictured above, under “Data & Retrieval” there are subcategories like “Vector Stores” and “Data Ingestion”.

Individual traces

You can view the traces assigned to each category or subcategory by clicking through to see the runs table. From there, you can click into any trace to see the full conversation details.

Configure a job

When kicking off an Insights job, you can configure the following:

Select runs

  • Sample size: The maximum number of traces to analyze. Currently capped at 1,000
  • Time range: Traces are sampled from this time range
  • Filters: Additional run filters. As you adjust filters, you’ll see how many traces match your criteria

Categories

By default, top-level categories are automatically generated bottom-up from the underlying traces. In some instances, you know specific categories you’re interested in upfront and want the job to bucket traces into those predefined categories. The Categories section of the config lets you do this by enumerating the names and descriptions of the top-level categories you want to be used. Subcategories are still auto-generated by the algorithm within the predefined top-level categories.

Summary prompt

The first step of the job is to create a brief summary of every trace — it is these summaries that are then categorized. Extracting the right information in the summary is essential for getting useful categories. The prompt used to generate these summaries can be edited. The two things to think about when editing the prompt are:
  • Summarization instructions: Any information that isn’t in the trace summary won’t affect the categories that get generated, so make sure to provide clear instructions on what information is important to extract from each trace.
  • Trace content: Use mustache formatting to specify which parts of each trace are passed to the summarizer. Large traces with lots of inputs and outputs can be expensive and noisy. Reducing the prompt to only include the most relevant parts of the trace can improve your results.
For specifying trace content, you can access run inputs via {{run.inputs}} and the outputs via {{run.outputs}}. For example, the prompt "Summarize this: {{run.inputs}}" will include (a JSON serialization of) all of the run inputs. The prompt "Summarize this: {{run.inputs.foo.bar}}" will include only the “bar” value within the “foo” value of the run inputs.

Attributes

Along with a summary, you can define additional categorical, numerical, and boolean attributes to be extracted from each trace. These attributes will influence the categorization step — traces with similar attribute values will tend to be categorized together. You can also see aggregations of these attributes per category. As an example, you might want to extract the attribute user_satisfied: boolean from each trace to steer the algorithm towards categories that split up positive and negative user experiences, and to see the average user satisfaction per category.

Save your config

You can optionally save Insights job configs for future reuse. This is especially useful if you want to periodically run Insights and compare results over time to identify changes in user and agent behavior. Select from previously saved configs in the dropdown in the top-left corner of the pane when creating a new Insights job.