LangSmith Polly

Polly is in beta. Your feedback on Polly is invaluable as the team refines its capabilities.

LangSmith Polly is an AI assistant embedded directly in your LangSmith workspace to help you analyze and understand your application data. Polly helps you gain insight from your traces, conversation threads, and prompts without having to dig through data manually. By asking natural language questions, you can quickly understand agent performance, debug issues, and analyze user sentiment. LangSmith Polly icon

Polly appears in the right-hand bottom corner of the following locations within LangSmith UI: Observability & Debugging:

Trace pages: Analyze individual runs and execution traces.
Thread views: Understand conversation threads and user interactions.

Prompt Engineering:

Prompt Playground: Edit and optimize prompts.
Prompt Hub pages: Explore and understand shared prompts.

Evaluation & Testing:

Dataset Experiments: Analyze experiment results and compare runs.
Dataset Examples: Browse and understand dataset structure.
Annotation Queues: Review runs and make informed annotation decisions.

Get started

Before you start using Polly, you need to add an API key for the model you’re using: In the LangSmith UI, ensure that your API key is set as a workspace secret.

Navigate to Settings and then move to the Secrets tab.
Select Add secret and enter the key environment variable (e.g.,OPENAI_API_KEY or ANTHROPIC_API_KEY) and your API key as the Value.
Select Save secret.

When adding workspace secrets in the LangSmith UI, make sure the secret keys match the environment variable names expected by your model provider.

Observability

Trace pages

On an individual trace, Polly analyzes the run data and execution trajectory. Polly examines the full trace context, including run metadata, inputs, outputs, intermediate steps, and configuration to help you understand what happened and identify areas for improvement. Example questions:

“Is there anything that the agent could have done better here?”
“Why did this run fail?”
“What took the most time in this trace?”
“Summarize what happened in this trace”

Thread views

Under the Threads tab, Polly analyzes conversation threads to help you understand user sentiment, conversation outcomes, and interaction patterns. Use Polly to identify user pain points and understand whether issues were resolved. Example questions:

“Did the user seem frustrated?”
“What issues is the user experiencing?”
“Was the user’s problem solved?”
“What was the main topic of this thread?”

Prompt engineering

Prompt Playground

In the Playground, Polly helps you edit and optimize your prompts. Use automated options like Optimize prompt, Generate a tool, or Generate an output schema, or give Polly custom instructions for editing your prompt. Example questions:

“Make it respond in Italian”
“Add more context about the user’s role”
“Make the tone more professional”
“Simplify the instructions”

Prompt Playground showing Polly chat in the sidebar with information on a generated tool.

Prompt Hub pages

When viewing a prompt in the LangSmith Hub, Polly helps you understand the prompt’s structure, messages, tools, and configuration. This is useful for exploring and learning from shared prompts. Example questions:

“What does this prompt do?”
“What tools does this prompt use?”
“Explain the structure of this prompt”
“What are the key instructions in this prompt?”

Evaluation

Dataset Experiments

On the Datasets page under the Experiments tab, Polly analyzes experiment results and helps you compare runs across different experiments. Polly can identify patterns, summarize performance, and help you understand which approaches work best. Example questions:

“Which experiment performed best?”
“What are the main differences between these runs?”
“Summarize the results of this experiment”
“What patterns do you see in the failures?”

Dataset Examples

On the Datasets page under the Examples tab, Polly helps you understand your dataset structure, browse examples, and identify data patterns. This is useful for understanding what data you’re working with and preparing datasets for experiments. Example questions:

“What type of data is in this dataset?”
“Show me examples with errors”
“What patterns do you see in the inputs?”
“How many examples are in this dataset?”

Annotation Queues

In Annotation Queues, Polly helps you analyze runs before making annotation decisions. Whether you’re reviewing runs individually or comparing them pairwise, Polly provides insights into run behavior, errors, and execution patterns to inform your scoring. Example questions:

“What went wrong in this run?”
“Summarize what happened in this run”
“Compare these two runs”
“What should I consider when scoring this?”

What’s next

Learn more about the features that Polly helps you explore:

Observability

Learn more about tracing and monitoring your LLM applications

Threads

Understand how threads work in LangSmith

Prompt Engineering

Create and iterate on prompts in the playground

Evaluation

Evaluate and test your applications systematically

Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Account administration

Additional resources

Get started

Observability

Trace pages

Thread views

Prompt engineering

Prompt Playground

Prompt Hub pages

Evaluation

Dataset Experiments

Dataset Examples

Annotation Queues

What’s next

Observability

Threads

Prompt Engineering

Evaluation

Account administration

Additional resources

​Get started

​Observability

​Trace pages

​Thread views

​Prompt engineering

​Prompt Playground

​Prompt Hub pages

​Evaluation

​Dataset Experiments

​Dataset Examples

​Annotation Queues

​What’s next

Observability

Threads

Prompt Engineering

Evaluation

Get started

Observability

Trace pages

Thread views

Prompt engineering

Prompt Playground

Prompt Hub pages

Evaluation

Dataset Experiments

Dataset Examples

Annotation Queues

What’s next