Skip to main content
LangSmith Polly is an AI assistant embedded directly in your LangSmith workspace to help you analyze and understand your application data. Polly helps you gain insight from your traces, conversation threads, and prompts without having to dig through data manually. By asking natural language questions, you can quickly understand agent performance, debug issues, and analyze user sentiment. LangSmith Polly icon Polly appears in the right-hand bottom corner of the following locations within LangSmith UI:

Observability & Debugging:
  • Projects: Browse and filter runs across a project.
  • Trace pages: Analyze individual runs and execution traces.
  • Thread views: Understand conversation threads and user interactions.
Prompt Engineering: Evaluation & Testing: Polly chat in the sidebar on a dataset view.

Get started

Before you start using Polly, you need to add an API key for the model you’re using: In the LangSmith UI, ensure that your API key is set as a workspace secret.
  1. Navigate to Settings and then move to the Secrets tab.
  2. Select Add secret and enter the key environment variable (e.g.,OPENAI_API_KEY or ANTHROPIC_API_KEY) and your API key as the Value.
  3. Select Save secret.
When adding workspace secrets in the LangSmith UI, make sure the secret keys match the environment variable names expected by your model provider.

Supported models

Polly supports the following model providers out of the box:
  • Anthropic (Claude)
  • OpenAI
  • Google Gemini
  • AWS Bedrock
  • Groq
  • Mistral
  • xAI
  • DeepSeek
  • Fireworks AI
You can also use any custom model you’ve configured in Playground Settings by enabling the Available in Polly toggle on that configuration. Workspace admins manage which custom models are available.

Keyboard shortcuts

ActionMacWindows/Linux
Toggle Polly open/closedCmd+ICtrl+I
Clear current threadCmd+Shift+OCtrl+Shift+O

Observability

Projects

On a project’s run list, Polly can browse and filter runs across the entire project, create datasets, and add examples. Use Polly to quickly explore what’s happening across your traces without manually paging through results. Example questions:
  • “Show me all the failed runs from the last 24 hours”
  • “Which runs took the longest?”
  • “Add the failing runs to my test dataset”
  • “How many runs errored this week?”

Trace pages

On an individual trace, Polly analyzes the run data and execution trajectory. Polly examines the full trace context, including run metadata, inputs, outputs, intermediate steps, and configuration to help you understand what happened and identify areas for improvement. Example questions:
  • “Is there anything that the agent could have done better here?”
  • “Why did this run fail?”
  • “What took the most time in this trace?”
  • “Summarize what happened in this trace”

Thread views

Under the Threads tab, Polly analyzes conversation threads to help you understand user sentiment, conversation outcomes, and interaction patterns. Use Polly to identify user pain points and understand whether issues were resolved. Example questions:
  • “Did the user seem frustrated?”
  • “What issues is the user experiencing?”
  • “Was the user’s problem solved?”
  • “What was the main topic of this thread?”

Prompt engineering

Playground

In the Playground, Polly helps you edit and optimize your prompts. Use automated options like Optimize prompt, Generate a tool, or Generate an output schema, or give Polly custom instructions for editing your prompt. Polly can directly modify the playground state—updating messages, tools, output schemas, and examples—so you can iterate on prompts conversationally. Example questions:
  • “Make it respond in Italian”
  • “Add more context about the user’s role”
  • “Make the tone more professional”
  • “Simplify the instructions”

Prompt Hub pages

When viewing a prompt in the LangSmith Hub, Polly helps you understand the prompt’s structure, messages, tools, and configuration. This is useful for exploring and learning from shared prompts. Example questions:
  • “What does this prompt do?”
  • “What tools does this prompt use?”
  • “Explain the structure of this prompt”
  • “What are the key instructions in this prompt?”

Evaluation

Dataset Experiments

On the Datasets page under the Experiments tab, Polly analyzes experiment results and helps you compare runs across different experiments. Polly can identify patterns, summarize performance, and help you understand which approaches work best. Example questions:
  • “Which experiment performed best?”
  • “What are the main differences between these runs?”
  • “Summarize the results of this experiment”
  • “What patterns do you see in the failures?”

Dataset Examples

On the Datasets page under the Examples tab, Polly helps you understand your dataset structure, browse examples, and identify data patterns. This is useful for understanding what data you’re working with and preparing datasets for experiments. Example questions:
  • “What type of data is in this dataset?”
  • “Show me examples with errors”
  • “What patterns do you see in the inputs?”
  • “How many examples are in this dataset?”

Annotation Queues

In Annotation Queues, Polly helps you analyze runs before making annotation decisions. Whether you’re reviewing runs individually or comparing them pairwise, Polly provides insights into run behavior, errors, and execution patterns to inform your scoring. Example questions:
  • “What went wrong in this run?”
  • “Summarize what happened in this run”
  • “Compare these two runs”
  • “What should I consider when scoring this?”

Evaluators

In the Evaluators builder, Polly helps you write and refine evaluator logic. Polly can generate evaluator code, suggest improvements, and help you test your evaluator against examples. Example questions:
  • “Write an evaluator that checks for hallucinations”
  • “Improve the accuracy of this evaluator”
  • “What does this evaluator check for?”
  • “Add handling for edge cases”

What’s next

Learn more about the features that Polly helps you explore:

Observability

Learn more about tracing and monitoring your LLM applications

Threads

Understand how threads work in LangSmith

Prompt Engineering

Create and iterate on prompts in the Playground

Evaluation

Evaluate and test your applications systematically