Docs by LangChain home page
Python
Search...
⌘K
LangSmith
Platform for LLM observability and evaluation
Overview
Concepts
Evaluation approaches
Datasets
Create a dataset
Manage datasets
Set up evaluations
Run an evaluation
Evaluation types
Frameworks & integrations
Evaluation techniques
Improve evaluators
Tutorials
Analyze experiment results
Analyze an experiment
Compare experiment results
Filter experiments in the UI
Fetch performance metrics for an experiment
Upload experiments run outside of LangSmith
Annotation & human feedback
Use annotation queues
Set up feedback criteria
Annotate traces and runs inline
Audit evaluator scores
Common data types
Example data format
Dataset prebuilt JSON schema types
Dataset transformations
Docs by LangChain home page
Python
Search...
⌘K
GitHub
Forum
Forum
Search...
Navigation
Evaluation
Get started
Observability
Evaluation
Prompt engineering
Self-hosting
Administration
Get started
Observability
Evaluation
Prompt engineering
Self-hosting
Administration
GitHub
Forum
Evaluation
Copy page
Copy page
Welcome to the LangSmith Evaluation documentation. The following sections help you create datasets, run evaluations, and analyze results:
Datasets
:
Create
and
manage
datasets for evaluation, including creating datasets through the UI or SDK and managing existing datasets.
Evaluations
:
Run evaluations
on your applications using various methods and techniques, including different evaluator types and evaluation techniques.
Analyze experiment results
:
View and analyze your evaluation results
, including comparing experiments, filtering results, and downloading data.
Annotation & human feedback
: Collect human feedback on your application outputs through
annotation queues
and
inline annotation
.
Tutorials
: Follow step-by-step tutorials to evaluate different types of applications, from
chatbots
to
complex agents
.
For terminology definitions and core concepts, refer to the
introduction on evaluation
.
Was this page helpful?
Yes
No
Concepts
Assistant
Responses are generated using AI and may contain mistakes.