To define code evaluators programmatically using the SDK, refer to How to define a code evaluator (SDK).
Step 1. Create the evaluator
- Create an evaluator from one of the following pages in the LangSmith UI:
- In the playground or from a dataset: Select the + Evaluator button.
- Select Add rules, configure your rule and select Apply evaluator.
- Give your evaluator a clear name that describes what it measures (e.g., “Exact Match”).
- Select Create code evaluator from the evaluator type options.
Step 2. Write your evaluator code
Custom code evaluators restrictions.Allowed Libraries: You can import all standard library functions, as well as the following public packages:Network Access: You cannot access the internet from a custom code evaluator.
perform_eval and should:
- Accept
runandexampleparameters. - Access data via
run['inputs'],run['outputs'], andexample['outputs']. - Return a dictionary where each key is a metric name and each value is the score for that metric. Each key represents a piece of feedback you want to return. For example,
{"correctness": 1, "silliness": 0}would create two pieces of feedback on the run.
Function signature
Example: Exact match evaluator
Example: Input-based evaluator
Step 3. Test and save
- Test your evaluator on example data to ensure it works as expected
- Click Save to make the evaluator available for use
Use your code evaluator
Once created, you can use your code evaluator:- When running evaluations from the playground
- As part of a dataset to automatically run evaluations on experiments
Related
- LLM-as-a-judge evaluator (UI): Use an LLM to evaluate outputs
- Composite evaluators: Combine multiple evaluator scores