Skip to main content
When running evaluations on large datasets, you may encounter failures on a small subset of examples due to rate limits, network issues, or other transient errors. Rather than re-running the entire evaluation, you can identify and retry only the failed examples on an experiment. This guide shows an approach to build retry logic into your evaluation workflow and to retry only the failed examples. You can use the error_handling='ignore' parameter to skip logging errored runs, then automatically identify unsuccessful examples and re-run them in Python.

Step 1. Run the initial evaluation

Run the initial evaluation, ignoring errors to prevent errored runs from being logged:
from langsmith import Client

client = Client()

# Run initial evaluation, ignoring errors
# error_handling='ignore' prevents errored runs from being logged
results = await client.aevaluate(
    target,
    data="dataset",
    evaluators=[your_evaluators],
    error_handling='ignore'
)

Step 2. Retry on failed examples and log to same experiment

Fetch all the unsuccessful examples:
# Identify unsuccessful examples
runs = client.list_runs(project_name=results.experiment_name)
successful_example_ids = [r.reference_example_id for r in runs]
unsuccessful_examples = (e for e in client.list_examples(dataset_name="dataset") if e.id not in successful_examples)
Next, re-run all the failed examples and log them to the same experiment:
# Retry only the failed examples, log
results_retry = await client.aevaluate(
    target,
    unsuccessful_examples,
    evaluators=[your_evaluators],
    experiment=results.experiment_name,
    error_handling='ignore'
)

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.