Repetitions
Repetitions run an experiment multiple times to account for LLM output variability. Since LLM outputs are non-deterministic, multiple repetitions provide a more accurate performance estimate. Configure repetitions by passing thenum_repetitions argument to evaluate / aevaluate (Python, TypeScript). Each repetition re-runs both the target function and all evaluators.
Learn more in the repetitions how-to guide.
Concurrency
Concurrency controls how many examples run simultaneously during an experiment. Configure it by passing themax_concurrency argument to evaluate / aevaluate. The semantics differ between the two functions:
evaluate
The max_concurrency argument specifies the maximum number of concurrent threads for running both the target function and evaluators.
aevaluate
The max_concurrency argument uses a semaphore to limit concurrent tasks. aevaluate creates a task for each example, where each task runs the target function and all evaluators for that example. The max_concurrency argument specifies the maximum number of concurrent examples to process.
Caching
Caching stores API call results to disk to speed up future experiments. Set theLANGSMITH_TEST_CACHE environment variable to a valid folder path with write access. Future experiments that make identical API calls will reuse cached results instead of making new requests.
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.