Repetitions
Repetitions run an experiment multiple times to account for LLM output variability. Since LLM outputs are non-deterministic, multiple repetitions provide a more accurate performance estimate. Configure repetitions by passing thenum_repetitions argument to evaluate / aevaluate (Python, TypeScript). Each repetition re-runs both the target function and all evaluators.
Learn more in the repetitions how-to guide.
Concurrency
Concurrency controls how many examples run simultaneously during an experiment. Configure it by passing themax_concurrency argument to evaluate / aevaluate. The semantics differ between the two functions:
evaluate
The max_concurrency argument specifies the maximum number of concurrent threads for running both the target function and evaluators.
aevaluate
The max_concurrency argument uses a semaphore to limit concurrent tasks. aevaluate creates a task for each example, where each task runs the target function and all evaluators for that example. The max_concurrency argument specifies the maximum number of concurrent examples to process.
Caching
Caching stores API call results to disk to speed up future experiments. Set theLANGSMITH_TEST_CACHE environment variable to a valid folder path with write access. Future experiments that make identical API calls will reuse cached results instead of making new requests.