Create your own LLM-as-a-judge evaluator
For complete control of evaluator logic, create your own LLM-as-a-judge evaluator and run it using the LangSmith SDK (Python / TypeScript). Requireslangsmith>=0.2.0
An LLM-as-a-judge evaluator consists of three key components:
- Evaluator function: A function that receives the example inputs and application outputs, then uses an LLM to score the quality. The function should return a boolean, number, string, or dictionary with score information.
- Target function: Your application logic being evaluated (wrapped with
@traceablefor observability). - Dataset and evaluation: A dataset of test examples and the
evaluate()function that runs your target function on each example and applies your evaluators.
Example
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.