LangSmith
evaluations. You would need to first define an evaluator function to judge the results from an agent, such as final outputs or trajectory. Depending on your evaluation technique, this may or may not involve a reference output:
AgentEvals
package:
superset
will accept output trajectory as valid if it’s a superset of the reference one. Other options include: strict, unordered and subset{"messages": [...]}
input messages to call the agent with.{"messages": [...]}
expected message history in the agent output. For trajectory evaluation, you can choose to keep only assistant messages.