prompts
.
ResponseGenerator
class. First, we must create a langchain
LLM object. Below we use ChatVertexAI
, but any of LangChain’s LLM classes may be used instead. Note that InMemoryRateLimiter
is to used to avoid rate limit errors.
ResponseGenerator.generate_responses
to generate 25 responses for each prompt, as is convention for toxicity evaluation.
ToxicityMetrics
. Note that use of torch.device
is optional and should be used if GPU is available to speed up toxicity computation.
StereotypeMetrics
.
CounterfactualGenerator
.
CounterfactualMetrics
.
AutoEval
AutoEval
class conducts a multi-step process that completes all of the aforementioned steps with two lines of code.