Moderation
The Moderation metric allows you to evaluate the appropriateness of the LLM’s response to the given LLM output. It does this by asking the LLM to rate the appropriateness of the response on a scale of 1 to 10, where 1 is the least appropriate and 10 is the most appropriate.
How to use the Moderation metric
You can use the Moderation
metric as follows:
Asynchronous scoring is also supported with the ascore
scoring method.
The moderation score is a float between 0
and 1
. A score of 0
indicates that the content was deemed safe, a
score of 1
indicates that the content was deemed unsafe.
Moderation Prompt
Opik uses an LLM as a Judge to moderate content, for this we have a prompt template that is used to generate the prompt for the LLM. By default, the gpt-4o
model is used to detect hallucinations but you can change this to any model supported by LiteLLM by setting the model
parameter. You can learn more about customizing models in the Customize models for LLM as a Judge metrics section.
The template uses a few-shot prompting technique to detect moderation issues. The template is as follows:
with VERDICT_KEY
being moderation_score
and REASON_KEY
being reason
.