Opik Archives

March 26, 2025

Abby Morgan

SelfCheckGPT for LLM Evaluation

Detecting hallucinations in language models is challenging. There are three general approaches: Measuring token-level probability distributions for indications that a…

Read

Product Tutorials Machine Learning LLMOps Comet Community Hub

January 28, 2025

Abby Morgan

G-Eval for LLM Evaluation

LLM-as-a-judge evaluators have gained widespread adoption due to their flexibility, scalability, and close alignment with human judgment. They excel at…

Read

Tutorials LLMOps Comet Community Hub

December 19, 2024

Abby Morgan

BERTScore For LLM Evaluation

Introduction BERTScore represents a pivotal shift in LLM evaluation, moving beyond traditional heuristic-based metrics like BLEU and ROUGE to a…

Read

Tutorials LLMOps Comet Community Hub

November 21, 2024

Abby Morgan

Perplexity for LLM Evaluation

Perplexity is, historically speaking, one of the "standard" evaluation metrics for language models. And while recent years have seen a…

Read

Run open source LLM evaluations with Opik!

SelfCheckGPT for LLM Evaluation

G-Eval for LLM Evaluation

BERTScore For LLM Evaluation

Perplexity for LLM Evaluation

Products

Learn

Company

Pricing