Tutorials Archives

March 26, 2025

Abby Morgan

SelfCheckGPT for LLM Evaluation

Detecting hallucinations in language models is challenging. There are three general approaches: Measuring token-level probability distributions for indications that a…

Read

Tutorials LLMOps Comet Community Hub

February 24, 2025

Abby Morgan

LLM Juries for Evaluation

Evaluating the correctness of generated responses is an inherently challenging task. LLM-as-a-Judge evaluators have gained popularity for their ability to…

Read

LLM Juries for Evaluation featured image

Tutorials Machine Learning LLMOps

February 19, 2025

Claire Longo

A Simple Recipe for LLM Observability

So, you’re building an AI application on top of an LLM, and you’re planning on setting it live in production.…

Read

Product Tutorials Machine Learning LLMOps Comet Community Hub

January 28, 2025

Abby Morgan

G-Eval for LLM Evaluation

LLM-as-a-judge evaluators have gained widespread adoption due to their flexibility, scalability, and close alignment with human judgment. They excel at…

Read

Tutorials LLMOps

January 13, 2025

Paul IusztinDecoding ML

Build Multi-Index Advanced RAG Apps

Welcome to Lesson 12 of 12 in our free course series, LLM Twin: Building Your Production-Ready AI Replica. You’ll learn…

Read

illustration of a human face with colored lines and symbols radiating outward to visualize the concept of neural networks

Tutorials LLMOps

January 13, 2025

Paul IusztinDecoding ML

Build a scalable RAG ingestion pipeline using 74.3% less code

Welcome to Lesson 11 of 12 in our free course series, LLM Twin: Building Your Production-Ready AI Replica. You’ll learn…

Read

Tutorials LLMOps Comet Community Hub

December 19, 2024

Abby Morgan

BERTScore For LLM Evaluation

Introduction BERTScore represents a pivotal shift in LLM evaluation, moving beyond traditional heuristic-based metrics like BLEU and ROUGE to a…

Read

Run open source LLM evaluations with Opik!

SelfCheckGPT for LLM Evaluation

LLM Juries for Evaluation

A Simple Recipe for LLM Observability

G-Eval for LLM Evaluation

Build Multi-Index Advanced RAG Apps

Build a scalable RAG ingestion pipeline using 74.3% less code

BERTScore For LLM Evaluation

Products

Learn

Company

Pricing