NEW YORK – September 17, 2024 – Comet, a leading end-to-end model evaluation platform, today announced a vanguard large language model (LLM) evaluation product: Opik. The platform is a true open-source project, with the full suite of tools included free in the source code.
While building LLM-based applications is increasingly prevalent, it remains a challenging task for developers, due to a low tolerance for failure in many use cases. A bridge between software engineering and data science, Opik enables developers to evaluate, test and ship LLM applications with various observability tools designed to improve language model interactions across the development life cycle.
The platform’s three core components contribute to optimizing and benchmarking LLM applications with ease:
-
Observability: Gain visibility with the ability to record, sort, search and understand each step an LLM application takes to generate a response. Manually annotate, view and compare LLM responses. Tracing capabilities are possible during development and in production.
-
Model unit testing: A convenient SDK library allows developers to choose and run metrics, as well as consult built-in LLM judges for complex issues like hallucination detection, factuality and moderation. This automates evaluation and eliminates the need to manually review LLM responses, lending to better scalability.
-
Scoring: Store test cases as datasets, run evaluation experiments and compare the results. Score individual LLM outputs and aggregate performance across application versions.
Fully open source, individual users can download the code from GitHub and run it locally. Opik is also compatible with any LLM, and it comes with a direct OpenAI integration out of the box, allowing developers to work with significant efficiency.
“Comet has been contributing to machine learning open source for seven years and will continue to do so,” said Comet Co-founder and CEO Gideon Mendels. “While we previously open-sourced smaller components of our platform with our ML analysis and visualization tool, Kangas, Opik will allow any developer to evaluate their AI applications and models.”
A highly scalable and industry-compliant version is also available to enterprise teams, which offers additional benefits, such as implementation flexibility, team collaboration and user management for enhanced safety and security.
Opik is a pertinent extension of Comet’s mission to help data scientists, engineers and team leaders accelerate and optimize artificial intelligence. Comet’s tools–focused on experiment management, model management and production monitoring–address fundamental pain points and reduce friction in the AI workflow.
Alongside the company’s dedication to open-source contributions, hundreds of organizations utilize Comet’s platform, including Netflix, Uber, Cisco, Ancestry, Etsy and Zappos. Founded in 2017, Comet is headquartered in New York City, with a remote team spanning 14 countries. Comet has secured $70 million in funding to empower practitioners and teams to achieve business value with AI.
To learn more about Comet and Opik, visit www.comet.com.
About Comet
Comet provides an end-to-end model evaluation platform for AI developers, with best in class LLM evaluation, experiment tracking, and production monitoring. Comet’s platform is trusted by over 150 enterprise customers including Netflix, Cepsa, Etsy, Uber and Zappos. Individuals and academic teams use Comet’s platform to advance research in their fields of study. Founded in 2017, Comet is headquartered in New York, NY with a remote workforce in 14 countries on four continents. Comet is free to individuals and academic teams. Startup, team, and enterprise licensing is also available. To learn more, visit www.comet.com.
Editorial Contact:
Claire Peña
VP of Marketing
clairep@comet.com