July 29, 2024
In the machine learning (ML) and artificial intelligence (AI) domain, managing, tracking, and visualizing model…
Welcome to issue #11 of The Comet Newsletter! To kick things off this week, we have a really special and exciting update to share with you. We’re thrilled to announce the release of Comet Artifacts, a new set of tools that provides ML teams a convenient way to log, version, and browse data from all parts of their experimentation pipelines.
But while our team has been hard at work on this set of new features, we’ve also been following the latests industry news and perspective, so check out the abridged list of links at the end of this issue to see what our team has been paying attention to.
And be sure to follow us on Twitter and LinkedIn — drop us a note if you have something we should cover in an upcoming issue!
Happy Reading,
Austin
Head of Community, Comet
FROM TEAM COMET | WHAT WE’RE READING
Today, we’re thrilled to introduce Comet Artifacts, a new set of tools that provides ML teams with the capability to log, version, and access their data across entire experimentation workflows.
Machine learning typically involves experimenting with different models, hyperparameters, and different versions of datasets.
In addition to the metrics and parameters that are being measured and tested, machine learning also involves keeping track of the inputs and outputs produced by an experiment. An experiment run can produce all sorts of interesting output data—files containing model predictions, model weights, and much more.
And often, the outputs from one experiment can be used as the inputs for other experiments. This can become complex to track without the right structure or a single source of truth.
In Comet, an Artifact is a versioned object, where each version is an immutable snapshot of files and assets, arranged in a folder-like logical structure. This snapshot can be tracked using metadata, a version number, tags, and aliases. A version tracks which experiments consumed it, and which experiment produced it.
This means that with Artifacts, you can structure your experiments as multi-stage pipelines or DAGs (Directed Acyclic Graphs), and ensure centralized, managed, and versioned access to any of the intermediate data produced in the process.
Specifically, Artifacts enable you and your team to:
For a deeper dive into working with Artifacts—or to jump right in and try it for yourself—check out these additional resources:
And stay tuned! We’ll be exploring more of the use cases and capabilities of Comet Artifacts in the near future.
FROM TEAM COMET | WHAT WE’RE READING
OpenAI Codex, which powers GitHub’s new “AI pair programmer” CoPilot, has been pitched as the future of AI-assisted software development. Built on the foundation GTP-3 architecture, Codex was designed as a unique implementation of the architecture—but in their paper, researches from OpenAI revealed that none of the other various versions of GPT-3 were able to solve any of the code-based problems that Codex is designed to tackle.
The primary implication of this finding is that, while quite powerful, the GPT-3 architecture was initially developed on the premise that enough language models trained on a big enough text dataset would be able to match or outperform models designed for more specialized tasks.
In addition to this dynamic, author Ben Dickson covers some of the other key findings in the research, covering ares like size vs cost, text generation vs understanding, and the responsible use and reporting of these kinds of models.
Read Ben Dickson’s full article here
This compelling article by Rasmus Berg Palm and Pola Schwöbel discuss two essential—and deeply complex—concepts when it comes to how we evaluate ML models: fairness and transparency. The two authors wade into these challenging waters, touching on ML’s potential and pitfalls ion decision making as it relates to race, gender, sexual orientation, and more.
They also pose another fundamental question that we’ll all likely grapple with for years to come—are humans actually better at making the kinds of decisions we’re attempting to automate?
Read the full essay in The Gradient
An incredibly thorough and information-rich resource from Jay Alammar, who focuses on visualizing Transformer architectures via input saliency and neuron activation. This range of visualizations and the accompanying commentary provided a wide variety of entry points into one of the most popular deep learning architectures of the past few years.