Kubeflow Integration¶
Kubeflow is a machine learning toolkit built to run on Kubernetes that allows you, in part, to orchestrate and schedule machine learning training runs.
The Comet integration allows you to track not only track single tasks but also the state of the pipeline as a whole.
Visualizing Kubeflow pipelines¶
The Kubeflow integration allows you to track both individual tasks and the state of the pipeline as a whole. The state of the pipeline can be visualized using the Kubeflow Panel available in the featured
tab.
You can re-create the view above by:
- Grouping experiments by
kubeflow_run_name
- Adding the
Kubeflow panel
available in the featured tab - Saving the view as
Kubeflow dashboard
The Kubeflow panel
can be used to either visualize the latest state of the DAG or the static graph similar to what is available in the Kubeflow UI. In addition all tasks that have the Comet logo are also tracked in Comet and can be accessed by clicking on the task.
Integration Overview¶
The Kubeflow integration relies on two steps: 1. Adding a Comet logger component to track the state of the pipeline 2. Adding Comet to each individual tasks that you would like track as a Comet experiment
Logging the state of a pipeline¶
The Kubeflow integration relies on a component that runs in parallel to the rest of the pipeline. The comet_logger_component
logs the state of the pipeline as whole and reports it back to the Comet UI allowing you to track the pipeline progress as a whole. This component can be used like this:
```python import comet_ml.integration.kubeflow import kfp.dsl as dsl
@dsl.pipeline(name='ML training pipeline') def ml_training_pipeline(): # Add the Comet logger component workflow_uid = "{{workflow.uid}}"
comet_ml.integrations.kubeflow.comet_logger_component(
workflow_uid=workflow_uid,
api_key="<>",
workspace="<>",
project="<>")
# Rest of the training pipeline
```
The string "{{workflow.uid}}"
is a placeholder that is replaced at runtime by Kubeflow, you do not need to update it before running the pipeline.
The Comet logger component can also be configured using environment variables, in which case you will not need to specify the api_key
, workspace
or project
arguments.
Logging the state of each task¶
In addition to the component, you will need to initialize the Kubeflow task logger within each task of the pipeline that you wish to run. This can be done by creating an experiment within each task and using the initialize_task_logger
function:
```python def my_component() -> None: import comet_ml.integration.kubeflow
experiment = comet_ml.Experiment()
workflow_uid = "{{workflow.uid}}"
pod_name = "{{pod_name}}"
comet_ml.integration.kubeflow.initialize_comet_logger(experiment, workflow_uid, pod_name)
# Rest of the task code
```
The strings "{{workflow.uid}}"
, "{{pod_name}}"
are placeholders that are replaced at runtime by Kubeflow, you do not need to update them before running the pipeline.