Skip to content

Integrate with Kubeflow¶

Comet integrates with Kubeflow.

Kubeflow is an open-source machine learning platform that enables using machine learning pipelines to orchestrate complicated workflows running on Kubernetes.

Visualize Kubeflow pipelines¶

The Comet-Kubeflow integration allows you to track both individual tasks and the state of the pipeline as a whole. The state of the pipeline can be visualized using the Kubeflow Panel available in the featured tab.

kubeflow-integration

You can recreate the view above by performing these steps:

  1. Grouping experiments by kubeflow_run_name.
  2. Adding the Kubeflow panel available in the featured tab.
  3. Saving the view as Kubeflow dashboard.

The Kubeflow panel can be used to either visualize the latest state of the DAG or the static graph similar to what is available in the Kubeflow UI. In addition all tasks that have the Comet logo are also tracked in Comet and can be accessed by clicking on the task.

Integration overview¶

The Kubeflow integration is performed in two steps:

  1. Adding a Comet logger component to track the state of the pipeline.
  2. Adding Comet to each individual tasks that you would like track as a Comet experiment.

Log the state of a pipeline¶

The Kubeflow integration relies on a component that runs in parallel to the rest of the pipeline. The comet_logger_component logs the state of the pipeline as whole and reports it back to the Comet UI allowing you to track the pipeline progress as a whole. This component can be used like this:

import comet_ml.integration.kubeflow
import kfp.dsl as dsl

@dsl.pipeline(name='ML training pipeline')
def ml_training_pipeline():
    # Add the Comet logger component
    workflow_uid = "{{workflow.uid}}"

    comet_ml.integrations.kubeflow.comet_logger_component(
        workflow_uid=workflow_uid,
        api_key="<>",
        workspace="<>",
        project_name="<>")

    # Rest of the training pipeline

The string "{{workflow.uid}}"is a placeholder that is replaced at runtime by Kubeflow. You do not need to update it before running the pipeline.

The Comet logger component can also be configured using environment variables, in which case you will not need to specify the api_key, workspace or project_name arguments.

Log the state of each task¶

In addition to the component, you will need to initialize the Kubeflow task logger within each task of the pipeline that you wish to run. This can be done by creating an experiment within each task and using the initialize_comet_logger function:

def my_component() -> None:
    import comet_ml.integration.kubeflow

    experiment = comet_ml.start(
        api_key="<Your API Key>", project_name="<Your Project Name>"
    )
    workflow_uid = "{{workflow.uid}}"
    pod_name = "{{pod_name}}"

    comet_ml.integration.kubeflow.initialize_comet_logger(
        experiment, workflow_uid, pod_name
    )

    # Rest of the task code

The strings "{{workflow.uid}}", "{{pod_name}}" are placeholders that are replaced at runtime by Kubeflow. You do not need to update them before running the pipeline.

Note

There are alternatives to setting the API key programatically. See more here.

Nov. 18, 2024