Integrate with SageMaker¶
Many ML practioners use AWS SageMaker in combination with Comet.
Comet is best used for experiment management, artifact management, production monitoring, and as the go-to UI and reporting tool for data science teams. Sagemaker provides a complementary set of tools for infrastructure, resource management and compute (training, orchestration, deployment).
Sagemaker Model Training¶
Training job with a custom script and container¶
If you can use a custom training script and container, you can use the Comet Python SDK directly, see the following examples to get started. Check out the Quickstart guide for more information.
Training job with Built-in algorithms¶
If you trained a model with one of Sagemaker Built-in Algorithm and you cannot change the container or training script, you can still import your Sagemaker Training jobs as Comet Experiment.
The recommended way is to use comet_ml.integration.sagemaker.log_sagemaker_training_job_v1
if you have access to the Estimator object. This will import the latest training job that was scheduled using that Estimator object.
You can also use comet_ml.integration.sagemaker.log_sagemaker_training_job_by_name_v1
if you have the name of the Sagemaker Training job that you want to import.
Lastly, you can use comet_ml.integration.sagemaker.log_last_sagemaker_training_job_v1
to import the last sagemaker training job, you should only use this function if you are the only person using this AWS account.
Log automatically¶
When a Sagemaker training job is imported as a new Comet experiment, the following metadata are logged:
- All Hyper-Parameters.
- All metrics defined in the Algortihm Definition.
- Pip packages from the environment where
comet_ml.integration.sagemaker.log_*
is called. - If you are calling
comet_ml.integration.sagemaker.log_*
from an Ipython environment (like Sagemaker Studio or Sagemaker hosted notebook), the source code of the notebook. - Tags as Experiment tags.
- The following Sagemaker metadata fields as Comet Other fields:
- BillableTimeInSeconds
- EnableInterContainerTrafficEncryption
- EnableManagedSpotTraining
- EnableNetworkIsolation
- RoleArn
- TrainingJobArn
- TrainingJobName
- TrainingJobStatus
- TrainingTimeInSeconds
- TrainingImage
- TrainingInputMode
- All metadata for "ModelArtifacts"
- All metadata for "OutputDataConfig"
- All metadata for "ResourceConfig"
- All metadata for "InputDataConfig"