Run hyperparameter tuning in parallel¶
When tuning hyper-parameters, you may want to speed up the time it takes to run the search by parallelizing the tuning runs.
Tip
While random and grid search can be run fully in parallel, the bayes search algorithm requires visibility on previous tuning runs for the smart selection of the next hyperparameters to tune so you should be careful to not fully parallelize execution for Bayesian tuning.
How to parallelize Comet Optimizer¶
Comet allows you to parallelize the execution of the Comet Optimizer using the comet optimize
CLI command.
In order to execute Comet Optimizer through command line, you first need to:
- Format your training code inside an executable Python script.
- Move the Optimizer config file to a separate file called
optimizer.config
. - Update the training script to read the optimizer config via
sys.argv
.
Then, you can simply run the comet optimize
command with the -j
or --parallel
argument followed by the number of parallel runs to perform.
For example, if you want to parallelize the hyperparameter tuning across two processes, you can use:
comet optimize -j 2 training_script.py optimizer.config
And Comet will automatically associates split the hyperparameter selections across the two parallel processes.
Note
You execute (maxCombo * trial) / j tuning runs for each parallel process where:
- maxCombo and trial are defined inside the
optimize.config
file, and - j is specified in the command itself (e.g., 2 in the example above).
Discover more about the optimizer configuration in the Configure the Optimizer page.
End-to-end Example¶
This distributed example showcases how to execute the end-to-end example from the Comet Optimizer Quickstart in parallel.
Simply run:
comet optimize -j 2 training_script.py optimizer.config
where:
the
training_script.py
is defined as:training_script.py1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
import comet_ml from sklearn.metrics import accuracy_score from sklearn.datasets import make_classification from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split import sys # Initialize the Comet SDK comet_ml.login(project_name="example-optimizer") # Get the optimizer config file from args optimizer_config = sys.argv[1] # Create a dataset X, y = make_classification(n_samples=5000, n_informative=3, random_state=25) # Split dataset into train and test X_train, X_test, y_train, y_test = train_test_split(X,y,shuffle=True,test_size=0.25,random_state=25) # Initialize the Comet Optimizer opt = comet_ml.Optimizer(config=optimizer_config) # Run optimization for experiment in opt.get_experiments(): # Initialize the algorithm, and set the parameters to be optimized with get_parameter random_forest=RandomForestClassifier( n_estimators=experiment.get_parameter("n_estimators"), criterion=experiment.get_parameter("criterion"), min_samples_leaf=experiment.get_parameter("min_samples_leaf"), random_state=25, ) # Train the model and make predictions random_forest.fit(X_train, y_train) y_hat = random_forest.predict(X_test) # Log the random state and accuracy of each model experiment.log_parameter("random_state", 25) experiment.log_metric("accuracy", accuracy_score(y_test, y_hat)) experiment.log_confusion_matrix(y_test, y_hat) # End the current experiment experiment.end()
and the
optimizer.config
is defined as:optimizer.config1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
{ "algorithm": "bayes", "spec": { "maxCombo": 20, "objective": "minimize", "metric": "loss", "minSampleSize": 500, "retryLimit": 20, "retryAssignLimit": 0, }, "parameters": { "n_estimators": { "type": "integer", "scaling_type": "uniform", "min": 100, "max": 300 }, "criterion": { "type": "categorical", "values": ["gini", "entropy"] }, "min_samples_leaf": { "type": "discrete", "values": [1, 3, 5, 7, 9] }, }, "name": "Bayes Optimization", "trials": 10, }