Integrate with Ludwig Toolbox¶
Comet integrates with Ludwig Toolbox.
Ludwig is a TensorFlow-based toolbox that lets you to train and test deep learning models without the need to write code. By offering a well-defined, codeless deep learning pipeline from beginning to end, Ludwig enables practitioners and researchers alike to quickly train and test their models and obtain strong baselines to compare experiments against. Ludwig offers CLI commands for preprocessing data, training, issuing predictions, and visualizations.
Install Ludwig¶
Install Ludwig for Python (and spacy for English as a dependency, since we're using text features for this example). The following examples have been tested with Python 3.6 and Ludwig 0.2.
$ pip install ludwig
$ python -m spacy download en
If you encounter problems installing gmpy, install libgmp or gmp:
- On Debian-based Linux distributions:
sudo apt-get install libgmp3-dev
- On MacOS:
brew install gmp
Install Comet¶
- If you haven't already, install Comet:
$ pip install comet_ml
Log on to Comet.
Make sure to set up your Comet credentials. Get your API key in the Settings page.
Make your API key available to Ludwig and set which Comet project you’d like the Ludwig experiment details to report to. Replace the following
...
with the appropriate values:
$ export COMET_API_KEY="..."
$ export COMET_PROJECT_NAME="..."
We recommend that you create a new directory for each Ludwig experiment.
Some background: every time you want to create a new model and train it, you will use one of two commands:
ludwig train
ludwig experiment
Once you run these commands with the --comet
flag, a .comet.config
file is created in the current directory. This .comet.config
file pulls your API key and Comet Project name from the environment variables you set above and creates an Experiment key for use in this directory.
If you want to run another Experiment, it is recommended that you create a new directory (and thus it will create another Experiment).
Download the dataset¶
For this example, we will be working on a text classification problem with the Reuters-21578, a well-known newswire dataset. It only contains 21,578 newswire documents grouped into six categories. Two are 'big' categories (many positive documents), two are 'medium' categories, and two are 'small' categories (few positive documents).
- Small categories: heat.csv, housing.csv
- Medium categories: coffee.csv, gold.csv
- Big categories: acq.csv, earn.csv
To get the dataset, we use the curl command-line program:
$ curl http://boston.lti.cs.cmu.edu/classes/95-865-K/HW/HW2/reuters-allcats-6.zip \
-o reuters-allcats-6.zip
$ unzip reuters-allcats-6.zip
You can also just download the file and place it in this directory.
Define the model¶
Define the model you wish to build with the input and output features you want. Create a file named model_definition.yaml
with these contents:
input_features:
-
name: text
type: text
level: word
encoder: parallel_cnn
output_features:
-
name: class
type: category
Train the model¶
Train the model with the new --comet
flag:
$ ludwig experiment --comet --data_csv reuters-allcats.csv \
--model_definition_file model_definition.yaml
Once you run this, a Comet experiment will be created. Check your output for that Comet experiment URL.
Analysis¶
In Comet (even while the above Experiment is being run), you’ll be able to see:
- Your live model metrics in real-time on the Charts tab.
- The bash command you ran to train your Experiment along with any run arguments in the Code tab.
- Hyperparameters that Ludwig is using (defaults) in the Hyperparameter tab and much more!
If you choose to make any visualizations with Ludwig, it’s also possible to upload these visualizations to Comet’s Image tab by running:
$ ludwig visualize --comet \
--visualization learning_curves \
--training_statistics \
./results/experiment_run_0/training_statistics.json
To keep up to date with Ludwig, consider these resources: