ludwig
Ludwig is a TensorFlow-based toolbox that allows users to train and test deep learning models without the need to write code. By offering a well-defined, codeless deep learning pipeline from beginning to end, Ludwig enables practitioners and researchers alike to quickly train and test their models and obtain strong baselines to compare experiments against. Ludwig offers CLI commands for preprocessing data, training, issuing predictions, and visualizations.
Running Ludwig with Comet¶
Install Ludwig¶
Install Ludwig for Python (and spacy for English as a dependency since we're using text features for this example). The following examples have been tested with Python 3.6 and Ludwig 0.2.
shell
$ pip install ludwig
$ python -m spacy download en
If you encounter problems installing gmpy
please install libgmp
or
gmp
. On Debian-based Linux distributions: sudo apt-get
install libgmp3-dev
. On MacOS: brew install gmp
.
Install Comet¶
If you haven't already, install comet_ml
.
bash
$ pip install comet_ml
Make sure to set up your Comet credentials. Get your API key at www.comet.ml
Make your API key available to Ludwig and set which Comet project
you’d like the Ludwig experiment details to report to. Replace the
following ...
with the appropriate values:
bash
$ export COMET_API_KEY="..."
$ export COMET_PROJECT_NAME="..."
We recommend that you create a new directory for each Ludwig experiment.
bash
$ mkdir experiment1
$ cd experiment1
Some background: every time you want to create a new model and train it, you will use one of two commands:
ludwig train
ludwig experiment
Once you run these commands with the --comet
flag, a .comet.config
file is created in the current directory. This .comet.config
file
pulls your API key and Comet Project name from the environment
variables you set above and creates an Experiment key for use in this
directory.
If you want to run another experiment, it is recommended that you create a new directory (and thus it will create another Experiment).
Download the dataset¶
For this example, we will be working on a text classification problem with the Reuters-21578, a well-known newswire dataset. It only contains 21,578 newswire documents grouped into 6 categories. Two are 'big' categories (many positive documents), two are 'medium' categories, and two are 'small' categories (few positive documents).
- Small categories: heat.csv, housing.csv
- Medium categories: coffee.csv, gold.csv
- Big categories: acq.csv, earn.csv
To get the dataset, we use the curl
command-line program:
bash
$ curl http://boston.lti.cs.cmu.edu/classes/95-865-K/HW/HW2/reuters-allcats-6.zip \
-o reuters-allcats-6.zip
$ unzip reuters-allcats-6.zip
You can also just download the file and place it in this directory.
Define the model¶
Define the model we wish to build with the input and output features
we want. Create a file named model_definition.yaml
with these
contents:
``` input_features: - name: text type: text level: word encoder: parallel_cnn
output_features: - name: class type: category ```
Train the Model¶
Train the model with the new --comet
flag:
bash
$ ludwig experiment --comet --data_csv reuters-allcats.csv \
--model_definition_file model_definition.yaml
Once you run this, a Comet experiment will be created. Check your output for that Comet experiment URL and press on that URL.
Analysis¶
In Comet (even while the above experiment is being run), you’ll be able to see:
- your live model metrics in real-time on the Charts tab
- the bash command you ran to train your experiment along with any run arguments in the Code tab
- hyperparameters that Ludwig is using (defaults) in the Hyper parameter tab and much more!
If you choose to make any visualizations with Ludwig, it’s also possible to upload these visualizations to Comet’s Image Tab by running:
bash
$ ludwig visualize --comet \
--visualization learning_curves \
--training_statistics \
./results/experiment_run_0/training_statistics.json
To keep up to date with Ludwig, consider these resources: