Skip to content

Integrate with GPT-NeoX¶

GPT-NeoX is a library for efficiently training large language models with tens of billions of parameters in a multimachine distributed context. This library is currently maintained by EleutherAI.

Instrument your runs with Comet to start managing experiments, create dataset versions and track hyperparameters for faster and easier reproducibility and collaboration.

Comet SDKMinimum SDK versionMinimum GPT-NeoX version
Python-SDK3.45.0master

Start logging¶

Add the following config or create a separate configuration file:

{
  "use_comet": true,
  "comet_project": "<your-project-name>",
  "comet_experiment_name": "<your-experiment-name>",
  "comet_tags": ["<experiment-tag>"],
  "comet_others": { "<experiment-other-name>": "<experiment-other-value>" },
}

Tip

Find a full list of GPT-NeoX recipe configs here.

Log automatically¶

When using the Integration, Comet automatically logs the following items, by default, with no additional configuration:

  • Training metrics like train/lm_loss, timers/forward and runtime/flops_per_sec_per_gpu.
  • All hyperparameters like data_path, DeepSpeed configuration and anything else included in the config file.

End-to-end example¶

The following is a basic example of using Comet with GPT-NeoX using GPT2-3B.

Clone the repo¶

git clone https://github.com/EleutherAI/gpt-neox/

Install dependencies¶

python -m pip install -r gpt-neox/requirements/requirements.txt -r gpt-neox/requirements/requirements-comet.txt

Log-in to Comet¶

comet_ml login

Download the training dataset¶

cd gpt-neox && python prepare_data.py enwik8

Write the GPT-NeoX config with Comet¶

Write the following config file to gpt-neox/configs/comet.yml:

{ "use_comet": true }

Run the example on a single node¶

python ./gpt-neox/deepy.py ./gpt-neox/train.py ./gpt-neox/configs/1-3B.yml ./gpt-neox/configs/slurm_local.yml ./gpt-neox/configs/comet.yml
Nov. 18, 2024