Skip to content

Integrate with unsloth¶

unsloth dramatically improves the speed and efficiency of LLM fine-tuning for models including Llama, Phi-3, Gemma, Mistral, and more. For a full listed of 100+ supported unsloth models, see here.

Instrument your runs with Comet to start managing experiments, create dataset versions and track hyperparameters for faster and easier reproducibility and collaboration.

Open In Colab

Comet SDKMinimum SDK versionMinimum transformers version
Python-SDK3.31.54.43.0

Comet + unsloth dashboard

Start logging¶

Connect Comet to your existing code by adding in a simple Comet Experiment.

Add the following lines of code to your script or notebook:

import comet_ml
import torch
from transformers import TrainingArguments
from trl import SFTTrainer
from unsloth import FastLanguageModel, is_bfloat16_supported

comet_ml.login()
exp = comet_ml.Experiment(project_name = <YOUR-PROJECT-NAME>)

# 1. Enable logging of model checkpoints
os.environ["COMET_LOG_ASSETS"] = "True"

# 2. Define your model and tokeniker
model, tokenizer = FastLanguageModel.from_pretrained(
    ...
)

# 3. Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    ...
)

# 4. Train your model

trainer = SFTTTrainer(
...
)

trainer.train()

Note

There are other ways to configure Comet. See more here.

Log automatically¶

By integrating with Transformers Trainer object, Comet automatically logs the following items, with no additional configuration:

  • Metrics (such as loss and accuracy)
  • Hyperparameters
  • Assets (such as checkpoints and log files)

Note

Don't see what you need to log here? We have your back. You can manually log any kind of data to Comet using the Experiment object. For example, use experiment.log_image to log images, or experiment.log_audio to log audio.

End-to-end example¶

Get started with a basic example of using Comet with unsloth and Llama-3.1 (8b).

You can check out the results of this example unsloth experiment for a preview of what's to come.

The following is a basic example of using Comet with unsloth.

Install dependencies¶

pip install comet_ml pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git" "torch>=2.4.0" xformers trl peft accelerate bitsandbytes triton

Run the example¶

import os

import comet_ml

# Enable logging of model checkpoints
os.environ["COMET_LOG_ASSETS"] = "True"

comet_ml.login(project_name="comet-example-unsloth")

import torch
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import FastLanguageModel, is_bfloat16_supported

max_seq_length = 2048
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True

# Download model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Meta-Llama-3.1-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = HF_TOKEN
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # rank stabilized LoRA
    loftq_config = None, # LoftQ
)

# Data download and preparation
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

dataset = load_dataset("yahma/alpaca-cleaned", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)

# Train the model
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
    ),
)

trainer_stats = trainer.train()

Try it out!¶

Don't just take our word for it, try it out for yourself.

Configure Comet with Unsloth¶

Unsloth is using the same configuration method as Comet's Transformer integration.

For more information about using environment parameters in Comet, see Configure Comet.

Sep. 17, 2024