Using NCSA Delta

Introduction

We are using the NCSA Delta cluster, and there is a lengthy usage guide. The “right way” to run a long task, such as fine-tuning is to use the sbatch command, which is described here.

However, I recommend using JupyterLab instead. It will be make it a lot easier to debug problems that you will inevitably encounter the first time you try to fine-tune models.

When you create a Jupyter notebook, make sure you select:

bchk-delta-gpu as the name of the account
A partition with GPUs. I recommend gpuA40x4
A Duration of job that seems reasonable. E.g., 1-2 hours.
Sufficient CPU (e.g., 8) and RAM (48GB). The more CPU/RAM you select, the longer you may need to wait for the job to start.
The number of GPUs should be 1, unless you are actually experimenting with multi-GPU code that I haven’t provided.
The working directory should be /scratch/bchk/<USERNAME>.

One-Time Setup

You have a login on the Delta cluster, and can use the bchk account. Your disk space in your home directory is limited, but you have more space on /scratch/bchk/<USERNAME>. You can setup models and an environment yourself.

But, I recommend using the environment we have set up for you and the models that we have already installed.

Start a Jupyter session and from the terminal run the following:

source /scratch/bchk/aguha/venv/bin/activate
python -m ipykernel install --user --name=shared-venv

The first line activates a Python virtual environment that has everything needed to fine-tune recent models. The second line adds the virtual environment to your Jupyter session.

Fine-Tuning

I recommend using the Hugging Face TRL library to fine-tune models, specifically the supervised fine-tuning trainer. Here is a little script to fine-tune Llama 3 on GSM8K that you can adapt.

import torch
from datasets import load_dataset
from transformers import AutoModelForCausalLM
from trl import SFTConfig, SFTTrainer

def format_item(item):
    q = item["question"].strip()
    answer = item["answer"].strip()
    explanation, a = answer.split("####")
    explanation = explanation.strip()
    a = a.strip()
    return { "content": f"Problem: {q}\nAnswer: {a}\nExplanation: {explanation}" }

data_dict = load_dataset(
    "openai/gsm8k",
    "main",
    split="train"
).train_test_split(
    0.01
).map(
    format_item
).remove_columns(
    ["question", "answer"]
)

sft_config = SFTConfig(
    dataset_text_field="content",
    max_seq_length=2048,
    output_dir="finetuned",
    learning_rate=3e-05,
    lr_scheduler_type="cosine",
    num_train_epochs=5,
    per_device_train_batch_size=8,
    bf16=True,
    logging_steps=10,
)

model = AutoModelForCausalLM.from_pretrained(
    "/scratch/bchk/aguha/models/llama3p2_1b_base",
    torch_dtype=torch.bfloat16,
).to("cuda")

trainer = SFTTrainer(
    model,
    train_dataset=data_dict["train"],
    eval_dataset=data_dict["test"],
    args=sft_config,
)
trainer.train()

There are a few other models in /scratch/bchk/aguha/models that you can use instead.

Acknowledgements

For this class, we use the Delta class at the National Center for Supercomputing Applications through allocation CIS240017 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, which is supported by U.S. National Science Foundation grants #2138259, #2138286, #2138307, #2137603, and #2138296.