Skip to main content

Fine-Tuning Python-API

A unified LLMBoostFinetune API is offered to simplify, customize, and streamline the fine-tuning process for your LLM. LLMBoostFinetune supports multiple models and both NVIDIA and AMD GPUs. Also, it supports customized various fine-tuning hyperparameters and to offer unparalleled fine-tuning speed and efficiency.

Step 0: Knowing LLMBoostFinetune API

LLMBoostFinetune is LLMBoost's main class support large language models fine-tuning. It provides several input arguments option to offer a customizable fine-tuning of LLM. You can click the box blow to learn the detailed explanations of LLMBoostFinetune input parameters.

📌 Click to expand parameter descriptions

Basic Args:

  • stage (str): The finetuning stage to run. Can be chosen from "pt", "sft", "rm", "ppo", "dpo", "kto". - [Required].
  • model_name_or_path (str): Path to the pretrained model or model identifier from HuggingFace's model hub. - [Required].
  • dataset (str): The name or path of the dataset to be used for finetuning. - [Required].
  • template (str): The prompt template or format applied to the dataset for finetuning. - [Required].
  • finetuning_type (str): Type of finetuning to apply (e.g., "full", "lora", "freeze"), which determines which parts of the model are updated. - [Required].

Training Hyperparameters:

  • per_device_train_batch_size (int): Number of samples per batch for each device during training. - [Required].
  • gradient_accumulation_steps (int): Number of steps to accumulate gradients before performing a backward/update pass. - [Required].
  • num_train_epochs (int): Total number of epochs to train the model. - [Required].

Args with Default Values:

  • do_train (bool): Whether to run training, setting to False will only run evaluation. Defaults to True.
  • learning_rate (float): Learning rate to use for training. Defaults to 1e-4.
  • lr_scheduler_type (str): Type of learning rate scheduler to use (e.g., "cosine", "linear", "constant"). Defaults to cosine.
  • overwrite_cache (bool): Whether to overwrite the output dir. Defaults to True.
  • bf16 (bool): Whether to use bfloat16 precision for training (if supported by hardware). Defaults to True.
  • output_dir (str): Directory to save the finetuned model, logs, and checkpoints. Defaults to /workspace/save/finetune/.
  • logging_steps (int): Interval (in steps) at which to log training metrics. Defaults to 100.
  • save_steps (int): Interval (in steps) at which to save model checkpoints. Defaults to 500.
  • plot_loss (bool): Whether to plot and save the training loss curve. Defaults to False.

LLMBoostFinetune also provides an simple function to start the fine-tuning after you instantialize the engine. You can use LLMBoostFinetune.run() to start the fine-tuning.

Step 1: Launch the Docker

Please start the LLMBoost container by running the following command.

docker run --rm -it \
--gpus all \
--network host \
--ipc host \
--uts host \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--group-add video \
--device /dev/dri:/dev/dri \
mangollm/mb-llmboost-training:cuda-prod

Note: You might need permission to access this docker, please contact our support contact@mangoboost.io.

Once you get into the docker, the following commands are all run within the docker container.

Since the model tokenizer is defaultly downloading from huggingface. Please authenticate to HuggingFace via the HuggingFace CLI by running the following command:

# EXAMPLE COMMANDS
export HUGGING_FACE_HUB_TOKEN=<your-hf-token>
huggingface-cli login

Also, to use the full functionality of our fine-tuning software, please also provide your llmboost license within the docker. Please contact us through contact@mangoboost.io if you don't have a llmboost license. You can put your license inside the docker by running:

echo "<your-llmboost-license>" > /workspace/llmboost_license.skm

Step 2: Using LLM Fine-Tuning in a Python program

The belowing python script gives an example to fine-tune a meta-llama/Meta-Llama-3-8B-Instruct model. You can also find this example script in /workspace/apps/examples/finetune/llama3_8b_finetune.py.

# /workspace/apps/examples/finetune/llama3_8b_finetune.py
from llmboost.llmboost_finetune import LLMBoostFinetune

if __name__ == "__main__":
# example of finetuning llama-3-8B
llmboost_finetuner = LLMBoostFinetune(
stage="sft",
model_name_or_path="meta-llama/Meta-Llama-3-8B-Instruct",
dataset="alpaca_en_demo",
template="llama3",
finetuning_type="lora",
per_device_train_batch_size=1,
gradient_accumulation_steps=8,
num_train_epochs=2,
)
llmboost_finetuner.run()

Now you get the python script to fine-tune a Llama model. To start executing the fine-tuning, please run the following command:

Note: The current system doesn't support visualizing the fine-tuning process in GUI. Please select NO when systems prompting whether to visualize or not.

torchrun --nproc_per_node=8 /workspace/apps/examples/finetune/llama3_8b_finetune.py

We also provides several examples to fine-tune meta-llama/Llama-2-70b-chat-hf and mistralai/Mixtral-8x7B-v0.1 to help you quickly get started with LLMBoost's fine-tuning API.

To run multi-node finetuning, you could specify the master node by setting MASTER_ADDR, MASTER_PORT, and RANK for each worker. i.e. torchrun --MASTER_ADDR <master node ip> --MASTER_PORT <port> --RANK <node rank> <script to run>. Please refer to PyTorch's torchrun documentation for more details.