Welcome back, data scientists! In my previous post, we explored how easy it is to set up a machine learning environment with Canonical’s Data Science Stack (DSS) and run your first model using Hugging Face’s Smol Course. Today, let’s take it a step further with experiment tracking. Experimentation is at the heart of data science, and having the right tool to support it can make all the difference. That’s why we bundle MLFlow in DSS – to help you track, compare, and reproduce your experiments effortlessly.
When you’re exploring new ideas and fine-tuning models, it can be challenging to keep track of all your experiments manually. Imagine having to remember which hyperparameters led to which results or trying to reproduce an experiment you did weeks ago. MLFlow solves this problem by automatically logging your experiment details – from parameters and metrics to model artifacts – so you can always pick up where you left off.
To access MLFlow from your DSS environment, type the following in your terminal:
dss status
Copy-paste the MLFlow URL in your browser, and you’ll be directed to the MLFlow UI. Now it’s probably empty.
Let’s work on top of our previous fine-tuning example, and see how we can start tracking our training runs with just a few lines of code.
Note: before proceeding, make sure to install the mlflow dependencies (we are restricting the packaging
library to avoid conflicts with the HuggingFace Smol Course dependencies):
pip install mlflow
pip install "packaging
This snippet builds on the code from our previous article, adding experiment tracking capabilities that log key parameters, capture metrics, and store the model artifact.
import mlflow
mlflow.set_experiment("FineTuning")
with mlflow.start_run():
# Log key configuration parameters
mlflow.log_param("max_steps", 3)
mlflow.log_param("batch_size", 4)
mlflow.log_param("learning_rate", 5e-5)
mlflow.log_param("logging_steps", 10)
mlflow.log_param("save_steps", 3)
mlflow.log_param("eval_steps", 2)
mlflow.log_param("use_mps_device", True if device == "mps" else False)
mlflow.log_param("hub_model_id", finetune_name)
# Configure the SFTTrainer
sft_config = SFTConfig(
output_dir="./sft_output",
max_steps=3, # Adjust based on dataset size and desired training duration
per_device_train_batch_size=4, # Set according to your GPU memory capacity
learning_rate=5e-5, # Common starting point for fine-tuning
logging_steps=10, # Frequency of logging training metrics
save_steps=3, # Frequency of saving model checkpoints
evaluation_strategy="steps", # Evaluate the model at regular intervals
eval_steps=2, # Frequency of evaluation
use_mps_device=(True if device == "mps" else False), # Use MPS for mixed precision training
hub_model_id=finetune_name, # Set a unique name for your model)
# Initialize the SFTTrainer
trainer = SFTTrainer(
model=model,
args=sft_config,
train_dataset=ds["train"],
tokenizer=tokenizer,
eval_dataset=ds["test"])
# Train the model
trainer.train()
# Save the model
trainer.save_model(f"./{finetune_name}")
# Log the saved model as an MLFlow artifact
mlflow.pytorch.log_model(model, "fine_tuned_model")
If you run this snippet, you will log your experiment and your model, alongside its parameters, to the MLFLow dashboard.
If you want to automatically explore various parameters, or even be smart about it and automatically iterate on some specific hyperparameters, you could do the following:
# Define a list of learning rates to experiment with
learning_rates = [5e-5, 3e-5, 1e-5]
for lr in learning_rates:
with mlflow.start_run():
# Log the current learning rate
mlflow.log_param("learning_rate", lr)
# add below the same code as above, but parameterize the learning rate with this new variable
In this snippet, we iterate over a list of learning rates to explore how each setting impacts the model. For each learning rate, we start a new MLFlow run to log the experiment parameters, train the model, and save the fine-tuned model. This enables you to later compare the results across different runs.
After a few training runs with custom parameters, you’ll see something like this in the MLFlow dashboard:
Click on one of the runs, you’ll see MLFlow saved all the parameters and a lot of internal details about the model.
In order to evaluate the trained model, head over to the MLFlow UI, click on the run you want to evaluate, and copy the runID from the top left (it will look something like this be1193d43a1a40c1bc84866b9462dddf
). Go back to your notebook and change the Smol Course evaluation code to using MLFlow to retrieve and load the model:
inputs = tokenizer(formatted_prompt, return_tensors="pt").to(device)
run_id = “”
model_uri = f"runs:/{run_id}/fine_tuned_model"
loaded_model = mlflow.pytorch.load_model(model_uri)
outputs = model.generate(**inputs, max_new_tokens=300)
print("After training:")
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Once loaded, the model generates a response for your prompt, allowing you to directly see the improvements from your fine-tuning. This process not only confirms that your experiments are correctly logged but also makes it easy to compare different runs and choose the best-performing model – all without manually searching through local files.
Integrating MLFlow with Canonical’s Data Science Stack takes your experimentation to the next level. You no longer need to worry about manually keeping track of each run, the entire process is streamlined and automated. This means you can focus more on the creative aspects of model building and less on managing experimental details.
MLFlow is capable of much more than simply tracking your metrics and logging models. Some of the major capabilities MLFlow offers include:
Ready to elevate your data science game? Give MLFlow in DSS a try and discover how effortless and powerful experiment tracking can be. Happy experimenting!
Learn more about Canonical’s Data Science Stack.
Watch our on demand webinar to explore how to get your ML environment in 3 commands on Ubuntu.
In this article, we will see how to install vLLM on Linux using 4 easy…
Welcome to the Ubuntu Weekly Newsletter, Issue 880 for the week of February 16 –…
Welcome to the Ubuntu Weekly Newsletter, Issue 880 for the week of February 16 –…
The Ubuntu team is pleased to announce the release of Ubuntu 24.04.2 LTS (Long-Term Support)…
OpenSSH is a free and open-source implementation of the Secure Shell (SSH) protocol. It provides…
This guide addresses common OpenVPN DNS troubleshooting Ubuntu 18.04 issues, where a successful VPN connection…