Categories: BlogCanonicalUbuntu

Experiment Tracking with MLFlow in Canonical’s Data Science Stack

Welcome back, data scientists! In my previous post, we explored how easy it is to set up a machine learning environment with Canonical’s Data Science Stack (DSS) and run your first model using Hugging Face’s Smol Course. Today, let’s take it a step further with experiment tracking. Experimentation is at the heart of data science, and having the right tool to support it can make all the difference. That’s why we bundle MLFlow in DSS – to help you track, compare, and reproduce your experiments effortlessly.

Table of Contents

Toggle

Why Experiment Tracking Matters

When you’re exploring new ideas and fine-tuning models, it can be challenging to keep track of all your experiments manually. Imagine having to remember which hyperparameters led to which results or trying to reproduce an experiment you did weeks ago. MLFlow solves this problem by automatically logging your experiment details – from parameters and metrics to model artifacts – so you can always pick up where you left off.

MLFlow in Action

Note: before proceeding, make sure to install the mlflow dependencies (we are restricting the packaging library to avoid conflicts with the HuggingFace Smol Course dependencies):

pip install mlflow
pip install "packaging

This snippet builds on the code from our previous article, adding experiment tracking capabilities that log key parameters, capture metrics, and store the model artifact.

import mlflow

mlflow.set_experiment("FineTuning")
with mlflow.start_run():

    # Log key configuration parameters
    mlflow.log_param("max_steps", 3)
    mlflow.log_param("batch_size", 4)
    mlflow.log_param("learning_rate", 5e-5)
    mlflow.log_param("logging_steps", 10)
    mlflow.log_param("save_steps", 3)
    mlflow.log_param("eval_steps", 2)
    mlflow.log_param("use_mps_device", True if device == "mps" else False)
    mlflow.log_param("hub_model_id", finetune_name)

    # Configure the SFTTrainer
    sft_config = SFTConfig(
        output_dir="./sft_output",
        max_steps=3,  # Adjust based on dataset size and desired training duration
        per_device_train_batch_size=4,  # Set according to your GPU memory capacity
        learning_rate=5e-5,  # Common starting point for fine-tuning
        logging_steps=10,  # Frequency of logging training metrics
        save_steps=3,  # Frequency of saving model checkpoints
        evaluation_strategy="steps",  # Evaluate the model at regular intervals
        eval_steps=2,  # Frequency of evaluation
        use_mps_device=(True if device == "mps" else False),  # Use MPS for mixed precision training
        hub_model_id=finetune_name,  # Set a unique name for your model)

    # Initialize the SFTTrainer
    trainer = SFTTrainer(
        model=model,
        args=sft_config,
        train_dataset=ds["train"],
        tokenizer=tokenizer,
        eval_dataset=ds["test"])

    # Train the model
    trainer.train()

    # Save the model
    trainer.save_model(f"./{finetune_name}")

    # Log the saved model as an MLFlow artifact
    mlflow.pytorch.log_model(model, "fine_tuned_model")

What this enhanced code does

Parameter logging:
It logs key parameters (like the model identifier and the number of new tokens) so you can track how different settings affect your results.
Artifact storage:
By saving the fine-tuned model as an artifact, you ensure that you always have a record of your work to revisit or share with your team.
Integrated experiment management:
All these details are automatically available in the MLFlow dashboard within DSS, making it a breeze to compare experiments, reproduce results, and refine your approach.

If you run this snippet, you will log your experiment and your model, alongside its parameters, to the MLFLow dashboard.

If you want to automatically explore various parameters, or even be smart about it and automatically iterate on some specific hyperparameters, you could do the following:

# Define a list of learning rates to experiment with
learning_rates = [5e-5, 3e-5, 1e-5]
for lr in learning_rates: 
    with mlflow.start_run():
        # Log the current learning rate
        mlflow.log_param("learning_rate", lr)
 # add below the same code as above, but parameterize the learning rate with this new variable

In this snippet, we iterate over a list of learning rates to explore how each setting impacts the model. For each learning rate, we start a new MLFlow run to log the experiment parameters, train the model, and save the fine-tuned model. This enables you to later compare the results across different runs.

Conclusion

Integrating MLFlow with Canonical’s Data Science Stack takes your experimentation to the next level. You no longer need to worry about manually keeping track of each run, the entire process is streamlined and automated. This means you can focus more on the creative aspects of model building and less on managing experimental details.

MLFlow is capable of much more than simply tracking your metrics and logging models. Some of the major capabilities MLFlow offers include:

Advanced Visualization: Get a comprehensive view of your experiments with interactive dashboards.
Model Registry: Manage different versions of your models for smoother deployment workflows.
Deployment Pipelines: Seamlessly transition from experimentation to production with built-in deployment support.

Ready to elevate your data science game? Give MLFlow in DSS a try and discover how effortless and powerful experiment tracking can be. Happy experimenting!

Learn more about Canonical’s Data Science Stack.

Watch our on demand webinar to explore how to get your ML environment in 3 commands on Ubuntu.

What is MLflow?

November 11, 2023

In "Blog"

Canonical releases Charmed MLFlow

London, United Kingdom, 26 September 2023. Canonical announced today that Charmed MLFlow, Canonical’s distribution of the popular machine learning platform, is now generally available. Charmed MLFlow is part of Canonical’s growing MLOps portfolio. Ideal for model registry and experiment tracking, Charmed MLFlow is integrated with other AI and big data…

October 6, 2023

In "Blog"

Introducing Data Science Stack: set up an ML environment with 3 commands on Ubuntu

September 17, 2024

In "Blog"

Ubuntu Server Admin

Previous « How to Install vLLM on Linux Using 4 Easy Steps

Experiment Tracking with MLFlow in Canonical’s Data Science Stack

Why Experiment Tracking Matters

MLFlow in Action

What this enhanced code does

Conclusion

Related

What is MLflow?

Canonical releases Charmed MLFlow

Introducing Data Science Stack: set up an ML environment with 3 commands on Ubuntu

Recent Posts

How to Install vLLM on Linux Using 4 Easy Steps

Ubuntu Weekly Newsletter Issue 880

Ubuntu Weekly Newsletter Issue 880

Ubuntu 24.04.2 LTS released

How to Install and Secure OpenSSH on Ubuntu 24.04: Complete Step-by-Step Guide

Fixing OpenVPN DNS Issues on Ubuntu 18.04

Experiment Tracking with MLFlow in Canonical’s Data Science Stack

Why Experiment Tracking Matters

MLFlow in Action

What this enhanced code does

Conclusion

Related

Related Post

Recent Posts

This Website Uses Cookies