MLflow is an open source platform, used for managing machine learning workflows. It was launched back in 2018 and has grown in popularity ever since, reaching 10 million users in November 2022. AI enthusiasts and professionals have struggled with experiment tracking, model management and code reproducibility, so when MLflow was launched, it addressed pressing problems in the market. MLflow is lightweight and able to run on an average-priced machine. But it also integrates with more complex tools, so it’s ideal to run AI at scale.
Since MLflow was first released in June 2018, the community behind it has run a recurring survey to better understand user needs and ensure the roadmap s address real-life challenges. About a year after the launch, MLflow 1.0 was released, introducing features such as improved metric visualisations, metric X coordinates, improved search functionality and HDFS support. Additionally, it offered Python, Java, R, and REST API stability.
MLflow 2.0 landed in November 2022, when the product also celebrated 10 million users. This version incorporates extensive community feedback to simplify data science workflows and deliver innovative, first-class tools for MLOps. Features and improvements include extensions to MLflow Recipes (formerly MLflow Pipelines) such as AutoML, hyperparameter tuning, and classification support, as well as improved integrations with the ML ecosystem, a revamped MLflow Tracking UI, a refresh of core APIs across MLflow’s platform components, and much more.
In September 2023, Canonical released Charmed MLflow, a distribution of the upstream project.
MLflow is often considered the most popular ML platform. It enables users to perform different activities, including:
MLFlow is an end-to-end platform to manage the machine learning lifecycle. It has four primary components:
MLflow Tracking is used to track different pipeline parameters such as metrics, hyperparameters, feature parameters, code versions, and other artifacts. The logs can later be used to visualise or compare the results between experiments, users, or environments. The logs can be stored both on any local system and remote servers.
With MLflow Models, the ML model can be packaged into different formats or structures. For example, a format or structure such as a TensorFlow DAG or a Python function, and the descriptor file defines it. This ability to package different formats enables the model to be used across a host of downstream tools and platforms, such as on Docker or AWS SageMaker. This makes the model lifecycle easier to process and manage.
MLflow Projects offer a convention for packaging or structuring your ML projects and reusable project codes. Fundamentally, a project is a directory along with a descriptor file that defines the structure and dependencies. Additionally, on using the MLflow API in the project, MLflow automatically remembers the parameters or project details.
MLflow Registry acts as a core and enables APIs, UI, and centralised model storage. It aims to govern the end-to-end ML pipeline through tracking model lineage and versioning capabilities.
MLflow is built around two key concepts: runs and experiments.
Both Kubeflow and MLFlow are open source solutions designed for the machine learning landscape. They received massive support from industry leaders, and are driven by a thriving community whose contributions are making a difference in the development of the projects. The main purpose of both Kubeflow and MLFlow is to create a collaborative environment for data scientists and machine learning engineers, and enable teams to develop and deploy machine learning models in a scalable, portable and reproducible manner.
However, comparing Kubeflow and MLflow is like comparing apples to oranges. From the very beginning, they were designed for different purposes. The projects evolved over time and now have overlapping features. But most importantly, they have different strengths. On the one hand, Kubeflow is proficient when it comes to machine learning workflow automation, using pipelines, as well as model development. On the other hand, MLFlow is great for experiment tracking and model registry. From a user perspective, MLFlow requires fewer resources and is easier to deploy and use by beginners, whereas Kubeflow is a heavier solution, ideal for scaling up machine learning projects.
Charmed MLflow is Canonical’s distribution of the upstream project. It is part of Canonical’s growing MLOps portfolio. It has all the features of the upstream project, to which we add enterprise-grade capabilities such as:
The Linux terminal is a powerful tool allowing users to control their system precisely and…
Welcome to the Ubuntu Weekly Newsletter, Issue 876 for the week of January 19 –…
Canonical Ceph with IntelⓇ Quick Assist Technology (QAT) Photo by v2osk on Unsplash When storing…
Introduction Using Kafka for Remote Procedure Calls (RPC) might raise eyebrows among seasoned developers. At…
This article provides a guide for how to install PalWorld on Ubuntu VPS server. How…
Using APT to manage software on Ubuntu (or similar Linux systems) is generally simple. It…