This blog post explores the technical and strategic benefits of deploying open-source AI models on Ubuntu. We’ll highlight why it makes sense to use Ubuntu with open-source AI models, and outline the deployment process on Azure.
Authored by Gauthier Jolly, Software Engineer, CPC, and Jehudi Castro-Sierra, Public Cloud Alliance Director, both from Canonical.
Ubuntu Pro elevates the security and compliance aspects of deploying AI models, offering extended security maintenance, comprehensive patching, and automated compliance features that are vital for enterprise-grade applications. Its integration with Confidential VMs on Azure enhances the protection of sensitive data and model integrity, making it an indispensable tool for tasks requiring stringent security measures like ML training, inference, and confidential multi-party data analytics.
Using a public cloud like Azure gives straightforward access to powerful GPUs and Confidential Compute capabilities, essential for intensive AI tasks. These features significantly reduce the time and complexity involved in setting up and running AI models, without compromising on security and privacy. Although some may opt for on-prem deployment due to specific requirements, Azure’s scalable and secure environment offers a compelling argument for cloud-based deployments.
We are going to explore using open models on Azure by creating an instance with Ubuntu, installing NVIDIA drivers for GPU support, and setting up Ollama for running the models. The process is technical, involving CLI commands for creating the resource group, VM, and configuring NVIDIA drivers. Ollama, the chosen tool for running models like Mixtral, is best installed using Snap for a hassle-free experience, encapsulating dependencies and simplifying updates.
Begin by creating a resource group and then a VM with the Ubuntu image using the Azure CLI.
az group create --location westus --resource-group ml-workload az vm create --resource-group ml-workload --name jammy --image Ubuntu2204 --generate-ssh-keys --size Standard_NC4as_T4_v3 --admin-username ubuntu --license-type UBUNTU_PRO
Note the publicIpAddress from the output – you’ll need it to SSH into the VM.
For GPU capabilities, install NVIDIA drivers using Ubuntu’s package management system. Restart the system after installation.
sudo apt update -y sudo apt full-upgrade -y sudo apt install -y ubuntu-drivers-common sudo ubuntu-drivers install sudo systemctl reboot
Important: Standard NVIDIA drivers don’t support vGPUs (fractional GPUs). See instructions on the Azure site for installing GRID drivers, which might involve building an unsigned kernel module (which may be incompatible with Secure Boot).
Snap simplifies the installation of Ollama and its dependencies, ensuring compatibility and streamlined updates. The –beta flag allows you to access the latest features and versions, which might still be under development
sudo snap install --beta ollama
Configure Ollama to use the ephemeral disk
sudo mkdir /mnt/models sudo snap connect ollama:removable-media # to allow the snap to reach /mnt sudo snap set ollama models=/mnt/models
At this point, you can run one of the open models available out of the box, like mixtral or llama2. If you have a fine-tuned version of these models (a process that involves further training on a specific dataset), you can run those as well.
ollama run mixtral
The first run might take a while to download the model.
Now you can use the model through the console interface:
This step is optional, but provides a UI via your web browser.
sudo snap install --beta open-webui
To quickly access the UI without open ports in the Azure security group, you can create an SSH tunnel to your VM using the following command:
ssh -L 8080:localhost:8080 ubuntu@${IP_ADDR}
Go to http://localhost:8080 in your web browser on your local machine (the command above tunnels the traffic from your localhost to the instance on Azure).:
In case you want to make this service public, follow this documentation.
sudo watch -n2 nvidia-smi
Check that the ollama process is using the GPU, you should see something like this:
+---------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |===========================================================================| | 0 N/A N/A 1063 C /snap/ollama/13/bin/ollama 4882MiB | +---------------------------------------------------------------------------+
Ubuntu’s open-source foundation and robust ecosystem make it a compelling choice for deploying open-source AI models. When combined with Azure’s GPU capabilities and Confidential Compute features, you gain a flexible, secure, and performant AI solution.
You’ve recently installed VMware Workstation on your Ubuntu system and encountered the frustrating “Could not…
Have you ever found yourself staring at a terminal full of 404 errors while trying…
One particularly frustrating error that many users face when trying to upgrade from Ubuntu 18.04 …
In the world of containerization, time synchronization issues can create unexpected roadblocks when working with…
If you’ve recently upgraded to Ubuntu 23.04 or newer, you might have encountered a frustrating…
Canonical announces the General Availability of Ubuntu for the NVIDIA® Jetson Orin™ for edge AI…