Categories: BlogCanonicalUbuntu

A look into Ubuntu Core 24: Deploying AI models in FPGAs for production

Welcome to this blog series which explores innovative uses of Ubuntu Core. Throughout this series, Canonical’s Engineers will show what you can build with this Core 24 release, highlighting the features and tools available to you. 

In this second blog, Talha Can Havadar, senior engineer from our ODM Partner team, will show you how to deploy optimised AI models in Field Programmable Gate Arrays (FPGAs). Deploying AI models for inference in FPGAs is particularly useful for developers looking to leverage flexibility and acceleration to handle any workload. Coupled with Ubuntu Core architecture, developers have an end-to-end infrastructure for managing a secure, modular deployment. 

Sponsored

By the end of this blog, you’ll know how to package, load and run AI models on FPGAs. 

AI inference in FPGAs 

If you don’t know much about FPGAs, resources like FPGAKey, AMD FPGA can help. The main takeaway is that FPGAs are reprogrammable computers, capable of adjusting their transistors to specific workloads to accelerate the compute. If the workload, inputs, or sensors change, the processing can adapt accordingly.

Now, think of the architecture of a convolution neural network. As you can see in the picture below, there are 2 main processes: feature extraction and classification. Once your model has been trained, the inference is a repetitive task between convolution operations which extract features and feed the tuned classification network. FPGAs can be programmed to accelerate these operations; a configuration for convolution operations and a configuration for neural networks. Your FPGA’s transistors will take the architecture of your network. You will get your dedicated compute for your model.     

Convolution Neural Network (Source, Kh. Nafizul Haque on LinkedIn)

This customizable hardware configuration is why FPGAs are well-suited for handling the parallel processing demands of AI algorithms. The result is faster processing times and reduced latency, compared to general-purpose processors. Furthermore, FPGAs are highly efficient in power consumption, making them ideal for applications where energy efficiency is crucial, such as in embedded devices. Their reprogrammable nature also provides flexibility, enabling the integration of any sensor, connectivity to any interface, and handling of any workload. 

Deploying AI models in FPGAs with Ubuntu Core

For this blog, we will examine the AMD Kria KV260 with Ubuntu Core, using dynamically configurable FPGA as an AI accelerator to implement a car number plate detection system as an example. But if you want to try something different, there are a couple of other example applications in the KV260 apps docs that you can play with. 

We will also use the NLP SmartVision application as an example. In this post, we will give you an alternative way to handle all these in a more secure, easy-to-maintain Ubuntu Core fashion.

In Ubuntu Core, every element of the system is containerized using snaps. To create a similar demo application as listed in AMD wiki, we need to start identifying our needs in terms of tools, libraries and snap interfaces.

Since this is an AI application, we need a trained neural network model that we can use for the detection of car plates. For this blog, we will use Vitis-AI.

Vitis AI

Vitis-AI is an Integrated Development Environment that can be leveraged to accelerate AI inference on AMD adaptable platforms. It provides optimized IP, tools, libraries and models, as well as resources such as example designs. Below is an overview of how Vitis-AI is structured.

Vitis-AI development environment (Source, AMD)

With the help of the functionality provided by Vitis-AI, such as AI Optimizer, AI Quantizer and AI Compiler, we will be able to utilize the full performance of the FPGA we have in our hands. AI Optimizers and AI Quantizers are already well known by the deep learning communities and they have proven their worth. They reduce the complexity of neural networks up to 50 times and they have a very significant impact on speed and computing efficiency. 

But in this post, I wanted to give the AI Compiler, by Vitis-AI, a special recognition because this is where the magic happens. It maps the AI model to a highly efficient instruction set and data flow. It also performs sophisticated optimizations, such as layer fusion and instruction scheduling, and reuses on-chip memory as much as possible. This way the DPU inside the FPGA, will be effectively utilized to process the data coming to the neural network.

From AI model to DPU instructions (Source, AMD)

Packaging your AI model in a snap

Now that you know your application will work with Vitis-AI, the next step is to package your model. In this demonstration, we are using one of the provided models in the model zoo. Particularly the same as the one in the nlp-smartvision demo which can also be found in Vitis-AI Model Zoo here.

Sponsored

In the snap we will need to package: 

  • Vitis-AI library
  • Plate detection xmodel from the model zoo
  • DPU bitstream to load it in the FPGA.

Below you can see an example of how to declare the model in the snap’s YAML file:

parts:
  kv260-platedetect-demo:
    # See 'snapcraft plugins'
    plugin: nil
    source: src
    override-build: |
      ./build.sh
      cp test_*_platedetect $SNAPCRAFT_PART_INSTALL
    build-packages:
      - vitis-ai-library
      - libgoogle-glog-dev
      - libopencv-dev
      - libprotobuf-dev
      - libjson-c-dev
    stage-packages:
      - vitis-ai-runtime
      - vitis-ai-library
      - libblas3
    organize:
      test_*_platedetect: bin/
  platedetect-xmodel:
    plugin: dump
    source: https://www.xilinx.com/bin/public/openDownload?filename=plate_detect-kv260_DPUCZDX8G_ISA1_B3136-r2.5.0.tar.gz
    override-build: |
      cp plate_detect.* $SNAPCRAFT_PART_INSTALL
    organize:
      plate_detect.*: models/

  nlp-smartvision-bitstream:
    plugin: nil
    stage-packages:
      - xlnx-firmware-kv260-nlp-smartvision
    organize:
      lib/firmware/xilinx/kv260-nlp-smartvision: firmware/kv260-nlp-smartvision

Details of the complete YAML implementation can be found here.

This is all you need to package your model in a snap and leverage all the features of this tool including over-the-air, delta updates, automatic rollback in failure, strict security and tamperproof. Your model is ready to be deployed using the same infrastructure that millions of developers use today. To learn more about creating snaps see our docs

Loading the DPU firmware to the FPGA

Ok, but how do we load the DPU firmware into the FPGA? Thanks to the dfx-mgr tool provided by AMD, it is possible to load/unload bitstreams to Kria FPGA in runtime. This tool allows users to dynamically manage the FPGA. 

Of course, to use this tool in Ubuntu Core it needs to be a snap. We have created a snap application of this tool for demo purposes. Once you have dfx-mgr snap installed in Ubuntu Core system, all you need to do is load the DPU bitstream to the FPGA by using, either the default-firmware option of dfx-mgr or `-loadPackage` command with the bitstream name. For example:

dfx-mgr -loadPackage kv260-nlp-smartvision

After this point, the only thing we need to do is run the application we created. It is quite simple:

  • Install the snap via sudo snap install --dangerous kv260-platedetect-demo
  • Then load the bitstream to FPGA as shown before, dfx-mgr -loadPackage kv260-nlp-smartvision

That’s it. 

You can even separate your model and your bitstream to have a more modular way to handle updates in your application. Let’s say you have improved the model and you want to update the devices on the field. In this case, Ubuntu Core and the Snap Store will help you deliver these updates reliably and securely. 

Thanks to Ubuntu Core, you can just focus on your development and not worry about the infrastructure on how to update devices on the field or maintain the security of third parties. 

What’s next?

With their capacity for both parallelism and determinism, FPGAs can effectively implement and modify sensor fusion algorithms, accelerate pre- and post-data processing, and ensure deterministic networking and motor control for real-time response. They can segregate safety-critical functions to guarantee fail-safe operations and facilitate hardware redundancies and fault resilience. Their value for deploying embedded AI devices is indisputable. 

Now is your turn. Why don’t you try packaging your optimized models using snaps? Through this, you will see the benefits for yourself of the infrastructure to manage and deploy software at the edge. With Ubuntu Core, you can build your production image with your snaps and targeted hardware. This will empower you to easily flash devices in production lines. Plus, in your image, you can define what user experience you want to bring. From automatically running the applications that you want, to clearly defining what the final user should have access to, your model will always be secure in its sandbox. 

Ubuntu Server Admin

Recent Posts

How we used Flask and 12-factor charms to simplify Canonical.com development

Our latest Canonical website rebrand did not just bring the new Vanilla-based frontend, it also…

5 hours ago

Web Engineering: Hack Week 2024

At Canonical, the work of our teams is strongly embedded in the open source principles…

1 day ago

Ubuntu Weekly Newsletter Issue 873

Welcome to the Ubuntu Weekly Newsletter, Issue 873 for the week of December 29, 2024…

3 days ago

How to resolve WiFi Issues on Ubuntu 24.04

Have WiFi troubles on your Ubuntu 24.04 system? Don’t worry, you’re not alone. WiFi problems…

3 days ago

Remembering and thanking Steve Langasek

The following is a post from Mark Shuttleworth on the Ubuntu Discourse instance. For more…

3 days ago

How to Change Your Prompt in Bash Shell in Ubuntu

I don’t like my prompt, i want to change it. it has my username and…

3 days ago