Categories: BlogCanonicalUbuntu

An overview of machine learning security risks

Data is at the heart of all machine learning (ML) initiatives – and bad actors know it. As AI continues to occupy the limelight of modern tech discourse, ML systems are becoming increasingly attractive targets for attack. With the Identity Theft Resource Center reporting a 72% spike in data breaches in 2023, it’s critical to take the proper precautions to ensure your ML projects don’t provide a back door to your data.
This blog gives an overview of machine learning security risks, highlighting the key threats and challenges. But it isn’t all doom and gloom; we’ll also explain best practices and explore possible solutions, including the role of open source.

Table of Contents

Toggle

The top four machine learning security risks

With so many moving parts and potential avenues of attack, machine learning projects are subject to a large number of security risks, and this number is only growing as more and more ML applications enter production. Here are the four most common threats you should be aware of.

Package security vulnerabilities

Depending on the regulations of each industry and organisation, companies must ensure that the software they use does not contain any critical or high vulnerabilities. However, ML projects often depend on thousands of packages, so it’s easy for vulnerabilities to slip through the cracks. These vulnerabilities can appear at all layers of the stack, from the operating system to the applications, and they can be major security risks when exploited maliciously. An infamous example in the AI space is ShellTorch, which exposed all the code used during development and enabled people to access the models.

To mitigate this risk, you should have a clear understanding of the packages that your ML projects use, as well as their dependencies. You should implement regular scans that report vulnerabilities and have a strategy for fixing them. This includes having regular updates and upgrades of the tools used, following the latest news and security updates and having a trusted advisor.

Data poisoning

Data poisoning can happen when data is used to train a model or product, and an outcome is altered to damage the system’s performance. New data is often introduced into the system, causing the model to learn something new that is both inaccurate and unintended. For example, if your training dataset includes surveillance camera footage, attackers might maliciously target it and use only red cars for a certain period of time to train the model maliciously. Even a small amount of incorrectly labelled data can affect the model’s performance, making it unreliable for production.

With a clear understanding of how data can be influenced by possible attackers, you can implement measures to mitigate these risks. A continuous re-training pipeline ensures models always stay up to date, while drift monitoring for both the model and the data can ensure that professionals are informed in a timely manner if a model’s accuracy or structure varies.

Adversarial attacks

These are the most commonly used types of attacks in the machine learning space, which involve tricking the ML model to give the desired result. They usually include input provided by the attacker, giving a specific expected output. They basically trick the system due to the low number of boundaries that ML systems often have. Adversarial attacks are hard to detect by the human eye and even monitoring systems, mostly because models do not also learn the decision boundary, which is used to separate different classes based on the features of the input.

An overview of machine learning security risks 3

Source

Adversarial attacks reduce the model accuracy and can cause professionals to avoid running any more certain projects in production. Organisations should consider adversarial training and have a clear strategy when building ML projects and cleaning data. Not all data that is produced should land directly in a training set. In addition, not everyone should have access to all models created within an organisation and capabilities such as experiment tracking, model stores and model performance trackers.

Data privacy

ML algorithms are built to predict or generate new data by only looking at the existing information. Companies, compared to individuals, have access to the personal data of millions of people. Whenever data is given access to an ML system, there is a risk associated with its confidentiality because of the new workflows it involves.

Best practices to improve your machine learning security

The four threats outlined above are just some of the machine learning security risks that projects face. They stand alongside the traditional software threats that are always present in any technology. As such, protecting your ML systems requires a specialised approach that considers the unique risks present in the AI space while also drawing on broader security best practices:

Architect your ML systems with security in mind. The enthusiasm to start quickly can tempt professionals to skip steps and focus on building ML models. Temper your excitement and ensure that you build ML systems that are compliant and follow security regulations. From always keeping the packages up to date to building pipelines that are isolated, architects need to look at the big picture.
Use data encryption. It will protect your data from possible attacks and avoid reducing the accuracy of a model. Data encryption alone is not enough to protect an ML system, but it is a foundational step for securely running ML projects in production.
Look for anomaly detection and bias detection: Once a model runs in production, it is subject to error, which can impact the credibility or activity of an organisation. Anomaly detection or bias detection improves the robustness of an ML project, avoids falsely detecting any drift, and increases the chances of identifying any threat in a timely manner.
Isolate your networks : Not everyone should have access to all ML pipelines, especially when projects use highly sensitive data. Professionals should always work on isolated networks that limit both access to the pipelines from outside and inside. It protects the data used through the pipelines and the project’s reliability.
Control access: As ML projects grow within an organisation, so does the number of data scientists or ML engineers involved. It is important to control their access because of the projects that might be developed by some team members. For example, not everyone in a data science team might have approval to work on the financial data of the organisations. Also, there is a need to track any malicious activity that might come within the company. User management capabilities are crucial to the system’s visibility.
Follow secure design principles. ML systems are new, but they should not be an exception to following existing best practices related to security. They should follow the existing secure design principles, and ask for specialised help when needed. In addition, you should always keep in mind the tasks you need to do to check the security of the systems, including threat modelling, static code analysis and CVE patching.
Monitor your tools and artefacts. Having visibility of data, models, tools and packages involved in an ML project can determine its long-term success. Running daily scans of the tools, monitoring for data drift, and having constant visibility over the model performance are basic tasks that help professionals identify possible risks. In addition, you should develop a clear set of alerts, which inform everyone about any malicious attack or sudden change in behaviour for any of the artefacts.
Educate your organisation. Artificial intelligence and machine learning are still new topics for most people. Organisations should spend time training professionals about their ML projects, their associated risks, and the methods they could use to report possible problems.
Look at the entire machine learning lifecycle. Security can be a risk throughout the entire machine learning lifecycle, from data ingestion to model deployment. Protecting all artefacts throughout the journey will reduce the risks as attackers look for vulnerable parts of the pipelines. Vulnerabilities are often found towards the end of the cycle, in the deployment phase, but this should not encourage organisations to disregard the risks associated with the other steps.

Security solutions with open source AI

Open source is at the heart of development for machine learning. The Linux Foundation Data & AI shows the abundance of tools that data scientists, ML engineers, and architects have nowadays that are available to experiment and run their ML projects in production. It includes open source tools that are used to develop, deploy, monitor, or store models. Many of them focus on security, which we will further explore.

Canonical has open source in its DNA. The company’s promise is to provide secure open source software across all layers of the stack, from the operating system to the cloud-native applications. We will further explore how our security tools and capabilities enhance ML systems.

Livepatch

Livepatch is a solution that periodically checks for kernel patches and applies them to hardware without rebooting it. It enables organisations to update their hardware with the latest kernel patches, reducing downtime and unplanned work. This is also extremely useful when performing training for a longer period of time because of its ability to continue it without putting at risk the outcome or causing project delays. Additionally, it enables organisations to build and follow their own update strategy by planning the patching time and rollout policy. Ubuntu comes with Livepatch as part of Ubuntu Pro.

Confidential computing

Confidential computing originated in the late 1970s, but the rise of AI also accelerated its adoption. Using innovative technology at the silicon level, it protects the confidentiality and integrity of the sensitive data hosted on-prem or on a public cloud. Highly regulated industries such as healthcare or financial services often adopt it. Ubuntu is at the heart of confidential computing, being already available on Microsoft VMs or Intel TDX. Learn more about confidential computing.

Vulnerability fixes

New common vulnerabilities and exposures (CVEs) are coming up daily and need timely patching. Ubuntu Pro helps teams address this need in a timely manner by fixing over 30,000 packages as part of the subscription. This includes machine learning tools such as Pandas, Python, Numpy, Tensorflow or PyTorch, enabling professionals to develop models securely. Read more about how to secure your MLOps tooling.

MLOps platforms

Machine learning Operations (MLOps) platforms such as Charmed Kubeflow enable organisations to run AI at scale. They ensure that the ML systems have features such as authentication capabilities or network isolation to better control and protect data and ML models. They are a foundational piece to run the entire ML lifecycle within one tool, reducing the number of security holes that could appear throughout the ML pipeline.

Snaps for model packaging

Snaps are a secure and scalable way to embed applications on Linux devices. They can also be used for ML models that are packed and deployed to edge devices. It simplifies their maintenance and enables them to benefit from OTA updates and auto rollback in case of failure. Brand stores can also help you manage multiple models. Learn more about AI at the edge with open source.

ML systems are compelling targets for malicious actors, but that fact shouldn’t hold you back from innovating with AI. By developing a strong understanding of the threat landscape, implementing best practices, and taking advantage of open source solutions, you can protect your models and data and enjoy the benefits of AI/ML without putting your organisation at risk.

Ubuntu

Ubuntu Adding NVIDIA 575 Driver Support for 24.04 & 22.04 LTS

For NVIDIA users, the latest 575 series driver will be available soon in Ubuntu repository.…

2 days ago

Ubuntu

Amarok Released 3.3 with Qt6/KF6 Port & GStreamer Backend

Amarok, the free open-source KDE music player, released new 3.3 version on Tuesday, named “Far…

2 days ago

An overview of machine learning security risks

Sponsored

class=”wp-block-heading”>The machine learning attack surface

You can learn more about MLOps on highly sensitive data by watching our talk at Kubecon

The top four machine learning security risks

Package security vulnerabilities

Data poisoning

Adversarial attacks

Data privacy

Best practices to improve your machine learning security

Security solutions with open source AI

Livepatch

Confidential computing

Vulnerability fixes

MLOps platforms

Snaps for model packaging

Further reading

Recent Posts

From sales development to renewals: Mariam Tawakol’s career progression at Canonical

In pursuit of quality: UX for documentation authors

Ubuntu 24.10 (Oracular Oriole) reached End of Life on 10th July 2025

Canonical announces Charmed Feast: A production-grade feature store for your open source MLOps stack

Ubuntu Adding NVIDIA 575 Driver Support for 24.04 & 22.04 LTS

Amarok Released 3.3 with Qt6/KF6 Port & GStreamer Backend

An overview of machine learning security risks

Sponsored class=”wp-block-heading”>The machine learning attack surface

You can learn more about MLOps on highly sensitive data by watching our talk at Kubecon

The top four machine learning security risks

Package security vulnerabilities

Data poisoning

Adversarial attacks

Data privacy

Best practices to improve your machine learning security

Security solutions with open source AI

Livepatch

Confidential computing

Vulnerability fixes

MLOps platforms

Snaps for model packaging

Further reading

Related Post

Recent Posts

From sales development to renewals: Mariam Tawakol’s career progression at Canonical

In pursuit of quality: UX for documentation authors

Ubuntu 24.10 (Oracular Oriole) reached End of Life on 10th July 2025

Canonical announces Charmed Feast: A production-grade feature store for your open source MLOps stack

Ubuntu Adding NVIDIA 575 Driver Support for 24.04 & 22.04 LTS

Amarok Released 3.3 with Qt6/KF6 Port & GStreamer Backend

This Website Uses Cookies

Sponsored

class=”wp-block-heading”>The machine learning attack surface