Categories: BlogCanonicalUbuntu

Solving newcomer data science challenges with Canonical’s Data Science Stack – now in beta

Data science is one of the most exciting topics of the last century. With its utility in industries of all kinds, it’s easy to see why it has been rated as one of the top 20 fast-growing occupations in the US, according to the Bureau of Labour Statistics. However, entering this fast-growing space isn’t’ easy: newcomers face significant challenges in setting up their environments, dealing with package dependencies, or accessing compute resources. Given these obstacles, it’s easy to see why a talent shortage persists in the data science field, and why overcoming these challenges is vital for teams and companies.

Is it easy to get started on data science?

Data science is a rewarding career, but starting out as a newcomer can be challenging. Here are the most common obstacles that new data scientists face when starting out their careers:

Time spent on tooling: Data scientists spend more time configuring and fixing their tools than building models. Between tool selection and integration, package dependencies, people who are active in this field need always to ensure the system works. Looking for an out-of-the-box solution seems the most obvious option, however, tools that are seamlessly integrated and can be deployed within minutes are also a viable option.
Configurations: Whether it’s GPU configurations or managing package dependencies, data scientists need to do tedious tasks before they can get started. A 2023 report from Anaconda found approximately a quarter of commercial data scientists report being blocked by managing package dependencies or access to compute resources.
Learning curve: Something new is coming up every other day in this field and it often feels overwhelming for newcomers, who are under pressure to quickly upskill in many different areas at once, from programming to development tool maintenance. Data scientists are constantly upskilling via a number of channels and often by themselves: according to the latest Stack Overflow Developer Survey, a majority of developers upskill using online courses, blogs and technical documentation. This shows that data scientists need time and space to focus on the actual skills they are trying to acquire rather than preparing the environment to start learning.
Initial cost: Data science can be costly; newcomers would like to lower their initial investment before they commit long-term to data science as the career path forward for them. Open-source tooling has been a great option for saving on set-up costs: it enables future data scientists and ML engineers to get started with no costs and have access to projects that are already available.

See also What’s new in Ubuntu 22.04 LTS

As you can see, new data scientists typically face a rough start. However, the good news is that once they are on track, it gets easier and easier every day.

How to choose a Data Science platform

As I mentioned before, it seems that a new tool, framework or library for data science or machine learning is launched every other day. This can be overwhelming. How do you actually choose from this wide variety of options?

Before we get into the weeds of tools, let’s take a moment to look at the main capabilities and key considerations that a data science platform should have:

Exploratory data analysis: being able to perform initial exploratory data analysis is crucial, especially for people looking to use a data science tool on a workstation. It enables them to focus on the initial stages of the machine learning lifecycle, understand the data set, get some data visualisations and do initial data preprocessing.
Machine learning lifecycle: The main purpose of any professional or enthusiast who is active in this space is to build models. Therefore, they need tools that cover multiple parts of the machine learning lifecycle, enabling them to build and store models, and track and reproduce experiments. It covers the initial part of the machine learning lifecycle, such that the development of models is made at ease.
Popular tools: For any beginner, the scale of adoption of their chosen tools can make or break them. When a tool is used by more people, it typically has better awareness and documentation of bugs, challenges, and workarounds. If we look at the open source space, the community provides extensive support and guidance, enabling professionals from different areas to benefit from continuous improvements, fixes, and workarounds for popular tools and platforms.
Ease of use: everyone wants tools that are easy to use. The main objective of a data scientist is not endless tinkering with tools, so having an intuitive platform that accelerates project delivery and reduces the learning curve is vital for their work.

Scalability: While many AI projects start small, every data scientist should also have a long-term vision and consider scalability capabilities. This helps data scientists to grow as the project matures without a need to upskill in other tools.

Join our webinar to learn more about data science tools

Now that we have an idea of what to look for in a data science tool or platform, let’s take a closer look at the popular options that data scientists use.

Returning to the preference for open source, we should look at the entire stack and how open-source tooling can accelerate the entire process. Linux has pioneered the open source space, with Ubuntu being the most adopted distribution. It has a powerful command line that data scientists and machine learning engineers enjoy using, and it simplifies their operational tasks. Furthermore, there is a lot more from open source that could enhance someone’s journey in the data science space. Python is a great example: it’s the preferred programming language in data science, and many of its libraries, such as Pandas, Numpy, PyTorch, and TensorFlow, have been widely adopted in countless data science projects.

What is Data Science Stack (DSS)?

Solving newcomer data science challenges with canonical’s data science stack – now in beta 2

Data Science Stack (DSS) is an out-of-the-box solution for data scientists and machine learning engineers, published by Canonical. It is a ready-made environment for ML enthusiasts that enables them to develop and optimise models without spending time on the necessary tooling. It is designed to run on any Ubuntu AI workstation, maximising the GPU’s capability and simplifying its usage. Are you curious?

Try it now

DSS includes leading open source tools, such as Jupyter Notebook and MLflow, with full integration. It has, by default, two of the most adopted ML images, Pytorch and TensorFlow. They can be deployed using an intuitive command line interface (CLI), and then, the UIs of the tools can be accessed to dive into data science.

Beyond giving access to an ML solution, DSS also takes care of the packaging dependencies, ensuring that all the tools, libraries and frameworks work seamlessly together and are compatible with the machine’s hardware. In addition, DSS simplifies the GPU configuration by including the GPU operator and all the benefits that it comes with.

Try Canonical’s Data Science Stack

It is available in beta, inviting data scientists, machine learning engineers, and AI enthusiasts to share their feedback with us. You can easily deploy it on your Ubuntu machine, tell us about your experience, and benefit from this ongoing community feedback.

Try it now

Share your feedback

Join our webinar

If you would like to learn more about data science tools join our webinar on [date]. Together with Michal Hucko, we will talk about:

Key considerations when selecting data science tools
Challenges of the data science landscape
Data science with open source tooling
Demo of the DSS

Join our webinar to learn more about data science tool

Welcome to the Ubuntu Weekly Newsletter 883

Welcome to the Ubuntu Weekly Newsletter, Issue 883 for the week of March 9 –…

13 hours ago

Ubuntu

How to Install nvidia-smi on Ubuntu or Debian Linux

In this article, we will see how to install nvidia-smi on Ubuntu or Debian Linux.…

1 day ago

Ubuntu

Solving newcomer data science challenges with Canonical’s Data Science Stack – now in beta

Is it easy to get started on data science?

How to choose a Data Science platform

Join our webinar to learn more about data science tools

What is Data Science Stack (DSS)?

Try Canonical’s Data Science Stack

Join our webinar

Join our webinar to learn more about data science tool

Further reading

Recent Posts

Welcome to the Ubuntu Weekly Newsletter 883

How to Install nvidia-smi on Ubuntu or Debian Linux

How to Install clang tool on Ubuntu or Debian Linux

How to resolve Ubuntu 20.04 Container Signature Errors on Raspberry Pi ARM Devices

How to fix DNS Resolution Issues with OpenVPN on Ubuntu 18.04

How to Fix Ubuntu 18.04 System Monitor Launch Issues

Solving newcomer data science challenges with Canonical’s Data Science Stack – now in beta

Is it easy to get started on data science?

How to choose a Data Science platform

Join our webinar to learn more about data science tools

What is Data Science Stack (DSS)?

Try Canonical’s Data Science Stack

Join our webinar

Join our webinar to learn more about data science tool

Further reading

Related Post

Recent Posts

Welcome to the Ubuntu Weekly Newsletter 883

How to Install nvidia-smi on Ubuntu or Debian Linux

How to Install clang tool on Ubuntu or Debian Linux

How to resolve Ubuntu 20.04 Container Signature Errors on Raspberry Pi ARM Devices

How to fix DNS Resolution Issues with OpenVPN on Ubuntu 18.04

How to Fix Ubuntu 18.04 System Monitor Launch Issues

This Website Uses Cookies