Categories: BlogCanonicalUbuntu

Large language models (LLMs): what, why, how?

Large language models (LLMs) are machine-learning models specialised in understanding natural language. They became famous once ChatGPT was widely adopted around the world, but they have

applications beyond chatbots. LLMs are suitable to generate translations or content summaries. This blog will explain large language models (LLMs), including their benefits, challenges, famous projects and what the future holds.

LLMs use cases

There are multiple use cases for LLMs. These include not only plain text generation but also translation, people interaction or summarisation. They are used by organisations to solve various problems including:

Driving productivity by reducing repetitive tasks
Addressing talent shortages
Content creation
Sentiment analysis

Content generation

Depending on the application, there are multiple LLMs that are used for content generation, based on a trigger or not. While the content itself needs refining, LLMs generate great first drafts which are ideal for brainstorming, answering questions or getting inspiration. They should not be considered fact books that own the source of truth.

Chatbots

LLMs are likely to be used for chatbots, providing help to customer support, troubleshooting or even having open-ended conversations. They also accelerate the process of gathering information to address recurring issues or questions.

Language Translation

Translations were the main driver that kickstarted efforts around LLMs in the 1950s. However, these days LLMs enable content localisation, by automatically translating content in various languages. Whereas they are expected to work well, it is worth mentioning that the quality of the output depends on the volume of data that is available in different languages.

Sentiment analysis

LLMs often take texts and analyse emotions and opinions, in order to gauge sentiment. Organisations use this often to gather data, summarise feedback and identify quickly improvement opportunities. It helps enterprises both improve customer satisfaction and identify development and feature needs.

These are just some of the use cases that benefit from LLMs. Some other applications include text clustering, content summarisation or code generation.

Challenges for building LLMs

LLMs seem to be a complex yet innovative solution that helps enterprises and gets AI enthusiasts excited. But building LLMs comes with a set of challenges:

Large datasets are a must-have . Whereas companies are working on improving their data collection processes and data quality, there are still industries where data is still unavailable for different reasons – whether it’s not enough digitisation or simply not enough availability.
Increased computing power is required to train an LLM. The availability of powerful computing resources, including GPUs or DGXes, enables the existence of LLMs, but it also represents a constraint, since they come at a high cost and a long delivery time.
Lack of talent challenges any AI project, since finding skilled people who can work on building or fine-tuning an LLM can take some time. The skill gap is something that challenges any initiative – interest in AI is growing at a faster rate than talent.
Slow training can delay project delivery. Depending on the hardware used for training, as well as the size of the dataset, training can take months.
Interpretability is still difficult, and a big reason why professionals often struggle to understand some of the predictions that LLMs output. Digging between billions of parameters can take time and often predictions are hardly influenced by biased data, which is even more difficult to detect.

Benefits of LLMs

As the adoption of AI grows across the board and more LLMs are built, reiterating on the benefits that large language models bring is crucial. LLMs are interesting for a wide audience, companies from various industries, engineers who are passionate about deep learning, and professionals who work across different topics because of their capabilities to reproduce human language.

They capture nuances of a language, often capturing the context of a document. This can lead to more accurate translations or sentiment analysis.
They reduce the time spent on repetitive tasks and even take away the burden of spending time gathering information. For example, chatbots can ask questions that help customer support solve tickets faster.
They have the potential to speed up model training and reduce the required data that is needed. This correlates with the number of parameters that an LLM has available: the higher the number, the lower the volume of data that is needed.

Open source LLMs

2023 saw the emergence of open source LLMs that are backed by thriving communities. Huggingface is just one of the examples whose activities intensified after the release of ChatGPT, with the goal to get instructions-following large language models in different applications. This led to an explosion of open source LLMs such as Guanco, h2oGPT or OpenAssistant. When it comes to open source LLMs, it’s important to bear the following in mind:

LLMs with billions of parameters can easily compete in terms of performance with models trained on very large datasets.
Fine-tuning small LLMs requires a small budget.
Advances in open source LLMs are much faster, because of community contributions.
Techniques such as low-rank adoption (LoRa) can reduce the cost of training.

Out-of-the-box solutions will still stay attractive for enterprises, but long-term, open source communities are likely to expand their efforts in order to make LLMs available in new environments, including laptops. It could also lead to a collaboration that never happened before between organisations which have proprietary LLMs and open source communities, where the first ones focus on building the model (since they have the computing power) and the second ones work on fine-tuning the models.

Tools for LLMs

Large language models require large volumes of data and performant hardware. They also need tooling for experiment tracking, data cleaning and pipeline automation. Open source ML platforms, such as Canonical’s Charmed Kubeflow are great options since they enable developers to run the end-to-end machine-learning lifecycle within one tool. Charmed Kubeflowenables professionals to start on a public cloud, either by using an appliance or by following the guide on EKS. Charmed Kubeflow has been tested and certified on performant hardware such as NVIDIA DGX. Canonical’s portfolio includes Charmed MLFlow and an observability stack.

Large language models (LLMs): what, why, how?