Note: This post is co-authored by Jon Seager. Jon Seager is an engineering director at Canonical with responsibility for Juju, the Charmed Operator Framework, and a number of charmed operator development teams which operate across different software flavors including observability, data platform, MLOps, identity and more.
Juju re-imagines the world of operating software securely, reliably, and at scale. Juju realizes the promise of model-driven operations. Excellent observability is undeniably a key ingredient for operating software well, which is why the Charmed Operator ecosystem has long provided operators the ability to run a variety of open source monitoring software. We collectively refer to these operators as the Logs,
With the advent of cloud native software and microservices, and the resulting increase in complexity of systems, we decided it was time to create the next generation of LMA running on Kubernetes. It needed to be capable of monitoring workloads running on Kubernetes, virtual machines, bare metal, or the edge. Going back to the drawing board, we also reassessed which components would be part of this new cloud native LMA. The resulting design is composed of open source projects led or very heavily contributed to by Grafana Labs. Let us tell you why.
Juju takes a fundamentally different approach to building and operating infrastructure. Rather than a focus on the low-level components, such as virtual machines, subnets, or containers, Juju focuses the user on operating applications while it takes care of the rest.
At the heart of any Juju deployment lies a controller, which is aware of a set of underlying clouds and the state of the models and applications it’s managing. Models in Juju are essentially workspaces for groups of related applications, and applications are known as Charmed Operators.
A Charmed Operator comprises code that understands how to install, configure, integrate, and generally operate a piece of software. It uses an agent to communicate with the Juju controller, which is provisioned automatically when you deploy an application. Juju’s advantage is that Charmed Operators function the same whether your underlying cloud is AWS, Azure, OpenStack, VMware, Kubernetes, and more. You’re always just one juju deploy foo away from competently operating software with Juju, where the user experience and tooling is consistent across clouds and enables seamless integration between them.
There is an ever-growing collection of Charmed Operators available on Charmhub. Go have a look!
The new LMA stack, which we call “LMA2,” has the following design goals:
The design goals above are ambitious, and we are delighted with how the result of our efforts are shaping up! Our success is thanks in no small part to the amazing quality of the Prometheus, Grafana, and Grafana Loki projects, and how well their philosophy aligns with the design goals for LMA2, so let’s dive deeper into that.
At Canonical, open source is a part of our DNA. Juju and the entire Charmed Operator ecosystem is open source. We see eye-to-eye on this with the Grafana folks, who are excellent open source citizens, very responsive to the community, and overall a pleasure to work with.
But, of course, not only must the people be nice, the software must be nice, too! The open source projects led and contributed to by Grafana Labs don’t disappoint! Put very succinctly, the reasons why we decided to build LMA2 by composing Grafana projects are the following (not necessarily in order of importance!):
To expand on the above, Prometheus, Alertmanager, and Grafana have been staples of the previous iteration of LMA2 and relying on them again was simply a no-brainer: familiarity for end users, quality, ease of use, consistency in design philosophy, resource efficiency — it’s all there!
Loki, the “Prometheus for logs,” has displaced Graylog as the log analytics component of choice in LMA2. We ran a detailed evaluation involving, among other aspects, ease of operations, ease of integration with the other LMA2 components, sparsity of dependencies, and scalability. We got really excited seeing that Loki 2.0 did away with a dedicated index store, further reducing the footprint for LMA2. Moreover, Loki uses object storage rather than complex databases, which is amazing for reliability and ease of operations. In terms of consistent user experience, Loki is very well integrated in Grafana, and LogQL feels very familiar to Prometheus users.
Grafana Agent also deserves a loving mention. Monitoring benefits greatly from the network effect: the more you monitor together, the easier it is to find correlations and root causes. The ease of enabling telemetry collection helps to achieve this network effect: a single agent, capable of collecting and forwarding metrics, logs, and distributed traces goes a long way to reduce the cost and complexity of rolling our monitoring to many systems. We like the Grafana Agent so much that we will build the self-monitoring capabilities of LMA2 with it! We foresee two modes for self-monitoring: one in which the data goes to a remote LMA2 stack (think of it like Prometheus federation, but for the whole stack), and one in which the stack monitors itself, to be used for self-contained deployments.
The declarative nature of Juju lends itself really well to graphical representations.
Here above you see a diagram representing the deployment of an LMA2 in a Juju model and how the various Charmed Operators interact with one another. (Note that not everything in the diagram is implemented just yet.)
As mentioned before, the self-monitoring of the stack by the Grafana Agent is something we look forward to implementing soon. Similarly, we are working on charming (what the Juju community calls the process of creating a Charmed Operator) an Ingress controller so that we can expose outside various endpoints that are needed to monitor or receive data from outside the MicroK8s cluster since that, for resilience reasons, we want LMA2 to be hosted in a dedicated MicroK8s cluster and share as little infrastructure as possible with the workloads it monitors. The endpoints that will need to be exposed outside of the MicroK8s cluster are:
(Note: We currently do not plan to implement support for Prometheus federation or remote_read, but if the use-case arises…)
With regard to telemetry types, the work on the new cloud native LMA2 is currently focusing on metrics, logs, and alerts. But there is more than that to observability, especially in cloud native environments. On the “What is observability?” page we go through a number of other telemetry types, like distributed tracing, or end user and synthetic monitoring. The Grafana ecosystem has projects, such as Grafana Tempo and k6 that fit those bills, which makes us confident that as LMA2 grows and becomes more capable, it will be able to leverage projects with this consistent philosophy and quality as it does today with Prometheus, Grafana, and Loki.
Another direction we want to pursue with LMA2 is high scalability and availability. We are looking with great interest at Cortex as the way of bringing multi-tenancy and resilience to the Prometheus experience, and Loki already has a lot of the capabilities we need in that dimension.
Join us in making model-driven observability a first-class experience on the Charmhub Mattermost, have a look at the LMA2 documentation, or jump right into it and take LMA Light for a spin!
Microsoft Edge is now available for Ubuntu. In this guide, I’ll walk you through the…
Our latest Canonical website rebrand did not just bring the new Vanilla-based frontend, it also…
At Canonical, the work of our teams is strongly embedded in the open source principles…
Welcome to the Ubuntu Weekly Newsletter, Issue 873 for the week of December 29, 2024…
Have WiFi troubles on your Ubuntu 24.04 system? Don’t worry, you’re not alone. WiFi problems…
The following is a post from Mark Shuttleworth on the Ubuntu Discourse instance. For more…