In the first part of this blog journey (I’d call it a post, but it’s actually two posts) we explored what operational excellence looks like in public cloud deployments. And while I do not want to spoil it for you, the main takeaway was that it is not easy and can become resource-intensive. With this in mind, you might should be wondering what you can do to achieve excellence without focusing all your resources on operations. You may be asking yourself questions like “Am I still able to innovate?” or “Do I have enough resources to cover all of this?” Worry not, friend, for I am here to guide you through this maze.
To cover your environment’s operations, you’ve got two options: you can do it yourself, or you can partner with someone who can do it for you. Regardless of your choice, the operations are the same. But depending on where you are in your journey, and what your main business scope is, some options could be more advantageous than others. Let’s look at what each choice would mean for your business.
I’ve mentioned above that the operations themselves are similar regardless of your choice. On top of that, even the requirements to achieve the operational excellence I’ve described in Blog 1 are the same, whether you choose to try your hand at self-management or opt for a managed service. And while it would be impossible to map out exactly what you need in order to operate your public cloud clusters at full efficiency, there are a few key requirements without which no public cloud deployment could operate. These are:
Perhaps the most essential part of any project is people. This is especially true in the case of public cloud deployments. Unfortunately, in this scenario, the requirement is not for any type of person, but for seasoned software engineers. It is difficult to estimate precisely how many members a team requires, but an industry rule of thumb is that one engineer should usually focus on operating less than 100 nodes or clusters.
Attracting and retaining people in a team can be challenging. It is no secret that as an industry, we are currently battling a major software engineering workforce shortage, and so the market is quite fierce. Engineers tend to be motivated by purpose, complexity, scope, and, of course, competitive salaries – and it is all well deserved. Good engineers can make the difference between success and failure, and therefore entire businesses. In short, without people, there are no operations. Without good people, there are no good operations.
Automation plays a key role in management, and operations are no exception. However, when it comes to monitoring, alerting, and incident recovery, there is only so much automation one can do. For various reasons, a big part of operations will always remain manual, requiring large amounts of time from engineers. Ideally, within a team, the senior members will focus on innovation and troubleshooting the most difficult incidents, while the junior and middle members will cover lighter operational tasks. This, however, changes with availability, which makes time a scarce yet essential resource for achieving operational excellence.
Let’s suppose we’ve got a team of people, and they have enough time to cover operations. And so we send them to work, and they come back screaming – they realise how dynamic and volatile public cloud operations can be, and how much they need to focus on not only building but also maintaining their operational skills. Knowledge is to operations what natural pearls are to jewellery: scarce, incredibly valuable, hard to obtain, and harder to maintain. Training an engineering team must be a continuous and evergreen process, in which members are constantly learning new and innovative ways to perform their tasks, whilst also continuously challenging themselves to improve and grow their seniority. Only by staying up to date with market trends and requirements can a team ensure that an environment operates at full efficiency and produces reliable and accurate results. Knowledge must be one of the key values of any operational team – I’d even dare to say any team at all.
I’m sure you’ll agree that the three items above are rather intuitive. You’ll say “I knew that – anyone with half a brain knows that you need skilled people who have time to do what you want them to do”. And I will agree with you. However, the reason I’ve chosen to list them so clearly is to make fully logical the next point, which is that all of the above require significant pecuniary investments from the company that undertakes the creation and formation of an operational team.
Hiring and maintaining talent costs a lot of money. Replacing talent costs even more. Training and re-training a team adds additional costs, and the time spent on operations will always incur an opportunity cost (which, in the case of senior engineers, is often dizzyingly high). What’s more, is that these costs are often unpredictable. It is difficult to be sure when the required headcount will be met, or what training each engineer will need, or how scarce time availability will be divided into periods of high uncertainty or innovation. Therefore, strong financial forecasting is necessary to ensure the smooth development of operations.
Now let’s look at what options you’ve got for applying those resources towards the operation of a public cloud app deployment. Taking matters into your own hands is always admirable because it is an act of courage. Doing it all yourself means that you assume full responsibility for whatever happens. This option essentially entails building a team of operational experts that will manage your public cloud application environment from its conception until its decommissioning. As mentioned above, you will need to nurture and grow this team and ensure that everyone is well-trained and well acquainted with the fast-moving dynamics of the public cloud app ecosystem.
Succeeding at the DIY approach is one thing. However, there are several factors that will make you more likely to achieve excellence and truly thrive. The scenario in which we would actually recommend that you take the DIY approach is if you are a tech-first company (meaning, your main scope is related to software or hardware). Being tech-first would make you an attractive workplace for top engineering professionals from an ideological perspective, more than a remunerative one, and it would also empower you to use the energy of all your teams to maintain technical correctness. In addition to that, top professionals tend to always gravitate towards intellectual challenges and autonomy for innovation.
Worry not if you are not tech-first, though. You can still try your hand at managing your public cloud clusters all by yourself, or with the help of your in-house IT team. Let’s look at the advantages and disadvantages of the self-managed route.
Opting for a managed service provider (MSP) is rather self-explanatory in this context. Instead of taking care of your operations by yourself, you choose a trusted company and hire their services at a fixed cost. In return, they operate your environments to your specifications, for as long as you need them to.
During my tenure as product manager for managed services, I have been constantly baffled by the occasional stigma associated with selecting a managed service for open source ecosystems. There seems to be a fear that this choice is an indirect declaration of lack of skill. If this is a worry you have, allow me to dispel it for you: it is a very strong declaration of the opposite. Choosing a managed service liberates your engineering resources to focus on innovation, which actually honours their skills and training. The truth is that operations, despite their essential indirect presence in an innovative project, have a very low direct contribution to innovation itself. Keeping the lights on won’t help you create the next big thing – but you certainly won’t be able to do it at all in the dark. So, if you’ve got a talented engineer – and trust me, I know, they’re scarce – then I find that allowing them to build directly towards your innovation and competitive edge is a gesture of grace, respect, and corporate maturity.
But managed services are certainly not for everyone. Like with self-management, there are upsides and downsides:
If you’re thinking of choosing an MSP, it is important to consider a couple of variables. Are they large enough to cover your environment? Do they have enough experience? Are they flexible enough? Do they tie you down? All of these questions and more would require answers before you opt for an MSP. I explore this in more detail, with a high focus on the current and highly interesting topic of AI, in my whitepaper, An Executive Guide to Managed AI Infrastructure. I’ll also go over these questions in a future post.
Let’s tell it like it is: I’m likely biased. My entire career revolves around helping enterprises like yours achieve operational excellence through managed services. Are managed services a fully assured way to achieve your business goals? No. But can they get you closer? Absolutely. That does not mean you can’t achieve the same success without opting for a managed service, but it can mean that it would take more money, more time, and overall more effort. That is why I strongly believe in the power of operational management.
If you’re interested in the services Canonical offers in the field, check out our webpage: https://ubuntu.com/managed
Until next time, stay well!
Adrian
Microsoft Edge is now available for Ubuntu. In this guide, I’ll walk you through the…
Our latest Canonical website rebrand did not just bring the new Vanilla-based frontend, it also…
At Canonical, the work of our teams is strongly embedded in the open source principles…
Welcome to the Ubuntu Weekly Newsletter, Issue 873 for the week of December 29, 2024…
Have WiFi troubles on your Ubuntu 24.04 system? Don’t worry, you’re not alone. WiFi problems…
The following is a post from Mark Shuttleworth on the Ubuntu Discourse instance. For more…