In this series, we’ve been building up an Apache Spark cluster on Kubernetes using MicroK8s, Ubuntu Core OS, LXD
But we’re not finished quite yet. The setup we’ve built so far is vertically scalable. But we want to be able to scale horizontally and add more Ubuntu Core instances to a cluster in order to handle big data workloads with Spark. So in this final Part 4 of the series, we’ll get to that using LXD clustering and fan networking.
Fan networking builds a virtual, overlay software-defined network on top of the real, underlay network, so that VMs and Linux Containers can communicate with one another across hosts seamlessly. LXD’s fan networking implementation can handle up to a theoretical ~16 million hosts, with 65k VMs or Linux Containers per host. So for our Spark cluster, it should be good!
The first step is to deploy three Ubuntu Core VMs on GCE. We’ll take a smaller instance size this time, so this project shouldn’t end up costing you more than a regular oat milk latte from your favourite coffee shop. You can terminate and delete the VM we used the last time if you want, and we will start afresh. While we’re here, we’ll also adjust the MTU for the VPC so that everything works as expected.
Use the following commands:
gcloud compute networks update default --mtu=1500
for x in {0..2};
do
gcloud beta compute instances create ubuntu-core-20-$x --zone=europe-west1-b --machine-type=n2-standard-2 --network-interface network=default --network-tier=PREMIUM --maintenance-policy=MIGRATE --service-account=@developer.gserviceaccount.com --scopes=https://www.googleapis.com/auth/devstorage.read_only,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/trace.append --min-cpu-platform="Intel Cascade Lake" --image=ubuntu-core-20-secureboot --boot-disk-size=60GB --boot-disk-type=pd-balanced --boot-disk-device-name=ubuntu-core-20-$x --shielded-secure-boot --no-shielded-vtpm --no-shielded-integrity-monitoring --reservation-affinity=any --enable-nested-virtualization --create-disk=size=60,mode=rw,auto-delete=yes,name=storage-disk-$x,device-name=storage-disk-$x
done
Cool! We got the three systems up and deployed! Since we preinstalled LXD in the VM disk image back in Part 2, we can just log into the first one and initialize the LXD cluster and the fan network from it.
Time to cluster up kids, here we go! Run the following commands to initialize the cluster:
GCE_IIP0=$(gcloud compute instances list | grep ubuntu-core-20-0 | awk '{ print $5 }')
INT0_IP=$(gcloud compute instances list | grep ubuntu-core-20-0 | awk '{ print $4 }')
SUBNET=$(echo $INT0_IP | sed 's/.[0-9]*$/.0/')
cat > preseed-0.yaml <@$GCE_IIP0:.
ssh @$GCE_IIP0 "cat preseed-0.yaml | sudo lxd init --preseed"
Oh wow, that’s the first node bootstrapped. On to building the rest of the LXD cluster now. We just need the cert data from the bootstrap server in order to get going. We’ll build preseed files and apply them to the other two nodes:
GCE_IIP1=$(gcloud compute instances list | grep ubuntu-core-20-1 | awk '{ print $5 }')
GCE_IIP2=$(gcloud compute instances list | grep ubuntu-core-20-2 | awk '{ print $5 }')
INT1_IP=$(gcloud compute instances list | grep ubuntu-core-20-1 | awk '{ print $4 }')
INT2_IP=$(gcloud compute instances list | grep ubuntu-core-20-2 | awk '{ print $4 }')
CERT_DATA=$(ssh @$GCE_IIP0 "sudo sed 's/$/n/g' /writable/system-data/var/snap/lxd/common/lxd/cluster.crt")
cat > preseed-1.yaml < preseed-2.yaml <@$GCE_IIP1:.
ssh -t @$GCE_IIP1 "cat preseed-1.yaml | sudo lxd init --preseed"
scp preseed-2.yaml @$GCE_IIP2:.
ssh -t @$GCE_IIP2 "cat preseed-2.yaml | sudo lxd init --preseed"
Your LXD cluster should be alive now. Alive! Now we can pour MicroK8s over the top, but this time all clustered and highly available. Since LXD has made our Ubuntu Core instances operate as one, we can launch several MicroK8s instances from one host, and they’ll be distributed across the LXD cluster as needed.
One gotcha to be aware of is that our LXD fan network, like our MicroK8s Calico network, is making use of VXLAN networking, meaning that we will need to adjust the MTUs at each layer. We already changed the MTU for the VPC to 1500 in a previous step. In the next steps, we’ll change the MTU for the VMs running MicroK8s to 1450, and the MTU for the Calico network to 1400, as well as deploying and clustering MicroK8s:
for x in {0..2};
do
ssh -t @$GCE0_IIP "sudo lxc init ubuntu:20.04 microk8s-$x --vm -c limits.memory=6GB -c limits.cpu=2"
ssh -t @$GCE0_IIP "sudo lxc config device override microk8s-$x root size=40GB"
ssh -t @$GCE0_IIP "sudo lxc start microk8s-$x"
sleep 90 # let it boot
ssh -t @$GCE0_IIP "sudo lxc exec microk8s-$x -- echo ' mtu: 1450' | sudo tee -a /etc/netplan/50-cloud-init.yaml"
ssh -t @$GCE0_IIP "sudo lxc exec microk8s-$x -- sudo netplan apply"
ssh -t @$GCE0_IIP "sudo exec microk8s-$x -- sudo snap install microk8s --classic"
ssh -t @$GCE0_IIP sudo exec microk8s-$x -- microk8s.kubectl patch configmap/calico-config -n kube-system --type merge -p '{"data":{"veth_mtu": "1400"}}'
ssh -t @$GCE0_IIP sudo exec microk8s-$x -- microk8s.kubectl rollout restart daemonset calico-node -n kube-system
done
for x in {1..2};
do
JOIN_CMD=$(ssh -t @$GCE_IIP0 "sudo lxc exec microk8s-0 -- microk8s add-node" | grep "microk8s join" | tail -n 1)
ssh -t @$GCE_IIP0 "sudo lxc exec microk8s-$x -- $JOIN_CMD"
done
Just a highly available Kubernetes cluster, on a highly available LXD cluster, on a bunch of next-gen Ubuntu Core OS boxes using nested virtualisation! To get Spark deployed and running on the top, we can replay the steps we did in Part 3 of this series.
To recap:
Thanks for being such great company on this journey – but now it’s your turn to take this further. I want you to have some fun, and don’t forget to tag us on Twitter (@ubuntu) with some examples of your nicest projects!
Contact us to learn more about MicroK8s, LXD, Ubuntu Core or any of the technologies that we’ve explored in this series. We’ll gladly set up a call so that you can find out more about how our technologies and services can help you to jump-start your next implementation.
One of the most critical gaps in traditional Large Language Models (LLMs) is that they…
Canonical is continuously hiring new talent. Being a remote- first company, Canonical’s new joiners receive…
What is patching automation? With increasing numbers of vulnerabilities, there is a growing risk of…
Wouldn’t it be wonderful to wake up one day with a desire to explore AI…
Ubuntu and Ubuntu Pro supports Microsoft’s Azure Cobalt 100 Virtual Machines (VMs), powered by their…
Welcome to the Ubuntu Weekly Newsletter, Issue 870 for the week of December 8 –…