In a previous blog, we talked about patterns to run a database in a highly available manner. In this blog, we present our open source recipe for PostgreSQL high availability.
A good recipe should always start by exposing the benefits of the concerned meal. Therefore, we will start this section with a brief introduction to the benefits of using PostgreSQL and why high availability is, often, a mandate.
PostgreSQL is one of the most widely used databases, as the latest survey from StackOverflow shows. PostgreSQL has been successfully used for more than 35 years for a variety of workloads like Online Transaction Processing and Online Analytical Processing ones provides a wide range of features that make it suitable for a large number of industries, including finance and healthcare ones as TimeScale’s 2022 state of PostgreSQL report explains.
Now that we have an overview of PostgreSQL, we will discuss why running only one instance of PostgreSQL is generally a bad idea.
Reports like ManageForce’s or Phenom’s estimated the cost of an outage to be around $500,000 per hour, on average. A database outage often cascades to its consuming applications and causes a degradation on the dependent services. Therefore, a database outage is one of the worst scenarios for any company.
Unfortunately, there is more to outages than impact on revenue. According to IDC’s 2021 Worldwide State of Data Protection and DR Survey, an outage might also lead to loss of productivity, data loss and reputation damage.
This is why running a highly available PostgreSQL deployment is beneficial for your company. So how can you achieve it? In the next section, we will discuss the components we choose – at Canonical – to automate the deployment of H.A. PostgreSQL clusters.
The following table provides an overview of the components we use to provide a highly available PostgreSQL deployment:
Component | Role/Functions | Version (major) |
PostgreSQL | Database server | 14 |
PgBouncer | Connection poolingConnection failover | 1 |
Patroni | Availability ManagerReplication Manager | 3 |
pgBackRest | BackupRestore | 2 |
Charmed operator | Operator for PostgreSQL It automates a number of management tasks like deployment and scaling | latest/stable |
Juju | Operator framework It allows the user to run an operator on top of a variety of clouds including K8s, OpenStack, AWS, Azure and GCP. | 3.1 |
Let’s cover how to install these components.
The first step is to install Juju, our operator framework:
sudo snap install juju --channel=3.1/stable
Next, we will install LXD. LXD is a system container manager that will allow us to emulate a cloud environment in our local machine.
sudo snap install lxd --channel=5.0/stable
The next step is to bootstrap Juju configuration by performing the following commands:
juju bootstrap
Clouds
aws
aws-china
aws-gov
azure
azure-china
equinix
localhost
oracle
Select a cloud [localhost]: localhost
Enter a name for the Controller [localhost-localhost]: demo-postgres
Finally, we can start deploying a single PostgreSQL instance:
juju add-model mymodel
juju deploy postgresql --channel edge
Please note that the first time you run the above commands, it might take Juju several minutes to download the required charm (i.e. Juju-based operator or application) and dependencies. The subsequent retries should be faster.
In order to check for the status of the deployment, you can use the following command:
juju status
# You can type ‘juju status --watch 1s’ to continuously monitor the output
For more details around what is happening in the background you can type:
juju debug-log --tail --replay
After a few minutes, (your mileage may vary), you should get an output similar to the following after typing juju status:
Model Controller Cloud/Region Version SLA Timestamp
mymodel demo-postgres localhost/localhost 3.1.0 unsupported 15:00:31+02:00
App Version Status Scale Charm Channel Rev Exposed Message
postgresql active 1 postgresql edge 281 no
Unit Workload Agent Machine Public address Ports Message
postgresql/0* active idle 0 …
Machine State Address Inst id Base AZ Message
0 started … juju-3d56ca-0 ubuntu@22.04 Running
And our first elephant is ready !
As we said earlier, running a single instance is not a good idea. So we will now explore another neat feature of our charms, on-demand scaling. Adding replicas to PostgreSQL is as simple as running the following command:
juju add-unit -n 2 postgresql
After some minutes, running juju status should yield an output similar to the following:
Model Controller Cloud/Region Version SLA Timestamp
mymodel demo-postgres localhost/localhost 3.1.0 unsupported 15:14:14+02:00
App Version Status Scale Charm Channel Rev Exposed Message
postgresql active 3 postgresql edge 281 no
Unit Workload Agent Machine Public address Ports Message
postgresql/0* active executing 0 …
postgresql/1 active executing 1 …
postgresql/2 active executing 2 …
Machine State Address Inst id Base AZ Message
0 started … juju-3d56ca-0 ubuntu@22.04 Running
1 started … juju-3d56ca-1 ubuntu@22.04 Running
2 started … juju-3d56ca-2 ubuntu@22.04 Running
Let’s deploy the PgBouncer component next, using the following command:
juju deploy pgbouncer --channel edge
After some minutes, running juju status should show a new application:
Model Controller Cloud/Region Version SLA Timestamp
mymodel demo-postgres localhost/localhost 3.1.0 unsupported 15:16:36+02:00
App Version Status Scale Charm Channel Rev Exposed Message
pgbouncer unknown 0 pgbouncer edge 25 no
postgresql active 3 postgresql edge 281 no
Unit Workload Agent Machine Public address Ports Message
postgresql/0* active idle 0 …
postgresql/1 active idle 1 …
postgresql/2 active idle 2 …
Machine State Address Inst id Base AZ Message
0 started … juju-3d56ca-0 ubuntu@22.04 Running
1 started … juju-3d56ca-1 ubuntu@22.04 Running
2 started … juju-3d56ca-2 ubuntu@22.04 Running
As you might have noticed, PgBouncer shows unknown as the Status. You should not worry, as it is expected. PgBouncer is actually a subordinate charm that is deployed inside the same system container. Therefore, PgBouncer will only be invoked when it is used.
In order to emulate an application that uses PostgreSQL, we will use the Data Integrator charm:
juju deploy data-integrator --channel edge --config database-name=test-database
After some minutes (or hours if you go snacking like me 🙂), running juju status should yield an output similar to the following:
Model Controller Cloud/Region Version SLA Timestamp
mymodel demo-postgres localhost/localhost 3.1.0 unsupported 17:24:20+02:00
App Version Status Scale Charm Channel Rev Exposed Message
data-integrator blocked 1 data-integrator edge 10 no Please relate the data-integrator with the desired product
pgbouncer unknown 0 pgbouncer edge 25 no
postgresql active 3 postgresql edge 281 no
Unit Workload Agent Machine Public address Ports Message
data-integrator/0* blocked idle 5 … Please relate the data-integrator with the desired product
postgresql/0* active idle 0 … Primary
postgresql/1 active idle 1 …
postgresql/2 active idle 2 …
Machine State Address Inst id Base AZ Message
0 started … juju-3d56ca-0 ubuntu@22.04 Running
1 started … juju-3d56ca-1 ubuntu@22.04 Running
2 started … juju-3d56ca-2 ubuntu@22.04 Running
5 started … juju-3d56ca-5 ubuntu@22.04 Running
Juju provides a powerful integration abstraction (a.k.a. relation) that allows to establish a communication link between two workloads (e.g. PgBouncer and PostgreSQL server).
Therefore, relating PgBouncer to PostgreSQL is as simple as running the following command:
juju relate postgresql pgbouncer
And relating the Data Integrator to PgBouncer is as simple as the previous operation:
juju relate data-integrator pgbouncer
After some minutes, running juju status should yield an output similar to the following:
Model Controller Cloud/Region Version SLA Timestamp
mymodel demo-postgres localhost/localhost 3.1.0 unsupported 17:31:43+02:00
App Version Status Scale Charm Channel Rev Exposed Message
data-integrator active 1 data-integrator edge 10 no
pgbouncer active 1 pgbouncer edge 25 no
postgresql active 3 postgresql edge 281 no
Unit Workload Agent Machine Public address Ports Message
data-integrator/0* active idle 5 …
pgbouncer/0* active idle …
postgresql/0* active idle 0 … Primary
postgresql/1 active idle 1 …
postgresql/2 active idle 2 …
… The rest was omitted to save some bytes (and the elephant's environment).
And voila ! The communication links are now established between:
Now that our meal is ready, let’s taste it.
We will continue our culinary experience by communicating with PostgreSQL using the Data integrator charm. The following command will create a user and display its credentials:
juju run data-integrator/leader get-credentials
The latter will , only one time, display a username and password in the following format:
ok: "True"
postgresql:
database: test-database
endpoints: localhost:
password:
username:
version: "14.7"
We can now connect to the deployed PostgreSQL by using the following commands:
juju ssh postgresql/0*
psql --host=postgresql/0* that displayed in juju status output>
--username=
--password test-database
After typing the previously provided password we get a prompt where we can run queries against our PostgreSQL deployment. For example, we can issue the following query:
test-database=> SELECT VERSION();
You can exit the prompt by typing q and you can exit the system container by typing exit.
High availability is not only about deploying several replicas of PostgreSQL. It is also about providing automatic failover when a problem affects one of the replicas. Let’s check together what our charmed operator will do for us when we simulate a failure in one of the PostgreSQL units.
We recommend, from now on, to execute the upcoming instructions in a new terminal/tab so that you can follow the behaviour of our charmed operator.
We will simulate a first issue in the PostgreSQL primary by doing the following:
# Connect to the system container where the primary elephant is hosted
juju ssh postgresql/0*
# The following will display all running processes
ps -edf
# The following will terminate all postgresql related processes
sudo pkill postgres
ps -edf
Patroni should restart all the postgreSQL processes shortly after they are killed. As a consumer of PostgreSQL, you will not notice any issue.
We will now try to abruptly stop the system container where the primary PostgreSQL is running. This can emulate a crash of a server or a sudden network isolation of the primary
exit
Juju status
# Note the ip address of the primary PostgreSQL
# Then note the Inst id corresponding to the primary (in the Machine section of juju status’ output)
lxc list
# The above will display all the system containers managed by lxd/lxc
# Identify the system container used by the primary by comparing the lxc’s NAME to the previously identified Inst id
lxc stop --force --timeout 0
# The above will abruptly stop the Primary’s host
lxc list
By checking juju status, you can see that an automatic failover happened and that our cluster self healed after only a few seconds!
A juju status should present an output similar to the following:
Model Controller Cloud/Region Version SLA Timestamp
mymodel demo-postgres localhost/localhost 3.1.0 unsupported 19:07:11+02:00
App Version Status Scale Charm Channel Rev Exposed Message
data-integrator active 1 data-integrator edge 10 no
pgbouncer active 1 pgbouncer edge 25 no
postgresql active 2/3 postgresql edge 281 no
Unit Workload Agent Machine Public address Ports Message
data-integrator/0* active idle 5 …
pgbouncer/0* active idle …
postgresql/0 unknown lost 0 … agent lost, see 'juju show-status-log postgresql/0'
postgresql/1 active idle 1 … Primary
postgresql/2* active idle 2 …
Machine State Address Inst id Base AZ Message
0 down … juju-3d56ca-0 ubuntu@22.04 Running
1 started … juju-3d56ca-1 ubuntu@22.04 Running
2 started … juju-3d56ca-2 ubuntu@22.04 Running
5 started … juju-3d56ca-5 ubuntu@22.04 Running
At Canonical, we are committed to open-source software. Therefore, all of our charms are open-source and are available under the following links:
So if you like PostgreSQL as much as we do, please do not hesitate to submit feedback, propose a commit or contact us on mattermost to discuss your ideas and requests.
2024 was the GenAI year. With new and more performant LLMs and a higher number…
Canonical’s Kubernetes LTS (Long Term Support) will support FedRAMP compliance and receive at least 12…
Welcome to the Ubuntu Weekly Newsletter, Issue 878 for the week of February 2 –…
At Canonical, we firmly believe that delivering an outstanding, customer-centric support experience is impossible without…
I want to share how to install osTicket v1.14 for Ubuntu 20.04 server. osTicket written…
Now I want to share how to install WordPress on ubuntu 20.04 server. WordPress is…