Getting Started with Patroni

Patroni is a tool for deploying PostgreSQL servers in high-availability configurations. Read on to get your feet wet with Patroni.

This post assumes you are familiar with PostgreSQL streaming replication, as well as replication toplogies.

Patroni

Patroni is a fork of the now-unmaintained Governor from Compose. It is open source (GitHub repo) and is documented here. Among other things, Patroni supports bootstrapping, synchronous standbys, automatic failover and maximum replication lag limit on failover.

Patroni needs a distributed configuration manager for coordinating it’s actions, and supports ZooKeeper, etcd and consul. Patroni itself is written in Python and available on PyPi.

Let’s see how to setup an etcd-based Patroni to manage a primary-standby PostgreSQL cluster. Practically, you’d have a 3-node or 5-node etcd cluster, and the primary and standby on separate servers. However, we’re only interested in getting a working Patroni setup so we can explore further, so we’ll set everything up on a single server.

The commands below are for Debian 9, you should be able to use similar or same commands on other distros too.

Setup etcd

etcd is available from it’s GitHub releases page. Let’s download it and run it:

wget https://github.com/coreos/etcd/releases/download/v3.3.2/etcd-v3.3.2-linux-amd64.tar.gz
tar xvf etcd-v3.3.2-linux-amd64.tar.gz
cd etcd-v3.3.2-linux-amd64
./etcd

This will start and run etcd in the foreground. (Use ^C to exit.) etcd is being run as a single-node cluster, which is useless from a redundancy perspective but is good enough for experimentation and development use.

Install Patroni

Patroni and it’s dependencies are installed via pip:

sudo pip install patroni[etcd]

The command is to be typed in verbatim, including the square brackets.

If you don’t have pip, you can install it with apt-get:

sudo apt-get install python-pip

On RHEL/CentOS-based distros, you’ll need to “sudo yum install epel-release python-pip gcc” before you can pip install.

Installing Postgres

Obviously, the Postgres binaries must be present on the system. You can install the Postgres that your distro bundles, or follow the instructions on the Postgres download page. Patroni does not appear to be too dependent on the Postgres version.

For this example, we’ll use the 9.6 version included in Debian 9:

sudo apt-get install postgresql-9.6

Which ever method and version you choose, make a note of the path for the Postgres binaries, we’ll need this later. In this case, the path is:

/usr/lib/postgresql/9.6/bin

Patroni Configuration Files

Patroni is invoked with the path to a configuration file. We’ll have two configuration files, one for the primary and one for the slave. We’ll run both on the same server, on different ports (5432 and 5433).

We’ll call the nodes “pg-1” and “pg-2”. They each have their own configuration files. Here is the first one, “pg-1.yml”:

scope: my-ha-cluster
name: pg-1

restapi:
  listen: 0.0.0.0:8008
  connect_address: 127.0.0.1:8008

etcd:
  host: 127.0.0.1:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        wal_keep_segments: 8
        max_wal_senders: 5
        max_replication_slots: 5
        checkpoint_timeout: 30

  initdb:
  - encoding: UTF8

  pg_hba:
  - host all dba all md5
  - host replication repl all md5

  users:
    dba:
      password: secret
      options:
        - createrole
        - createdb
    repl:
      password: secret
      options:
        - replication

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 127.0.0.1:5432
  data_dir: /data/pg-1
  config_dir: /data/pg-1
  bin_dir: /usr/lib/postgresql/9.6/bin
  authentication:
    replication:
      username: repl
      password: secret
    superuser:
      username: dba
      password: secret
  parameters:
    unix_socket_directories: '/tmp'

Here’s an explanation of the fields:

scope: This is the name of the Patroni-managed cluster, which can be used in patronictl to refer to your cluster. All nodes should have the same value for this key.
name: Name of the node, unique within the cluster.
restapi: Patroni has a REST API, which is started at this address (“listen”). The “connect_address” is the address with which other nodes can connect to this API, so the IP here should be the IP by which this node can be reached from other nodes (typically over a private VLAN).
etcd: The configuration to connect to the etcd cluster. For a 3-node etcd cluster, use “hosts: ip1:port1, ip2:port, ip3:port3”.
bootstrap: These values are used when a Patroni cluster is created. The values under “postgresql.parameters” are actual postgresql.conf configuration parameters. Some of the values, like wal_level and max_wal_senders are required for streaming replication to work.
initdb: When bootstrapping the first node of a cluster and the PostgreSQL data directory does not exist, these parameters will be used to invoke initdb.
pg_hba: The entries that Patroni will add to the pg_hba.conf file of the database it creates. See the “users” section below.
users: Patroni creates the list of users specified here. These users (who should have been given access in pg_hba above) are then used in the postgresql.authentication section below to let Patroni login to the Postgres server. Here, the users “dba” (for admin access by Patroni) and “repl” (for replication access from standby’s) are created.
postgresql: These parameters contain a bunch of information about the PostgreSQL server (“pg-1”) managed by this Patroni node. The IP in “connect_address” should be the IP from which other servers can reach this server (typically over a private VLAN). We’re using 127.0.0.1 here since everything is on the same node. The bin_dir is what we picked up earlier, and we’re setting data and config dirs to “/data/pg-1”. For now, this directory does not exist (“/data” should though). The “authentication” parameters should refer to the replication and admin users that we created above in the “users” section. Finally, the “parameters” section again contains postgresql.conf configuration parameters that Patroni will pass to pg_ctl to start the database.

And here is the second file, “pg-2.yml”:

scope: my-ha-cluster
name: pg-2

restapi:
  listen: 0.0.0.0:8009
  connect_address: 127.0.0.1:8009

etcd:
  host: 127.0.0.1:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        wal_keep_segments: 8
        max_wal_senders: 5
        max_replication_slots: 5
        checkpoint_timeout: 30

  initdb:
  - encoding: UTF8

  pg_hba:
  - host all dba all md5
  - host replication repl all md5

  users:
    dba:
      password: secret
      options:
        - createrole
        - createdb
    repl:
      password: secret
      options:
        - replication

postgresql:
  listen: 0.0.0.0:5433
  connect_address: 127.0.0.1:5433
  data_dir: /data/pg-2
  config_dir: /data/pg-2
  bin_dir: /usr/lib/postgresql/9.6/bin
  authentication:
    replication:
      username: repl
      password: secret
    superuser:
      username: dba
      password: secret
  parameters:
    unix_socket_directories: '/tmp'
    port: 5433

It looks, and is, mostly similar to the first file. There are only a few changes:

name: This has to be different, and this one we’re naming as “pg-2”.
restapi: The ports are different, since we’re running 2 Patroni’s on the same node. This is typically not the case.
postgresql: The ports have been changed from 5432 to 5433. The data directory is also now “/data/pg-2”. In the “parameters” section also, the port is specified as 5433.

These configuration files have more options, you can read more about them here. We’re only using a minimal set to bring up a working cluster.

Starting the Cluster

Note that we don’t have any actual databases yet. We’re going to let Patroni invoke initdb for us, using the parameters from the configuration files.

We already have etcd running. Let’s also make sure any default PostgreSQL service is stopped first:

sudo systemctl stop postgresql

Now we can start the first node, by calling the “patroni” script with the path to the first configuration file:

$ patroni pg-1.yml
2018-03-13 07:47:05,853 INFO: Failed to import patroni.dcs.consul
2018-03-13 07:47:05,867 INFO: Selected new etcd server http://127.0.0.1:2379
2018-03-13 07:47:05,878 INFO: Lock owner: None; I am pg-1
2018-03-13 07:47:05,882 INFO: trying to bootstrap a new cluster
The files belonging to this database system will be owned by user "vagrant".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /data/pg-1 ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /usr/lib/postgresql/9.6/bin/pg_ctl -D /data/pg-1 -l logfile start

2018-03-13 07:47:08,850 INFO: postmaster pid=1658
localhost:5432 - no response
LOG:  database system was shut down at 2018-03-13 07:47:08 GMT
LOG:  MultiXact member wraparound protections are now enabled
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
localhost:5432 - accepting connections
localhost:5432 - accepting connections
2018-03-13 07:47:09,916 INFO: establishing a new patroni connection to the postgres cluster
2018-03-13 07:47:09,926 INFO: running post_bootstrap
2018-03-13 07:47:09,940 WARNING: Could not activate Linux watchdog device: "Can't open watchdog device: [Errno 2] No such file or directory: '/dev/watchdog'"
2018-03-13 07:47:09,948 INFO: initialized a new cluster
2018-03-13 07:47:19,943 INFO: Lock owner: pg-1; I am pg-1
2018-03-13 07:47:19,951 INFO: Lock owner: pg-1; I am pg-1
2018-03-13 07:47:19,960 INFO: no action.  i am the leader with the lock
2018-03-13 07:47:29,992 INFO: Lock owner: pg-1; I am pg-1
2018-03-13 07:47:30,000 INFO: no action.  i am the leader with the lock

As you can see, it has invoked initdb to create a new database at /data/pg-1, and started the server. You can connect to the server using psql -h /tmp -U dba postgres at this point.

Now let’s bring up the standby, by running “patroni” with the second configuration file, in another terminal:

$ patroni pg-2.yml
2018-03-13 07:58:55,411 INFO: Failed to import patroni.dcs.consul
2018-03-13 07:58:55,425 INFO: Selected new etcd server http://127.0.0.1:2379
2018-03-13 07:58:55,438 INFO: Lock owner: pg-1; I am pg-2
2018-03-13 07:58:55,441 INFO: trying to bootstrap from leader 'pg-1'
2018-03-13 07:58:55,962 INFO: replica has been created using basebackup
2018-03-13 07:58:55,963 INFO: bootstrapped from leader 'pg-1'
2018-03-13 07:58:55,999 INFO: postmaster pid=1752
localhost:5433 - no response
LOG:  database system was interrupted; last known up at 2018-03-13 07:58:55 GMT
LOG:  entering standby mode
LOG:  redo starts at 0/2000028
LOG:  consistent recovery state reached at 0/20000F8
LOG:  database system is ready to accept read only connections
FATAL:  could not start WAL streaming: ERROR:  replication slot "pg_2" does not exist

FATAL:  could not start WAL streaming: ERROR:  replication slot "pg_2" does not exist

localhost:5433 - accepting connections
localhost:5433 - accepting connections
2018-03-13 07:58:57,075 INFO: Lock owner: pg-1; I am pg-2
2018-03-13 07:58:57,075 INFO: does not have lock
2018-03-13 07:58:57,075 INFO: establishing a new patroni connection to the postgres cluster
2018-03-13 07:58:57,090 INFO: no action.  i am a secondary and i am following a leader
2018-03-13 07:58:59,958 INFO: Lock owner: pg-1; I am pg-2
2018-03-13 07:58:59,958 INFO: does not have lock
2018-03-13 07:58:59,962 INFO: no action.  i am a secondary and i am following a leader
LOG:  started streaming WAL from primary at 0/3000000 on timeline 1

Patroni has taken a backup using pg_basebackup, setup streaming replication and brought up the PostgreSQL server. To verify, you can try connecting to the standby with psql -h /tmp -p 5433 -U dba postgres. We’d specified “hot_standby” as “on”, so read-only connections to the standby are possible.

Using Patronictl

Patroni includes a command called “patronictl” which can be used to control the cluster. Let’s check the status of the cluster:

$ patronictl -d etcd://127.0.0.1:2379 list my-ha-cluster
+---------------+--------+-----------+--------+---------+-----------+
|    Cluster    | Member |    Host   |  Role  |  State  | Lag in MB |
+---------------+--------+-----------+--------+---------+-----------+
| my-ha-cluster |  pg-1  | 127.0.0.1 | Leader | running |       0.0 |
| my-ha-cluster |  pg-2  | 127.0.0.1 |        | running |       0.0 |
+---------------+--------+-----------+--------+---------+-----------+

Manual Failover

The patronictl command can be used to do a manual failover. Since pg-1 is the current master, we want to promote pg-2. Let’s do that with:

$ patronictl -d etcd://127.0.0.1:2379 failover my-ha-cluster
Candidate ['pg-2'] []: pg-2
Current cluster topology
+---------------+--------+-----------+--------+---------+-----------+
|    Cluster    | Member |    Host   |  Role  |  State  | Lag in MB |
+---------------+--------+-----------+--------+---------+-----------+
| my-ha-cluster |  pg-1  | 127.0.0.1 | Leader | running |       0.0 |
| my-ha-cluster |  pg-2  | 127.0.0.1 |        | running |       0.0 |
+---------------+--------+-----------+--------+---------+-----------+
Are you sure you want to failover cluster my-ha-cluster, demoting current master pg-1? [y/N]: y
2018-03-13 08:06:27.49165 Successfully failed over to "pg-2"
+---------------+--------+-----------+--------+---------+-----------+
|    Cluster    | Member |    Host   |  Role  |  State  | Lag in MB |
+---------------+--------+-----------+--------+---------+-----------+
| my-ha-cluster |  pg-1  | 127.0.0.1 |        | stopped |   unknown |
| my-ha-cluster |  pg-2  | 127.0.0.1 | Leader | running |           |
+---------------+--------+-----------+--------+---------+-----------+

Looks like the failover happened. Let’s list the nodes again to confirm:

$ patronictl -d etcd://127.0.0.1:2379 list my-ha-cluster
+---------------+--------+-----------+--------+---------+-----------+
|    Cluster    | Member |    Host   |  Role  |  State  | Lag in MB |
+---------------+--------+-----------+--------+---------+-----------+
| my-ha-cluster |  pg-1  | 127.0.0.1 |        | running |       0.0 |
| my-ha-cluster |  pg-2  | 127.0.0.1 | Leader | running |       0.0 |
+---------------+--------+-----------+--------+---------+-----------+

The node pg-2 has now indeed become the master. The Patroni output for pg-2 should contain log lines like:

2018-03-13 08:09:07,939 INFO: Lock owner: pg-2; I am pg-2
2018-03-13 08:09:07,952 INFO: no action.  i am the leader with the lock

Automatic Failover

When the current leader, pg-2, goes off the network, becomes unreachable, or if the node crashes, then the Patroni at pg-1 should take over and promote pg-1. Let’s see if that happens by killing the pg-2 Patroni process with a ^C:

2018-03-13 08:18:12,469 INFO: Lock owner: pg-2; I am pg-2
2018-03-13 08:18:12,477 INFO: no action.  i am the leader with the lock
^CLOG:  received fast shutdown request
LOG:  aborting any active transactions
FATAL:  terminating connection due to administrator command
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down
2018-03-13 08:18:17,008 INFO: Lock owner: pg-2; I am pg-2
$

Over in pg-1’s terminal, we can see it promoting itself:

2018-03-13 08:18:12,476 INFO: Lock owner: pg-2; I am pg-1
2018-03-13 08:18:12,476 INFO: does not have lock
2018-03-13 08:18:12,479 INFO: no action.  i am a secondary and i am following a leader
LOG:  replication terminated by primary server
DETAIL:  End of WAL reached on timeline 4 at 0/5000838.
FATAL:  could not send end-of-streaming message to primary: no COPY in progress

LOG:  invalid record length at 0/5000838: wanted 24, got 0
FATAL:  could not connect to the primary server: could not connect to server: Connection refused
                Is the server running on host "127.0.0.1" and accepting
                TCP/IP connections on port 5433?

2018-03-13 08:18:17,030 WARNING: request failed: GET http://127.0.0.1:8009/patroni (('Connection aborted.', error(104, 'Connection reset by peer')))
2018-03-13 08:18:17,162 INFO: promoted self to leader by acquiring session lock
server promoting
LOG:  received promote request
LOG:  redo done at 0/50007C8
2018-03-13 08:18:17,173 INFO: cleared rewind state after becoming the leader
LOG:  selected new timeline ID: 5
LOG:  archive recovery complete
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
2018-03-13 08:18:18,449 INFO: Lock owner: pg-1; I am pg-1
2018-03-13 08:18:18,469 INFO: no action.  i am the leader with the lock

and the node status confirms it:

$ patronictl -d etcd://localhost:2379 list my-ha-cluster
+---------------+--------+-----------+--------+---------+-----------+
|    Cluster    | Member |    Host   |  Role  |  State  | Lag in MB |
+---------------+--------+-----------+--------+---------+-----------+
| my-ha-cluster |  pg-1  | 127.0.0.1 | Leader | running |       0.0 |
+---------------+--------+-----------+--------+---------+-----------+

Onward!

The aim of this post was to get you up and running with a Patroni cluster on which you can start experimenting. Patroni can do much more than what we just did with it.

You can read more about Patroni in the docs, or go through the source here. The original Compose article here also has some information about how the solution works.

Monitoring PostgreSQL With OpsDash

With our own product, OpsDash, you can quickly start monitoring your PostgreSQL servers, and get instant insight into key performance and health metrics including replication stats.

Here’s a default, pre-configured dashboard for PostgreSQL.

OpsDash strives to save you the tedious work of setting up a useful dashboard. The metrics you see here were carefully chosen to represent the most relevant health and performance indicators for a typical PostgreSQL instance.

OpsDash understands the streaming replication feature of PostgreSQL and displays per-slave replication status on the master:

The replication lag (as a length of time) is also visible on a slave’s dashboard. In the graph below (part of the dashboard of a slave), we can see that the slave could not catch up to the master’s changes for a while.