PostgreSQL 9.4 and later come with a feature called “replication slots”. They can be used to make WAL archiving and streaming replication more robust and efficient. Read on to learn more, with hands-on examples.
PostgreSQL provides ACID semantics for your data by sequencing transactions into a long series of changes, like an ever-growing changelog. Descriptions of these changes are encoded into “WAL files” – a series of ordered 16MB binary files stored alongside other files inside the server’s data directory.
This means that if you keep around all the WAL files since the creation of the database, you can recreate that database by reading and replaying the changes from the WAL files. This is the basis for streaming replication, WAL archiving and Point-In-Time-Recovery (PITR) in PostgreSQL.
(Note: some objects, like unlogged tables and hash indexes are not captured in WAL files – but that does not compromise the integrity of the data.)
WAL stands for Write Ahead Log, by the way.
Archiving WAL Files
Copying out WAL files as they are generated, into a “safe” location is called WAL archiving. A “safe” location is typically an NFS-mounted disk, file servers, S3 bucket etc.
Traditionally, a user-supplied script is invoked by PostgreSQL each time a WAL file is generated. The script then copies (using rsync, scp or the like) the file to one or more safe locations.
The WAL files stored in the safe location becomes a form of incremental backup from which a database can be recovered up to any specified point in time.
Instead of copying the WAL files to a safe location, it can instead be copied to another machine where another PostgreSQL server is running. This server can be configured to be in “recovery mode”, meaning all it does is to apply any new WAL files that it gets. This second server then, becomes a “standby” – it will be an identical, warm backup of the “primary” PostgreSQL server.
(The standby can also be made a “hot standby”, where it can also serve read-only queries, thus being a “read replica”.)
The process of “shipping” WAL files (rather than “archiving” them) to one ore more standby servers is called “log shipping”. Typically, log shipping is also done using the same archive script as before – it copies newly-created WAL files at the primary over the network to the standby servers.
Streaming replication was added to PostgreSQL inorder to improve the log shipping process. With streaming replication, the standby server will connect to the primary server over the network (using a different protocol than the standard client protocol). The primary server can then send WAL file contents directly over this connection without having to rely on user-supplied scripts.
This also allows for better tracking. You can know for example, how far the standby has caught up with the primary in terms of receiving, applying and replaying the changes from the primary.
Streaming replication can be, and is popularly, used to maintain a failover-ready standby that can be promoted to the primary in case the primary fails.
WAL File Retention
Which brings us to the topic of how long the primary should retain WAL files.
Typically, you’ll have a WAL archival script that will copy WAL files to one or more long-term archive locations. If you don’t have any standbys or other streaming replication clients, then basically the server can discard the WAL file once the archive script reports success. You’ll still need some recent WAL files for crash recovery (data from recent WAL files get replayed during crash recovery).
(BTW: PostgreSQL does not actually discard the file, it renames and recycles them.)
If you have standbys though, there is a problem – you need to keep around WAL files long enough for as long as your slowest standby needs them. If the standby that was taken down for a while comes back online and asks the primary for a WAL file that the primary no longer has, then the replication fails.
The logical solution is to make the primary to keep track of how far behind the standby is, and to not delete WAL files that any standbys still need. And this feature is called replication slots.
(Note: there are “physical” and “logical” replication slots. We cover only “physical” replication slots in this article.)
Replication slots are uniquely identified by names. Each slot is associated with:
- the oldest WAL required by the consumer of the slot - WAL files later than this are not deleted during checkpoints
- the oldest transaction ID required to be retained by the consumer of the slot - rows needed by any transactions later than this are not deleted by vacuum
Let’s try it out. First, we need a new database to experiment with:
Make sure that
data/pg_hba.conf allows normal and replication connections:
(This is about as insecure as you can get, but works great for experimenting!) Let’s start the server now:
Creating a Replication Slot
Now that we have a running PostgreSQL server, let’s create a replication slot.
We can do this using the
Let’s also run pgbench for some traffic. Make a database first:
Then initialize and run pgbench:
Now that we have a pgbench busy in the background, we can see that the server is generating WAL files:
Receiving WALs Using a Slot
We can receive these WAL files using the
pg_receivewal command (on versions
before 10, this is
pg_receivexlog). Let’s run it now, and ask it to use the
“slot1” that we created earlier:
We created a directory called “wals” and asked pg_receivewal to fetch and store
the WALs into this directory. Since this is the first time we’re using the slot,
pg_receivewal starts fetching from the WAL location at the time it was started –
in this case from
0/17000000. When it has fetched 17, 18 completely and is
partway through 19, here is how the “wals” directory looks:
There is a catalog view called
pg_replication_slots which gives you
information about these slots:
PostgreSQL will not remove WAL files prior to “restart_lsn”, even at the risk of filling up the disk.
Let’s stop pg_receivewal for a while now, and let pgbench do more transactions:
After the pgbench, the current WAL has crept up from 19 to 2B:
while our slot remains at 19. Let’s see what happens when we restart pg_receivewal:
The server had been saving up WAL files from 19 onwards for slot1’s sake, which
it serves up immediately on the consumer’s startup, and the consumer then
continues to wait for more. The server’s WAL directory (
data/pg_wal in v10,
data/pg_xlog in earlier versions) can be checked to verify whether the old
files were retained or deleted as intended.
WAL Archiving with Slots
What we just did was in fact WAL archiving – saving a copy of every WAL generated by the server. We did not use the traditional “archive_command” method, but the set of WAL files that we have are just as good for restoration and PITR.
Streaming Replication with Slots
Let’s try doing streaming replication using slots. First, let’s create another slot:
We added a “true” parameter to the function this time – this sets the “restart_lsn” of the slot to the current REDO LSN. If we take a “pg_basebackup” at this point, the data files will be guaranteed to contain changes up to REDO LSN. Having the slot’s “restart_lsn” set to this value will guarantee that our new standby, bootstrapped from a pg_basebackup will be able to fetch all changes until it is ready.
Let’s get the backup next. We’ll use the “-S / –slot” parameter to supply the slot name to the pg_basebackup command:
Because we created the slot earlier, changes made between the time the slot was created, and the completion of the pg_basebackup command, will be included in the newly created “data2” directory. pg_basebackup informs us that it was written WAL 32 partially, and pg_replication_slots confirms that the restart_lsn has advanced in sync:
The recovery.conf for the standby should contain the slot name (we made the pg_basebackup write this using the -R flag):
The “primary_slot_name” make the standby use that slot for it’s streaming replication. Let’s start the standby now:
We hit a checkpoint in between the backup and the start, so the WAL file has crept up to 33.
Examining Replication Status
We now have two active replication slot consumers (pg_receivewal and the standby). We can check the replication status using the pgmetrics tool:
Where to go from here
Replication in PosgreSQL represents a whole lot of features, knobs and levers accumulated into a stable codebase over more than a decade. Understandably, it takes a lot of reading and experimenting to get to any level of comfort with it.
Read more about replication and standby servers in the docs here.
We didn’t cover temporary physical replication slots, a feature added in v10. pg_basebackup can also make use of such slots by default.
We also didn’t cover logical replication slots or logical decoding.
Monitoring PostgreSQL With OpsDash
With our own product, OpsDash, you can quickly start monitoring your PostgreSQL servers, and get instant insight into key performance and health metrics including replication stats.
Here’s a default, pre-configured dashboard for PostgreSQL.
OpsDash strives to save you the tedious work of setting up a useful dashboard. The metrics you see here were carefully chosen to represent the most relevant health and performance indicators for a typical PostgreSQL instance.
OpsDash understands the streaming replication feature of PostgreSQL and displays per-slave replication status on the master:
The replication lag (as a length of time) is also visible on a slave’s dashboard. In the graph below (part of the dashboard of a slave), we can see that the slave could not catch up to the master’s changes for a while.
Sign up for a free 14-day trial of OpsDash SaaS today!