Question :
- There are 2 nodes in the cluster: node1(primary) and node2(replica).
- I will shutdown node1 and promote node2 as primary.
- I’ll add the node1 back to the cluster (as a replica this time), so I am using the following procedure on failed node1:
systemctl stop postgresql-12
rm -rf /var/lib/pgsql/12/data/*
pg_basebackup -h node2 -D /var/lib/pgsql/12/data/ -U replicator -P -v -R -X stream -C -S pgstandby1
systemctl start postgresql-12
Is there a better way? I don’t like that I’m deleting everything from /var/lib/pgsql/12/data/, and it takes some time until restoring, especially when there’s a lot of data.
What are your considerations gentleman?
Answer :
That’s exactly what pg_rewind
was written for. It allows you to undo any transactions that have happened on node1
after node2
was promoted. It can be seen as a fast version of pg_basebackup
in this case.
There is no guarantee that pg_rewind
will succeed. It depends on whether you have all the WAL since the last common checkpoint of node1
and node2
. If there is not enough, you have to resort to pg_basebackup
.
You can make sure that old WAL is kept around for a while by setting wal_keep_size
(wal_keep_segments
in older releases) appropriately.