- There are 2 nodes in the cluster: node1(primary) and node2(replica).
- I will shutdown node1 and promote node2 as primary.
- I’ll add the node1 back to the cluster (as a replica this time), so I am using the following procedure on failed node1:
systemctl stop postgresql-12 rm -rf /var/lib/pgsql/12/data/* pg_basebackup -h node2 -D /var/lib/pgsql/12/data/ -U replicator -P -v -R -X stream -C -S pgstandby1 systemctl start postgresql-12
Is there a better way? I don’t like that I’m deleting everything from /var/lib/pgsql/12/data/, and it takes some time until restoring, especially when there’s a lot of data.
What are your considerations gentleman?
That’s exactly what
pg_rewind was written for. It allows you to undo any transactions that have happened on
node2 was promoted. It can be seen as a fast version of
pg_basebackup in this case.
There is no guarantee that
pg_rewind will succeed. It depends on whether you have all the WAL since the last common checkpoint of
node2. If there is not enough, you have to resort to
You can make sure that old WAL is kept around for a while by setting
wal_keep_segments in older releases) appropriately.