In my replication servers on a postgresql master-slave replication scheme on aws RDS I get the following error:
SQLSTATE: Serialization failure: 7 ERROR: canceling statement due to conflict with recovery
The cause, as far as I understood is that the replication happens simirary like a database migration. A sequence of queries is written something called WAL and then are being executed in a FIFO sequence.
Also, I understood that once you execute queries whilst wal is being executed it can cause conflicts because sometimes the query that is currently is being executed can cause fetching stale data.
So according to documentation there are delays that allow first the current queries to be executed and then apply wal changes. Theese are:
But setting these values as -1 on heavy queries (query execution time >30s) will cause the replicas to have stale data for extended period of times?
Yes, that is the idea. In the case of a replication conflict PostgreSQL has only two options:
cancel the query
delay the application of replicated changes.
max_standby_streaming_delay to -1 will delay replication indefinitely long.
There are ways to reduce replication conflicts:
hot_standby_feedback = onto remove replication conflicts caused by
VACUUM. The price you are paying is that long running queries on the standby can bloat your tables.
Don’t have any statements like
ALTER TABLEand similar in your workload that cause
In PostgreSQL v12, you can set
ALTER TABLE atable SET (vacuum_truncate = off);
to disable autovacuum truncation (which also causes a short
ACCESS EXCLUSIVElock). In older versions, there is only the crude and undocumented workaround of setting
old_snapshot_thresholdto something else than the default value.
This is all complicated and may have undesired effects, so the best advice is this: If you want a standby that does not lag more than necessary, but you also want to run long running queries on a standby server, you should use two standby servers, one for each of these conflicting purposes.