Question :
Lots of PostgreSQL ALTER TABLE
commands, like adding a new column with a default value, have clever optimisations in the latest versions of PostgreSQL that allow them to execute basically instantly, even on large tables, once Postgres has briefly acquired a lock on the table.
Unfortunately, that final caveat is important. A command like this one from the linked blog post
ALTER TABLE users ADD COLUMN credits bigint NOT NULL DEFAULT 0;
still needs to wait for an exclusive lock on the users
table before it can run, even though it’ll execute instantly once the lock is acquired. Worse, while it’s waiting for that lock, it blocks all writes and reads involving the table.
Some simple steps to reproduce this (tested in Postgres 13.3):
-
In one
psql
shell, create a table, then start a transaction, do a read from the table, and don’t commit:CREATE TABLE users (id SERIAL, name TEXT); INSERT INTO users (name) VALUES ('bob'), ('fred'); START TRANSACTION; SELECT * FROM users WHERE id = 1;
-
Leave the first shell open, then open a second one and try to alter the table:
ALTER TABLE users ADD COLUMN credits bigint NOT NULL DEFAULT 0;
Observe that this query hangs, waiting for the transaction in the first shell to be committed.
-
Open a third terminal, and try to run
SELECT * FROM users WHERE id = 2;
Observe that this also hangs; it’s now blocked waiting for the
ALTER TABLE
command to complete, which is in turn blocked waiting for the first transaction to complete.
It seems that most or all ALTER TABLE
commands behave like this. Even if the operation itself is very fast or can run without holding a lock for the whole operation, ALTER TABLE
still needs to briefly acquire an exclusive lock on the table before it can start its work, and while it’s waiting for that lock, all other statements that touch the table – even reads! – are blocked.
Needless to say, this behaviour is pretty problematic if you want to make changes to a table that is occasionally involved in long-running transactions. If the ALTER TABLE
statement gets blocked by a long-running transaction that happens to be holding any kind of lock involving the table at the moment the ALTER TABLE
statement is run, all interactions with that table are blocked until the end of whatever that random long-running transaction was, and anything depending on that table probably experiences downtime.
Is there a canonical solution to this problem?
A crude solution that I’ve tried is to use a wrapper script that repeatedly attempts to run the ALTER TABLE
statement via a connection with lock_timeout
set to a small value (e.g. 5 seconds). If the ALTER TABLE
fails due to hitting the lock timeout, the transaction aborts and the script catches the error, waits for a minute or two, and tries whole process again. This avoids outright downtime, but still has performance implications, since every failed attempt to run the ALTER TABLE
statement still blocks queries for a few seconds.
What I’d really like to do is somehow tell Postgres that I want the ALTER TABLE
statement to wait for a moment when it can acquire the lock on the table without blocking other queries in the meantime. (I don’t mind if that means it waits hours until it finally reaches a moment where no other queries are touching the table; if it avoids blocking other queries, that’s absolutely an acceptable tradeoff.) Is there some way to do this – perhaps some incantation I can include in the ALTER TABLE
statement, or some configuration parameter I can set to change this behaviour?
Answer :
Unfortunately, there is no great alternative to just retrying in a loop. But you can perhaps make the retrying more clever. When I need to do this and can be in a transaction block, I take the lock explicitly, and use the NOWAIT option.
but still has performance implications, since every failed attempt to run the ALTER TABLE statement still blocks queries for a few seconds.
You can set the time out value to be lower (much lower) than a few seconds. Or you could use NOWAIT, which should be about the same thing as setting lock_timeout to its lowest possible value, but automatically resetting once that lock is acquired, (relevant in case of multi-statement transactions).
What I’d really like to do is somehow tell Postgres that I want the ALTER TABLE statement to wait for a moment when it can acquire the lock on the table without blocking other queries in the meantime.
Yeah, having some better options here would be nice. It might be contentious to figure out exactly what that would look like, though. Maybe something like MySQL’s low priority lock, which keeps itself in the wait queue but lets other waiters jump over it if the other waiter could immediately get the lock in the mode it wants.