This is pretty simple, but I’m baffled by what PG does (v9.0).
We start with a simple table:
CREATE TABLE test (id INT PRIMARY KEY);
and a few rows:
INSERT INTO TEST VALUES (1); INSERT INTO TEST VALUES (2);
Using my favorite JDBC query tool (ExecuteQuery), I connect two session windows to the db where this table lives. Both of them are transactional (ie, auto-commit=false). Let’s call them S1 and S2.
The same bit of code for each:
1:DELETE FROM test WHERE id=1; 2:INSERT INTO test VALUES (1); 3:COMMIT;
Now, run this in slow motion, executing one at a time in the windows.
S1-1 runs (1 row deleted) S2-1 runs (but is blocked since S1 has a write lock) S1-2 runs (1 row inserted) S1-3 runs, releasing the write lock S2-1 runs, now that it can get the lock. But reports 0 rows deleted. HUH??? S2-2 runs, reports a unique key constraint violation
Now, this works fine in SQLServer. When S2 does the delete, it reports 1 row deleted. And then S2’s insert works fine.
I suspect that PostgreSQL is locking the index in the table where that row exists, whereas SQLServer locks the actual key value.
Am I right? Can this be made to work?
Mat and Erwin are both right, and I’m only adding another answer to further expand on what they said in a way which won’t fit in a comment. Since their answers don’t seem to satisfy everyone, and there was a suggestion that PostgreSQL developers should be consulted, and I am one, I will elaborate.
The important point here is that under the SQL standard, within a transaction running at the
READ COMMITTED transaction isolation level, the restriction is that the work of uncommitted transactions must not be visible. When the work of committed transactions becomes visible is implementation-dependent. What you are pointing out is a difference in how two products have chosen to implement that. Neither implementation is violating the requirements of the standard.
Here’s what happens within PostgreSQL, in detail:
S1-1 runs (1 row deleted)
The old row is left in place, because S1 might still roll back, but S1 now holds a lock on the row so that any other session attempting to modify the row will wait to see whether S1 commits or rolls back. Any reads of the table can still see the old row, unless they attempt to lock it with
SELECT FOR UPDATE or
SELECT FOR SHARE.
S2-1 runs (but is blocked since S1 has a write lock)
S2 now has to wait to see the outcome of S1. If S1 were to roll back rather than commit, S2 would delete the row. Note that if S1 inserted a new version before rolling back, the new version would never have been there from the perspective of any other transaction, nor would the old version have been deleted from the perspective of any other transaction.
S1-2 runs (1 row inserted)
This row is independent of the old one. If there had been an update of the row with id = 1, the old and new versions would be related, and S2 could delete the updated version of the row when it became unblocked. That a new row happens to have the same values as some row that existed in the past doesn’t make it the same as an updated version of that row.
S1-3 runs, releasing the write lock
So S1’s changes are persisted. One row is gone. One row has been added.
S2-1 runs, now that it can get the lock. But reports 0 rows deleted. HUH???
What happens internally, is that there is a pointer from one version of a row to the next version of that same row if it is updated. If the row is deleted, there is no next version. When a
READ COMMITTED transaction awakens from a block on a write conflict, it follows that update chain to the end; if the row has not been deleted and if it still meets the selection criteria of the query it will be processed. This row has been deleted, so S2’s query moves on.
S2 may or may not get to the new row during its scan of the table. If it does, it will see that the new row was created after S2’s
DELETE statement started, and so is not part of the set of rows visible to it.
If PostgreSQL were to restart S2’s entire DELETE statement from the beginning with a new snapshot, it would behave the same as SQL Server. The PostgreSQL community has not chosen to do that for performance reasons. In this simple case you would never notice the difference in performance, but if you were ten million rows into a
DELETE when you got blocked, you certainly would. There’s trade-off here where PostgreSQL has chosen performance, since the faster version still complies with the requirements of the standard.
S2-2 runs, reports a unique key constraint violation
Of course, the row already exists. This is the least surprising part of the picture.
While there is some surprising behavior here, everything is in conformance with the SQL standard and within bounds of what is “implementation-specific” according to the standard. It certainly can be surprising if you are assuming that some other implementation’s behavior will be present in all implementations, but PostgreSQL tries very hard to avoid serialization failures in the
READ COMMITTED isolation level, and allows some behaviors that differ from other products in order to achieve that.
Now, personally I’m not a big fan of the
READ COMMITTED transaction isolation level in any product’s implementation. They all allow race conditions to create surprising behaviors from a transactional point of view. Once someone becomes accustomed to the weird behaviors allowed by one product, they tend to consider that “normal” and the trade-offs chosen by another product odd. But every product has to make some sort of trade-off for any mode not actually implemented as
SERIALIZABLE. Where PostgreSQL developers have chosen to draw the line in
READ COMMITTED is to minimizing blocking (reads don’t block writes and writes don’t block reads) and to minimize the chance of serialization failures.
The standard requires that
SERIALIZABLE transactions be the default, but most products don’t do that because it causes a performance hit over the more lax transaction isolation levels. Some products don’t even provide truly serializable transactions when
SERIALIZABLE is chosen — most notably Oracle and versions of PostgreSQL prior to 9.1. But using truly
SERIALIZABLE transactions is the only way to avoid surprising effects from race conditions, and
SERIALIZABLE transactions always must either block to avoid the race conditions or roll back some transactions to avoid a developing race condition. The most common implementation of
SERIALIZABLE transactions is Strict Two-Phase Locking (S2PL) which has both blocking and serialization failures (in the form of deadlocks).
Full disclosure: I worked with Dan Ports of MIT to add truly serializable transactions to PostgreSQL version 9.1 using a new technique called Serializable Snapshot Isolation.
I believe this is by design, according to the description of the read-committed isolation level for PostgreSQL 9.2:
UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the command start time1. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits, the second updater will ignore the row if the first updater deleted it2, otherwise it will attempt to apply its operation to the updated version of the row.
The row you insert in
S1 didn’t exist yet when
DELETE started. So it won’t be seen by the delete in
S2 as per (1) above. The one that
S1 deleted is ignored by
DELETE according to (2).
S2, the delete does nothing. When the insert comes along though, that one does see
Because Read Committed mode starts each command with a new snapshot that includes all transactions committed up to that instant, subsequent commands in the same transaction will see the effects of the committed concurrent transaction in any case. The point at issue above is whether or not a single command sees an absolutely consistent view of the database.
So the attempted insert by
S2 fails with the constraint violation.
Continuing reading that doc, using repeatable read or even serializable would not solve your problem completely – the second session would fail with a serialization error on the delete.
This would allow you to retry the transaction though.
I completely agree with @Mat’s excellent answer. I only write another answer, because it wouldn’t fit into a comment.
In reply to your comment: The
DELETE in S2 is already hooked on a particular row version. Since this is killed by S1 in the meantime, S2 considers itself successful. Though not obvious from a quick glance, the series of events virtually is like this:
S1 DELETE successful S2 DELETE (successful by proxy - DELETE from S1) S1 re-INSERTs deleted value virtually in the meantime S2 INSERT fails with unique key constraint violation
It’s all by design. You really need to use
SERIALIZABLE transactions for your requirements and make sure you retry on serialization failure.
Use a DEFERRABLE primary key and try again.
We also faced this issue. Our solution is adding
select ... for update before
delete from ... where. The isolation level must be Read Committed.