SELECT … FOR UPDATE returns old value after waiting for lock

Posted on

Question :

It seems that if SELECT ... FOR UPDATE has to wait for a lock, and meanwhile another thread changes what would be the result, and commits, then the initial query returns the old result, i.e. from before the change.

Is this expected behavior or a bug? It certainly does not seem useful.

It is easy to demonstrate. Let’s take the following table:

CREATE TABLE `orders`(  
  `id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  `description` VARCHAR(50),
  PRIMARY KEY (`id`)
) ENGINE=INNODB;

And this sequence of statements:

SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN;

SELECT MAX(id)
FROM orders
FOR UPDATE;

# PAUSE HERE

INSERT INTO orders (description)
VALUES ('bla');

COMMIT;

Now we do the following:

  1. Open two connections.
  2. On connection 1, execute up to the pause. This succeeds.
  3. On connection 2, execute up to the pause. This gets stuck waiting for a lock.
  4. On connection 1, execute the rest. This succeeds.
  5. Connection 2 has now returned the previous id!
  6. On connection 2, execute the SELECT once more. Now it returns the updated value.

So to get behavior that seems sensible and desirable, it seems that we are forced to execute the SELECT twice.

Is this expected behavior or a bug?

(Tested on MySQL 5.7.25)

Answer :

I have figured out the explanation for the behavior, by doing some experiments. I have used connection 1 to perform some other changes rather than insert a new row. The combined results reveal how the process works.

  • If connection 1 updates the description for the id, connection 2 will see the updated description. (The index scan has halted before the locked row.)
  • If connection 1 updates the id to a smaller value, e.g. from 10 to 9, connection 2 will return the new value, 9. (The index scan has halted before the locked row, sees it disappear, and continues in a descending manner, thus arriving at 9.)
  • If connection 1 updates the id to a greater value, e.g. from 10 to 11, connection 2 will skip that value and instead return the greatest id that is smaller than what its original value was. In this example, it returns id 9 if that exists. (The index scan has halted before the locked row, sees it disappear, and continues in a descending manner, thus never passing 11, but rather arriving at 9.)
  • If connection 1 updates the id to a greater value, e.g. from 10 to 11, and then inserts a new row in place of the old one, e.g. with id 10, connection 2 will select the newly inserted row. (The index scan has halted before the locked row, and will see something inserted in its place.)
  • TODO: If connection 1 updates the id to a greater value, e.g. from 10 to 20, and then inserts a new row in place of the old one, but with a greater value than the original, e.g. 11, will connection 2 see it? Most likely not, but this should be tested.
  • If connection 1 locks a MAX(id) of 10, and connection 2 selects a hard-coded id of 9, then the locks do not overlap, and both can proceed independently.

We can conclude that the MAX(id) is determined using a reverse scan, and that the locks are on actual (index) rows. This makes sense. Our isolation level of READ COMMITTED does not acquire gap locks. REPEATABLE READ might, but it causes the transaction of connection 2 to fail entirely, so there is no exploring that scenario any deeper.

Concluding the exact workings

Connection 2 starts reverse scanning until it hits a lock on a certain row.

It will wait for that lock to be released, and continue its scan from (and including) whatever id it left off at. Not whatever row, but whatever id.

  • As a result, if there is no longer a row at that id, the scan continues beyond it, to smaller ids.
  • As a result, if the row has been updated, the updated values are visible (if selecting more than just the id).
  • As a result, if the row has been moved and a new one inserted for its id, that new row is selected.

This is expected behaviour. If you’ve started a select query, you want those results to be consistent to the start of that query (or transaction). Because you’re specifying FOR UPDATE, you’re indicating the start of a transaction. When you commit that’s the end of your transaction, which is why step 6 returns the updated value.

If another session updates a row that is involved in your select and commits it, your session uses rollback data to reconstruct the data as of the start time of your transaction.

You don’t need any complexity to simply do

INSERT ...;
SELECT LAST_INSERT_ID();

The value of LAST_INSERT_ID() is held in the connection, so it is impervious to any other actions going on.

Why do you need MAX(id)?

Case 1: You will use it in some other insert. Then the transaction is perfectly fine this way:

BEGIN;
INSERT ...;
SELECT @id := LAST_INSERT_ID();  -- and put into local variable or @variable
INSERT something else ... VALUES (..., @id, ...);
COMMIT;

Case 2: Well, explain your case.

Leave a Reply

Your email address will not be published.