Two updates cause a deadlock

Posted on

Question :

I have a mysql queue with the following sheme:

CREATE TABLE IF NOT EXISTS `media` (
  `mid` int(11) NOT NULL AUTO_INCREMENT,
  `order_type` int(11) NOT NULL,
  `media_id` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
  `package` int(11) NOT NULL,
  `sent` int(11) NOT NULL,
  `timestamp` datetime NOT NULL,
  `t_check` int(11) NOT NULL,
  `performed_by` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
  `last_run` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `flag` int(11) NOT NULL,
  `package_sent_diff` int(11) AS (package-sent) PERSISTENT,
  PRIMARY KEY (`mid`),
  KEY `last_run` (`last_run`),
  KEY `performed_by` (`performed_by`),
  KEY `sent` (`sent`),
  KEY `package` (`package`),
  KEY `sent_package` (`sent`,`package`),
  KEY `package_sent_diff` (`package_sent_diff`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=238343961;

package is the amount of work and sent is how often a task has been performed. if sent < package a job is open for workers to take it. package_sent_diff is used to index the amount that has to be done for query performance. performed_by is used to assign a job to a worker.

MariaDB version 10.0.20, the table has over 2 million rows.

Auto commit is on

Several workers run based on this queue.

To simply outline their queries work:

A worker performs an update to claim 10 unused jobs

UPDATE media 
SET performed_by = '71602155f18ac6001eb', 
  last_run = NOW() 
WHERE flag = 0 
  AND t_check < 75 
  AND sent < 20 
  AND last_run < NOW() 
  AND performed_by IS NULL 
  AND sent < 4377 
  AND package_sent_diff > 0 
LIMIT 10;

The worker then selects all the info from the table

SELECT * FROM media WHERE performed_by = '71602155f18ac6001eb'

Afterwards it performs the job and increments the sent value of the job by one and frees the row for later execution

UPDATE media SET sent = sent + 1, performed_by = NULL WHERE mid = 238323961

At this point I often see a deadlock occur on the package_sent_diff key

The simplified code:

$unique = uniqid();
$mysqli->query("UPDATE media SET performed_by = '".$unique."', last_run = NOW() WHERE flag = 0 AND t_check < 75 AND sent < 20 AND last_run < NOW() AND performed_by IS NULL AND sent < 4377 AND package_sent_diff > 0 LIMIT 10");
$query = $mysqli->query("SELECT * FROM media WHERE performed_by = '".$unique."'");

while($job = $query->fetc_assoc()) {
    if(doJob($job)) {
        $mysqli->query("UPDATE media SET sent = sent + 1, performed_by = NULL WHERE mid = ".$job['mid']);
    }
}

Here’s the result from SHOW ENGINE INNODB STATUS

------------------------
LATEST DETECTED DEADLOCK
------------------------
2015-09-10 14:09:22 7f7f27fb6700
*** (1) TRANSACTION:
TRANSACTION 2544765863, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1184, 3 row lock(s), undo log entries 1
MySQL thread id 725683, OS thread handle 0x7f7f50fb6700, query id 989929718 localhost media_queue updating
UPDATE media SET sent = sent + 1, performed_by = NULL WHERE mid = 238323961
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2565 page no 59448 n bits 768 index `package_sent_diff` of table `media_queue`.`media` trx table locks 1 total table locks 13  trx id 2544765863 lock_mode X locks gap before rec insert intention waiting lock hold time 0 wait time before grant 0 
*** (2) TRANSACTION:
TRANSACTION 2544765389, ACTIVE 1 sec fetching rows, thread declared inside InnoDB 2350
mysql tables in use 1, locked 1
3029 lock struct(s), heap size 357928, 25327 row lock(s)
MySQL thread id 725748, OS thread handle 0x7f7f27fb6700, query id 989929439 localhost media_queue updating
UPDATE media SET performed_by = '71602155f18ac6001eb', last_run = NOW() WHERE flag = 0 AND t_check < 75 AND sent < 20 AND last_run < NOW() AND performed_by IS NULL AND sent < 4377 AND package_sent_diff > 0 LIMIT 10
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 2565 page no 59448 n bits 768 index `package_sent_diff` of table `media_queue`.`media` trx table locks 1 total table locks 13  trx id 2544765389 lock_mode X lock hold time 0 wait time before grant 0 
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2565 page no 55655 n bits 1192 index `package_sent_diff` of table `media_queue`.`media` trx table locks 1 total table locks 13  trx id 2544765389 lock_mode X waiting lock hold time 0 wait time before grant 0 
*** WE ROLL BACK TRANSACTION (1)

It seems the query to claim jobs and the query to update the sent column by one conflict each other causing a deadlock.

The more workers are running the more often I see a deadlock

I often read an application has to expect deadlocks and should simply re-issue queries that cause a deadlock but this didn’t bring any results for me.

No matter how often I repeated the UPDATE media SET sent = sent + 1, performed_by = NULL WHERE mid = 238323961 query, 3 times or 20 times, it always resulted in a deadlock if a deadlock was already encountered for that query.

How could I change things to get rid of that deadlock? It’s important that the queries stay fast so I can run 100+ workers on that table without making it slow-down too much.

//EDIT:

After adding the composite index

ALTER TABLE  `media_queue`.`media` ADD INDEX  `performed_by_package_sent_diff` (  `performed_by` ,  `package_sent_diff` ) COMMENT  '';

and modifying the second query to

UPDATE media SET sent = sent + 3, performed_by = NULL WHERE mid = 243674295 AND performed_by = '92817855f2e21978b76'

I’m seeing 10 times less deadlocks, but deadlocks still do occur.

Here’s the SHOW ENGINE INNODB STATUS output:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2015-09-11 15:48:29 7f7f487ff700
*** (1) TRANSACTION:
TRANSACTION 2585082648, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1184, 3 row lock(s), undo log entries 1
MySQL thread id 748336, OS thread handle 0x7f7f255b7700, query id 1018467234 localhost media_queue updating
UPDATE media SET sent = sent + 1, performed_by = NULL WHERE mid = 244045072 AND performed_by = '11514555f2f24e094e7'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2587 page no 61522 n bits 784 index `performed_by_package_sent_diff` of table `media_queue`.`media` trx table locks 1 total table locks 41  trx id 2585082648 lock_mode X locks gap before rec insert intention waiting lock hold time 0 wait time before grant 0 
*** (2) TRANSACTION:
TRANSACTION 2585081924, ACTIVE 2 sec fetching rows, thread declared inside InnoDB 1852
mysql tables in use 1, locked 1
4559 lock struct(s), heap size 521768, 66353 row lock(s)
MySQL thread id 748490, OS thread handle 0x7f7f487ff700, query id 1018466800 localhost media_queue Searching rows for update
UPDATE media SET performed_by = '35359855f2f4a5a7eaf', last_run = NOW() WHERE flag = 0 AND t_check < 75 AND sent < 20 AND last_run < NOW() AND performed_by IS NULL AND sent < 4370 AND package_sent_diff > 0 LIMIT 10
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 2587 page no 61522 n bits 784 index `performed_by_package_sent_diff` of table `media_queue`.`media` trx table locks 1 total table locks 41  trx id 2585081924 lock_mode X lock hold time 1 wait time before grant 0 
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 2587 page no 73537 n bits 520 index `performed_by_package_sent_diff` of table `media_queue`.`media` trx table locks 1 total table locks 41  trx id 2585081924 lock_mode X waiting lock hold time 0 wait time before grant 0 
*** WE ROLL BACK TRANSACTION (1)

Also in 90% of the cases I can force the query through if the query gets repeated after a deadlock occurs.

Is this as far as optimization can go or is it possible to completely get rid of the deadlocks?

Answer :

Adding ORDER BY mid might prevent deadlocks.

But my real answer is “live with it”. That is, recognize that you got a deadlock and simply rerun the UPDATE again. You were slowed down a little, but otherwise no harm done.

Unrelated:

KEY `sent` (`sent`),
KEY `sent_package` (`sent`,`package`),

The first of those is redundant and can be DROPped.

The answer should have been revise the update where you select read the primary keys related to the updates where clause and then individually update every row using its primary key.

I work in an environment of around 100 mariadb databases, approximately 32-40 cpu each, 384gbram, and each node has 6tb to 33tb mariadb databases.
Literally doing 300,000 to 500,000’s transactions per second depending on volumes.

The only way many services can play in a SUPER HIGH VOLUME environment is for 100% all Updates and Deletes SIMPLY MUST BE BY PRIMARY KEY meaning you only take individual row level locks. That is the finest grain set of locking available on todays mysql databases.

Leave a Reply

Your email address will not be published.