Question :
A while ago, I spent a nightmarish number of weeks dealing with “deadlock detected” and trying to figure out how to handle it. I ended up finally handling it in such a manner that my code is able to detect when it happens and then retry the same query indefinitely with 50000 microseconds in between each retry, until it works.
Maybe that’s bad practice, but so far (months), it has not caused any problems except for logging the “deadlock detected” so-called “errors”.
Is it “OK” for me to now suppress the “deadlock detected” errors by marking them as “unimportant” and thus not showing up to me, even if they are still logged into my error log table?
Please don’t tell me to “avoid them in the first place”. This is simply not possible. They will apparently happen if you do concurrent (multiple processes/script instances) working on the same table/thing. I spent forever trying to “code them away”, but it just doesn’t seem possible.
Obviously, since I’m asking this instead of just adding the ignore rule and being done with it, I do care about the answers/responses. Still, I don’t think I can be convinced at this point that they can be avoided completely. I’m not saying that I get thousands of them logged every hour or anything, but a few each day, seemingly always in the beginning when I do have lots of concurrent processes working on the same table/query.
Answer :
A deadlock is a kind of serialization error: you didn’t do anything forbidden, it just so happened that there was an interaction with other active transactions that prevented your transaction from being completed. Your reaction is the correct one: retry the transaction. There is absolutely no need to wait before retrying.
I concur that with a sufficiently complicated workload of bigger transactions it can be nigh impossible to rule out deadlocks completely. As long as they happen only rarely, they are not a real problem if as you handle them correctly.
Deadlocks start becoming a problem if they happen too frequently: it means that you have to redo much work, which is bad for performance and puts extra load on your database. Also the one second wait before the deadlock is resolved means that locks are held for a long time (a second is long), which is not great for concurrency.
Even if you cannot completely get rid of deadlocks, you can take measures to reduce them:
-
see that your transactions are short
-
try to reduce the number of data modifications per transaction
Both will reduce the likelihood of running into a deadlock.
It is safe to ignore deadlock errors in your log file, but then you should monitor pg_stat_database.deadlocks
and take action of the deadlock count per hour increases beyond an acceptable number.
You seen to take objection to calling a deadlock an error. By definition, an error is a condition that aborts the execution of an SQL statement. So a deadlock is clearly an error.