Question :
The quote below from Wikipedia (here) says that READ COMMITTED
doesn’t cause serialization failure! Can you please explain the reason? Which isolation levels cause serialization failure?
Under multiversion concurrency control, at the SERIALIZABLE isolation
level, both SELECT queries see a snapshot of the database taken at the
start of Transaction 1. Therefore, they return the same data. However,
if Transaction 2 then attempted to UPDATE that row as well, a
serialization failure would occur and Transaction 1 would be forced to
roll back.At the READ COMMITTED isolation level, each query sees a snapshot of
the database taken at the start of each query. Therefore, they each
see different data for the updated row. No serialization failure is
possible in this mode (because no promise of serializability is made),
and Transaction 1 will not have to be retried.
Answer :
It’s impossible to give a brief answer to this.
I suggest that you start reading here:
-
Think twice before dropping ACID and throw your CAP away
follow the links, particularly to jepsen.io -
What developers don’t know about transactions will hurt them!
-
Martin Kleppmann’s take on isolation levels (quoted in the 2nd link – and his explanation)
-
Recommended further reading: Kleppmann’s book (which I’ve purchased) is highly regarded! Of 1525 reviews, 94% are 4 (8%) or 5 (86%) star rated – that’s the best I’ve ever seen!
If you can read and grasp all of that material (and the links within – the book is the cherry on the cake but not strictly required just to answer this question), you’ll be well on the way to understanding transactions and isolation levels!
Edit:
In response to the OP’s comments – by serialization failure
, what I believe the author of the wiki (which is here – please quote references in future if possible) means is that in the first case, there will be a failure of T1 due to the isolation level being set to SERIALIZABLE
– whereas, in the second case, a level of READ COMMITTED
will mean that the T1 will not fail.
The important think to realise about SERIALIZABLE
transactions is that (from here):
a transaction schedule is serializable if its outcome (e.g., the
resulting database state) is equal to the outcome of its transactions
executed serially, i.e. without overlapping in time.
So, T1 will fail because its entire view of the data can’t be consistent over the entire time (i.e. for all of the queries) of the transaction due to the modification by T2.
I have found (and this confused me for a while until I teased it through) is that there is a BIG DIFFERENCE between a query AND a transaction – a query may or may not be a proper subset of a transaction. On most modern RDBMS’s, if you don’t explicitly specify a BEGIN TRANSACTION
or an isolation level
, the default is that:
-
a) a transaction is started for you automatically beneath the hood and,
-
b) that transaction is read-committed – which normally doesn’t matter much since it’s one query.
The distinction really becomes important when you start running multi-query transactions – that’s when the wheat is sorted from the chaff and the developer has to understand what’s going on in the server re. transaction isolations levels
There is, as with all things in IT and databases in particular there are balances, trade-offs and compromises. Why, you might ask, is SERIALIZABLE
not the default and/or not always specified? The answer is simple: Performance – there’s your trade-off!
I would urge you now to read the references I’ve provided – do some experiments with two CLI windows on your server of choice (or multiple servers if you really want to get into it!) running some of the code referenced!