Question :
We have a table,
CREATE TABLE [dbo].[MyTable](
[MasterKey] [uniqueidentifier] NOT NULL,
[DetailKey] [varchar](100) NULL,
[JSON] [nvarchar](max) NOT NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [ix_MyTable_details] ON [dbo].[MyTable]
(
[MasterKey] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
And we have below query which is causing too many deadlocks/blocks.
IF EXISTS(SELECT 1 from [MyTable](nolock) where [MasterKey]= @MasterKey AND [DetailKey] = @DetailKey)
BEGIN
UPDATE [MyTable]
SET [JSON] = @JSON
WHERE [MasterKey]= @MasterKey AND [DetailKey] = @DetailKey
END
ELSE
BEGIN
INSERT INTO [MyTable]
([MasterKey]
,[DetailKey]
,[JSON])
VALUES
(@MasterKey
,@DetailKey
,@JSON)
END
Note different user uses MasterKey key at the same time but in peek time we see application becomes unresponsive. When we see the logs we found a lot of block/deadlocks. We also see Sch-S locks in this table.
Will adding MasterKey and DetailKey as primary key can fix this issue or what can we do?
Note sometimes JSON size is huge because it includes images as base64.
Answer :
In general, you should always change this approach:
IF EXISTS ( /* scan the table with a where clause */ )
UPDATE ( /* scan the table AGAIN with a where clause */ )
ELSE
INSERT
To this:
UPDATE ( /* scan the table JUST ONCE with a where clause */ )
IF @@ROWCOUNT = 0
INSERT
Forget you ever learned the first approach. It’s like going to the grocery store to check if they have any eggs, then going back home to get your wallet, and back to the grocery store again to buy your eggs.
Also, yes, a clustered index can help in many scenarios. Is MasterKey
really a key (e.g. is it unique)? Or is the combination of MasterKey
+ DetailKey
the candidate key? In the latter case, an index (clustered or not) with both columns as key columns will help reduce any residual searching the query has to do to locate a specific row after it finds the rows with that MasterKey
value (which prolongs blocking and can lead to deadlocks depending on other factors).
Actual use case. First, change the indexes:
DROP INDEX [ix_MyTable_details] ON [dbo].[MyTable];
CREATE UNIQUE CLUSTERED INDEX cix_MyTable_details
ON dbo.MyTable(MasterKey, DetailKey);
Then change the query to:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
UPDATE dbo.[MyTable]
SET [JSON] = @JSON
WHERE MasterKey = @MasterKey AND DetailKey = @DetailKey;
IF @@ROWCOUNT = 0
BEGIN
INSERT dbo.[MyTable]
([MasterKey]
,[DetailKey]
,[JSON])
VALUES
(@MasterKey
,@DetailKey
,@JSON);
END
COMMIT TRANSACTION;
Will adding MasterKey and DetailKey as primary key can fix this issue or what can we do?
Almost. The current deadlocks are probably being caused by excessive scanning due to not having a proper key on the table. In addition you should lock the target row/range with the first query to ensure you have the locks necessary to perform the insert.
IE the way to avoid deadlocks is to lock more, and earlier. Deadlocks only occur when two sessions first acquire compatible locks, and then later try to acquire incompatible locks. If you make them acquire the incompatible locks up front, the deadlock disappears.
EG
begin transaction
IF EXISTS(SELECT 1 from [MyTable](updlock,holdlock) where [MasterKey]= @MasterKey AND [DetailKey] = @DetailKey)
BEGIN
UPDATE [MyTable]
SET [JSON] = @JSON
WHERE [MasterKey]= @MasterKey AND [DetailKey] = @DetailKey
END
ELSE
BEGIN
INSERT INTO [MyTable]
([MasterKey]
,[DetailKey]
,[JSON])
VALUES
(@MasterKey
,@DetailKey
,@JSON)
END
commit transaction
Forcing JSON into a TEXT or BLOB column can often cause blocking, especially the bigger images simply due to them being large objects and all the issues associated with that. The most common are concurrency, run away disk space, and transaction log growth.
Depending on your isolation level on that table, it has to wait for it to be written before it can be read. If you can pull up the prior written value while the row is updated you can use RCSI or a different isolation level which will help with reads. Otherwise break up the app a bit so you can do more concurrency. This can be done by moving the offending data to a different schema, going with a NoSQL engine for your JSON and img files, using FILESTREAM for img files (Though that has some other considerations), and anything else to isolate this.
I’ve had developers create zip files as a binary file in a blob column to massive redundant JSONs. They caused blocking, and in several cases isolation levels had to be looked at to fix it.
I really like Aaaron’s answer and he’s gone through the details, from my experience we had to handle very large JSON or BLOBs like img or zip files stored in a column through isolation levels for numerous reasons, including the keys cant be changed to the sizes were still too large when optimized.