Question :
How do we improve this Update query Performance as currently it is taking 1 Minute to update 17643 Rows.
Query ->
Declare @date datetime = '09-Dec-2013'
update #tmp_sp_abc
set test = rtrim(xyz_test)
from #tmp_sp_abc t1,
t_h_bg_pc_it t2
where (t2.id_i = t1.i or t2.id_s1 = t1.s)
and t1.r_type = 1
and t2.[date] = @date
Tables Row Count: –
#tmp_sp_abc -> 125352
t_h_bg_pc_it -> 14798 Rows
t_h_bg_pc_it
table has 300 columns with primary key on id_i column
and #tmp_sp_abc
has 11 columns with no primary key and no indexes.
found that “OR” condition is the root cause of this much time consumption but, can’t change it.
tried to add indexes on: –
Table: - t_h_bg_pc_it
Columns: - [xyz_test], [id_i], [id_s1], [date]
Table: - #tmp_sp_abc
Columns: - [i], [s], [r_type] include [test]
but, by doing this saved only 5 seconds.
Attaching the Execution Plan Snaps (Without above indexes and with indexes).
Please advice.
Answer :
The best option if you really want to squeeze out the most is to parallelise the query.
Instead of a single UPDATE, run multiple copies. Each of them operate on a subset of the data in the temporary table. Keep adding more copies and more subsets until you run out of cores of CPU (whatever happens first).
If you don’t have the ability to parallelise. Run two UPDATES, one that works on the first part of the OR, another on the second. If possible, add indexes on t_h_bg_pc_it
to support both queries. Like this:
Declare @date datetime = '09-Dec-2013'
update #tmp_sp_abc
set test = rtrim(xyz_test)
from #tmp_sp_abc t1
join t_h_bg_pc_it t2
on t2.id_i = t1.i
where t1.r_type = 1
and t2.[date] = @date
update #tmp_sp_abc
set test = rtrim(xyz_test)
from #tmp_sp_abc t1
join t_h_bg_pc_it t2
on tt2.id_s1 = t1.s
where t1.r_type = 1
and t2.[date] = @date
You can now index t_h_bg_pc_it
on both s, date, xyz_test
and id_id, date, xyz_test
(two indexes). This should speed things up. It may also be useful to index #tmp_sp_abc
on r_type
. If possible, do both the parallelism and the indexing/rewrite.
Finally: Fire whoever it is that comes up with naming conventions for your tables. Life is too short for this type of cryptic naming, SQL Server does auto-completion.
Another answer because it actually is another approach. Have you just tried removing the or in the JOIN?
where (t2.id_i = t1.i or t2.id_s1 = t1.s)
and t1.r_type = 1
and t2.[date] = @date
You can:
Reformulate it without aan OR in the top line
WHERE
(t2.id_i = t1.i and t1.r_type = 1 and t2.[date] = @date)
OR
(t2.id_s1 = t1.s and t1.r_type = 1 and t2.[date] = @date)
and give that a try.
Alternatively you could try a double join – t2, t3 instead of t2 – and split the conditions there and see what happens.
Both have a chance to allow the query optimizer to work out different paths.
Theoreitcally they should be identically, but I am not sure how much they optimize joins for 2 ORed conditions.
That said, also try to formulate that in the more moden ANSI syntax (JOIN ON).