Question :
I am running the execution plan for the following query:
select m_uid from EmpTaxAudit
where clientid = 91682
and empuid = 42100176452603
and newvalue in('Deleted','DB-Deleted','Added')
Here is the execution plan:
I have a non clustered index on EmpTaxAudit Table on ClientId and NewValue columns which shows above as 14.9% of the execution:
CREATE NONCLUSTERED INDEX [idx_EmpTaxAudit_clientid_newvalue] ON [dbo].
[EmpTaxAudit]
(
[ClientID] ASC,
[NewValue] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
I also have a non clustered unique index PK as follows:
ALTER TABLE [dbo].[EmpTaxAudit] ADD CONSTRAINT [PK_EmpTaxAudit] PRIMARY KEY NONCLUSTERED
(
[ClientID] ASC,
[EmpUID] ASC,
[m_uid] ASC,
[m_eff_start_date] ASC,
[ReplacedOn] ASC,
[ColumnName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
Trigger code in source table EmpTax:
CREATE trigger [dbo].[trins_EmpTax]
on [dbo].[emptax]
for insert
as
begin
declare
@intRowCount int,
@user varchar(30)
select @intRowCount = @@RowCount
IF @intRowCount > 0
begin
select @user = suser_sname()
insert EmpTaxAudit (Clientid, empuid,m_uid,m_eff_start_date, ColumnName, ReplacedOn, ReplacedBy, OldValue,dblogin,newvalue)
select Clientid, empuid,m_uid,m_eff_start_date,'taxcode', getdate(),IsNull(userid,@user), '', Left(@user,15),'Added'
from inserted i
where m_uid not in (select m_uid from EmpTaxAudit
where clientid = i.clientid and (newvalue = 'Deleted'
or newvalue = 'DB-Deleted'
or newvalue = 'Added') and empuid = i.empuid)
and i.m_eff_end_date is null
insert EmpTaxAudit (Clientid, empuid,m_uid,m_eff_start_date, ColumnName, ReplacedOn, ReplacedBy, OldValue,dblogin,newvalue)
select Clientid, empuid,m_uid,m_eff_start_date,'taxcode', getdate(),IsNull(userid,@user), '', Left(@user,15),'Deleted'
from inserted i
where m_uid not in (select m_uid from EmpTaxAudit
where clientid = i.clientid and (newvalue = 'Deleted'
or newvalue = 'DB-Deleted'
or newvalue = 'Added') and empuid = i.empuid)
and i.m_eff_end_date is not null
end
end
What can I do to avoid the high cost of Hash Match (Inner Join)?
Thanks!
Answer :
For the 1st query, an index that uses all three columns from the WHERE
clause and includes the column from the SELECT
list would be much more useful:
-- index suggestion A
(clientid, empuid, newvalue) INCLUDE (m_uid)
or an index targeted specifically for this query:
-- index suggestion B
(clientid, empuid, m_uid)
WHERE newvalue in ('Deleted', 'DB-Deleted', 'Added')
Regarding the trigger, some comments:
- The first query you show does not appear in the trigger. What appears is a join from that table to the
inserted
rows to another table (which has the trigger). - My suggestion B above seems better suited to be used by the trigger.
-
The trigger has 2 almost identical
insert
statements. Why? I think they could be combined in one – and simpler – insert and usingNOT EXISTS
instead ofNOT IN
:insert EmpTaxAudit ( Clientid, empuid, m_uid, m_eff_start_date, ColumnName, ReplacedOn, ReplacedBy, OldValue, dblogin, newvalue ) select Clientid, empuid, m_uid, m_eff_start_date, 'taxcode', getdate(), IsNull(userid,@user), '', Left(@user,15), case when m_eff_end_date is null then 'Added' else 'Deleted' end from inserted i where not exists ( select 1 from EmpTaxAudit where m_uid = i.m_uid and clientid = i.clientid and empuid = i.empuid and newvalue in ('Deleted', 'DB-Deleted', 'Added') ) ;
Reason for Hash Match (inner join) :
“a Hash Match join occurs when SQL Server joins two tables by hashing
the rows from the smaller of the two tables to be joined, and then
inserting them into a hash table, then processing the larger table one
row at a time against the smaller hashed table, looking for matches
where rows need to be joined. Because the smaller of the tables
provides the values in the hash table, the table size is kept at a
minimum, and because hashed values instead of real values are used,
comparisons can be made very quickly. As long as the table that is
hashed is relatively small, this can be a quick process. On the other
hand, if both tables are very large, a Hash Match join can be very
inefficient as compared to other types of joins.”
So clearly your smaller table return 16 rows and larger one return 1,41,000 rows.Here smaller table is hash which is quite good, now it will join to 1,41,000 rows so cost has to be more and execution time has to be slow.
you have to minimize number of rows return in large table.you are getting index seek in both case so index is being utilize.
I think you only need to rewrite trigger query.
CREATE trigger [dbo].[trins_EmpTax]
on [dbo].[emptax]
for insert
as
begin
if @@rowcount = 0
return
set nocount on
declare
@intRowCount int,
@user varchar(30)
--select @intRowCount = @@RowCount
if exists(select * from inserted)
begin
select @user = suser_sname()
insert EmpTaxAudit (Clientid, empuid,m_uid,m_eff_start_date, ColumnName, ReplacedOn, ReplacedBy, OldValue,dblogin,newvalue)
select Clientid, empuid,m_uid,m_eff_start_date,'taxcode', getdate(),IsNull(userid,@user), '', Left(@user,15)
,case when i.m_eff_end_date is null then 'Added' else 'Deleted' END
from inserted i
where not exists (select m_uid from EmpTaxAudit ETA
where i.m_uid=ETA.m_uid and empuid = i.empuid and clientid = i.clientid
and newvalue in('Deleted','DB-Deleted' , 'Added')
)
--insert EmpTaxAudit (Clientid, empuid,m_uid,m_eff_start_date, ColumnName, ReplacedOn, ReplacedBy, OldValue,dblogin,newvalue)
--select Clientid, empuid,m_uid,m_eff_start_date,'taxcode', getdate(),IsNull(userid,@user), '', Left(@user,15),'Deleted'
-- from inserted i
-- where m_uid not in (select m_uid from EmpTaxAudit
-- where clientid = i.clientid and (newvalue = 'Deleted'
-- or newvalue = 'DB-Deleted'
-- or newvalue = 'Added') and empuid = i.empuid)
-- and i.m_eff_end_date is not null
end
end
Notice that you can correct condition inside NOT EXISTS
clause which is minor,rest it will work perfect and perform better.
Also care to explain that what is clustered index on this table and why so many composite primary key.?
First critique: I don’t think a PRIMARY KEY
should consist of so many fields. A primary key is just meant to be a simple identifier of uniqueness. Arguably you could have two fields in your PK for many-to-many intersection tables, though I personally always have a single-field primary key, even for these. So I’d say your PK PK_EmpTaxAudit
really should be redefined as a unique index.
Then I would suggest adding newvalue
to the INCLUDE
fields of that index:
create unique index IX_EmpTaxAudit(
[ClientID] ASC,
[EmpUID] ASC,
[m_uid] ASC,
[m_eff_start_date] ASC,
[ReplacedOn] ASC,
[ColumnName] ASC
)
include (newvalue)
I suspect that will give you a neater execution plan.
First of all you should have a clustered index (with FILLFACTOR<=95) on the tables to avoid high IO. Because HEAP cause to more fragmentation. So, you can replace PK Non Clustered Index with Clustered.
Apart from that to avoid HASH_MATCH you should have all the columns being used in SELECT statement and WHERE clause should be within a index. So either you replace the non clustered index idx_EmpTaxAudit_clientid_newvalue with bellow one ore create new non clustered index.
CREATE NONCLUSTERED INDEX [idx_EmpTaxAudit_clientid_newvalue] ON [dbo].
[EmpTaxAudit]
(
[ClientID] ASC,
[NewValue] ASC,
[newvalue]
)
INCLUDE (m_uid)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
Reason of HASH_MATCH
SQL Server Optimizer prefer to use index instead of HEAP. So, in your case requested data and columns in where condition lies on two different index. That’s why SQL Server Optimizer used both indexes and performed HASH_MATCH.
Thanks