Hash Match inner join in simple query with in statement

Posted on

Question :

I am running the execution plan for the following query:

select m_uid from EmpTaxAudit 
where clientid = 91682 
and  empuid = 42100176452603
and newvalue in('Deleted','DB-Deleted','Added')

Here is the execution plan:

Execution Plan

I have a non clustered index on EmpTaxAudit Table on ClientId and NewValue columns which shows above as 14.9% of the execution:

CREATE NONCLUSTERED INDEX [idx_EmpTaxAudit_clientid_newvalue] ON [dbo].

[EmpTaxAudit]
(
    [ClientID] ASC,
    [NewValue] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

I also have a non clustered unique index PK as follows:

ALTER TABLE [dbo].[EmpTaxAudit] ADD  CONSTRAINT [PK_EmpTaxAudit] PRIMARY KEY NONCLUSTERED 
(
    [ClientID] ASC,
    [EmpUID] ASC,
    [m_uid] ASC,
    [m_eff_start_date] ASC,
    [ReplacedOn] ASC,
    [ColumnName] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

Trigger code in source table EmpTax:

CREATE                trigger [dbo].[trins_EmpTax]
on [dbo].[emptax]
for insert
as
begin
  declare 
    @intRowCount  int,
      @user          varchar(30)

  select @intRowCount = @@RowCount
  IF @intRowCount > 0 
  begin
     select @user = suser_sname()

     insert EmpTaxAudit (Clientid, empuid,m_uid,m_eff_start_date, ColumnName, ReplacedOn, ReplacedBy, OldValue,dblogin,newvalue)
     select Clientid, empuid,m_uid,m_eff_start_date,'taxcode', getdate(),IsNull(userid,@user), '', Left(@user,15),'Added'
        from inserted i
         where m_uid not in (select m_uid from EmpTaxAudit 
                    where clientid = i.clientid  and (newvalue = 'Deleted'
            or newvalue = 'DB-Deleted'
                        or newvalue = 'Added') and  empuid = i.empuid)
       and i.m_eff_end_date is null 

     insert EmpTaxAudit (Clientid, empuid,m_uid,m_eff_start_date, ColumnName, ReplacedOn, ReplacedBy, OldValue,dblogin,newvalue)
     select Clientid, empuid,m_uid,m_eff_start_date,'taxcode', getdate(),IsNull(userid,@user), '', Left(@user,15),'Deleted'
        from inserted i
         where m_uid not in (select m_uid from EmpTaxAudit 
                    where clientid = i.clientid  and (newvalue = 'Deleted'
            or newvalue = 'DB-Deleted'
                        or newvalue = 'Added') and  empuid = i.empuid)
       and i.m_eff_end_date is not null 


  end
end

What can I do to avoid the high cost of Hash Match (Inner Join)?

Thanks!

Answer :

For the 1st query, an index that uses all three columns from the WHERE clause and includes the column from the SELECT list would be much more useful:

-- index suggestion A
(clientid, empuid, newvalue) INCLUDE (m_uid)

or an index targeted specifically for this query:

-- index suggestion B
(clientid, empuid, m_uid)
WHERE newvalue in ('Deleted', 'DB-Deleted', 'Added')

Regarding the trigger, some comments:

  • The first query you show does not appear in the trigger. What appears is a join from that table to the inserted rows to another table (which has the trigger).
  • My suggestion B above seems better suited to be used by the trigger.
  • The trigger has 2 almost identical insert statements. Why? I think they could be combined in one – and simpler – insert and using NOT EXISTS instead of NOT IN:

    insert EmpTaxAudit 
      ( Clientid, empuid, m_uid, m_eff_start_date, ColumnName, 
        ReplacedOn, ReplacedBy, OldValue, dblogin, 
        newvalue
      )
    select 
        Clientid, empuid, m_uid, m_eff_start_date, 'taxcode',
        getdate(), IsNull(userid,@user), '', Left(@user,15),
        case when m_eff_end_date is null 
            then 'Added' else 'Deleted'
        end
    from inserted i
    where not exists 
          ( select 1 
            from EmpTaxAudit 
            where m_uid = i.m_uid 
              and clientid = i.clientid  
              and empuid = i.empuid
              and newvalue in ('Deleted', 'DB-Deleted', 'Added')
          ) ;
    

Reason for Hash Match (inner join) :

“a Hash Match join occurs when SQL Server joins two tables by hashing
the rows from the smaller of the two tables to be joined, and then
inserting them into a hash table, then processing the larger table one
row at a time against the smaller hashed table, looking for matches
where rows need to be joined. Because the smaller of the tables
provides the values in the hash table, the table size is kept at a
minimum, and because hashed values instead of real values are used,
comparisons can be made very quickly. As long as the table that is
hashed is relatively small, this can be a quick process. On the other
hand, if both tables are very large, a Hash Match join can be very
inefficient as compared to other types of joins.”

So clearly your smaller table return 16 rows and larger one return 1,41,000 rows.Here smaller table is hash which is quite good, now it will join to 1,41,000 rows so cost has to be more and execution time has to be slow.

you have to minimize number of rows return in large table.you are getting index seek in both case so index is being utilize.

I think you only need to rewrite trigger query.

CREATE  trigger [dbo].[trins_EmpTax]
on [dbo].[emptax]
for insert
as
begin

if @@rowcount = 0 
        return
    set nocount on

  declare 
    @intRowCount  int,
      @user          varchar(30)

  --select @intRowCount = @@RowCount
  if exists(select * from inserted)
  begin
     select @user = suser_sname()

     insert EmpTaxAudit (Clientid, empuid,m_uid,m_eff_start_date, ColumnName, ReplacedOn, ReplacedBy, OldValue,dblogin,newvalue)
     select Clientid, empuid,m_uid,m_eff_start_date,'taxcode', getdate(),IsNull(userid,@user), '', Left(@user,15)
     ,case when i.m_eff_end_date is null then 'Added' else 'Deleted' END
        from inserted i
         where not exists (select m_uid from EmpTaxAudit ETA
                    where i.m_uid=ETA.m_uid and  empuid = i.empuid and clientid = i.clientid  
                    and newvalue in('Deleted','DB-Deleted' , 'Added') 
                    )


     --insert EmpTaxAudit (Clientid, empuid,m_uid,m_eff_start_date, ColumnName, ReplacedOn, ReplacedBy, OldValue,dblogin,newvalue)
     --select Clientid, empuid,m_uid,m_eff_start_date,'taxcode', getdate(),IsNull(userid,@user), '', Left(@user,15),'Deleted'
     --   from inserted i
     --    where m_uid not in (select m_uid from EmpTaxAudit 
     --               where clientid = i.clientid  and (newvalue = 'Deleted'
     --       or newvalue = 'DB-Deleted'
     --                   or newvalue = 'Added') and  empuid = i.empuid)
     --  and i.m_eff_end_date is not null 


  end
end

Notice that you can correct condition inside NOT EXISTS clause which is minor,rest it will work perfect and perform better.

Also care to explain that what is clustered index on this table and why so many composite primary key.?

First critique: I don’t think a PRIMARY KEY should consist of so many fields. A primary key is just meant to be a simple identifier of uniqueness. Arguably you could have two fields in your PK for many-to-many intersection tables, though I personally always have a single-field primary key, even for these. So I’d say your PK PK_EmpTaxAudit really should be redefined as a unique index.

Then I would suggest adding newvalue to the INCLUDE fields of that index:

create unique index IX_EmpTaxAudit(
  [ClientID] ASC,
  [EmpUID] ASC,
  [m_uid] ASC,
  [m_eff_start_date] ASC,
  [ReplacedOn] ASC,
  [ColumnName] ASC
)
include (newvalue)

I suspect that will give you a neater execution plan.

First of all you should have a clustered index (with FILLFACTOR<=95) on the tables to avoid high IO. Because HEAP cause to more fragmentation. So, you can replace PK Non Clustered Index with Clustered.

Apart from that to avoid HASH_MATCH you should have all the columns being used in SELECT statement and WHERE clause should be within a index. So either you replace the non clustered index idx_EmpTaxAudit_clientid_newvalue with bellow one ore create new non clustered index.

    CREATE NONCLUSTERED INDEX [idx_EmpTaxAudit_clientid_newvalue] ON [dbo].

    [EmpTaxAudit]
    (
        [ClientID] ASC,
        [NewValue] ASC,
        [newvalue]
    )
    INCLUDE (m_uid)
    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

Reason of HASH_MATCH

SQL Server Optimizer prefer to use index instead of HEAP. So, in your case requested data and columns in where condition lies on two different index. That’s why SQL Server Optimizer used both indexes and performed HASH_MATCH.

Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *