Why is Index causing bad estimation and execution plan

Posted on

Question :

SELECT ISNULL(SUM(CASE WHEN T4.f_in =12 THEN T4.f_money END),0) - 
       ISNULL(SUM(CASE WHEN T4.f_out=12 THEN T4.f_money END),0) 
 FROM [Credit] M WITH(NOLOCK) 
INNER JOIN [Transaction] T4 WITH(NOLOCK) ON M.f_accounts=T4.f_accounts
 AND (T4.f_time>=M.f_time OR M.f_time IS NULL)
 AND T4.f_other IS NULL 
 AND (T4.f_in=12 OR T4.f_out=12)

After adding an index, one of the queries that originally finishes within 1s now takes around 12s. It seems that the index is now seeked by the query with bad estimation of only 1 row to be read. No other indexes are present in both Credit and Transaction tables.

CREATE NONCLUSTERED INDEX IX_Credit_f_time 
    ON Credit (f_time) INCLUDE (f_accounts,f_credit)

Above is the index added to the table.

enter image description here

Plan without index, takes 1s
enter image description here

Plan with the index, takes 12s. The highlighted parts are number of rows read, estimated number of rows and estimated number of rows to be read respectively.

Statistics are updated with fullscan but results are still the same.

enter image description hereenter image description here
From my understanding the f_time column is pretty concentrated, hash joining the table once would be much more efficient than seeking multiple times.

Full Execution Plan: https://www.brentozar.com/pastetheplan/?id=rJjwvXBA_

Backup Execution Plan: https://pastebin.com/XLwAGXSf

Answer :

I believe this index is confusing SQL Server. The join conditions include

T4.f_time>=M.f_time

This type of join predicate is generally hard for SQL Server to reason about (greater or smaller, with columns on both sides). The index makes things worse, because it’s first key column is f_time and other columns are only included. So for every row from Transaction table it seeks all rows where T4.f_time>=M.f_time and only then applies other predicates. And here it goes wrong, because optimizer guesses that for every seek there will be one row read. Where actually it’s 57237 on average (89290306 rows read divided by 1560 executions). This is easily visible in the execution plan:

enter image description here

In my opinion the index should be dropped and following one created (as suggested by others in comments):

CREATE NONCLUSTERED INDEX IX_Credit
ON Credit
(f_accounts,f_time)

That way a seek can be done on both of these columns and it should be much more selective. Do note the order of the columns is important here.

As a general note I try to put columns where equality predicate is used first and columns with inequality as last in indexes.

Leave a Reply

Your email address will not be published. Required fields are marked *