Which indexes for joining a table onto itself on two columns

Posted on

Question :

I am using SQL Server 2016, and trying to join a table onto itself, but the query is very slow, and it seems its because SQL Server is refusing to use the indexes.

If you’ll excuse the crudity of this example, I have a table like this…

TABLE [Balances] (
    account_id NCHAR(20),
    balance_date DATE,
    balance INT,
    next_balance_date DATE
)

… and I essentially want to do this query:

SELECT [t1].*, [t2].[balance] [next_balance]
FROM [Balances] [t1]
LEFT JOIN [Balances] [t2]
ON [t1].[next_balance_date] = [t2].[balance_date]
AND [t1].[account_id] = [t2].[account_id]

The problem is that no matter what index I try to put onto this table, SQL Server insists on doing a hash match. Which is slow on this large database.

And I currently have the following indexes:

NONCLUSTERED INDEX [idx_acc_date] ON [Balances]​ ([account_id] ASC, [balance_date] ASC)
NONCLUSTERED INDEX [idx_acc_nextdate] ON [Balances]​ ([account_id] ASC, [next_balance_date] ASC)​

What indexes should I be creating to speed-up this query?

(As a small note on the data – all of the balance_dates are as at month end, and therefore the same goes for the next_balance_date. Also, each account_id only has one entry at each balance_date.)

Answer :

There are 2 options:
1. Create clustered index by account_id. Delete all other indexes.
2. Add new column Balance_ID INT IDENTITY(1,1) and create clustered index on that column. Create another non-clustered index on ONLY one column account_id.

The second option is more preferable because it is much faster. You join INT only by 4 bytes. While your current queries join by 40 bytes. That means your current operation is 10 times more expensive.

Also, ask yourself couple of questions:
– Do you really need have Account_ID as Unicode?
– Can you convert Account_ID to INT or BIGINT?
Hopefully, you’ve got my point.

ADDITION:

  1. You create clustered index on only one column. Technically, you can do it on multiple, but the main goal to have it as short as possible.
  2. If you created clustered index on account_id you DO NOT NEED any other indexes on that column.
  3. I can guess that your performance improved because before there was no Clustered index and indexes were so bad that SQL decided to do full table scan instead.
  4. Suggestion: read a book about Indexes, their differences and how they work.

Leave a Reply

Your email address will not be published. Required fields are marked *