Question :
I am using SQL Server 2016, and trying to join a table onto itself, but the query is very slow, and it seems its because SQL Server is refusing to use the indexes.
If you’ll excuse the crudity of this example, I have a table like this…
TABLE [Balances] (
account_id NCHAR(20),
balance_date DATE,
balance INT,
next_balance_date DATE
)
… and I essentially want to do this query:
SELECT [t1].*, [t2].[balance] [next_balance]
FROM [Balances] [t1]
LEFT JOIN [Balances] [t2]
ON [t1].[next_balance_date] = [t2].[balance_date]
AND [t1].[account_id] = [t2].[account_id]
The problem is that no matter what index I try to put onto this table, SQL Server insists on doing a hash match. Which is slow on this large database.
And I currently have the following indexes:
NONCLUSTERED INDEX [idx_acc_date] ON [Balances] ([account_id] ASC, [balance_date] ASC)
NONCLUSTERED INDEX [idx_acc_nextdate] ON [Balances] ([account_id] ASC, [next_balance_date] ASC)
What indexes should I be creating to speed-up this query?
(As a small note on the data – all of the balance_dates are as at month end, and therefore the same goes for the next_balance_date. Also, each account_id only has one entry at each balance_date.)
Answer :
There are 2 options:
1. Create clustered index by account_id. Delete all other indexes.
2. Add new column Balance_ID INT IDENTITY(1,1)
and create clustered index on that column. Create another non-clustered index on ONLY one column account_id
.
The second option is more preferable because it is much faster. You join INT
only by 4 bytes. While your current queries join by 40 bytes. That means your current operation is 10 times more expensive.
Also, ask yourself couple of questions:
– Do you really need have Account_ID as Unicode?
– Can you convert Account_ID to INT or BIGINT?
Hopefully, you’ve got my point.
ADDITION:
- You create clustered index on only one column. Technically, you can do it on multiple, but the main goal to have it as short as possible.
- If you created clustered index on
account_id
you DO NOT NEED any other indexes on that column. - I can guess that your performance improved because before there was no Clustered index and indexes were so bad that SQL decided to do full table scan instead.
- Suggestion: read a book about Indexes, their differences and how they work.