Question :
Good day, I have the following sql server database table:
Please note the compound primary key. This was done for 3 reasons:
- Prevent duplicate entries
- Improve query performance as all queries will have all of thoes 3 keys.
- We needed and index, and I did not want to introduce a random ID.
Please also note that this table was designed with its size in mind, This table is going to store millions and millions of rows of data.
OK now for my actual question. I am using azure sql server to host this db. and I have enabled automatic tuning. And strangely enough I see that it then went and created a new index. (see below)
Now In my mind this seems to be a duplicate index, as the same columns are being indexed.
So I now have two indexes on my table:
Original (My PK):
ALTER TABLE [dbo].[SensorDataRaw] ADD CONSTRAINT [PK_SensorDataRaw] PRIMARY KEY CLUSTERED
(
[DateTime] ASC,
[SensorId] ASC,
[Key] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
Newly Added (Auto Created by azure tuning):
CREATE NONCLUSTERED INDEX [nci_wi_SensorDataRaw_DC9789077DA75B4440AC8BFE3E2AA198] ON [dbo].[SensorDataRaw]
(
[Key] ASC,
[SensorId] ASC,
[DateTime] ASC
)
INCLUDE ( [Value]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
Observations:
- The order of the coloumns has been reversed in the new index.
- The new index is NOT unique
- the new index includes the value column.
Please note my knowledge on indexes is not advanced, hence me asking this.
So my question is:
- Can someone explain why the newly added index is better than my intially created one.
- How can I remove the two indexes and just create one that covers both cases. With this being such a massive database I cannot afford the space that both these indexes will take up.
- Is the maybe a better design alternative?
Additional Info:
I’m assuming the type of queries becomes important here, So I have listed some examples.
All queries include DateTime
, SensorId
, and Key
.
Simple Queries:
Select SensorId Where average value for key w is greater than x where time between (y,z)
Graphing Data:
SELECT AVG([Value]) AS 'AvgValue',
DATEADD( MINUTE,
(DATEDIFF(MINUTE, '1990-01-01T00:00:00', [dbo].[SensorDataRaw].
[DateTime]) / @IntervalInMinutes) * @IntervalInMinutes,
'1990-01-01T00:00:00'
) AS 'TimeGroup'
FROM [dbo].[SensorDataRaw]
where
[dbo].[SensorDataRaw].[SensorId] = @SensorId
and [dbo].[SensorDataRaw].[Key] = @KeyValue
and [dbo].[SensorDataRaw].[DateTime] Between @DateFrom and @DateTo
and [dbo].[SensorDataRaw].[Value] IS NOT NULL
GROUP BY (DATEDIFF(MINUTE, '1990-01-01T00:00:00', [dbo].[SensorDataRaw].
[DateTime]) / @IntervalInMinutes)
Answer :
The index suggested by the system is a much better fit for the query you have shown. You should aim to have columns with equality predicates as the leading columns.
Consider a phone book ordered by lastname, firstname
. If your rquirement is to find all people with surnames between “Brown” and “Yates” and a first name of “John” then you need to read most of the phone book. If the phonebook was instead ordered by firstname, lastname
you could easily find the “John” section and the first “Brown” in the section then all you need to do is read all the names until the lastname
is after “Yates” or a new firstname is encountered.
It might not be the ideal index. Potentially you should just change the key columns in the clustered index to this order rather than creating a new one though. You need to evaluate this based on knowledge of your workload.
Can someone explain why the newly added index is better than my
intially created one
Your index is clustered
one, this means it includes all the data as the leaf level.
The index offered by server is narrower respect to clustered as it contains only key fields
+ included field
on the leaf level.
Your query needs only key fields
+ value
field so non clustered index offered you by server is sufficient, and it is chosen by server as the narrowest possible covering index.
How can I remove the two indexes and just create one that covers both
cases.
You can remove non clustered index, your clustered is of course covering one, and if it has only 2 additional fields: datetime2(2) + decimal(19,4) respect to non clustered there will no huge overhead when using it instead of non clustered.
The suggested index is narrower, that will make it more efficient when the query only deals with columns covered by that index.
The column order is also different which MAY make it more efficient for that particular query (I can’t tell for sure without seeing an execution plan). Column order DOES make a difference.
Which is better, the non-clustered index may be better for that particular query but wont cover everything. Besides, the clustered index can’t be removed and I don’t see any problem with keeping both.
Try experimenting with the two, force the clustered index and check the execution plan and IO stats, then force the non-clustered and compare.