Question :
I have a SQL Server 2008 R2 table whose schema structure looks as follows:
CREATE TABLE [dbo].[CDSIM_BE]
(
[ID] [bigint] NOT NULL,
[EquipmentID] [varchar](50) NOT NULL,
[SerialNumber] [varchar](50) NULL,
[PyrID] [varchar](50) NULL,
[MeasMode] [varchar](50) NULL,
[ReadTime] [datetime] NOT NULL,
[SubID] [varchar](15) NULL,
[ProbePosition] [float] NULL,
[DataPoint] [int] NULL,
CONSTRAINT [PK_CDSIM_BE]
PRIMARY KEY CLUSTERED ([ID] ASC, [EquipmentID] ASC, [ReadTime] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [MonthlyArchiveScheme9]([ReadTime])
) ON [MonthlyArchiveScheme9]([ReadTime])
CREATE NONCLUSTERED INDEX [idx_CDSIM_BE__SubID_ProbePosition]
ON [dbo].[CDSIM_BE] ([SubID] ASC, [ProbePosition] ASC)
INCLUDE ([EquipmentID], [ReadTime], [BECorr])
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [MonthlyArchiveScheme9]([ReadTime])
CREATE NONCLUSTERED INDEX [IX_CDSIM_BE_ProbePosition]
ON [dbo].[CDSIM_BE] ([ProbePosition] ASC)
INCLUDE ([SerialNumber], [SubID])
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [MonthlyArchiveScheme9]([ReadTime])
CREATE NONCLUSTERED INDEX [IX_CSDIM_Readtime]
ON [dbo].[CDSIM_BE]([ReadTime] ASC)
INCLUDE ([EquipmentID])
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [MonthlyArchiveScheme9]([ReadTime])
And I’m executing this simple query:
Select Max(Id)
From dbo.CDSIM_BE
There are ~2.5B rows in the table.
The query plan shows an index scan being done on the IX_CdSIM_BE_ProbePosition
index. I’m wondering why SQL Server simply wouldn’t use the clustered (and primary) index and immediately go to the last row in the table and retrieve the Id value, since that must be the maximum.
Answer :
The clustered index is partitioned on ReadTime
so it couldn’t use the PK as you describe. It would need to find the Max(Id)
for each partition and then find the max of those. It is possible to rewrite the query to get such a plan however.
Using an example based on the article here a possible rewrite might be
SELECT MAX(ID) AS ID
FROM sys.partitions AS P
CROSS APPLY (SELECT MAX(ID) AS ID
FROM [dbo].[CDSIM_BE]
WHERE $PARTITION.MonthlyArchiveFunction9(ReadTime)
= P.partition_number) AS A
WHERE P.object_id = OBJECT_ID('dbo.CDSIM_BE')
AND P.index_id <= 1;
To process each partition in turn.
Note the plan still has a scan (with a seek predicate to select the partition) but this is not a full scan of the partition.
The scan is in index order with direction “BACKWARD”. The TOP
iterator can stop requesting rows from the scan after the first one is received.