Question :
I would like to tune this query – it is reading more rows than I would like. I’ve added an index as suggested by the Database Tuning Advisor.
The query goes like this:
SET STATISTICS TIME, IO ON;
SELECT TOP 1000 project,
Max(timestamp) timestamp
FROM the_table
WHERE deviceid = 581
GROUP BY project
The IO , Time stats are like that:
Table 'the_table'. Scan count 1, logical reads 2373, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row affected)
SQL Server Execution Times:
CPU time = 390 ms, elapsed time = 829 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
The execution plan can be found here.
The dta_index this query seeks on consists of projectnumber
and timestamp
. The table has roughly 4,500,000 rows, consuming approximately 350MB and each deviceid
has roughly the same amount of records (i.e. uniform distribution and each device has roughly 700K records).
My problem: even though I am requesting only the first 1,000 rows from the table, no matter what index I have created or how I have tried to re-write the query, I always ended up with reading +/- 700K rows (i.e. all the rows for the selected device in WHERE
clause). I would be grateful if you could drop your tips about how to tune this query.
I don’t think there is a way of narrowing this query faster than touching the approx 700K rows for particular deviceid
with the first operator of exec. plan.
Here’s a minimal reproducible example, including the table definition and insertion of two sample rows:
CREATE TABLE [dbo].[the_table]
(
[primary_key_col] [int] IDENTITY(1,1) NOT NULL,
[Project] [int] NULL,
[DeviceID] [int] NULL,
[TimeStamp] [datetime] NULL,
[Quantity] [int] NOT NULL,
PRIMARY KEY CLUSTERED
(
[primary_key_col] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY];
SET IDENTITY_INSERT [dbo].[the_table] ON;
INSERT [dbo].[the_table] ([primary_key_col], [Project], [DeviceID], [TimeStamp], [Quantity]) VALUES (4825107, 1232, 587, CAST(N'2021-10-23T05:48:48.000' AS DateTime), 1);
INSERT [dbo].[the_table] ([primary_key_col], [Project], [DeviceID], [TimeStamp], [Quantity]) VALUES (4825108, 6458, 581, CAST(N'2021-10-23T05:49:17.000' AS DateTime), 1);
primary_key_col | Project | DeviceID | TimeStamp | Quantity |
---|---|---|---|---|
4825107 | 1232 | 587 | 2021-10-23 05:48:48.000 | 1 |
4825108 | 6458 | 581 | 2021-10-23 05:49:17.000 | 1 |
The indexes aren’t part of the question deliberately, no matter which index did I try, I always have hit 700K rows for particular deviceID.
Thanks.
Answer :
“My problem: even though I am reading only first 1000 rows from the table”
You are not. You are reading the first 1000 results (without any ordering) of the grouped query – top 1000
comes last in the processing order.
If you want to take a “random” 1000 rows that match the deviceid
predicate then you could use a subquery:
select project, max(timestamp) timestamp
from (select top 1000
project, timestamp
from the_table
where deviceid = 581
) sq
group by project
Sorry, but from my point of view you should re-think your query logic, because it gets all rows matching the deviceid
and take top 1000 from that matching rows, not TOP 1000 from the table. You can see it also in your execution plan: on the right hand side is a table seek for deviceid
and TOP is the last operator on the left side. That’s why you get this amount of rows.