Question :
Suppose I have a table:
CREATE TABLE [dbo].[TestTable]
(
[Id] INT NOT NULL PRIMARY KEY CLUSTERED IDENTITY(1,1),
[Timestamp] DATETIME2(7) NOT NULL DEFAULT (SYSUTCDATETIME()),
[Data] INT NOT NULL
);
And suppose I need to look up rows by ID
(e.g. for joins) and by Timestamp
(e.g. for date-range search).
The usual way to do this would be to add an index on Timestamp
, but this seems like a massive waste of resources (almost a complete copy of the table) when Timestamp
should be in order in the table in the first place. So two questions:
- Is there a way to enforce
Timestamp
being strictly ascending along withId
? (i.e. to make sure pathological cases like changing the server’s system clock, leap seconds,Id
‘s Identity-ness being overruled). - Is there a way for SQL Server to make use of the above enforcement and treat the table as indexed on both
Id
andTimestamp
?
I suspect the answer to (1) may be yes, but that (2) is probably a no, rendering (1) a bit useless, but I’m happy to be proven wrong.
Any version of SQL Server acceptable
Answer :
You can create a clustered index on both id and timestamp columns and even define different sorting order for each column if you want to:
create unique clustered index idx on TestTable(id asc, [timestamp] asc)
But it won’t make much sense in this case for these reasons:
- The engine does not know that timestamp increases along with id, so if you run a query with condition on timestamp, it’ll still perform a full scan.
- All additional columns that are included in clustered index will also be included in all non-clustered indexes.
- An older timestamp still could be inserted into the table with newest id.
So you are gaining nothing and only wasting space when you create a new index and slowing down inserts as timestamp order still will have to be maintained.
Creating an index only on timestamp column seems like a waste of resources (2 out of three columns of entire table will be included in an index), but actually it’s very common for size of all indexes be greater than the size of the table. Slow queries are also wasting a lot of resources, so just create it if it’ll boost select queries and if slower inserts will not be a problem.
To finally answer your question: as you can’t implement this logic using indexes and constraints, the only way I see is to create an INSTEAD OF INSERT Trigger. It’ll hurt insert performance, but prior to insert you can check if older timestamp already exists in the table and then modify the timestamp, not insert the data, raise an error – up to you.
You can’t apply two diffrent sort orders (by diffrent column) using only one index. An index is a structure which store a data on a disk in logical order (B-tree approach) based on columns from index declaration.
It is worth pointing out, that a storage isn’t usually problem nowdays and you should focus on CPU, network and engine (IO operations) efforts. And even in this case, you didn’t say that, you just said that it would be waste of resources, but it isn’t. In many cases, use of additional space for indexes gives your CPU, network and SQL Engine relief when quering data. Sometimes it is good solution for deal with concurrency or deadlock issues too.
In your case, if you want to ensure ability to identify unique row, you have to create Id
column. You shouldn’t rely on Timestamp
in this matter.
Let’s consider following approaches.
Solution 1
- Create clustered index on
Timestamp
column. - Create Primary Key as unique nonclustered index only on
Id
column (Timestamp
will be included as clustered index key).
You said about range scans with Timestamp
column, so better to have Data
content in the same index (range means lot of data which should be availabe without lookup). This requires for example clustered index on Timestamp
column (which is WHERE predicate). This will be most efficient approach to query data this way.
For other queries which use Id
column, engine will use second nonclustered index. But if Data
column is required, it will cause Key Lookup
(additional effort for CPU, engine, IO).
It’s quite good resolution if your queries mainly use Timestamp
as WHERE predicate and require Data
column. Still there is some effort when querying by Id
column.
In terms of storage, one row have 16 bytes for clustered index and 12 bytes for nonclustered index (28 bytes), so let’s think about something similar.
Solution 2
- Create Primary Key as unique clustered index only on
Id
column. Most narrow option. - Create nonclustered index on
Timestamp
column and includeData
column (Id
will be included as clustered index key).
Best way in terms of query performance would be to create two covering indexes. In this approach there will be no need to lookup when quering data by Id
or Timestamp
column, just smooth read from appropriate index (seek or scan).
Compared to Solution 1 there is additional space used (4 bytes), but we gain benefits when query data (no key lookups).
Remarks
Always create clustered index firstly, creating it as last one enforce rebouild all nonclustered indexes (replace RID with Clustered Key).
It’s also worth noting, that it will be some impact on insert/update/delete operations in both solutions presented above. But it would be acceptable looking at the structure and row size, and still benefits should compensate it.
And one more thing.
It looks like you want implement time series database in SQL Server.
There are complex solution to achive this like Azure Time Series Insights, InfluxDB database or others similar database engine. If you want to stick with SQL Server, you can look for articels about it.