Given some table with a primary key, e.g.:
CREATE TABLE Customers ( CustomerID int NOT NULL PRIMARY KEY, FirstName nvarchar(50), LastName nvarchar(50), Address nvarchar(200), Email nvarchar(260) --... )
we have a unique primary key on
Traditionally i might then need some additional covering indexes; for example to quickly find a user by either
CREATE INDEX IX_Customers_CustomerIDEmail ON Customers ( CustomerID, Email )
And these are the kinds of indexes i’ve created for decades.
It’s not required to be unique, but it actually is
The index itself exists to avoid a table scan; it is a covering index in order to aid performance (the index is not there as a constraint to enforce uniqueness).
Today i remembered a tid-bit of information – SQL Server can use the fact that:
- a column has a foreign key constraint
- a column has a unique index
- a constraint is trusted
in order to help it optimize its query execution. In fact, from SQL Server Index Design Guide:
If the data is unique and you want uniqueness enforced, creating a unique index instead of a nonunique index on the same combination of columns provides additional information for the query optimizer that can produce more efficient execution plans. Creating a unique index (preferably by creating a UNIQUE constraint) is recommended in this case.
Given that my multi-column index contains the primary key, this composite index will de facto be unique. It’s not a constraint that i particularly need SQL Server to enforce during every insert or update; but the fact is that this non-clustered index is unique.
Is there any advantage in marking this de facto unique index as actually unique?
CREATE UNIQUE INDEX IX_Customers_CustomerIDEmail ON Customers ( CustomerID, Email )
It seems to me that SQL Server could be smart enough to realize that my index already is unique by virtue of the fact that it contains the primary key.
- But perhaps it doesn’t know this, and there’s an advantage for the optimizer if i declare the index as unique anyway.
- Except perhaps that might now lead to slowdowns during inserts and updates, where it must perform uniqueness checks – where before it never had to before.
- Unless it knows the index is guaranteed to already be unique, because it contains the primary key.
I cannot find any guidance from Microsoft about what to do when a composite index contains the primary key.
The benefits of unique indexes include the following:
- Data integrity of the defined columns is ensured.
- Additional information helpful to the query optimizer is provided.
Should i mark a composite index as unique if it already contains the primary key? Or can SQL Server figure out this for itself?
Should I mark a composite index as unique if it already contains the primary key?
Probably not. The optimizer can generally use information about the uniqueness of the contained key column anyway, so there’s no real advantage.
There is also an important consequence of marking an index unique on update plans that modify keys of that index to consider:
CREATE TABLE dbo.Customers ( CustomerID int NOT NULL PRIMARY KEY, FirstName nvarchar(50), LastName nvarchar(50), [Address] nvarchar(200), Email nvarchar(260) ); CREATE NONCLUSTERED INDEX IX_Customers_CustomerIDEmail ON dbo.Customers ( CustomerID, Email ); -- Pretend we have some rows UPDATE STATISTICS dbo.Customers WITH ROWCOUNT = 100000, PAGECOUNT = 20000;
Per-index update plan (non-unique index)
UPDATE dbo.Customers SET Email = N'New', [Address] = 'New Address' WHERE Email = N'Old' OPTION (QUERYTRACEON 8790); -- Per-index update plan
The optimizer often makes a cost-based decision between updating nonclustered indexes per-row (a ‘narrow’ plan) or per-index (a ‘wide’ plan). The default strategy (except for in-memory OLTP tables) is a wide plan.
Narrow plans (where nonclustered indexes are maintained at the same time as the heap/clustered index) are a performance optimization for small updates. This optimization is not implemented for all cases – using certain features (like indexed views) means that the associated index(es) will be maintained in a wide plan.
More information: Optimizing T-SQL Queries that Change Data
In this case, I have used undocumented trace flag 8790 to force a wide update plan: The plan therefore shows the clustered and nonclustered indexes being maintained separately.
The Split turns each update into a separate delete & insert pair; the Filter filters out any rows that would not result in a change to the index.
More information: (Non-updating updates) by the SQL Server QO Team.
Per-index update plan (unique index)
-- Same index, but unique CREATE UNIQUE INDEX IX_Customers_CustomerIDEmail ON Customers ( CustomerID, Email ) WITH (DROP_EXISTING = ON); UPDATE dbo.Customers SET Email = N'New', [Address] = 'New Address' WHERE Email = N'Old' OPTION (QUERYTRACEON 8790); -- Per-index update plan
Notice the extra Sort and Collapse operators when the index is marked unique.
This Split-Sort-Collapse pattern is required when updating the keys of a unique index, to prevent intermediate unique key violations.
More information: Maintaining Unique Indexes by Craig Freedman
The Sort in particular can be a problem. Not only is it an unnecessary extra cost, it may spill to disk if estimates are inaccurate.
About nonclustered keys
Another factor to consider is that nonclustered index structures are always unique, at every level of the index, even if
UNIQUE is not specified. The clustering key(s) – and possibly a uniquifier if the clustered index is not marked unique – are added to a non-unique nonclustered index at all levels.
As a consequence, the following index definiton:
CREATE INDEX IX_Customers_CustomerIDEmail ON Customers ( Email ) WITH (DROP_EXISTING = ON);
…actually contains the keys (Email, CustomerID) at all levels. It is therefore ‘seekable’ on both columns:
SELECT * FROM dbo.Customers AS C WITH (INDEX(IX_Customers_CustomerIDEmail)) WHERE C.Email = N'Email' AND C.CustomerID = 1;
More information: More About Nonclustered Index Keys by Kalen Delaney
SQL knows it’s unique already (if it includes the PK, it can’t get any more unique), regardless of whether you explicitly tell it.
The big difference between a non-unique index and a unique index is that non-unique indexes require the clustered index key (with uniquifier value if the CIX is not declared as unique) at the higher levels of the index, not just at the leaf level.
In your case, you already have the CIX in the key, which means it’s going to be at every level of the index already.
But you could create a table which has a separate PK (unique) and CIX (doesn’t matter). Then create a non-unique index which includes the PK in its key. Put some rows into the table, including some easily-findable varchar values for your CIX column. Put enough rows in to cause multiple levels of your indexes. Then you can use DBCC IND to find the pages in your NCIX, and DBCC PAGE to open some up to look at the data, to see if the CIX key values are at the higher levels.