Indexing – Uniqueidentifier Foreign Key or Intermediary mapping table?

Posted on

Question :

Looking for some expert views on this guys – I’m not a DBA by trade so would appreciate any advice. Comments on the rest of the schema aren’t required as it is purely fabricated to illustrate my question.

Basically I’m working on a new feature for an existing system and am powerless to change the existing schema. So let’s say that there currently is a Users table which has a primary key UserID that is of type uniqueidentifier. I want to create a new table called Alerts – related to the Users table in a one-to-many relationship (one User can have many Alerts). Alerts will be regularly queried by this foreign key. I will expect to index this field for performance.

My question is – from what I understand about Indexing strategies, indexing a uniqueidentifier column is (delicately put) not a good idea – so would it be a better idea to have an intermediary table to map these uniqueidentifier keys into a better indexing candidate like an integer?

Users

  • UserID uniqueidentifier {PK}
  • Username varchar(50)

Alerts

  • AlertID int {PK}
  • UserID uniqueidentifier {FK}
  • Message varchar(250)
  • Timestamp datetime

OR

Users

  • UserID uniqueidentifier {PK}
  • Username varchar(50)

UserMap

  • UserKey int identity(1,1) {PK}
  • UserID uniqueidentifier {FK}

Alerts

  • AlertID int identity(1,1) {PK}
  • UserKey int {FK}
  • Message varchar(250)
  • Timestamp datetime

Answer :

Ok, I am making a lot of assumptions (INT instead of VARCHAR(50) being one of them) with this answer, so feel free to correct me if needed. The problem with option B is that it introduces a new join to relate Users to Alerts without any real added benefit. If joining on the UserID, it is best to index the UserID, so you can utilize seeks for your joins.

For Option A, UserID will be the clustering key (index key for the clustered index) on the Users table. UserID will be a nonclustered index key on Alerts table. This will cost 16 bytes per Alert.

For Option B, UserID will be the clustering key on the Users table. UserId will probably be the clustering key in UserMap too, to make joining more efficient. UserKey (assuming this is an INT) would then be a nonclustered index key on the Alerts table. This will cost 4 bytes per Alert. And 20 bytes per UserMap.

Looking at the big picture, one relationship, for Option A, costs 16 bytes of storage, and involves 1 join operation. Whereas, one relationship, for Option B, costs 24 bytes of storage, and involves 2 join operations.

Furthermore, there are a possibility of 340,282,366,920,938,000,000,000,000,000,000,000,000 uniqueidentifiers and only 4,294,967,296 INTs. Implementing a uniqueidentifier to INT map for a this type of relationship could cause unexpected results when you start reusing INTs.

The only reason for creating this type map table, is if you plan on creating a Many to Many relationship between Users and Alerts.

Taking all of this into consideration, I would recommend Option A.

I hope this helps,

Matt

Assuming your intended option B is to have UserKey in Alerts and for UserKey to be an INT rather than VARCHAR(50)… you might save some space and improve performance marginally if you expected there to be a very large number of alerts per user. But it’s going to be minimal so I’d probably stick with option A.

The inefficiencies of GUIDs are due to their size (16 byte) compared to an INT (4 byte) and the fragmentation they inevitably lead too, unless sequential. This is magnified where a GUID is used as a clustered index key (as is the case for your User table?) because all non-clustered indexes contain the clustered index key. Kimberley Tripps series of articles on the topic are good grounding for understanding this.

Option B

  • Requires 20 bytes per user in UserMap
  • Saves 12 bytes per user in Alerts

Option A

  • Costs 12 extra bytes per alert in Alerts, compared to an INT UserKey in Option B

Caveat: Also assuming that the typical usage pattern would be to find alerts for a specified user, not results for a group of users.

Could you simply add an identity column to the Users table (not making it the primary key or clustering key) and put an unique non-clustered index on it, then you can join to it from the Alerts table (and put a foreign key in place)?

Leave a Reply

Your email address will not be published. Required fields are marked *