Proper table design for sparse primary key

Posted on

Question :

In my system I have temporary entities that are created based on rules stored in my database, and the entities are not persisted.

Now, I need is to store information about these entities, and because they are created based on rules and are not stored, they have no ID.

I came up with a formula to generate an ID for these temp entities based on the rule that was used to generate them: id = rule id + "-" + entity index in the rule.
This formula generates unique strings of the form 164-3, 123-0, 432-2, etc...

My question is how should I build my table (regarding primary key and clustered index) when my keys have no relation or order?
Keep in mind that I will only (99.9% of the time) query the table using the id mentioned above.

Options I thought about after much reading, but don’t have the knowledge to determine which is better:

1) primary key on a varchar column with clustered index.
-According to various sources, this would be bad because of fragmentation and the wideness of the key. Also their format is pretty weird for sorting.

2) primary key on varchar column without clustered index (heap table).
-Also a bad idea according to various sources due to indexing and fragmentation issues.

3) identity int column with clustered index, and a varchar column as primary key with unique index.
-Can’t really see the benefit of the surogate key here since it would mainly help with range queries and ordering and I would never query the table based on this key because it would be unknown at all times.

4) 2 columns composite key: rule id + rule index columns.
Now I don’t have strings but I have two columns that will be copied to FKs and non clustered indexes. Also I’m not sure what indexes I would use in this case.

Can anybody shine a light here? Any help is appreciated.

–Edit

Here is what I think I’ll be using after reading the comments, and the rationale:

clustered identity int Id as PK
non-clustered unique index on (rule id, index)

Rationale:
1 – I will often query the table by rule id only, which is why it is the leftmost index column;
2 – The Id column can be used for subsequent operations after the first query by rule id (the id will be stored client side);
3 – I will never query by index only;
4 – The unique index guarantees insert consistency, though it still slower than a single int;

Answer :

I’m thinking clustered pk (rule_id, index)

Do that. This isn’t really a hard case. (rule_id, index) uniquely identifies a row, so that should be your clustered PK, unless there’s some compelling reason to use a different design. and because

I will often query the table by rule id only

rule_id should be the leading column in the index. This supports efficient lookup by rule_id and ensures locality for inserting multiple rows for the same rule_id.

Design in mind for your database. It does not matter if the database side makes sense to the app side. This theory follows the Star Schema design and includes concepts such as Durable Keys. However, you can certainly mix and match as you need. 🙂

[VERSION: 1.1]

  • May be Community Edited
  • Restructured format for easier understanding
  • provided examples for design

Rules:

  • Heaps will not be feared, but mastered by design.
  • Natural keys are to be shunned. For the database keys need not make any sense to the app/user side.
  • Any primary/unique keys based on a GUID or string are to be considered short-term at best.
    1. They will only result in wasted resources and wasting time reinventing what could have been solved beforehand.
    2. They furthermore are likely to be sneaky, despicable natural keys which weaken security and defeat the purpose of relational databases.
  • You must have defined your current scope of concurrent and proportional collections of Rules and Indexes within reason of a viable, scalable design.

The State of Design:
Towards this aim, we know we have numerous entity_index entities that appear and disappear. We assume in this design they have some relation to Rule_ID and may even have a categorical comparison that can be exploited now or later. Whether or not they are technically in “numeric” order is irrelevant since this is the database.

Furthermore, we know conceptually of three concepts: Rules, Entity_Indexes, and the mapping mechanism that makes the design more durable to change. ESPECIALLY TO THE OPTIMIZER. It cannot be stated enough this design should server the database first.

Example of Concept:

CREATE TABLE Rules_DIM
(
    Rule_ID         INT
  , Rule_Name       VARCHAR(255)
  , <Details>
)
GO
/*
EntityIndex Key is not related to GUID that identifies this key
It helps identifiy like RuleID's and their EntityIndexes
It can and SHOULD be limited in number to an INT
  Since this is the database key
  Unless you have determined Rule_ID can be mapped to more than
  5 billion Entity_Indexes CONCURRENTLY
*/
CREATE TABLE EntityIndex_DIM
(
    EntityIndex_ID  INT -- database_key.
    EntityIndex_GUID    VARCHAR(<some_length>)
     /*Can be an actual GUID or what maps to the APP*/
  , <Details>
)
/*Our Mapping Table*/
CREATE TABLE Rule_Mapping
(
    RuleID
  , EntityIndex_ID
  , Create_Date
  , <some other Mapping column that is not related to the EntityIndex_DIM table
)

Leave a Reply

Your email address will not be published. Required fields are marked *