Data Warehouse – Slowly Changing Dimensions with Many to Many Relationships

Posted on

Question :

As an example, let’s say I have a fact table with two dimensions and one measure

FactMoney table


ProjectKey int

PersonKey int

CashAmount money


The two dimensions are defined like this:

DimProject (a type 0 dimension – i.e. static)


ProjectKey int

ProjectName varchar(50)


DimPerson (a type 2 slowly changing dimension)


PersonKey int

PersonNaturalKey int

PersonName varchar(50)

EffectiveStartDate datetime

EffectiveEndDate datetime

IsCurrent bit


Pretty straightforward so far. Now I’ll introduce a Person Category concept.

DimCategory


CategoryKey int

CategoryName varchar(50)


And build an M2M relationship between DimPerson and DimCategory

BridgePersonCategory


PersonKey int

CategoryKey int


So – people can have 1..n categories.

My problem is – as Person is a slowly changing dimension, when a person’s name changes, we add a new person row and update our effective dates and is current flags, no big deal.

But what do we do with the person’s categories? Do we need to add more rows to the bridge table every time a new person version pops up?

And as a corollary, if a person’s categories change, does that mean we need to create a new row in the person table?

(please note I’ve also asked this question at stack-O and as directed by the first commenter, now posted this question here.)

Answer :

Does DimCategory have any Type 2 attributes that you are tracking?

If it is not, then you would only need to add a new record to your bridge table based on whenever a Type 2 attribute in DimPerson changes since you will be inserting a new record in DimPerson with a new surrogate key. You would then need to add or update your bridge table with this new surrogate key for whichever categories are assigned.

However, if an attribute on DimCategory changes, you would just update that record and there would be no change in the surrogate key so the bridge table would be unchanged.

Now if you want to track things like when a Category assignment to a Person changes as a Type 2 attribute, then you would need to build the Effective Start and End Dates in your bridge table to be able to tell at what point in time a Person’s Categories change.

Why new Surrogate Key for SCD 2 .

Soft RI .

MVs , Join Indexes , Cubes etc r nearly relic except projected MVS ,Join Indexes etc with the advent of columnar DBs.

Kimbal design is Relic for many reasons

  •    With redundancy denormalization it can never be DW . 
    
       It is not DW but  a physical semantic layer
    
       MDM
    
       bad use of surrogate Keys 
    
       Columnar DBs at PHYSICAL SEMANTIC LAYER for performance i.e. denorm without Row Oriented cost
    
       business user Reporting Layers r now in Memory as well as huge capability .
    
       Basic aggregation on contexts if volume can be reduced will work wonder .
    

Still All will depend upon data Hierarchy , data changes , user requirement of history .

Regards

Leave a Reply

Your email address will not be published.