Does joining to the same table twice have a predictable performance impact

Posted on

Question :

I have a simple dimension, for example a date dimension, which is static and unchanging over time. In the fact table, I have several dates.

Is it different from a performance perspective to join to one date dimension twice vs. having a second copy of that date dimension and joining to it for the second date and so on?

If I have a fact table that looks like this:

FactId   FactNumber  FactDate  FactOtherDate
1        10          1/1/2016  12/31/2015

…and I want to produce a query like this:

Select Sum(FactNumber), DateDim.Month, OtherMonth = DateDim2.Month
From Fact 
Inner Join DateDim On DateDim.Date = FactDate
Inner Join DateDim DateDim2 ON DateDim2.Date = FactOtherDate
Group By DateDim.Month, DateDim2.Month

…would it be better to make a second actual table and join to that, rather than joining to the same table twice?

I ask because indexed views won’t allow such a query as a view, and that made me think that SQL Server handles it differently and possibly less efficiently. I’m not looking to use an indexed view, just wondering if that restriction provides any insight into this question.

Answer :

There are numerous restrictions to what can be an indexed view. In a data warehouse setting those restrictions will make it impossible for just about every standard star join view to be an indexed view. Which is really a shame, since despite what the comments say, [materialized] indexed views are magic!

As for whether joining to the same table twice is an issue, it is not. In a data warehouse you’ll often find yourself joining to time and date dimensions dozens of times.

Also, I was interested to see that you made the primary key on the date dimension the date itself. In my last data warehouse we ended up needing to use an INT column as the primary key, because we needed to save indeterminate dates (like 2016/1/??) where all of the date parts were not known.

Also, when writing your views, you should always check to verify whether a LEFT JOIN will have better performance than an INNER JOIN, even if they produce the same results. Back when our data warehouse was closer to a snowflake schema, the joins were into the hundreds, and many of the views did better with all LEFT JOINs as opposed to INNER JOINs where INNER JOINs were appropriate.

Leave a Reply

Your email address will not be published. Required fields are marked *