Question :
Consider the following two queries.
SELECT Col1, Col2
FROM TblA
UNION
SELECT Col1, Col2
FROM TblB
and
SELECT DISTINCT Col1, Col2
FROM TblA
UNION
SELECT DISTINCT Col1, Col2
FROM TblB
These are logically equivalent. My question is if they are treated the same by the database engine. Does SQL Server recognize the redundancy and ignore the DISTINCT
operator?
Answer :
No, they are not necessarily “treated the same by the database engine.” A test below shows that you might get different query plans.
In many cases, the difference between query plans may not matter for you. But in some (likely rare) cases it could matter significantly. For example, if SQL Server has a very poor cardinality estimate for one branch of your UNION, calculating the unique rows within that branch could result in a spill that would not have happened if the query only calculated the final set of unique rows.
SELECT @@version
-- Microsoft SQL Server 2017 (RTM-CU1) (KB4038634) - 14.0.3006.16 (X64)
SELECT create_date FROM sys.databases
UNION
SELECT create_date FROM sys.databases
SELECT DISTINCT create_date FROM sys.databases
UNION
SELECT DISTINCT create_date FROM sys.databases