Is a UNION query equivalent with or without DISTINCT in subqueries?

Posted on

Question :

Consider the following two queries.

SELECT Col1, Col2
FROM TblA

UNION

SELECT Col1, Col2
FROM TblB

and

SELECT DISTINCT Col1, Col2
FROM TblA

UNION

SELECT DISTINCT Col1, Col2
FROM TblB

These are logically equivalent. My question is if they are treated the same by the database engine. Does SQL Server recognize the redundancy and ignore the DISTINCT operator?

Answer :

No, they are not necessarily “treated the same by the database engine.” A test below shows that you might get different query plans.

In many cases, the difference between query plans may not matter for you. But in some (likely rare) cases it could matter significantly. For example, if SQL Server has a very poor cardinality estimate for one branch of your UNION, calculating the unique rows within that branch could result in a spill that would not have happened if the query only calculated the final set of unique rows.

SELECT @@version
-- Microsoft SQL Server 2017 (RTM-CU1) (KB4038634) - 14.0.3006.16 (X64) 

SELECT create_date FROM sys.databases
UNION 
SELECT create_date FROM sys.databases

SELECT DISTINCT create_date FROM sys.databases
UNION 
SELECT DISTINCT create_date FROM sys.databases

enter image description here

Leave a Reply

Your email address will not be published. Required fields are marked *