Question :
Imagine you have two different tables/queries that are supposed to have/return identical data. You want to verify this. What’s an easy way to show any unmatched rows from each table just like the example below, comparing every column? Assume there are 30 columns in the tables, many of which are NULLable.
When there is no PK or there could be duplicates per PK, joining on just PK columns isn’t enough, and it would be a disaster to have to do a FULL JOIN with 30 join conditions that properly handle NULLs, plus a nasty WHERE condition to exclude the matched rows.
Usually it is when I’m writing a new query against unscrubbed or not-fully-understood data that the problem is worst and the likelihood of a PK being logically available is extremely low. I cook up two different ways to solve the problem and then compare their results, the differences highlighting special cases in the data that I was unaware of.
The result needs to look like this:
Which Col1 Col2 Col3 ... Col30
------ ------ ------ ------ ------
TableA Cat 27 86 -- mismatch
TableB Cat 27 105 -- mismatch
TableB Cat 27 87 -- mismatch 2
TableA Cat 128 92 -- no corresponding row
TableB Lizard 83 NULL -- no corresponding row
If [Col1, Col2]
do happen to be a composite key and we order by them in our final result, then we can easily see that A and B have one row different that should be the same, and each has one row that is not in the other.
In the above example, seeing the first row twice is not desirable.
Here’s DDL and DML to set up sample tables and data:
CREATE TABLE dbo.TableA (
Col1 varchar(10),
Col2 int,
Col3 int,
Col4 varchar(10),
Col5 varchar(10),
Col6 varchar(10),
Col7 varchar(10),
Col8 varchar(10),
Col9 varchar(10),
Col10 varchar(10),
Col11 varchar(10),
Col12 varchar(10),
Col13 varchar(10),
Col14 varchar(10),
Col15 varchar(10),
Col16 varchar(10),
Col17 varchar(10),
Col18 varchar(10),
Col19 varchar(10),
Col20 varchar(10),
Col21 varchar(10),
Col22 varchar(10),
Col23 varchar(10),
Col24 varchar(10),
Col25 varchar(10),
Col26 varchar(10),
Col27 varchar(10),
Col28 varchar(10),
Col29 varchar(10),
Col30 varchar(10)
);
CREATE TABLE dbo.TableB (
Col1 varchar(10),
Col2 int,
Col3 int,
Col4 varchar(10),
Col5 varchar(10),
Col6 varchar(10),
Col7 varchar(10),
Col8 varchar(10),
Col9 varchar(10),
Col10 varchar(10),
Col11 varchar(10),
Col12 varchar(10),
Col13 varchar(10),
Col14 varchar(10),
Col15 varchar(10),
Col16 varchar(10),
Col17 varchar(10),
Col18 varchar(10),
Col19 varchar(10),
Col20 varchar(10),
Col21 varchar(10),
Col22 varchar(10),
Col23 varchar(10),
Col24 varchar(10),
Col25 varchar(10),
Col26 varchar(10),
Col27 varchar(10),
Col28 varchar(10),
Col29 varchar(10),
Col30 varchar(10)
);
INSERT dbo.TableA (Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10, Col11, Col12, Col13, Col14, Col15, Col16, Col17, Col18, Col19, Col20, Col21, Col22, Col23, Col24, Col25, Col26, Col27, Col28, Col29, Col30)
VALUES
('Cat', 27, 86, 'a', 'b', 'c', 'd', 'e', 'f', 'g',' h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0'),
('Cat', 128, 92, 'a', 'b', 'c', 'd', 'e', 'f', 'g',' h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0'),
('Porcupine', NULL, 42, 'a', 'b', 'c', 'd', 'e', 'f', 'g',' h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0'),
('Tapir', NULL, NULL, 'a', 'b', 'c', 'd', 'e', 'f', 'g',' h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0')
;
INSERT dbo.TableB (Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10, Col11, Col12, Col13, Col14, Col15, Col16, Col17, Col18, Col19, Col20, Col21, Col22, Col23, Col24, Col25, Col26, Col27, Col28, Col29, Col30)
VALUES
('Cat', 27, 105, 'a', 'b', 'c', 'd', 'e', 'f', 'g',' h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0'),
('Cat', 27, 87, 'a', 'b', 'c', 'd', 'e', 'f', 'g',' h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0'),
('Lizard', 83, NULL, 'a', 'b', 'c', 'd', 'e', 'f', 'g',' h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0'),
('Porcupine', NULL, 42, 'a', 'b', 'c', 'd', 'e', 'f', 'g',' h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0'),
('Tapir', NULL, NULL, 'a', 'b', 'c', 'd', 'e', 'f', 'g',' h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0');
Answer :
You don’t need 30 join conditions for a FULL OUTER JOIN
here.
You can just Full Outer Join on the PK, preserve rows with at least one difference with WHERE EXISTS (SELECT A.* EXCEPT SELECT B.*)
and use CROSS APPLY (SELECT A.* UNION ALL SELECT B.*)
to unpivot out both sides of the JOIN
ed rows into individual rows.
WITH TableA(Col1, Col2, Col3)
AS (SELECT 'Dog',1,1 UNION ALL
SELECT 'Cat',27,86 UNION ALL
SELECT 'Cat',128,92),
TableB(Col1, Col2, Col3)
AS (SELECT 'Dog',1,1 UNION ALL
SELECT 'Cat',27,105 UNION ALL
SELECT 'Lizard',83,NULL)
SELECT CA.*
FROM TableA A
FULL OUTER JOIN TableB B
ON A.Col1 = B.Col1
AND A.Col2 = B.Col2
/*Unpivot the joined rows*/
CROSS APPLY (SELECT 'TableA' AS what, A.* UNION ALL
SELECT 'TableB' AS what, B.*) AS CA
/*Exclude identical rows*/
WHERE EXISTS (SELECT A.*
EXCEPT
SELECT B.*)
/*Discard NULL extended row*/
AND CA.Col1 IS NOT NULL
ORDER BY CA.Col1, CA.Col2
Gives
what Col1 Col2 Col3
------ ------ ----------- -----------
TableA Cat 27 86
TableB Cat 27 105
TableA Cat 128 92
TableB Lizard 83 NULL
Or a version dealing with the moved goalposts.
SELECT DISTINCT CA.*
FROM TableA A
FULL OUTER JOIN TableB B
ON EXISTS (SELECT A.* INTERSECT SELECT B.*)
CROSS APPLY (SELECT 'TableA' AS what, A.* UNION ALL
SELECT 'TableB' AS what, B.*) AS CA
WHERE NOT EXISTS (SELECT A.* INTERSECT SELECT B.*)
AND CA.Col1 IS NOT NULL
ORDER BY CA.Col1, CA.Col2
For tables with many columns it can still be difficult to identify the specific column(s) that differ. For that you can potentially use the below.
(though just on relatively small tables as otherwise this method likely won’t have adequate performance)
SELECT t1.primary_key,
y1.c,
y1.v,
y2.v
FROM t1
JOIN t2
ON t1.primary_key = t2.primary_key
CROSS APPLY (SELECT t1.*
FOR xml path('row'), elements xsinil, type) x1(x)
CROSS APPLY (SELECT t2.*
FOR xml path('row'), elements xsinil, type) x2(x)
CROSS APPLY (SELECT n.n.value('local-name(.)', 'sysname'),
n.n.value('.', 'nvarchar(max)')
FROM x1.x.nodes('row/*') AS n(n)) y1(c, v)
CROSS APPLY (SELECT n.n.value('local-name(.)', 'sysname'),
n.n.value('.', 'nvarchar(max)')
FROM x2.x.nodes('row/*') AS n(n)) y2(c, v)
WHERE y1.c = y2.c
AND EXISTS(SELECT y1.v
EXCEPT
SELECT y2.v)
This can be handled using EXCEPT and/or INTERSECT.
http://msdn.microsoft.com/en-us/library/ms188055.aspx
First find all records that are in table1 that are not in table 2, then find all records that are in table 2 that are not in table one.
SELECT * FROM table1
EXCEPT
SELECT * FROM table2
UNION
SELECT * FROM table2
EXCEPT
SELECT * FROM table1
There is undoubtedly a more efficient way to do this, but it is the first “quick and dirty” solution off the top of my head. Also, I do not recommend using a * wildcard, but it suits here for brevity.
Alternately, you could use an INTERSECT operator and exclude all the results from it.
It is easy to accomplish with a third party tool like Data Compare, or just do it on the client. In the context of unit testing stored procedures, we just wrote some C# code.
Here is the C# code we are using, quoted from an old article:Close those Loopholes – Testing Stored Procedures
internal static class DataSetComparer
{
internal static bool Compare(DataSet one, DataSet two)
{
if(one.Tables.Count != two.Tables.Count)
return false;
for(int i = 0; i < one.Tables.Count; i++)
if(!CompareTables(one.Tables[i], two.Tables[i]))
return false;
return true;
}
private static bool CompareTables(DataTable one, DataTable two)
{
if(one.Rows.Count != two.Rows.Count)
return false;
for(int i = 0; i < one.Rows.Count; i++)
if(!CompareRows(one.Rows[i], two.Rows[i]))
return false;
return true;
}
private static bool CompareRows(DataRow one, DataRow two)
{
if(one.ItemArray.Length != two.ItemArray.Length)
return false;
for(int i = 0; i < one.ItemArray.Length; i++)
if(!CompareItems(one.ItemArray[i], two.ItemArray[i]))
return false;
return true;
}
private static bool CompareItems(object value1, object value2)
{
if(value1.GetType() != value2.GetType())
return false;
if(value1 is DBNull)
return true;
if(value1 is DateTime)
return ((DateTime) value1).CompareTo((DateTime) value2)
== 0;
if(value1 is byte[])
{
if(((byte[]) value1).Length != ((byte[]) value2).Length)
return false;
for(int i = 0; i < ((byte[]) value1).Length; i++)
if(((byte[]) value1)[i] != ((byte[]) value2)[i])
return false;
return true;
}
return value1.ToString().Equals(value2.ToString());
}
}
Here’s a way to show what was asked for:
SELECT
Which = 'TableA',
*
FROM (
SELECT * FROM dbo.TableA
EXCEPT
SELECT * FROM dbo.TableB
) X
UNION ALL
SELECT
'TableB',
*
FROM (
SELECT * FROM dbo.TableB
EXCEPT
SELECT * FROM dbo.TableA
) X
ORDER BY
Col1, Col2, Col3, Col4, Col5, Col6, Col7, Col8, Col9, Col10, Col11, Col12, Col13, Col14, Col15, Col16, Col17, Col18, Col19, Col20, Col21, Col22, Col23, Col24, Col25, Col26, Col27, Col28, Col29, Col30
;