Question :
I’m trying to figure out an easy query I can do to test if a large table has a list of entries that has at least ONE blank (NULL / empty) value in ANY column.
I need something like
SELECT * FROM table AS t WHERE ANY(t.* IS NULL)
I don’t want to have to do
SELECT * FROM table AS t WHERE t.c1 = NULL OR t.c2 = NULL OR t.c3 = NULL
This would be a HUGE query.
Answer :
An extension to @db2’s answer with less (read:zero) hand-wrangling:
DECLARE @tb nvarchar(512) = N'dbo.[table]';
DECLARE @sql nvarchar(max) = N'SELECT * FROM ' + @tb
+ N' WHERE 1 = 0';
SELECT @sql += N' OR ' + QUOTENAME(name) + N' IS NULL'
FROM sys.columns
WHERE [object_id] = OBJECT_ID(@tb)
AND is_nullable = 1;
EXEC sys.sp_executesql @sql;
You should list out all the columns as per JNK’s comment.
WHERE c1 IS NULL OR c2 IS NULL OR c3 IS NULL
A somewhat less efficient approach that avoids this is below though.
;WITH xmlnamespaces('http://www.w3.org/2001/XMLSchema-instance' AS ns)
SELECT *
FROM YourTable AS T1
WHERE (
SELECT T1.*
FOR XML PATH('row'), ELEMENTS XSINIL, TYPE
).exist('//*/@ns:nil') = 1
There’s no nice built-in syntax, but Management Studio has a couple convenient features to generate the query quickly.
In Object Explorer, drill down to the table you want, expand it, then drag the whole “Columns” folder into a blank query editor. This will add a comma-separated list of columns to the query.
Next, open up Find And Replace. Set “Find What” to ,
and set “Replace With” to IS NULL OR
(with a leading space) then hit Replace All. You’ll have to clean up the last one in the sequence by hand.
It’s still ugly, but it’s less labor intensive ugly.
Multiple Solutions for: some nulls, all nulls, single & multiple columns plus making it QUICK using Top 1
If you need to test multiple columns, you could use the following:
Column_1 Column_2 Column_3
-------- -------- --------
1 2 NULL
1 NULL NULL
5 6 NULL
First, test for NULLs and count them:
select
sum(case when Column_1 is null then 1 else 0 end) as Column_1,
sum(case when Column_2 is null then 1 else 0 end) as Column_2,
sum(case when Column_3 is null then 1 else 0 end) as Column_3,
from TestTable
Yields a count of NULLs:
Column_1 Column_2 Column_3
0 1 3
Where the result is 0, there are no NULLs.
Second, let’s count the non-NULLs:
select
sum(case when Column_1 is null then 0 else 1 end) as Column_1,
sum(case when Column_2 is null then 0 else 1 end) as Column_2,
sum(case when Column_3 is null then 0 else 1 end) as Column_3,
from TestTable
…But because we’re counting non-NULLs here, this can be simplified to:
select
count(Column_1) as Column_1,
count(Column_2) as Column_2,
count(Column_3) as Column_3,
from TestTable
Either one yields:
Column_1 Column_2 Column_3
3 2 0
Where the result is 0, the column is entirely made up of NULLs.
Lastly, if you only need to check a specific column, then TOP 1 is quicker because it should stop at the first hit. You can then optionally use count(*) to give a boolean-style result:
select top 1 'There is at least one NULL' from TestTable where Column_3 is NULL
select count(*) from (select top 1 'There is at least one NULL' AS note from TestTable where Column_3 is NULL) a
0 = There are no NULLs, 1 = There is at least one NULL
or
select top 1 'There is at least one non-NULL' AS note from TestTable where Column_3 is not NULL
select count(*) from (select top 1 'There is at least one non-NULL' AS note from TestTable where Column_3 is not NULL) a
0 = They are all NULL, 1 = There is at least one non-NULL
I hope this helps.
UNPIVOT translates columns into rows. In the process it eliminates NULL values (reference).
Given the input
create table #t
(
ID int primary key,
c1 int null,
c2 int null
);
insert #t(id, c1, c2)
values
(1, 12, 13),
(2, null, 14),
(3, 15, null),
(4, null, null);
the UNPIVOT query
select
ID, ColName, ColValue
from
(
select *
from #t
) as p
unpivot
(
ColValue for ColName in
(c1, c2) -- explicit source column names required
) as unpvt;
will produce the output
| ID | ColName | ColValue |
|----|---------|----------|
| 1 | c1 | 12 |
| 1 | c2 | 13 |
| 2 | c2 | 14 |
| 3 | c1 | 15 |
Sadly row 4 has been eliminated entirely since it has only NULLs! It can be conveniently re-introduced by injecting a dummy value into the source query:
select
ID, ColName, ColValue
from
(
select
-5 as dummy, -- injected here, -5 is arbitrary
*
from #t
) as p
unpivot
(
ColValue for ColName in
(dummy, c1, c2) -- referenced here
) as unpvt;
By aggregating the rows on ID we can count the non-null values. A comparison to the total number of columns in the source table will identify rows containing one or more NULL.
select
ID
from
(
select -5 as dummy, *
from #t
) as p
unpivot
(
ColValue for ColName in
(dummy, c1, c2)
) as unpvt
group by ID
having COUNT(*) <> 3;
I calculate 3 as
number of columns in source table #t
+ 1 for the injected dummy column
– 1 for ID, which is not UNPIVOTED
This value could be obtained at runtime by examining the catalog tables.
The original rows can be retrieved by joining to the results.
If values other than NULL are to be investigated they can be included in a where clause:
...
) as unpvt
where ColValue <> '' -- will eliminate empty strings
Discussion
This requires an identifier that is carried through the UNPIVOT. A key would be best. If none exists one can be injected by the ROW_NUMBER() window function, though this may be expensive to execute.
All columns must be explicitly listed inside the UNPIVOT clause. They can be dragged in using SSMS, as @db2 suggested. It will not be dynamic when the table definition chagnes, as Aaron Bertrand’s suggestion would be. This is the case for almost all SQL, however.
For my rather limited data set the execution plan is a clustered index scan and a stream aggregate. This will be more expensive of memory than a straight scan of the table and lots of OR clauses.