SQL Server 2017: I have the EMP_DEPT (SCOTT schema) database with a few additional records inserted. Two of them have ZERO base salary (PORTMAN, RENO) and one employee (MCQUEEN) has NULL in the salary column.
It looks like this:
Then I’m learning SQL on these records and I want to
SELECT the employees with the highest and lowest salaries.
To do this I write 3 queries:
/* Return employee record with max/min salary. */ SELECT * FROM EMP WHERE SAL = (SELECT MAX(SAL) FROM EMP); SELECT * FROM EMP WHERE SAL = (SELECT MIN(SAL) FROM EMP); SELECT * FROM EMP WHERE SAL = (SELECT MIN(ISNULL(SAL, 0)) FROM EMP); --ISNULL(source column, replacement_value) function transforms NULL value into --any other value specified in the second ISNULL attribute.
The results are as follows:
The first query results are not a surprise – I understand them.
The second query results are not a surprise – I understand them.
But in the third query I would have expected
MCQUEEN returned as well, but he’s not!
The question is why
MCQUEEN is not returned in the third query results??
I’ve treated SAL column with
ISNULL function to change this NULL value to ZERO.
So why isn’t
RENO employees with zero salary??
So why isn’t MCQUEEN among PORTMAN and RENO employees with zero
The issue is this predicate:
WHERE SAL = (SELECT MIN(ISNULL(SAL, 0)) FROM EMP);
At run time, the above comparison for the MCQUEEN row with NULL salary is:
WHERE NULL = 0;
The result of any comparison to
NULL is UNKNOWN per the ISO SQL standard. The UNKNOWN result is neither TRUE nor FALSE so the row is eliminated from the result regardless of the value returned by the subquery.
Even though the right side subquery value will never be
NULL due the the subquery
ISNULL(SAL, 0) expression, the left side SAL value may be
NULL so you need to similarly consider
NULL values for SAL:
WHERE ISNULL(SAL, 0) = (SELECT MIN(ISNULL(SAL, 0)) FROM EMP);
NULL or zero SAL values will then be returned unless you have employees with negative salaries. Similarly, one could use
CASE WHEN SAL IS NULL THEN 0 ELSE SAL END), etc. to coerce a non-NULL for NULL values.
A better approach to include
NULL may be a
UNION ALL query:
SELECT * FROM EMP WHERE SAL = (SELECT MIN(SAL) FROM EMP) UNION ALL SELECT * FROM EMP WHERE SAL IS NULL;
The advantage of this technique is the queries can leverage an index on the SAL column (if one exists) to execute more efficiently.
I hope this helps you understand the technical reason why the row isn’t returned. In the real world, the semantics of a
NULL value need to be considered. For example, does the company really have employees with unknown salaries? If not, it’s a flaw in the data model. If unknown salaries are allowed, why should a query for minimum salaries return unknown salaries since the actual employee salary could be higher?
You have an NULL value in your table, so your last query for given row is like this:
Select * from EMP WHERE NULL = 0.
Where NULL is the value in the table and the 0 is returned as the minsalary value.
Correct would be:
Select * from EMP where coalesce(SAL,0) = coalesce((select min(SAL) from EMP),0)