How can it be that count(distinct x, y) < # records distinct x, y

Posted on

Question :

I’d like to learn why the query

select distinct m.id, m.sex 
from   member m
join   household h on m.household_id = h.id
join   contact c on c.household_id = h.id
where  c.contact_date between '2018-04-01' and '2018-04-30';

returns 923 rows, yet will return the value 918 with the query

select count(distinct m.id, m.sex) 
from   member m
join   household h on m.household_id = h.id
join   contact c on c.household_id = h.id
where  c.contact_date between '2018-04-01' and '2018-04-30';

The query

select count(distinct m.id) 
from   member m
join   household h on m.household_id = h.id
join   contact c on c.household_id = h.id
where  c.contact_date between '2018-04-01' and '2018-04-30';

also returns 923.

The statement select * from member m where sex not in ('Male', 'Female') returns no rows.

Answer :

You have 5 records with m.sex is null I think.

Any operator applied to NULL (except IS [NOT] NULL and <=>) returns NULL treated as False.

Leave a Reply

Your email address will not be published. Required fields are marked *