Select rows from table using conditions across multiple rows in left joined table

Posted on

Question :

I have 3 (or more) tables as follows in MySQL

persons

id name gender
1 nim male
2 nam female

visits

id person_id date
1 1 2020-12-10
2 1 2020-12-16
3 1 2020-12-17
4 2 2020-12-20
5 2 2020-12-21

selected_options

id visit_id option_a option_b option_c option_d
1 1 yes no yes yes
2 2 no no no no
3 3 yes yes no no
4 4 yes yes yes yes
5 5 yes no yes yes

Given the above table structure, I am looking to write a query that selects people who have selected “yes” to a set of options at during their visits.

E.g If the options to be queried are option_a = ‘yes’, option_b = ‘no’, option_c = ‘no’; The query should return person with id 1 since he selected option_a as ‘yes’ at visit 1, option_b as ‘no’ in visits 1 and 2 and option_c as ‘no’ in visits 2 and 3. It should not return person 2 because person 2 selected option_a as ‘yes’ in visit 4, option_b as ‘no’ in visit 5 but never selected option c as no in any of their visits.

I have tried the following:

SELECT  *
    FROM  persons
    LEFT JOIN  visits  ON person.id = visit.person_id
    LEFT JOIN  selected_options  ON visit.id = selected_options.visit_id
    WHERE  (option_a, option_b, option_c) IN (('yes', 'no', 'no'))

However it does not match any values.

Answer :

As you have to check every single column for validity, you can use such a construct.

CREATE TABLE persons
    (`id` int, `name` varchar(3), `gender` varchar(6))
;
    
INSERT INTO persons
    (`id`, `name`, `gender`)
VALUES
    (1, 'nim', 'male'),
    (2, 'nam', 'female')
;
CREATE TABLE visits
    (`id` int, `person_id` int, `date` varchar(10))
;
    
INSERT INTO visits
    (`id`, `person_id`, `date`)
VALUES
    (1, 1, '2020-12-10'),
    (2, 1, '2020-12-16'),
    (3, 1, '2020-12-17'),
    (4, 2, '2020-12-20'),
    (5, 2, '2020-12-21')
;
CREATE TABLE selected_options
    (`id` int, `visit_id` int, `option_a` varchar(3), `option_b` varchar(3), `option_c` varchar(3), `option_d` varchar(3))
;
    
INSERT INTO selected_options
    (`id`, `visit_id`, `option_a`, `option_b`, `option_c`, `option_d`)
VALUES
    (1, 1, 'yes', 'no', 'yes', 'yes'),
    (2, 2, 'no', 'no', 'no', 'no'),
    (3, 3, 'yes', 'yes', 'no', 'no'),
    (4, 4, 'yes', 'yes', 'yes', 'yes'),
    (5, 5, 'yes', 'no', 'yes', 'yes')
SELECT *FROm persons p 
WHERe EXISTS(SELECT 1 FROM visits v INNER JOIN selected_options s ON v.`id` = s.`visit_id`
             WHERE v.`person_id` = p.`id` AND s.`option_a` = 'yes')
AND EXISTS(SELECT 1 FROM visits v INNER JOIN selected_options s ON v.`id` = s.`visit_id`
             WHERE v.`person_id` = p.`id` AND s.`option_b` = 'no')
AND EXISTS(SELECT 1 FROM visits v INNER JOIN selected_options s ON v.`id` = s.`visit_id`
             WHERE v.`person_id` = p.`id` AND s.`option_c` = 'no')
id | name | gender
-: | :--- | :-----
 1 | nim  | male  

db<>fiddle here

Another approach (Bits and Booleans)

If the “options” are exactly yes/no (not “unknown” or “maybe”), then you could use a “bit” and put those bits into a SET or INT. This would lead to a much more compact dataset, though the WHERE code would be messier.

INT would allow Boolean operations with which you could compute things that you seem to need:

  • OptionA was “yes” for any visit by person-1.
  • OptionA was “yes” for all visits by person-1.

options BIGINT UNSIGNED lets you handle 64 options, no more. You could (with added complexity) use more than one column if you need more than 64 options.

The gist of the code is to GROUP BY person and do BIT_AND(options) to combine every column in order to test for “yes” at all visits. BIT_OR would give you yes at any visit. Other expressions would handle any/all “no”.

Masking would let you select the, say, 3 options that the query is interested in.

To find visits with a particular set of yes/no values, just masking and equality suffices. So (option_a, option_b, option_c) IN (('yes', 'no', 'no')) maps to (assuming a is bit 0, b is bit 1, etc):

WHERE options
    & ((1<<0) | (1<<1) | (1<<2))  -- check bits 0,1,2
    =  (1<<0)                     -- for 0 being YES; the rest being NO

Back to what might be the desired query:

WHERE BIT_OR(options) & (1<<0)  -- option_a is YES for _any_ visit
  AND BIT_AND(options) & (1<<1) = 0  -- b was NO for _any_ visit
  AND BIT_AND(options) & (1<<2) = 0  -- c was NO for _any_ visit

Tips

LEFT seems to be ‘wrong’, use a regular JOIN.

“Row constructors”, in the past (pre 5.7.3), were quite inefficient. Rewrite the WHERE if you are running an old version.

Please qualify column names so we know which table(s) they are in.

You show 4 options now. Might there be 5 tomorrow? And 6 the day after? This leads to a maintenance question for the schema.

Leave a Reply

Your email address will not be published.