Question :
I have 3 (or more) tables as follows in MySQL
persons
id | name | gender |
---|---|---|
1 | nim | male |
2 | nam | female |
visits
id | person_id | date |
---|---|---|
1 | 1 | 2020-12-10 |
2 | 1 | 2020-12-16 |
3 | 1 | 2020-12-17 |
4 | 2 | 2020-12-20 |
5 | 2 | 2020-12-21 |
selected_options
id | visit_id | option_a | option_b | option_c | option_d |
---|---|---|---|---|---|
1 | 1 | yes | no | yes | yes |
2 | 2 | no | no | no | no |
3 | 3 | yes | yes | no | no |
4 | 4 | yes | yes | yes | yes |
5 | 5 | yes | no | yes | yes |
Given the above table structure, I am looking to write a query that selects people who have selected “yes” to a set of options at during their visits.
E.g If the options to be queried are option_a = ‘yes’, option_b = ‘no’, option_c = ‘no’; The query should return person with id 1 since he selected option_a as ‘yes’ at visit 1, option_b as ‘no’ in visits 1 and 2 and option_c as ‘no’ in visits 2 and 3. It should not return person 2 because person 2 selected option_a as ‘yes’ in visit 4, option_b as ‘no’ in visit 5 but never selected option c as no in any of their visits.
I have tried the following:
SELECT *
FROM persons
LEFT JOIN visits ON person.id = visit.person_id
LEFT JOIN selected_options ON visit.id = selected_options.visit_id
WHERE (option_a, option_b, option_c) IN (('yes', 'no', 'no'))
However it does not match any values.
Answer :
As you have to check every single column for validity, you can use such a construct.
CREATE TABLE persons
(`id` int, `name` varchar(3), `gender` varchar(6))
;
INSERT INTO persons
(`id`, `name`, `gender`)
VALUES
(1, 'nim', 'male'),
(2, 'nam', 'female')
;
CREATE TABLE visits
(`id` int, `person_id` int, `date` varchar(10))
;
INSERT INTO visits
(`id`, `person_id`, `date`)
VALUES
(1, 1, '2020-12-10'),
(2, 1, '2020-12-16'),
(3, 1, '2020-12-17'),
(4, 2, '2020-12-20'),
(5, 2, '2020-12-21')
;
CREATE TABLE selected_options
(`id` int, `visit_id` int, `option_a` varchar(3), `option_b` varchar(3), `option_c` varchar(3), `option_d` varchar(3))
;
INSERT INTO selected_options
(`id`, `visit_id`, `option_a`, `option_b`, `option_c`, `option_d`)
VALUES
(1, 1, 'yes', 'no', 'yes', 'yes'),
(2, 2, 'no', 'no', 'no', 'no'),
(3, 3, 'yes', 'yes', 'no', 'no'),
(4, 4, 'yes', 'yes', 'yes', 'yes'),
(5, 5, 'yes', 'no', 'yes', 'yes')
SELECT *FROm persons p WHERe EXISTS(SELECT 1 FROM visits v INNER JOIN selected_options s ON v.`id` = s.`visit_id` WHERE v.`person_id` = p.`id` AND s.`option_a` = 'yes') AND EXISTS(SELECT 1 FROM visits v INNER JOIN selected_options s ON v.`id` = s.`visit_id` WHERE v.`person_id` = p.`id` AND s.`option_b` = 'no') AND EXISTS(SELECT 1 FROM visits v INNER JOIN selected_options s ON v.`id` = s.`visit_id` WHERE v.`person_id` = p.`id` AND s.`option_c` = 'no')
id | name | gender -: | :--- | :----- 1 | nim | male
db<>fiddle here
Another approach (Bits and Booleans)
If the “options” are exactly yes/no (not “unknown” or “maybe”), then you could use a “bit” and put those bits into a SET
or INT
. This would lead to a much more compact dataset, though the WHERE
code would be messier.
INT
would allow Boolean operations with which you could compute things that you seem to need:
- OptionA was “yes” for any visit by person-1.
- OptionA was “yes” for all visits by person-1.
options BIGINT UNSIGNED
lets you handle 64 options, no more. You could (with added complexity) use more than one column if you need more than 64 options.
The gist of the code is to GROUP BY person
and do BIT_AND(options)
to combine every column in order to test for “yes” at all visits. BIT_OR
would give you yes
at any visit. Other expressions would handle any/all “no”.
Masking would let you select the, say, 3 options that the query is interested in.
To find visits with a particular set of yes/no values, just masking and equality suffices. So (option_a, option_b, option_c) IN (('yes', 'no', 'no'))
maps to (assuming a is bit 0, b is bit 1, etc):
WHERE options
& ((1<<0) | (1<<1) | (1<<2)) -- check bits 0,1,2
= (1<<0) -- for 0 being YES; the rest being NO
Back to what might be the desired query:
WHERE BIT_OR(options) & (1<<0) -- option_a is YES for _any_ visit
AND BIT_AND(options) & (1<<1) = 0 -- b was NO for _any_ visit
AND BIT_AND(options) & (1<<2) = 0 -- c was NO for _any_ visit
Tips
LEFT
seems to be ‘wrong’, use a regular JOIN
.
“Row constructors”, in the past (pre 5.7.3), were quite inefficient. Rewrite the WHERE
if you are running an old version.
Please qualify column names so we know which table(s) they are in.
You show 4 options now. Might there be 5 tomorrow? And 6 the day after? This leads to a maintenance question for the schema.