Question :
Essentially, I want to get intersection between 2 tables based on matching IDs with the option of wildcard ‘*’ to return all IDs:
Select * from A a inner join B b on b.id = a.id or b.id = ‘*’
Table A is fairly large (10 M rows), and I have setup the id
to be index already.
Because of OR
, index id
is not used, and it does a full scan (Takes 20 secs).
If I don’t allow the wildcard:
Select * from A a inner join B b on b.id = a.id
It uses id
index and takes only 300 ms
Edit:
By separating into 2 select with Union helped. Detail here: https://stackoverflow.com/questions/5901791/is-having-an-or-in-an-inner-join-condition-a-bad-idea
Answer :
A common trick for optimizing OR
is to turn it into UNION
:
( SELECT * FROM A JOIN B USING(id) WHERE condition_1 )
UNION DISTINCT
( SELECT * FROM A JOIN B USING(id) WHERE condition_2 )
(I’m with Jack on being confused on what you really wanted, so I avoided spelling out the details of the conditions.)
Be sure to include index(es) that let the queries start with the conditions.
Reformatting your query slightly we have:
select * from A a
inner join B b on
b.id = a.id -- condition 1
or
b.id = '*' -- condition 2
Consider how this will match rows from B with rows from A:
- If there is a row in B with an
id
that matches theid
in A, include that row along with the row from A. - If there is any row in B with an
id
that matches ‘*’, include that row along with the row from A.
Condition #2 means that if there are any rows in B with an id
of ‘*’, then all of the rows in A must be included, because they all match those rows in B.
This also means that every row in B that has an id
of ‘*’ matches every row in A.
Is that really what you want? If so then there is no way to avoid a full scan because every row in A is included.