Question :
I need to make an OUTER JOIN on two tables, based on 2 columns, first id an INTEGER id, which is pretty straightforward to use in the query, but the second one is a DATETIME, which doesn’t actually matches a value on the other table, but needs to “match” the closest date.
As an added bonus to the problem, on the ref table data is not as DATETIME, but on two separate columns: one DATE and another TIME.
This question is a derivate of other two I’ve asked before, but as I need to narrow down the problems, I preferred to make separate questions.
The question is how to choose the closest date on a best performance approach. I’ve tried using between (1 minute tolerance) and things like that, but the query takes too long and the explain plan gives too many rows on the involved tables.
Perhaps something like using subqueries might work faster, but I just don’t know how to build them for the JOIN syntax.
References:
BD: MySQL 5.5.43-0+deb8u1-log
Answer :
Let’s jump “out of the box”…
Have a column in the first table with the datetime rounded to the minute. (One minute being your resolution.)
Have two columns in the second table, both datetime, one rounded up to the next minute, the other down.
Index both columns in second table: (id, lower)
and (id, upper)
.
Now run a UNION ALL
of two SELECTs
, one JOINing
the first table to the second via id and lower, with equality; the other via id and upper.
This could catch pairs that are almost 2 minutes apart, so in the outer part of the UNION
, do a WHERE
to filter using the to-the-second columns — checking which ones match within one minute.
This technique is likely to be much faster than any other.