Question :
I have 3 tables:
users
movements
unique_ids
I want to get the records gotten from march until now of the users registered in march, counting the unique id of their movements with a distinct and the total of it at the end:
SELECT MONTH(mo.ctime) AS month, COUNT(DISTINCT(unique_id))
FROM users us, movements mo,unique_ids id
WHERE us.time_field > '2014-03-01' AND us.time_field < '2014-03-31 23:59:59'
AND mo.time_field > '2014-03-01'
AND us.room_name LIKE 'name'
AND us.room_name=mo.room_name AND us.user=mo.user AND id.id_log=mo.id_log
GROUP BY month WITH ROLLUP
The thing is that few month ago in other query I realise that adding DATE() to the time_field into WHERE clause makes faster the query, but adding this into the query shown, gives me different results than the initial one.
What’s the difference between time_field > ‘x’ AND DATE(time_field) > ‘x’ inside WHERE clause?
Answer :
Just addressing this question
What’s the difference between time_field > ‘x’ AND DATE(time_field) > ‘x’ inside WHERE clause?
The difference is how the Query Optimizer treats them.
- If the
WHERE
clause hastime_field > 'x'
, this signals the Query Optimizer to try looking for any index so as to take advantage of it doing a range scan. - If the
WHERE
clause hasDATE(time_field) > 'x'
, this signals the Query Optimizer to throw all indexes under the bus because the DATE function has to be called across the whole table (in the form of a full table scan) or the join result.
Depending on the key distribution of time_field
, DATE(time_field) > 'x'
triggering a full table scan just happens to be better than a range scan on an index if the values make up a significant percentage of the index. This is even more true using InnoDB because a full table scan passes through the clustered index (where PRIMARY KEY
and row data coexist), while a secondary index on time_field
would cause a lookup of the secondary index in addition to the clustered index. If such a secondary index had a lopsided key distribution, such an index would be ignored in favor of a full table scan anyway.
This conjuecture is only indicative of your current dataset. Someone else’s dataset may have a better (evenly distributed, more balanced) key distribution, resulting in time_field > 'x'
working better than DATE(time_field) > 'x'
.
To see such differences, run the EXPLAIN
on both SELECT
queries. The results may be different.