Question :
I am trying to optimize a query that using IN clause in WHERE to avoid file sorting. To make it easy , I created the following sample which shows the problem. Here is my query:
SELECT *
FROM `test`
WHERE user_id = 9898
AND status IN (1,3,4)
order by id
limit 30;
Here is the result of explain, as you can see the query is filesort
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE test range user_id user_id 8 NULL 3 Using where; Using index; Using filesort
Here is my table structure
CREATE TABLE IF NOT EXISTS `test` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(10) unsigned NOT NULL,
`status` int(3) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`,`status`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=7 ;
--
-- Dumping data for table `test`
--
INSERT INTO `test` (`id`, `user_id`, `status`) VALUES
(5, 9797, 2),
(6, 9797, 3),
(4, 9898, 0),
(1, 9898, 2),
(2, 9898, 3),
(3, 9898, 4);
How can I optimize the query? In my real table I can see the following information in error log:
# Query_time: 26.498180 Lock_time: 0.000175 Rows_sent: 100 Rows_examined: 4926
Answer :
I was looking at a very similar problem today. After doing a ton of searching online, I found this great article by Percona.
Using “UNION ALL”, you can join together a list of queries, like so:
(SELECT * FROM `test` WHERE user_id = 9898 AND status = 1 ORDER BY id LIMIT 30)
UNION ALL
(SELECT * FROM `test` WHERE user_id = 9898 AND status = 3 ORDER BY id LIMIT 30)
UNION ALL
(SELECT * FROM `test` WHERE user_id = 9898 AND status = 4 ORDER BY id LIMIT 30)
ORDER BY id LIMIT 30
It looks gnarly compared to select ... status IN (1,3,4)
, but it’s effective in avoiding the filesort.
As long as the inner “SELECT…” statements are efficient, the performance is good.
Put the list of values into a temporary table, and then perform an INNER JOIN to it. Most SQL optimizers and engines can handle joins much better than they can handle an IN operation. If the list is long, this also allows you to define an index on the temporary table to further assist the optimizer