Question :
I have a problem with a UNION query in MySQL. We have 10 millions players on our website and we would like to select players with a multi-criterias system. For exemple, selecting US people, men, more than 35 years of age.
We are using “vertical partionning”: 1 table per criter. For example:
* user_country
- id_user
- id_country
We would like to do this kind of query:
SELECT id_inscri FROM userdata_langue
WHERE id_langue='43'
UNION
SELECT id_inscri FROM userdata_sexe
WHERE sexe='2'
UNION
SELECT id_inscri FROM userdata_nb_jour
WHERE nb_jour>='31'
UNION
SELECT id_inscri FROM userdata_last
WHERE last<='2013-04-07'
AND last>='2013-04-03' ;
How do I optimize that?
—– More details
Explain output of the query:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY userdata_langue ref id_langue id_langue 1 const 398846 Using index
2 UNION userdata_sexe ref sexe sexe 1 const 1667137 Using index
3 UNION userdata_nb_jour range nb_jour nb_jour 2 NULL 5830 Using where; Using index
4 UNION userdata_last range last last 3 NULL 371614 Using where; Using index
NULL UNION RESULT <union1,2,3,4> ALL NULL NULL NULL NULL NULL
SHOW CREATE TABLE
Table Create Table
userdata_langue CREATE TABLE `userdata_langue` (
`id_inscri` bigint(20) NOT NULL,
`id_langue` tinyint(3) unsigned NOT NULL,
PRIMARY KEY (`id_inscri`),
KEY `id_langue` (`id_langue`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Answer :
From the EXPLAIN output, it looks like first table userdata_langue
would benefit from an index that contains both id_langue
and id_inscri
. If there is already an index on that combination, try forcing it.
The second table userdata_sexe
could be partitioned on sexe
(I assume there are only two options here?). This would speed up the scan required to fetch all those.
I am not sure if mySQL would benefit from moving the order of the UNION DISTINCT components around. It might be worth trying to move component 3 and 4 up to the top of the query.
Finally, consider if you can afford to increase the sort buffer size in InnoDb (see: http://www.mysqlperformanceblog.com/2010/10/25/impact-of-the-sort-buffer-size-in-mysql/) to make sure you can hold everything in memory while mySQL calculates the UNION DISTINCT.
Just have a try with below query
SELECT distinct id_inscri
FROM userdata_langue, userdata_sexe, userdata_nb_jour, userdata_last
WHERE (userdata_langue.id_inscri = userdata_sexe.id_inscri = userdata_nb_jour.id_inscri = userdata_last.id_inscri)
and ( (id_langue='43') or (sexe='2') or
(nb_jour>='31') or (last<='2013-04-07' AND last>='2013-04-03'))
Consider using UNION ALL
. It should be significantly faster than UNION
, but it will return duplicate rows for players that answers to multiple criteria.
If you can deal with duplicates in your application code, it can be a good way to optimize your query.