Question :

I have a problem with a UNION query in MySQL. We have 10 millions players on our website and we would like to select players with a multi-criterias system. For exemple, selecting US people, men, more than 35 years of age.

We are using “vertical partionning”: 1 table per criter. For example:

* user_country
- id_user
- id_country

We would like to do this kind of query:

SELECT id_inscri FROM userdata_langue
WHERE id_langue='43'
SELECT id_inscri FROM userdata_sexe
WHERE sexe='2'
SELECT id_inscri FROM userdata_nb_jour
WHERE nb_jour>='31'
SELECT id_inscri FROM userdata_last
WHERE last<='2013-04-07'
  AND last>='2013-04-03' ;

How do I optimize that?

—– More details

Explain output of the query:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   PRIMARY userdata_langue ref id_langue   id_langue   1   const       398846  Using index
2   UNION   userdata_sexe   ref sexe    sexe    1   const   1667137 Using index
3   UNION   userdata_nb_jour    range   nb_jour nb_jour 2   NULL    5830    Using where; Using index
4   UNION   userdata_last   range   last    last    3   NULL    371614  Using where; Using index
NULL    UNION RESULT    <union1,2,3,4>  ALL NULL    NULL    NULL    NULL    NULL


Table   Create Table
userdata_langue CREATE TABLE `userdata_langue` (
 `id_inscri` bigint(20) NOT NULL,
 `id_langue` tinyint(3) unsigned NOT NULL,
 PRIMARY KEY (`id_inscri`),
 KEY `id_langue` (`id_langue`)

Answer :

From the EXPLAIN output, it looks like first table userdata_langue would benefit from an index that contains both id_langue and id_inscri. If there is already an index on that combination, try forcing it.

The second table userdata_sexe could be partitioned on sexe (I assume there are only two options here?). This would speed up the scan required to fetch all those.

I am not sure if mySQL would benefit from moving the order of the UNION DISTINCT components around. It might be worth trying to move component 3 and 4 up to the top of the query.

Finally, consider if you can afford to increase the sort buffer size in InnoDb (see: http://www.mysqlperformanceblog.com/2010/10/25/impact-of-the-sort-buffer-size-in-mysql/) to make sure you can hold everything in memory while mySQL calculates the UNION DISTINCT.

Just have a try with below query

SELECT distinct id_inscri
FROM userdata_langue, userdata_sexe, userdata_nb_jour, userdata_last
WHERE (userdata_langue.id_inscri = userdata_sexe.id_inscri = userdata_nb_jour.id_inscri =  userdata_last.id_inscri) 
and ( (id_langue='43') or (sexe='2') or
(nb_jour>='31') or (last<='2013-04-07' AND last>='2013-04-03'))

Consider using UNION ALL. It should be significantly faster than UNION, but it will return duplicate rows for players that answers to multiple criteria.

If you can deal with duplicates in your application code, it can be a good way to optimize your query.

