I have a problem with a UNION query in MySQL. We have 10 millions players on our website and we would like to select players with a multi-criterias system. For exemple, selecting US people, men, more than 35 years of age.
We are using “vertical partionning”: 1 table per criter. For example:
* user_country - id_user - id_country
We would like to do this kind of query:
SELECT id_inscri FROM userdata_langue WHERE id_langue='43' UNION SELECT id_inscri FROM userdata_sexe WHERE sexe='2' UNION SELECT id_inscri FROM userdata_nb_jour WHERE nb_jour>='31' UNION SELECT id_inscri FROM userdata_last WHERE last<='2013-04-07' AND last>='2013-04-03' ;
How do I optimize that?
—– More details
Explain output of the query:
id select_type table type possible_keys key key_len ref rows Extra 1 PRIMARY userdata_langue ref id_langue id_langue 1 const 398846 Using index 2 UNION userdata_sexe ref sexe sexe 1 const 1667137 Using index 3 UNION userdata_nb_jour range nb_jour nb_jour 2 NULL 5830 Using where; Using index 4 UNION userdata_last range last last 3 NULL 371614 Using where; Using index NULL UNION RESULT <union1,2,3,4> ALL NULL NULL NULL NULL NULL
SHOW CREATE TABLE
Table Create Table userdata_langue CREATE TABLE `userdata_langue` ( `id_inscri` bigint(20) NOT NULL, `id_langue` tinyint(3) unsigned NOT NULL, PRIMARY KEY (`id_inscri`), KEY `id_langue` (`id_langue`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
From the EXPLAIN output, it looks like first table
userdata_langue would benefit from an index that contains both
id_inscri. If there is already an index on that combination, try forcing it.
The second table
userdata_sexe could be partitioned on
sexe (I assume there are only two options here?). This would speed up the scan required to fetch all those.
I am not sure if mySQL would benefit from moving the order of the UNION DISTINCT components around. It might be worth trying to move component 3 and 4 up to the top of the query.
Finally, consider if you can afford to increase the sort buffer size in InnoDb (see: http://www.mysqlperformanceblog.com/2010/10/25/impact-of-the-sort-buffer-size-in-mysql/) to make sure you can hold everything in memory while mySQL calculates the UNION DISTINCT.
Just have a try with below query
SELECT distinct id_inscri FROM userdata_langue, userdata_sexe, userdata_nb_jour, userdata_last WHERE (userdata_langue.id_inscri = userdata_sexe.id_inscri = userdata_nb_jour.id_inscri = userdata_last.id_inscri) and ( (id_langue='43') or (sexe='2') or (nb_jour>='31') or (last<='2013-04-07' AND last>='2013-04-03'))
UNION ALL. It should be significantly faster than
UNION, but it will return duplicate rows for players that answers to multiple criteria.
If you can deal with duplicates in your application code, it can be a good way to optimize your query.