Complicated query running orders of magnitude faster than simple query

Posted on

Question :

Query A takes 3 seconds or more to produce 294 results:

SELECT user_id, id
FROM web_useradvisory
WHERE emailed IS NULL
ORDER BY user_id ASC, id ASC;

However, Query B takes about 10 milliseconds (!!) to produce the same 294 results:

SELECT user_id, id
FROM web_useradvisory
WHERE user_id IN (SELECT id FROM auth_user) AND emailed IS NULL
ORDER BY user_id ASC, id ASC;

EXPLAIN says both queries are using the same index, and web_useradvisory.user_id has a non-nullable foreign key index to auth_user.id. FYI there are only 5 distinct user IDs among the 294 records.

Query A:

mysql> EXPLAIN SELECT user_id, id FROM web_useradvisory WHERE emailed IS NULL ORDER BY user_id ASC, id ASC;
+----+-------------+------------------+------------+-------+---------------+------------------+---------+------+---------+----------+--------------------------+
| id | select_type | table            | partitions | type  | possible_keys | key              | key_len | ref  | rows    | filtered | Extra                    |
+----+-------------+------------------+------------+-------+---------------+------------------+---------+------+---------+----------+--------------------------+
|  1 | SIMPLE      | web_useradvisory | NULL       | index | NULL          | user_emailed_idx | 13      | NULL | 5075623 |    10.00 | Using where; Using index |
+----+-------------+------------------+------------+-------+---------------+------------------+---------+------+---------+----------+--------------------------+

Query B:

mysql> EXPLAIN SELECT user_id, id FROM web_useradvisory WHERE user_id IN (SELECT id FROM auth_user) AND emailed IS NULL ORDER BY user_id ASC, id ASC;
+----+-------------+------------------+------------+-------+--------------------------------------------------------------------------------------------------------+------------------+---------+------------------------------+------+----------+----------------------------------------------+
| id | select_type | table            | partitions | type  | possible_keys                                                                                          | key              | key_len | ref                          | rows | filtered | Extra                                        |
+----+-------------+------------------+------------+-------+--------------------------------------------------------------------------------------------------------+------------------+---------+------------------------------+------+----------+----------------------------------------------+
|  1 | SIMPLE      | auth_user        | NULL       | index | PRIMARY                                                                                                | username         | 152     | NULL                         |  170 |   100.00 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | web_useradvisory | NULL       | ref   | web_useradvisory_user_id_a9c47ffb_fk_auth_user_id,user_emailed_idx,user_localtime_idx,user_done_lt_idx | user_emailed_idx | 13      | TMVRemote.auth_user.id,const |    2 |   100.00 | Using where; Using index                     |
+----+-------------+------------------+------------+-------+--------------------------------------------------------------------------------------------------------+------------------+---------+------------------------------+------+----------+----------------------------------------------+

SHOW INDEXES shows the following:

SHOW INDEXES FROM web_useradvisory;
+------------------+------------+----------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table            | Non_unique | Key_name                                                 | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------------+------------+----------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| web_useradvisory |          0 | PRIMARY                                                  |            1 | id          | A         |     5001672 |     NULL | NULL   |      | BTREE      |         |               |
| web_useradvisory |          1 | web_useradvisory_advisory_id_a8bea1ea_fk_web_advisory_id |            1 | advisory_id | A         |      135733 |     NULL | NULL   |      | BTREE      |         |               |
| web_useradvisory |          1 | web_useradvisory_user_id_a9c47ffb_fk_auth_user_id        |            1 | user_id     | A         |          69 |     NULL | NULL   |      | BTREE      |         |               |
| web_useradvisory |          1 | user_emailed_idx                                         |            1 | user_id     | A         |          72 |     NULL | NULL   |      | BTREE      |         |               |
| web_useradvisory |          1 | user_emailed_idx                                         |            2 | emailed     | A         |     2361395 |     NULL | NULL   | YES  | BTREE      |         |               |
| web_useradvisory |          1 | user_localtime_idx                                       |            1 | user_id     | A         |          72 |     NULL | NULL   |      | BTREE      |         |               |
| web_useradvisory |          1 | user_localtime_idx                                       |            2 | localtime   | A         |     4674121 |     NULL | NULL   |      | BTREE      |         |               |
| web_useradvisory |          1 | user_done_lt_idx                                         |            1 | user_id     | A         |        7409 |     NULL | NULL   |      | BTREE      |         |               |
| web_useradvisory |          1 | user_done_lt_idx                                         |            2 | done        | A         |        9261 |     NULL | NULL   |      | BTREE      |         |               |
| web_useradvisory |          1 | user_done_lt_idx                                         |            3 | localtime   | A         |     4897349 |     NULL | NULL   |      | BTREE      |         |               |
+------------------+------------+----------------------------------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

Why doesn’t Query A run as quickly as Query B? Query B’s additional user_id WHERE condition doesn’t actually further constrain the results, since the foreign key constraint on web_useradvisory.user_id already requires it to be a value from the list auth_user.id. It seems like my subquery selecting all the auth_user.id values in Query B is giving mysql a hint that the database engine fails to infer from Query A.

Answer :

Try this order:

INDEX(emailed,       -- for the `WHERE`
      user_id, id)   -- for the `ORDER BY

Note: In this context, IS NULL optimizes similar to = constant. So, I put that column first, thereby handing all the WHERE. After that, all the ORDER BY columns.

Hence, it would be performed entirely in the Index’s BTree. (Cf “covering” or “Using index”).

Suppose I task you with finding all the people named Isaac in the Phonebook which is ordered by surname. How would you go about that?

For each surname, find all the Isaacs, or just run through all the pages in the phonebook scanning for Issacs? Well, if there are as many surnames as pages in the book, it’s the same, but if each surname spans tens or maybe hundreds of pages with only a few Isaacs per surname…

That’s what’s happening here. In query A, the database is tasked with finding all the emailed IS NULL rows, but the index it can use, is ordered by user_id first, so it has to scan the entire index (5M rows).

In query B, it finds 170 unique user_ids. Each user_id corresponds on average to 29,000 rows in web_useradvisory, but since only about 2 in each match emailed IS NULL, it’ll be done in 170 seeks and change. Ie. seek to the first user_id, then read the 2 rows where emailed IS NULL. In stead of reading the next 28.998 rows, seek in the index for the next user_id and repeat.

If you take Rick James’ advice and index by emailed in stead, the 294 rows will be at the very start of your index making the query lightening fast. On the other hand, looking up web_useradvisory for a specific user_id will require a full scan, so if the number of emailed IS NULL rows remain low I would just go with query B as it uses an existing index – and quite effectively so.

Leave a Reply

Your email address will not be published. Required fields are marked *