Slow Query not using index in one of the tables

Posted on

Question :

My Rails table structure is like this: a User has_many Contacts, which have one Reservation each, which have many ReservationNotes each.

I’m trying to get all the ReservationNotes for the contacts belonging to a User (in the example, user.id is 1).

The query works, but is very slow and doesn’t use indexes (possible_keys is empty for the table reservations), but I don’t have enough experience with EXPLAIN and even tough I added all the possible indexes (indicated below) the query is still not fast enough:

=> EXPLAIN for: SELECT `reservation_notes`.* FROM `reservation_notes` INNER JOIN `reservations` ON `reservations`.`id_hash` = `reservation_notes`.`reservation_id_hash` INNER JOIN `contacts` ON `contacts`.`id_hash` = `reservations`.`contact_id_hash` WHERE `contacts`.`user_id` = 1 */
+----+-------------+-------------------+------------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+---------+------+------+----------+------------------------------------+
| id | select_type | table             | partitions | type | possible_keys                                                                                                                                                                               | key                                            | key_len | ref  | rows | filtered | Extra                              |
+----+-------------+-------------------+------------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+---------+------+------+----------+------------------------------------+
|  1 | SIMPLE      | reservations      | NULL       | ALL  | NULL                                                                                                                                                                                        | NULL                                           | NULL    | NULL | 5994 |    100.0 | NULL                               |
|  1 | SIMPLE      | contacts          | NULL       | ref  | index_contacts_on_id_hash,index_contacts_on_user_id,index_contacts_on_user_id_and_read_by_user,index_contacts_on_user_id_and_blocked,index_contacts_on_user_id_and_read_by_user_and_blocked | index_contacts_on_id_hash                      | 767     | func |    1 |      5.0 | Using index condition; Using where |
|  1 | SIMPLE      | reservation_notes | NULL       | ref  | index_reservation_notes_on_reservation_id_hash                                                                                                                                              | index_reservation_notes_on_reservation_id_hash | 576     | func |    1 |    100.0 | Using index condition              |
+----+-------------+-------------------+------------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------+---------+------+------+----------+------------------------------------+
3 rows in set (0.00 sec)

Here are my indexes:

On table contacts (4 million records):

  • index_contacts_on_user_id
  • index_contacts_on_id_hash UNIQUE

On table reservations (6000 records):

  • index_reservations_on_contact_id_hash
  • index_reservations_on_id_hash UNIQUE

On table reservation_notes (2000 records):

  • index_reservation_notes_on_id_hash UNIQUE
  • index_reservation_notes_on_reservation_id_hash

So have I reached the limit of performance on Mysql, not being able to return a user’s contacts reservation_notes any faster than 350ms?

Answer :

SELECT  n.*
    FROM  `reservation_notes` AS n
    INNER JOIN  `reservations` AS r  ON r.`id_hash` = n.`reservation_id_hash`
    INNER JOIN  `contacts` AS c  ON c.`id_hash` = r.`contact_id_hash`
    WHERE  c.`user_id` = 1 */ 

A guess: user_id is VARCHAR?? It should be INT (or some other numeric datatype). And c (Contacts) needs INDEX(user_id). (Please don’t use those awfully long names for indexes; it clutters things unnecessarily.)

That would let the Optimizer start with c and do the filtering (user_id=1) first.

Also, provide INDEX(user_id, id_hash), in this order. That will make the index “covering”. This means that the processing can be done entirely in the index’s BTree, without touching the Data BTree. This is an extra optimizaiton.

Then r (Reservations) needs INDEX(contact_id_hash, id_hash). The first column is for reaching into the table. The extra column is to make it “covering”.

Then n (Notes) needs INDEX(reservation_id_hash), which you apparently have.

With these changes, EXPLAIN will list the table in a more optimal order: c, r, n. (I prefer to write JOINs in the order that the Optimizer will perform the query. However, the Optimizer will rearrange the tables as it sees fit.)

When you get millions of rows, you may find that “hashes” slow things down. But that is another topic.

Leave a Reply

Your email address will not be published. Required fields are marked *