Question :
I have this very slow, simple query that joins a large table (~180M rows) with a smaller table (~60k rows) with a foreign key, filtering an indexed column on the smaller table, ordering by the primary key in the larger table, and then taking the 25 latest rows.
The EXPLAIN
shows Using index; Using temporary; Using filesort
on the smaller table. Why?
Engine: MySQL 5.7.
Query:
SELECT
order.id,
order.company_id,
order.total
FROM
order
INNER JOIN
company ON company.id = order.company_id
WHERE
company.company_headquarter_id = 23133
ORDER BY order.id DESC
LIMIT 25;
+----+-------------+------------+------------+------+---------------------------------------+----------------------------+---------+-----------------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------------------------------+----------------------------+---------+-----------------------+------+----------+----------------------------------------------+
| 1 | SIMPLE | company | NULL | ref | PRIMARY,company_headquarter_id_idx | company_headquarter_id_idx | 8 | const | 6 | 100.00 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | order | NULL | ref | company_id_idx | company_id_idx | 8 | company.id | 381 | 100.00 | NULL |
+----+-------------+------------+------------+------+---------------------------------------+----------------------------+---------+-----------------------+------+----------+----------------------------------------------+
CREATE TABLE `order` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`company_id` bigint(20) NOT NULL,
`total` double(18,2) NOT NULL,
PRIMARY KEY (`id`),
KEY `company_id_idx` (`company_id`),
CONSTRAINT `company_id_fk` FOREIGN KEY (`company_id`) REFERENCES `company` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=186518644 DEFAULT CHARSET=latin1
CREATE TABLE `company` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`company_headquarter_id` bigint(20) NOT NULL,
`name` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
KEY `company_headquarter_id_idx` (`company_headquarter_id`),
CONSTRAINT `company_headquarter_id_fk` FOREIGN KEY (`company_headquarter_id`) REFERENCES `company_headquarter` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=60825 DEFAULT CHARSET=latin1
CREATE TABLE `company_headquarter` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`name` varchar(100) NOT NULL,
`phone` varchar(10) DEFAULT NULL,
`address_id` bigint(20) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name` (`name`),
KEY `address_id_idx` (`address_id`),
CONSTRAINT `address_id_fk` FOREIGN KEY (`address_id`) REFERENCES `address` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=43862 DEFAULT CHARSET=latin1
CREATE TABLE `address` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`street_address` varchar(100) DEFAULT NULL,
`zip` varchar(7) DEFAULT NULL,
`state` varchar(2) DEFAULT NULL,
`city` varchar(50) DEFAULT NULL,
`country` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=147360955 DEFAULT CHARSET=latin1
The query becomes faster when I:
- Remove the
ORDER BY
clause. - Filter
company.company_headquarter_id
with acompany_headquarter_id
that has a smaller number of orders. (company_headquarter_id = 23133
has ~3M rows in theorder
table) - Split it into two separate queries:
First:
SELECT
company.id
FROM
company
WHERE
company.company_headquarter_id = 23133;
Second:
SELECT
order.id,
order.company_id,
order.total
FROM
order
WHERE
order.company_id IN (20122, 50729, 50730, 50731, 50732, 50733) /* From first query */
ORDER BY order.id DESC
LIMIT 25;
Any ideas?
Thank you.
EDIT:
When I do:
SELECT STRAIGHT_JOIN
order.id,
order.company_id,
order.total
FROM
order
INNER JOIN
company ON company.id = order.company_id
WHERE
company.company_headquarter_id = 23133
ORDER BY order.id DESC
LIMIT 25;
The query is much faster and EXPLAIN shows a temporary table is not created.
Answer :
If you were to upgrade to at least MySQL 8.0, you’d be able to take advantage of lateral joins:
create index order_ordered on order (company_id, id desc);
SELECT
lateral_o.id,
lateral_o.company_id,
lateral_o.total
FROM
company c
LATERAL (select o.id
,o.company_id
,o.total
from order o
where c.id = o.company_id
order by o.company_id, o.id desc
limit 25
) lateral_o
WHERE
c.company_headquarter_id = 23133
ORDER BY lateral_o.id DESC
LIMIT 25;
This will do what it looks like, get the top 25 rows for each company_id
value you find using an index, then do a final sort at the end.
The current plans you have will either be: going through the order
table in reverse id
order and checking if the company
belongs to your filter until it succeeds 25 times. OR, it will be going to every single matching order
row, sorting and returning the 25 desired. The first choice is fine if the chance of an order being one you care about is high, but awful if it’s not. The second choice is great if there’s only a few matching order
rows, but will get worse when there is lots. My suggestion’s performance will be related to how many company
rows you identify, and it sounds like there won’t often be many.