Question :
For a few days now, i’m trying to optimize this query.
But still I haven’t found the right solution in how to fix this.
The contents table contains about 20 million records. As the connect table holds about 60 million.
The query runs in about 20 seconds:
SELECT *
FROM `contents` AS `n`
INNER JOIN `connect` AS `a` ON `a`.`mcded_primary_id_mcdno_id` = `n`.`mcdno_id`
INNER JOIN `connect` AS `b` ON `b`.`mcded_primary_id_mcdno_id` = `n`.`mcdno_id`
INNER JOIN `connect` AS `c` ON `c`.`mcded_primary_id_mcdno_id` = `n`.`mcdno_id`
WHERE `a`.`mcded_child_id_mcdno_id` = '1375'
AND `a`.`mcded_structure_mcdes_id` = '85'
AND `b`.`mcded_child_id_mcdno_id` = '13'
AND `b`.`mcded_structure_mcdes_id` = '187'
AND `c`.`mcded_child_id_mcdno_id` IN ('500065',
'500066',
'500067',
'500068')
AND `c`.`mcded_structure_mcdes_id` = '211'
AND `n`.`mcdno_structure_mcdns_id` IN ('1')
GROUP BY `n`.`mcdno_id`
ORDER BY `n`.`mcdno_id` DESC LIMIT 14
OFFSET 42
But it uses temp table every time.
This is the EXPLAIN:
+----+-------------+------+---------------+---------------------------------------------------------------------------------------------------+--------------------------------------------------+-----+-----------------------------------------+-------+-----------------------------------------------------------------------------------------------------------------+ | id | select_type | type | possible_keys | key | key_len | ref | rows | Extra | | +----+-------------+------+---------------+---------------------------------------------------------------------------------------------------+--------------------------------------------------+-----+-----------------------------------------+-------+-----------------------------------------------------------------------------------------------------------------+ | 1 | SIMPLE | b | index_merge | UN_set,FK_primary_id_id,FK_structure_mcdns_id_id,FK_child_id_mcdno_id_id,IND_primary_id_structure | FK_structure_mcdns_id_id,FK_child_id_mcdno_id_id | 4,8 | NULL | 37928 | Using intersect(FK_structure_mcdns_id_id,FK_child_id_mcdno_id_id); Using where; Using temporary; Using filesort | | 1 | SIMPLE | c | ref | UN_set,FK_primary_id_id,FK_structure_mcdns_id_id,FK_child_id_mcdno_id_id,IND_primary_id_structure | IND_primary_id_structure | 12 | b.mcded_primary_id_mcdno_id,const | 1 | Using where | | 1 | SIMPLE | n | eq_ref | PRIMARY,FK_mcdno_structure_mcdns_id | PRIMARY | 8 | c.mcded_primary_id_mcdno_id | 1 | Using where | | 1 | SIMPLE | a | ref | UN_set,FK_primary_id_id,FK_structure_mcdns_id_id,FK_child_id_mcdno_id_id,IND_primary_id_structure | UN_set | 20 | b.mcded_primary_id_mcdno_id,const,const | 1 | | +----+-------------+------+---------------+---------------------------------------------------------------------------------------------------+--------------------------------------------------+-----+-----------------------------------------+-------+-----------------------------------------------------------------------------------------------------------------+
I tried using connect (a) as primary table, but that is even worse.
Any suggestions?
—
CONTENTS
CREATE TABLE `contents` (
`mcdno_id` bigint(20) NOT NULL AUTO_INCREMENT,
`mcdno_created_on` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`mcdno_changed_on` timestamp NULL DEFAULT NULL,
`mcdno_structure_mcdns_id` int(11) NOT NULL,
PRIMARY KEY (`mcdno_id`),
KEY `FK_mcdno_structure_mcdns_id` (`mcdno_structure_mcdns_id`),
KEY `IND_created_on` (`mcdno_created_on`),
KEY `IND_changed_on` (`mcdno_changed_on`),
KEY `IND_published_on` (`mcdno_published_on`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
—
CONNECT
CREATE TABLE `connect` (
`mcded_id` bigint(11) NOT NULL AUTO_INCREMENT,
`mcded_primary_id_mcdno_id` bigint(20) NOT NULL,
`mcded_child_id_mcdno_id` bigint(20) NOT NULL,
`mcded_changed_on` datetime DEFAULT NULL,
`mcded_structure_mcdes_id` int(11) NOT NULL
PRIMARY KEY (`mcded_id`),
UNIQUE KEY `UN_set` (`mcded_primary_id_mcdno_id`,`mcded_child_id_mcdno_id`,`mcded_structure_mcdes_id`),
KEY `IND_created_on` (`mcded_created_on`),
KEY `FK_primary_id_id` (`mcded_primary_id_mcdno_id`),
KEY `FK_structure_mcdns_id_id` (`mcded_structure_mcdes_id`),
KEY `FK_child_id_mcdno_id_id` (`mcded_child_id_mcdno_id`),
KEY `IND_primary_id_structure` (`mcded_primary_id_mcdno_id`,`mcded_structure_mcdes_id`)
), ENGINE=InnoDB DEFAULT CHARSET=utf8
Answer :
“Using intersect” usually implies that a composite index would be better:
INDEX(mcded_child_id_mcdno_id, mcded_structure_mcdes_id) -- or the opposite order
Even better, rearrange UN_set
so it starts with these two.
Don’t use SELECT *
unless you really need all the columns from all the joins. It inhibits use of “covering” indexes.
(Don’t you find these long names hard to distinguish? And the similarity (mcded/mcdno/mcdes/mcdns; connect/content) makes it difficult for me to read. simply tossing “mcd” everywhere would help.)
SELECT * makes it difficult to guess what output you need, but here is a guess:
SELECT ...
FROM contents AS n
JOIN connect AS a
ON a.mcded_primary_id_mcdno_id = n.mcdno_id
AND n.mcdno_structure_mcdns_id = '1'
JOIN (
SELECT '1375' as x, '85' as y
UNION ALL
SELECT '13', '187'
UNION ALL
SELECT '500065', '211'
UNION ALL
SELECT '500066', '211'
UNION ALL
SELECT '500067', '211'
UNION ALL
SELECT '500068', '211'
) AS t
ON a.mcded_child_id_mcdno_id = t.x
AND a.mcded_structure_mcdes_id = t.y
ORDER BY n.mcdno_id DESC
LIMIT 14 OFFSET 42;
You may want to use a temporary table instead of the derived table I used. The result will be spread over several rows, so you will have to fix that in the application.
Curious to see if this speeds up the query; tried to utilize row goals by doing joins with a few where clauses first, then using top to count the rows for accurate estimates moving forward, then apply the rest of the where clauses. Also replaced the IN statement with a BETWEEN statement.
SELECT * FROM
( SELECT TOP (2000000000) FROM `contents` AS `n`
INNER JOIN `connect` AS `a` ON `a`.`mcded_primary_id_mcdno_id` =
`n`.`mcdno_id`
INNER JOIN `connect` AS `b` ON `b`.`mcded_primary_id_mcdno_id` =
`n`.`mcdno_id`
INNER JOIN `connect` AS `c` ON `c`.`mcded_primary_id_mcdno_id` =
`n`.`mcdno_id`
WHERE `a`.`mcded_child_id_mcdno_id` = '1375' AND
`a`.`mcded_structure_mcdes_id` = '85'
AND `c`.`mcded_child_id_mcdno_id` BETWEEN 500065 AND 500068)
WHERE `b`.`mcded_child_id_mcdno_id` = '13'
AND `b`.`mcded_structure_mcdes_id` = '187'
AND `c`.`mcded_structure_mcdes_id` = '211'
AND `n`.`mcdno_structure_mcdns_id` IN ('1')
GROUP BY `n`.`mcdno_id`
ORDER BY `n`.`mcdno_id` DESC
LIMIT 14
OFFSET 42
Have you already added an Clustered Index and a Non-clustered Index? Make sure also that your non-clustered index is not more than 5 because some of the non-clustered index that your query is not using might slower the query even more. Be specific with your query also, if possible don’t use the ‘*’. imagine, you have 4 tables joined together, some of those field are not usable anymore.