Question :
My machine disk is SSD and memory is 32GB.
My executed SQL:
SELECT `country_detail`.`country_name` AS `country_name`
FROM `statistic_detail`
INNER JOIN `country_detail` ON (`statistic_detail`.`country` = `country_detail`.`country_code`)
GROUP BY 1;
The SQL need cost 30 second. So slowly! My explain is:
I don’t understand why it need so much time?
I know how optimize it, I want to know why it is so slow.
My table information is:
CREATE TABLE `statistic_detail` (
`date` varchar(10) NOT NULL,
`os` varchar(10) NOT NULL,
`ver` varchar(16) NOT NULL,
`country` varchar(16) NOT NULL,
`utype` varchar(8) NOT NULL,
`stype` varchar(32) NOT NULL,
`language` varchar(16) NOT NULL,
`num0` bigint DEFAULT NULL,
`num1` double DEFAULT NULL,
`num2` double DEFAULT NULL,
`num3` double DEFAULT NULL,
`num4` double DEFAULT NULL,
`num5` double DEFAULT NULL,
`num6` double DEFAULT NULL,
`num7` double DEFAULT NULL,
`num8` double DEFAULT NULL,
`num9` double DEFAULT NULL,
`num10` double DEFAULT NULL,
`num11` double DEFAULT NULL,
`num12` double DEFAULT NULL,
`num13` double NOT NULL,
PRIMARY KEY (`country`, `ver`, `language`, `utype`, `stype`, `os`, `date`),
KEY `ver_idx` (`ver`),
KEY `language_idx` (`language`),
KEY `utype_idx` (`utype`),
KEY `stype_idx` (`stype`),
KEY `os_idx` (`os`),
KEY `date_idx` (`date`),
KEY `country_idx` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
13731253 rows.
CREATE TABLE `country_detail` (
`country_code` varchar(20) NOT NULL,
`country_name` varchar(16),
`region_code` varchar(16),
`region_name` varchar(16),
PRIMARY KEY (`country_code`),
KEY `region_code_idx` (`region_code`),
KEY `region_name_idx` (`region_name`),
KEY `country_name_idx` (`country_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
334 rows.
Answer :
To understand the reason behind slowness of your current query, you need to understand the Nested Loop Join algorithm used by MySQL. Basically, for every country_code
value in the country table, it is looking for all the matching rows in the statistic table. So if you look at your explain, it is roughly accessing 334*306 rows (a large number). Unfortunately, in this case, MySQL optimizer is not smart enough to stop searching for the row as soon as a match is found (because you are using Group By
, and no aggregation function is being used).
Now, JOIN
is unnecessary here, as it seems that you are just looking to check that the “country” has atleast one corresponding row in the statistic_detail
table. This query can be written in a more performant way using EXISTS()
on a Correlated Subquery. Exists()
stops as soon as a matching row is found in the statistic table.
Try the following query:
SELECT cd.country_name
FROM country_detail AS cd
WHERE EXISTS (SELECT 1 FROM statistic_detail AS sd
WHERE sd.country = cd.country_code)
If country_name
values are not UNIQUE in the country_detail
table, you can additionally utilize DISTINCT
clause to get rid of duplicates:
SELECT DISTINCT cd.country_name
FROM country_detail AS cd
WHERE EXISTS (SELECT 1 FROM statistic_detail AS sd
WHERE sd.country = cd.country_code)