MySQL Inner Join too slow

Posted on

Question :

My machine disk is SSD and memory is 32GB.
My executed SQL:

SELECT `country_detail`.`country_name` AS `country_name`
FROM `statistic_detail`
  INNER JOIN `country_detail` ON (`statistic_detail`.`country` = `country_detail`.`country_code`)
GROUP BY 1;

The SQL need cost 30 second. So slowly! My explain is:
enter image description here
I don’t understand why it need so much time?
I know how optimize it, I want to know why it is so slow.

My table information is:

CREATE TABLE `statistic_detail` (
  `date` varchar(10) NOT NULL,
  `os` varchar(10) NOT NULL,
  `ver` varchar(16) NOT NULL,
  `country` varchar(16) NOT NULL,
  `utype` varchar(8) NOT NULL,
  `stype` varchar(32) NOT NULL,
  `language` varchar(16) NOT NULL,
  `num0` bigint DEFAULT NULL,
  `num1` double DEFAULT NULL,
  `num2` double DEFAULT NULL,
  `num3` double DEFAULT NULL,
  `num4` double DEFAULT NULL,
  `num5` double DEFAULT NULL,
  `num6` double DEFAULT NULL,
  `num7` double DEFAULT NULL,
  `num8` double DEFAULT NULL,
  `num9` double DEFAULT NULL,
  `num10` double DEFAULT NULL,
  `num11` double DEFAULT NULL,
  `num12` double DEFAULT NULL,
  `num13` double NOT NULL,
  PRIMARY KEY (`country`, `ver`, `language`, `utype`, `stype`, `os`, `date`),  
  KEY `ver_idx` (`ver`),
  KEY `language_idx` (`language`),
  KEY `utype_idx` (`utype`),
  KEY `stype_idx` (`stype`),
  KEY `os_idx` (`os`),
  KEY `date_idx` (`date`),
  KEY `country_idx` (`country`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

13731253 rows.

CREATE TABLE `country_detail` (
  `country_code` varchar(20) NOT NULL,
  `country_name` varchar(16),
  `region_code` varchar(16),
  `region_name` varchar(16),
  PRIMARY KEY (`country_code`),
  KEY `region_code_idx` (`region_code`),
  KEY `region_name_idx` (`region_name`),
  KEY `country_name_idx` (`country_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

334 rows.

Answer :

To understand the reason behind slowness of your current query, you need to understand the Nested Loop Join algorithm used by MySQL. Basically, for every country_code value in the country table, it is looking for all the matching rows in the statistic table. So if you look at your explain, it is roughly accessing 334*306 rows (a large number). Unfortunately, in this case, MySQL optimizer is not smart enough to stop searching for the row as soon as a match is found (because you are using Group By, and no aggregation function is being used).

Now, JOIN is unnecessary here, as it seems that you are just looking to check that the “country” has atleast one corresponding row in the statistic_detail table. This query can be written in a more performant way using EXISTS() on a Correlated Subquery. Exists() stops as soon as a matching row is found in the statistic table.

Try the following query:

SELECT cd.country_name
FROM country_detail AS cd 
WHERE EXISTS (SELECT 1 FROM statistic_detail AS sd 
              WHERE sd.country = cd.country_code)

If country_name values are not UNIQUE in the country_detail table, you can additionally utilize DISTINCT clause to get rid of duplicates:

SELECT DISTINCT cd.country_name
FROM country_detail AS cd 
WHERE EXISTS (SELECT 1 FROM statistic_detail AS sd 
              WHERE sd.country = cd.country_code)

Leave a Reply

Your email address will not be published. Required fields are marked *