Why does it use temporary? (MySQL)

Posted on

Question :

MySQL 5.

I have the following table (filled with many rows):

CREATE TABLE `SectorGraphs2` (
  `Kind` tinyint(3) UNSIGNED NOT NULL COMMENT '1 - продюсер, 2 - жанр, 3 - регион',
  `Criterion` tinyint(3) UNSIGNED NOT NULL,
  `Period` tinyint(3) UNSIGNED NOT NULL,
  `PeriodStart` date NOT NULL,
  `SectorID` int(10) UNSIGNED NOT NULL,
  `Value` float NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

ALTER TABLE `SectorGraphs2`
  ADD UNIQUE KEY `Producer2` (`Kind`,`Criterion`,`Period`,`PeriodStart`,`SectorID`) USING BTREE,
  ADD KEY `SectorID` (`SectorID`);

then I run:

EXPLAIN 
    SELECT SectorID, SUM(Value)
    FROM SectorGraphs2
    WHERE Kind = 1 AND Criterion = 7
      AND Period = 1
      AND PeriodStart >= ? AND PeriodStart < ? + INTERVAL 1 WEEK
    GROUP BY SectorID

and it produces:

+---+---+---+---+---+---+---+---+---+---+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+---+---+---+---+---+---+---+---+---+---+
| 1 | SIMPLE | SectorGraphs2 | range | Producer2,SectorID | Producer2 | 6 | NULL | 1 | Using index condition; Using temporary; Using filesort |
+---+---+---+---+---+---+---+---+---+---+

I wonder why it uses a temporary while all keys Kind, Criterion, Period and PeriodStart are in the key Producer2 and it can be calculate by just enumerating the last field PeriodStart of the key. (Accordingly to what I learned, MySQL can enumerate the last field of a key without consulting data, if all previous fields have a fixed value.) Please help if possible to rewrite this without using a temporary.

Note that the above code is intended to store and calculate some statistics.

I need to ensure we do not run heavy queries which may overload our MySQL server. Please help to make proper use of indexes and avoid big temporary tables.

Answer :

Step 1: Filter (WHERE) to get just the desired rows. Producer2 is excellent for that.

Step 2: Now do the GROUP BY. Well, the rows from Step 1 are not ordered by SectorID (because of the “range” on PeriodStart left things shuffled. This could be done in one of two ways: Via some hash in memory, or by sorting (probably in RAM) and doing a simple pass through the data. It is hard to say which is faster.

You should move away from MyISAM. This particular query will probably run faster with InnoDB because of the way indexes are handled between the two engines.

Your query is an example of one thing I talk about in my index cookbook — “an index cannot handle both a range (PeriodStart...) and a GROUP BY.

I am suspicious of “Using index condition”. That is a synonym for “ICP” or “Index Condition Pushdown”, wherein the high level handler lets the Engine do some of the index work (more efficiently). However, I thought that MyISAM was not updated to do any ICP work. If I am correct, then this is another reason for moving to InnoDB; it does benefit from ICP.

If this is a big table, we can talk about another way to speed up the query (perhaps 10x) — Summary Tables .

Leave a Reply

Your email address will not be published. Required fields are marked *