MyISAM around 5x slower than InnoDB

Posted on

Question :

I thought, MyISAM is for simple tables and selects due to its architecture faster than InnoDB. Therefore I changed the engine for this table from InnoDB to MyISAM:

CREATE TABLE `table1` (
  `DateTime` datetime NOT NULL,
  `BidHigh` decimal(11,5) NOT NULL,
  `BidLow` decimal(11,5) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin COMMENT='1 minute rates';

ALTER TABLE `table1` ADD PRIMARY KEY (`DateTime`);

These conditions apply:

  1. I am testing it on a SLES 15.1 VM with 5 GB RAM and 8 CPU cores on a host which is mine and it doesn’t have to serve other VM [just an information that there is no other VM which may influence the result].
  2. A PHP script is running thousands of the below SELECT statements.
  3. There are 24 of the above tables in the database which are accessed in the PHP script.
  4. Each table has ~800k rows.
  5. Between each test I restart the server to assure there are always the same conditions. Due to the fact the difference is so huge I am not running multiple test to get an average…

The execution time:

  • With InnoDB engine the script runs 199 seconds.
  • With MyISAM engine the script runs 1’026 seconds. More than 5 times longer.

I am running these SELECT statements:

SELECT `DateTime` FROM table1
     WHERE `DateTime` BETWEEN '2018-12-27 07:50:00' AND '2199-12-31 23:59:00'
        AND BidHigh > 0.96604
     ORDER BY `DateTime` LIMIT 1;

— and —

SELECT MIN( BidLow ) FROM table1
    WHERE `DateTime` BETWEEN '2018-12-27 07:45:00' AND '2199-12-31 23:59:00';

OK, I figured out it is an index problem. Adding these two indexes

ALTER TABLE `table1` ADD UNIQUE `BidHigh` (`DateTime`, `BidHigh`);

— and —

ALTER TABLE `table1` ADD UNIQUE `BidLow` (`DateTime`, `BidLow`);

fixes the performance issue and now the script needs 245 seconds, still slower than InnoDB – this is not really what I expected…

Adding these indexes to the InnoDB version doesn’t improve the performance.


My questions:

  1. Why does InnoDB not need these indexes and is still faster?
  2. Is there a better solution?
  3. And, what did I apparently totally misunderstood as the change to MyISAM caused such a horrible performance.

Answer :

1) InnoDB will go faster with indexes, too.

2) InnoDB with appropriate indexing is the best solution.

3) MyISAM has been slower than InnoDB for most workloads for well over a decade. There are fundamental difference in how memory and caching work between the two.

In this case InnoDB was picking the first row by primary key. Since InnoDB tables are clustered by primary key, this was very fast, and PK was likely already in memory from when you created the table.

Create an index in BidHigh and it’ll go even faster.

Unless you have an overwhelmingly good reason to use MyISAM – you shouldn’t. And if you think you have an overwhelmingly good reason, you should re-scrutinize it because they are vanishingly few in 2020.

“MyISAM is better …” is an old “wive’s tale” that is very much out of date. Use InnoDB.

The two Engines use indexes quite differently.

PRIMARY KEY (DateTime) — I hope you don’t try to store two records with the same second. PKs are unique.

Query 1

SELECT `DateTime` FROM table1
     WHERE `DateTime` BETWEEN '2018-12-27 07:50:00' AND '2199-12-31 23:59:00'
        AND BidHigh > 0.96604
     ORDER BY `DateTime` LIMIT 1;

Since this involves two ranges, it is essentially impossible to build an good index for either MyISAM or InnoDB. The Optimizer will use an index starting with DateTime, then test all the rows for the other column. Let’s study the possible indexes:

PRIMARY KEY(DateTime)

For MyISAM, there is a BTree based on DateTime, plus a pointer to the data row. It will look at the data row to get the BidHigh to check its value.

For InnoDB, the data is ordered by Date time. Thus there is not the extra to get the BidHigh. Winner: InnoDB.

In either engine, the Optimizer might be smart enough to avoid the sort and get to the LIMIT. But that is risky because it depends on how many rows need to be tested. Due to this variation in the data, you could easily see 5x (or even 500x) slowdown due to the query plan chosen.
INDEX(DateTime, BidHigh)

This solves the inefficiency of MyISAM by making it a “covering” index. For InnoDB, it is mostly a waste; the PK is essentially an INDEX(DateTime, BidHigh, BidLow), which is only slightly worse than that 2-column index.

INDEX(BidHigh, DateTime)

This is probably faster if there are a lot fewer rows matching the range test on BidHign than the range test on DateTime. But there would be a sort before getting to the LIMIT.

Use EXPLAIN SELECT ... to see what it did.

Maybe SPATIAL

The first query needs a 2D index, which is not what INDEX gives you. I discuss 5 options for such, couched in “latitude/longitude” terms: http://mysql.rjweb.org/doc.php/find_nearest_in_mysql

Using SPATIAL might be viable for the first query, but probably not for the second.

Query 2

SELECT MIN( BidLow ) FROM table1
    WHERE `DateTime` BETWEEN '2018-12-27 07:45:00' AND '2199-12-31 23:59:00';

For InnoDB: PRIMARY KEY(DateTime) leads to a scan of about a year’s worth of data.

For MyISAM, I suspect it will punt on using an index unless it is exactly INDEX(DateTime, BidLow), which is “covering”.

23:59:00

You are assuming no Bids in the last minute of the day?

Consider using

WHERE DateTime >= '...
  AND DateTime  < '...-01-01'

I am adding my comments as an answer because the length of the comment fields isn’t long enough.

I accepted Gordan Bobic answer as he is a new member and I think the reputation he earns for accepting his answer is pushing him.

As some of his statements were confirmed in the answer of Rick James it shows me Gordan Bobic understands what is going on.

Gordan’s statement:

Unless you have an overwhelmingly good reason to use MyISAM – you shouldn’t. And if you think you have an overwhelmingly good reason,
you should re-scrutinize it because they are vanishingly few in 2020.

was a very important trigger to me because I realized my understanding was wrong and I had to reset myself regarding the used engines.

I didn’t mention the whole issue as it wouldn’t fit into one question. After reading both answers I realized I have to rebuild the databases. So, for example, I had a JSON field [maximum length of ~6k bytes and an average length of ~2k bytes] in some InnoDB tables .

Due to the high row count the size of these InnoDB tables was also high and grew daily. This was the reason why I started to convert these InnoDB tables into MyISAM. But, as mentioned, this decreased the performance considerably, and, therefore I asked this question.

I extracted these JSONfields into a very simple MyISAM table which has only two fields [the primary key and the JSON field]. This reduced the size [~30%] and doesn’t impact the performance.

Maybe I sound now a bit confusing but in the whole context it helped me a lot and solved the problem!

Leave a Reply

Your email address will not be published. Required fields are marked *