Query ‘optimizing’ state is taking too long

Posted on

Question :

I have a MySQL 5.6 on Amazon RDS that I’m using for testing some data archiving scripts. I’m removing oldest data based on a “updated_date” column and index. Curiously, after removing a few million rows, my script gets stuck on the initial query it does for determining data bounds.

I run a query like this:

SELECT min(updated_date) as oldest, max(updated_date) AS newest FROM `order`;

The explain command on this query shows:

'1', 'SIMPLE', NULL, NULL, NULL, NULL, NULL, NULL, NULL, 'Select tables optimized away'

So, it’s supposed to hit the index and run almost instantly, and it did when the testing started, but now, after millions of rows were removed, it gets stuck in the ‘optimizing’ state for several minutes.

The script is the only thing running on the database.

Any ideas on what’s wrong with it? Am I supposed to do something when removing lots of rows like that? Do I have to run optimize table, even though I’m not using delete quick?

Update #1

The result from show table status like 'order':

Name,Engine,Version,Row_format,Rows,Avg_row_length,Data_length,Max_data_length,Index_length,Data_free,Auto_increment,Create_time,Update_time,Check_time,Collation,Checksum,Create_options,Comment
order,InnoDB,10,Compact,568037197,280,159252496384,0,180806041600,37692112896,4052226884,"2015-01-26 17:27:20",NULL,NULL,utf8_general_ci,NULL,,

The result from select count(*) from order is 618376777 rows.

Unfortunately, I can’t post the whole schema here, but where it bears on the issue, the result from show create table order is:

CREATE TABLE `order` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  // 31 data columns here
  `updated_date` timestamp NULL DEFAULT NULL,
  PRIMARY KEY (`id`),       
  KEY `ix_order_updated_date` (`updated_date`),
  // 9 indexes here
) ENGINE=InnoDB AUTO_INCREMENT=4052226884 DEFAULT CHARSET=utf8;

Update #2

By separating the min() and max() calls in two queries, I noticed only the min() query is affected. The max() returns almost immediately, so it looks like the min() is traversing the index for all index entries that existed but are now empty. Is there any way to prevent that from happening other than rebuilding the index?

Update #3

RickJames nailed the problem with the hint about change buffering, but disabling it entirely compromises performance for all inserts, deletes and updates. Eventually, I figured out the time it took to flush the changing buffer was reasonable with the production server, so problem solved for me, but if you run into the same issue with a low-end server with magnetic storage, good luck with it.

Answer :

SELECT min(updated_date) as oldest, max(updated_date) AS newest FROM `order`;

is well optimized if you have INDEX(updated_date) (which you do). It will do two probes. No need for two queries, etc.

The EXPLAIN said ‘optimized away’ because it could see that it did not need to do any work.

I checked with a similar table, then checked the Handler% STATUS values — Handler_read_first and Handler_read_last incremented by 1 each. This is indicative of optimizing MIN and MAX.

Do not use OPTIMIZE TABLE it is (usually) a long wast of time.

It is a huge table.

DELETEing a million rows will queue up a lot of stuff, especially because of 10 secondary indexes. I’ll bet that when you asked for MAX and MIN, it had to finish up all the pending index updates.

If that is the case, none of the suggestions will really solve the problem. Chunking the deletes (which is a good idea anyway) might have “hidden” the problem by slowing down the delete task. As @eroomydna says, the undo logs must have been huge.

On what basis are you DELETEing? If it is “purge all records older than X”, then this is a cure: PARTITION BY RANGE(...) on the date (to use for purging) and DROP PARTITION to jetison old rows. This is also much cleaner (in the sense of OPTIMIZEing to defragment) than DELETEing.

Rolando, ANALYZE TABLE will recalc the stats “instantly”. OPTIMIZE TABLE rebuilds (and ANALYZEs) the table but takes forever on this sized table.

Potentially valid is to copy over the useful rows, a la Rolando’s #4. But only if you are jettisoning “most” of the table. I suggest that is not the case here — “deleting a few million rows” out of 618M.

10 secondary keys is a lot. Would you care to show them; we may have suggestions on pruning the list. For this sized table, it is costly to maintain that many.

Edit — “Change buffering”

I believe what I have described is called change buffering for DELETEs. More discussion:
http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-performance-change_buffering.html
http://dev.mysql.com/doc/refman/5.6/en/innodb-performance-change_buffering.html

You could try innodb_change_buffering = none.

The best performing way to delete a lot of data is to chunk your activity. There are tools to complete this such as pt-archive and oak-chunk-update. This avoids the build up of undo and performance issues.

SUGGESTION #1

Perhaps you can retrieve the min and max date as separate queries

SELECT updated_date INTO @oldest_updated_date
FROM `order` ORDER BY updated_date LIMIT 1;
SELECT updated_date INTO @newest_updated_date
FROM `order` ORDER BY updated_date DESC LIMIT 1;

This will traverse the index one one key per query

SUGGESTION #2

You should shrink the table after mass deletes.

OPTIMIZE TABLE `order`;

SUGGESTION #3

After performing the mass delete, the index statistics need to be recalculated. OPTIMIZE TABLE does it for youm but you can do a as a separate step, like this:

ANALYZE TABLE `order`;

SUGGESTION #4

Perhaps recreate the table from a start date. For example, to keep the last 30 days, do this:

SET @TimeGapToKeep = NOW() - INTERVAL 30 DAY;
ALTER TABLE `order` RENAME `old_order`;
CREATE TABLE `new_order` LIKE `old_order`;
INSERT INTO `new_order` SELECT * FROM `old_order`
WHERE updated_date >= @TimeGapToKeep;
ALTER TABLE `new_order` RENAME `order`;
ANALYZE TABLE `order`;

Leave a Reply

Your email address will not be published. Required fields are marked *