Range Extraction Algorithm for Muti-Part aka Composite Indexes in MySQL

Posted on

Question :

I went through Range Extraction Documentation for MySQL Indexes. They have explained for single column indices. What is the procedure to reduce complex multi column (at least two) criteria?

What I am essentially asking is, is there an algorithm or any literature(s) that explains how to simplify/reduce range intervals from complex predicates? Assuming B-TREE index or any index with sequential arrangement of key values.

For example:-

Single Column Examples

From

(duration > 5 and duration < 10) or (duration < 100)

To

NULL < duration < 100

From

(duration > 5 and duration < 10) or (duration < 100)

To

5 < duration < 10
100 < duration

Two Column Examples

From

(duration in (9,10) and service_id > 500)

To

9 <= duration <= 9 AND 500 < service_id
10 <= duration <= 10 AND 500 < service_id

From

(duration in (9,10) and service_id = 500) or (duration = 19 and service_id=570)

To

9 <= duration <= 9 AND 500 <= service_id <= 500
10 <= duration <= 10 AND 500 <= service_id <= 500
19 <= duration <= 19 AND 570 <= service_id <= 570

Answer :

Historically, MySQL has kept the relevant algorithms simple, even if less than optimal. In particular, OR has been poorly optimized. 5.7 and 8.0 have new code, but I don’t know if your examples have improved any.

EXPLAIN FORMAT=JSON sometimes gives clues. Also, see the “Optimizer Trace”.

If you have sample data, use the Handler counts to deduce whether they are well optimized: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#handler_counts.

But, be aware that MySQL will often do a table scan instead of using an obvious index. This is legitimate because of the cost of bouncing between the BTree for the secondary index and the BTree that contains the data. Typically, if more than 20% of the index needs to be touched, it will prefer to do a table scan. (The “20%” is no a hard-coded number, but is a gross simplification of the “cost-based” model that is used.)

Search for mysql cost based optimization. There have been a few articles written on the topic.

Leave a Reply

Your email address will not be published. Required fields are marked *