How is the number of records on a select determined as high or low by optimizer?

Posted on

Question :

What is the threshold between using the index and doing an direct access to the database?
I assume this decision is related to the number of index access (and subsequent access to the tables). If this number is really low the index should be used. But if the number is high I guess a direct access would be better in terms of performance. But how is high defined? Is there e.g. some variable that specifies this? E.g. in terms of percentage of the table size?

Answer :

MySQL 5.7 Reference manual contains the following:

The output from EXPLAIN shows ALL in the type column when MySQL uses a full table scan to resolve a query. This usually happens under the following conditions:

The table is so small that it is faster to perform a table scan than to bother with a key lookup. This is common for tables with fewer than 10 rows and a short row length.

There are no usable restrictions in the ON or WHERE clause for indexed columns.

You are comparing indexed columns with constant values and MySQL has calculated (based on the index tree) that the constants cover too large a part of the table and that a table scan would be faster. See Section 8.2.1.2, “How MySQL Optimizes WHERE Clauses”.

You are using a key with low cardinality (many rows match the key value) through another column. In this case, MySQL assumes that by using the key it probably will do many key lookups and that a table scan would be faster.

For small tables, a table scan often is appropriate and the performance impact is negligible. For large tables, try the following techniques to avoid having the optimizer incorrectly choose a table scan:

Use ANALYZE TABLE tbl_name to update the key distributions for the scanned table. See Section 13.7.2.1, “ANALYZE TABLE Syntax”.

Use FORCE INDEX for the scanned table to tell MySQL that table scans are very expensive compared to using the given index:

SELECT * 
FROM t1, t2 FORCE INDEX (index_for_column)
WHERE t1.col_name = t2.col_name;

Start mysqld with the --max-seeks-for-key=1000 option or use SET max_seeks_for_key=1000 to tell the optimizer to assume that no key scan causes more than 1,000 key seeks.

See the original page here: https://dev.mysql.com/doc/refman/5.7/en/how-to-avoid-table-scan.html

Leave a Reply

Your email address will not be published.