Slow queries on indexed columns (large datasets)

Posted on

Question :

I’ve got some large MySQL (5.5.60) tables where queries take 60 seconds+ to complete.

product_base (1.5GB, 26 million rows)

CREATE TABLE `product_base` (
  `EAN13` varchar(13) NOT NULL DEFAULT '',
  `company_tk` int(11) DEFAULT NULL,
  `category` int(11) DEFAULT NULL,
  `modified` varchar(30) DEFAULT NULL,
  KEY `EAN13` (`EAN13`),
  KEY `company_tk` (`company_tk`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

attribute (11.8GB, 126 million rows)

CREATE TABLE `attribute` (
  `EAN13` varchar(13) DEFAULT NULL,
  `attribute_type` int(11) DEFAULT NULL,
  `val_t` varchar(1500) DEFAULT NULL,
  `val_n` decimal(15,4) DEFAULT NULL,
  `val_d` datetime DEFAULT NULL,
  `modified` datetime DEFAULT NULL,
  KEY `EAN13` (`EAN13`),
  KEY `attribute_type` (`attribute_type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

attribute_type (16KB, 59 rows)

CREATE TABLE `attribute_type` (
  `attribute_type` int(11) DEFAULT NULL,
  `title` varchar(90) DEFAULT NULL,
  `field_name` varchar(90) DEFAULT NULL,
  `data_type` char(3) DEFAULT NULL,
  `seq` int(11) NOT NULL,
  `html` int(11) DEFAULT NULL,
  KEY `attribute_type` (`attribute_type`),
  KEY `seq` (`seq`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Example slow query:

Query_time: 61.629684  Lock_time: 0.000138 Rows_sent: 1  Rows_examined: 28141979
select p.EAN13, p.company_tk, a.attribute_type, a.val_t, t.title, t.field_name from product_base p
inner join attribute a on p.EAN13=a.EAN13 
inner join attribute_type t on a.attribute_type=t.attribute_type 
where p.EAN13=705632109762 and (a.attribute_type=1 or a.attribute_type=3)
order by t.seq;

Running explain on that query:

+----+-------------+-------+-------+----------------------+----------------+---------+--------------------------+---------+-----------------------------+
| id | select_type | table | type  | possible_keys        | key            | key_len | ref                      | rows    | Extra                       |
+----+-------------+-------+-------+----------------------+----------------+---------+--------------------------+---------+-----------------------------+
|  1 | SIMPLE      | t     | range | attribute_type       | attribute_type | 5       | NULL                     |       2 | Using where; Using filesort |
|  1 | SIMPLE      | a     | ref   | EAN13,attribute_type | attribute_type | 5       | eandata.t.attribute_type | 6299937 | Using where                 |
|  1 | SIMPLE      | p     | ref   | EAN13                | EAN13          | 54      | eandata.a.EAN13          |       1 |                             |
+----+-------------+-------+-------+----------------------+----------------+---------+--------------------------+---------+-----------------------------+

Server/OS:
Debian 8 64bit | 24GB RAM (MySQL is using 70% of this) | 60GB SSD | 2xIntel Xeon CPU E5506 @ 2.13GHz | MySQL 5.5.60

Innodb settings here

MySQL my.cnf

# * Fine Tuning
key_buffer      = 16M
max_allowed_packet  = 16M
thread_stack        = 192K
thread_cache_size       = 8
# This replaces the startup script and checks MyISAM tables if needed
# the first time they are touched
myisam-recover-options  = BACKUP
max_connections        = 20
table_cache            = 9552
thread_cache = 10
sort_buffer_size = 512K
net_buffer_length = 8K
read_buffer_size = 256K
read_rnd_buffer_size = 512K
myisam_sort_buffer_size = 16M
query_cache_size = 512M
join_buffer_size = 2G
innodb_buffer_pool_size = 15G
innodb_io_capacity = 2000
innodb_read_io_threads = 64
innodb_thread_concurrency = 0
innodb_write_io_threads = 64
#innodb_log_file_size = 512M
innodb_flush_log_at_trx_commit = 1
innodb_flush_method = O_DIRECT
innodb_buffer_pool_instances = 8

max_heap_table_size = 2G
tmp_table_size = 2G
wait_timeout = 120
interactive_timeout = 120

Any suggestions for the slow queries?

Answer :

I’m answering my own question here. The other answers have some great optimisation tips, but the key to getting my queries down from 60s to 100ms was to treat the EAN13 field as a string in the where statement.

As an example:
where p.EAN13=705632109762 became where p.EAN13='705632109762'

I would try using UNION except for OR condition.
You could try:

( select p.EAN13, p.company_tk, a.attribute_type, a.val_t, t.title, t.field_name, t.seq from product_base p
inner join attribute a on p.EAN13=a.EAN13 
inner join attribute_type t on a.attribute_type=t.attribute_type 
where p.EAN13=705632109762 and a.attribute_type=1 )
union all
( select p.EAN13, p.company_tk, a.attribute_type, a.val_t, t.title, t.field_name, t.seq from product_base p
inner join attribute a on p.EAN13=a.EAN13 
inner join attribute_type t on a.attribute_type=t.attribute_type 
where p.EAN13=705632109762 and a.attribute_type=3 )
order by 7;

Maybe MySQL optimizer will choose better execution plan.

  • You do InnoDB a disservice by not providing an explicit PRIMARY KEY. Look at the data to see if an existing column or combination of columns is unique.

  • Perhaps EAN13 could be the PK of product_base?

  • Perhaps (EAN13, attribute_type) could be the PK for attribute?

  • In one table modified is a string type, in another it is DATETIME. ?? Anyway, do you use the column? If not removing it would make the table smaller, thereby possibly leading to a faster query.

  • As already noted, the type inconsistency of EAN13 led to a serious inefficiency. This is because VARCHAR = int-constant does not optimize well. Either change the column to be BIGINT (if appropriate) or add quotes around the constant (to make it a string).

  • These 3 tables smell like an “Entity-Attribute-Value” schema, which can usually be implemented with fewer tables — hence more efficiently. Is it EAV?

  • You “can’t tune your way out of a performance problem”. But if you also provided SHOW GLOBAL STATUS; after being up at least a day, I will review the VARIABLES you already provided.

  • These are dangerously high; don’t make them more than 1% of RAM: max_heap_table_size = 2G and tmp_table_size = 2G.

To understand the ways to optimize let me to transform Your query to cartesian form (all joins are inner one, so it’s correct) by replacing JOINs with commas and moving ON conditions to WHERE section:

SELECT p.EAN13, p.company_tk, a.attribute_type, a.val_t, t.title, t.field_name
FROM product_base p, attribute a, attribute_type t
WHERE p.EAN13=705632109762 
  AND p.EAN13=a.EAN13 
  AND a.attribute_type IN (1,3)
  AND a.attribute_type=t.attribute_type 
ORDER BY t.seq

Do You see? No conditional join of product_base needed, the condition p.EAN13=a.EAN13 can be replaced with constant one a.EAN13=705632109762. And we obtain:

SELECT p.EAN13, p.company_tk, a.attribute_type, a.val_t, t.title, t.field_name
FROM product_base p, attribute a, attribute_type t
WHERE p.EAN13=705632109762 
  AND a.EAN13=705632109762
  AND a.attribute_type IN (1,3)
  AND a.attribute_type=t.attribute_type 
ORDER BY t.seq

There is the only condition on product_base table, and EAN13 is already indexed, so nothing on that table can be optimised.

There is 2 conditions on attribute table. To optimize, You need to create composite index

CREATE INDEX idx_EAN_at ON attribute (EAN13, attribute_type)

If nevertheless the attribute (attribute_type) index will be used, add IGNORE INDEX hint. Moreover, check if You really needs it, maybe it can be deleted without consequences. And creation of attribute (EAN13, attribute_type) index makes the attribute (EAN13) index unnecessary.

There is only one joining condition plus ordering on attribute_type table. But this table is small, so I think no optimization needed, existent indices are sufficient for optimization.

Suggestion for your my.cnf-ini [mysqld] section

innodb_log_buffer_size=8M

SHOULD NEVER BE larger than

innodb_log_file_size=5M

takes special handling to correct with 5.5 and after you get to 5.6 you may

innodb_lru_scan_depth=128  # from 1024 to conserve CPU seconds every SECOND.

because you have innodb_buffer_pool_instances at 8.

Good luck.

For additional assistance, check my profile, clk Network Profile for contact info.

Leave a Reply

Your email address will not be published. Required fields are marked *