600q/s write with a flush per commit is probably hitting the limit of your current spinning disks. Switching to SSDs would relieve the pressure.
The quick fix (before getting SSDs) is probably to change to this setting:
innodb_flush_log_at_trx_commit = 2
But read the caveats on making that change.
Having that setting and SSDs would let you grow further.
Another possible fix is to combine some writes into a single
COMMIT (where the logic is not violated).
Almost always, high CPU and/or I/O is due to poor indexes and/or poor formulation of queries. Turn on the slowlog with
long_query_time=1, wait a while, then see what turns up. With queries in hand, provide
EXPLAIN SELECT ..., and
SHOW CREATE TABLE. Ditto for the write queries. From those, we can probably tame the CPU and/or I/O. Even with your current setting of
pt-query-digest might find some interesting things.
Note that with 50 “running” threads, there is a lot of contention; this may be causing the switching, etc, that you noted. We need to get queries to finish faster. With 5.7, the system may keel over with 100 running threads. Increasing beyond about 64, the context switches, mutexes, locks, etc, conspire to slow down every thread, leading to no improvement in throughput while latency goes through the roof.
For a different approach to analyzing the problem, please provide
SHOW VARIABLES and
SHOW GLOBAL STATUS? More discussion here.
Analysis of VARIABLES & STATUS
(Sorry, nothing jumps out as addressing your Question.)
- Version: 5.7.20-log
- 700 GB of RAM
- Uptime = 36d 13:21:34
- You are not running on Windows.
- Running 64-bit version
- You appear to be running entirely (or mostly) InnoDB.
The More Important Issues:
A lot of temp tables, especially disk-based, are created for complex queries. Let’s hope that the slow log will identifiy some queries that can be improved (via indexing / reformulating / etc.)
Other indicators are joins without indexes and sort_merge_passes; however, neither of these is conclusive, we need to see the queries.
Max_used_connections = 701 is >=
Max_connections = 700, so there were probably some connections refused. Also, if that indicated more than, say, 64 threads running, then system performance probably sufferd at that time.
Consider throttling the number of connections by throttling the clients. Are you using Apache, Tomcat, or something else?
Threads_running indicates that, at the moment of doing this
SHOW, the system was in trouble.
Increasing the number of statements in each
COMMIT (when reasonable) may help performance some.
innodb_log_file_size, at 15GB, is bigger than necessary, but I see not need to change it.
Thousands of tables is usually not a good design.
eq_range_index_dive_limit = 200 concerns me, but I don’t know how to advise. Was it a deliberate choice?
Why so many CREATE+DROP PROCEDURE ?
Why so many SHOW commands?
Details and other observations:
( Innodb_buffer_pool_pages_flushed ) = 523,716,598 / 3158494 = 165 /sec — Writes (flushes)
— check innodb_buffer_pool_size
( table_open_cache ) = 10,000 — Number of table descriptors to cache
— Several hundred is usually good.
( (Innodb_buffer_pool_reads + Innodb_buffer_pool_pages_flushed) ) = ((61,040,718 + 523,716,598) ) / 3158494 = 185 /sec — InnoDB I/O
( Innodb_dblwr_pages_written/Innodb_pages_written ) = 459,782,684/523,717,889 = 87.8% — Seems like these values should be equal?
( Innodb_os_log_written ) = 1,071,443,919,360 / 3158494 = 339226 /sec — This is an indicator of how busy InnoDB is.
— Very busy InnoDB.
( Innodb_log_writes ) = 110,905,716 / 3158494 = 35 /sec
( Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group / innodb_log_file_size ) = 1,071,443,919,360 / (3158494 / 3600) / 2 / 15360M = 0.0379 — Ratio
— (see minutes)
( Uptime / 60 * innodb_log_file_size / Innodb_os_log_written ) = 3,158,494 / 60 * 15360M / 1071443919360 = 791 — Minutes between InnoDB log rotations Beginning with 5.6.8, this can be changed dynamically; be sure to also change my.cnf.
— (The recommendation of 60 minutes between rotations is somewhat arbitrary.) Adjust innodb_log_file_size. (Cannot change in AWS.)
( Com_rollback ) = 770,457 / 3158494 = 0.24 /sec — ROLLBACKs in InnoDB.
— An excessive frequency of rollbacks may indicate inefficient app logic.
( Innodb_row_lock_waits ) = 632,292 / 3158494 = 0.2 /sec — How often there is a delay in getting a row lock.
— May be caused by complex queries that could be optimized.
( Innodb_dblwr_writes ) = 97,725,876 / 3158494 = 31 /sec — “Doublewrite buffer” writes to disk. “Doublewrites” are a reliability feature. Some newer versions / configurations don’t need them.
— (Symptom of other issues)
( Innodb_row_lock_current_waits ) = 13 — The number of row locks currently being waited for by operations on InnoDB tables. Zero is pretty normal.
— Something big is going on?
( innodb_print_all_deadlocks ) = OFF — Whether to log all Deadlocks.
— If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.
( local_infile ) = ON
— local_infile = ON is a potential security issue
( bulk_insert_buffer_size / _ram ) = 8M / 716800M = 0.00% — Buffer for multi-row INSERTs and LOAD DATA
— Too big could threaten RAM size. Too small could hinder such operations.
( Questions ) = 9,658,430,713 / 3158494 = 3057 /sec — Queries (outside SP) — “qps”
— >2000 may be stressing server
( Queries ) = 9,678,805,194 / 3158494 = 3064 /sec — Queries (including inside SP)
— >3000 may be stressing server
( Created_tmp_tables ) = 1,107,271,497 / 3158494 = 350 /sec — Frequency of creating “temp” tables as part of complex SELECTs.
( Created_tmp_disk_tables ) = 297,023,373 / 3158494 = 94 /sec — Frequency of creating disk “temp” tables as part of complex SELECTs
— increase tmp_table_size and max_heap_table_size.
Check the rules for temp tables on when MEMORY is used instead of MyISAM. Perhaps minor schema or query changes can avoid MyISAM.
Better indexes and reformulation of queries are more likely to help.
( (Com_insert + Com_update + Com_delete + Com_replace) / Com_commit ) = (693300264 + 214511608 + 37537668 + 0) / 1672382928 = 0.565 — Statements per Commit (assuming all InnoDB)
— Low: Might help to group queries together in transactions; High: long transactions strain various things.
( Select_full_join ) = 338,957,314 / 3158494 = 107 /sec — joins without index
— Add suitable index(es) to tables used in JOINs.
( Select_full_join / Com_select ) = 338,957,314 / 6763083714 = 5.0% — % of selects that are indexless join
— Add suitable index(es) to tables used in JOINs.
( Select_scan ) = 124,606,973 / 3158494 = 39 /sec — full table scans
— Add indexes / optimize queries (unless they are tiny tables)
( Sort_merge_passes ) = 1,136,548 / 3158494 = 0.36 /sec — Heafty sorts
— Increase sort_buffer_size and/or optimize complex queries.
( Com_insert + Com_delete + Com_delete_multi + Com_replace + Com_update + Com_update_multi ) = (693300264 + 37537668 + 198418338 + 0 + 214511608 + 79274476) / 3158494 = 387 /sec — writes/sec
— 50 writes/sec + log flushes will probably max out I/O write capacity of normal drives
( ( Com_stmt_prepare - Com_stmt_close ) / ( Com_stmt_prepare + Com_stmt_close ) ) = ( 39 - 38 ) / ( 39 + 38 ) = 1.3% — Are you closing your prepared statements?
— Add Closes.
( Com_stmt_close / Com_stmt_prepare ) = 38 / 39 = 97.4% — Prepared statements should be Closed.
— Check whether all Prepared statements are “Closed”.
( innodb_autoinc_lock_mode ) = 1 — Galera: desires 2 — 2 = “interleaved”; 1 = “consecutive” is typical; 0 = “traditional”.
( Max_used_connections / max_connections ) = 701 / 700 = 100.1% — Peak % of connections
— increase max_connections and/or decrease wait_timeout
( Threads_running - 1 ) = 71 - 1 = 70 — Active threads (concurrency when data collected)
— Optimize queries and/or schema
Abnormally large: (Most of these stem from being a very busy system.)
Com_commit = 529 /sec Com_create_procedure = 0.01 /HR Com_drop_procedure = 0.01 /HR Com_delete = 12 /sec Com_delete_multi = 63 /sec Com_insert = 219 /sec Com_kill = 0.69 /HR Com_reset = 0.0011 /HR Com_revoke = 0.0023 /HR Com_select = 2141 /sec Com_show_binlogs = 12 /HR Com_show_create_func = 0.011 /HR Com_show_privileges = 0.0034 /HR Com_show_profile = 0.027 /HR Com_show_profiles = 0.028 /HR Com_show_slave_status = 0.037 /sec Com_show_storage_engines = 12 /HR Com_show_warnings = 0.14 /sec Com_slave_stop = 0.0011 /HR Com_update_multi = 25 /sec Created_tmp_files = 0.3 /sec Handler_commit = 3251 /sec Handler_external_lock = 18787 /sec Handler_prepare = 615 /sec Handler_read_first = 239 /sec Handler_read_key = 173669 /sec Handler_read_next = 1291439 /sec Handler_read_prev = 28535 /sec Handler_read_rnd = 32789 /sec
Innodb_buffer_pool_bytes_dirty = 7.03e+10 Innodb_buffer_pool_pages_data = 3.41e+7 Innodb_buffer_pool_pages_dirty = 4.29e+6 Innodb_buffer_pool_pages_misc = 2.15e+6 Innodb_buffer_pool_pages_total = 3.62e+7 Innodb_data_fsyncs = 132 /sec Innodb_data_writes = 232 /sec Innodb_data_written = 5440151 /sec Innodb_dblwr_pages_written = 145 /sec Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group = 582.3MB Innodb_pages_written = 165 /sec Innodb_row_lock_time = 5.97e+7 Innodb_rows_deleted + Innodb_rows_inserted = 2180 /sec Innodb_rows_inserted = 2155 /sec Innodb_rows_read = 1398531 /sec Max_used_connections = 701 Open_tables = 10,000 Select_full_range_join = 2.57e+7 Select_range = 130 /sec Sort_range = 30 /sec Sort_scan = 332 /sec Table_open_cache_hits = 9354 /sec Threads_running = 71 eq_range_index_dive_limit = 200 innodb_purge_threads = 4 innodb_thread_sleep_delay = 16,925
We never figured out what was the exact cause of this issue, but to offer some closure I`m going to tell what I can.
Our team made some load tests and concluded that
MySQL had troubles allocating memory. So they tried using
jemalloc instead of
glibc and the problem disappeared. We have been using
jemalloc in production more than 6 month now, without ever seeing this issue again.
I’m not saying that
jemalloc is better, or that everyone should use it with
MySQL. But it seems like for our specific case
glibc just wasn’t working properly .
Run “iostat -xk 5” to try to see if the disk still a problem. Also the CPU system is related to system code (kernell), double check the new disk/drivers/config.
Suggestions for my.cnf/ini [mysqld] section for your VERY BUSY
max_heap_table_size=128M # from 16M to match tmp_table_size innodb_lru_scan_depth=100 # from 256 to reduce depth every second innodb_change_buffer_max_size=15 # from 25 max used misc is 6% innodb_flush_neighbors=0 # from 1 with SSD there is no rotational delay innodb_read_ahead_threshold=8 # from 56 to expedite RD nxt extent read_rnd_buffer_size=128K # from 256K to reduce RD RPS
My expectation is a gradual decrease in results of
SHOW GLOBAL STATUS LIKE ‘innodb_buffer_pool_pages_dirty’;
with these suggestions applied. On 1/13/18 you had over 4M dirty pages.
I hope these help. These may be dynamically modified.
Many more opportunities exist, if you want them, let me know.
Your SHOW GLOBAL STATUS indicates innodb_buffer_pool_pages_dirty were 4,291,574.
To monitor current count,
SHOW GLOBAL STATUS LIKE '%pages_dirty';
To encourage reducing this count,
SET GLOBAL innodb_flushing_avg_loops=5;
In one hour run the monitor request to see where you stand with pages dirty.
Please let me know your counts at beginning and an hour later.
Apply the change to your my.cnf for long term better health of pages dirty reduction.
With IOPS at 30K tested (we need a number of IOPS for Random Writes), consider this
Suggestion for my.cnf/ini [mysqld] section
innodb_io_capacity_max=20000 # from 2000 and keep top 10000 free for now innodb_io_capacity=10000 # from 200 to ensure we stay away from the limits
may be Dynamically changed with SET GLOBAL and should reduce innodb_buffer_pool_pages_dirty rapidly.
The cause of COM_ROLLBACK averaging 1 every 4 seconds will remain a performance problem until resolved.
@chimmi Apr 9, 2018 Pick up this MySQL script from https://pastebin.com/aZAu2zZ0 for a quick check on Global Status resources used or released for nn seconds you can set in SLEEP. This will allow you to see if anyone has helped reduce your COM_ROLLBACK frequency.