We have a int/test/staging/production Mysql database setup. A Java application(using Hibernate) that periodically generates up to 200,000 updates/inserts (mainly updates by 20 to 1) completes in 10 mins on the int and test environments but can take up to 90mins on the staging and production envs (occasionally it achieves 10 mins but usually 90).
They are identical schemas and datasets, the only difference is that stage and production have master/slave setups (int and test just the one DB) where the write master is taken out of the pool so there is no load on it (there is no load on stage anyway). The stage and production servers are also much higher spec’d machines.
After reviewing the documentation I can’t find any leads as to why this might be.
Any ideas where I might look for the answer?
Someone asked for the servers configurations (The production box is a dedicated 24 core machine) :
| innodb_adaptive_flushing | ON | | innodb_adaptive_hash_index | ON | | innodb_additional_mem_pool_size | 20971520 | | innodb_autoextend_increment | 8 | | innodb_autoinc_lock_mode | 1 | | innodb_buffer_pool_instances | 1 | | innodb_buffer_pool_size | 17179869184 | | innodb_change_buffering | all | | innodb_checksums | ON | | innodb_commit_concurrency | 0 | | innodb_concurrency_tickets | 500 | | innodb_data_file_path | ibdata1:10M:autoextend | | innodb_data_home_dir | | | innodb_doublewrite | ON | | innodb_fast_shutdown | 1 | | innodb_file_format | Barracuda | | innodb_file_format_check | ON | | innodb_file_format_max | Antelope | | innodb_file_per_table | ON | | innodb_flush_log_at_trx_commit | 2 | | innodb_flush_method | | | innodb_force_recovery | 0 | | innodb_io_capacity | 400 | | innodb_large_prefix | OFF | | innodb_lock_wait_timeout | 50 | | innodb_locks_unsafe_for_binlog | OFF | | innodb_log_buffer_size | 8388608 | | innodb_log_file_size | 262144000 | | innodb_log_files_in_group | 2 | | innodb_log_group_home_dir | /sqllogs1/mysql-3306/innodb_logs | | innodb_max_dirty_pages_pct | 50 | | innodb_max_purge_lag | 0 | | innodb_mirrored_log_groups | 1 | | innodb_old_blocks_pct | 37 | | innodb_old_blocks_time | 0 | | innodb_open_files | 300 | | innodb_purge_batch_size | 20 | | innodb_purge_threads | 0 | | innodb_read_ahead_threshold | 56 | | innodb_read_io_threads | 10 | | innodb_replication_delay | 0 | | innodb_rollback_on_timeout | ON | | innodb_rollback_segments | 128 | | innodb_spin_wait_delay | 6 | | innodb_stats_method | nulls_equal | | innodb_stats_on_metadata | ON | | innodb_stats_sample_pages | 8 | | innodb_strict_mode | ON | | innodb_support_xa | ON | | innodb_sync_spin_loops | 30 | | innodb_table_locks | ON | | innodb_thread_concurrency | 0 | | innodb_thread_sleep_delay | 10000 | | innodb_use_native_aio | ON | | innodb_use_sys_malloc | ON | | innodb_version | 1.1.8 | | innodb_write_io_threads | 10
FahimParkar is partly correct in that the exact reasons for this are perhaps the domain of the DBA, but you should be aware of the general performance issues around D/B replication. In essence this is giving the production environment a level of resilience and for some configurations performance scaling, but at a price. And that price is that insert/updates to one database must be correctly journalled and replicated to any slaves. This involve both increased logging and a significant remote RPC overhead between the master and slave machines. This is both volume and transaction related. Whilst there are techniques to optimise insert intensive batch processes, you have less freedom with update intensive batch processes and these should should be expected to experience a material performance hit.
You mention that the stage and production servers are higher spec’d machines, but the perhaps a key issue here is the network architecture and fabric for their interconnection.
From an applications viewpoint, perhaps all you can do is to review your design and ensure that you are only updating when actual change have occurred, or consider why this is a batch function.
Also there is a so-what? Q: It takes 90 mins. So what? It’s a batch process; what is the downside of this?