Question :
I am running 30 scripts (PHP CLI) on Linux , each script are updating (loop) the data in the MySQL database.
When I typed ‘mysqladmin proc’ in terminal, I can see a lot of rows has been locked for 10-30 seconds. Mostly are Update queues. How to improve the performance faster? I am using InnoDB engine.
top
command:
top - 10:48:54 up 17 days, 10:30, 2 users, load average: 1.06, 1.05, 1.01
Tasks: 188 total, 1 running, 187 sleeping, 0 stopped, 0 zombie
Cpu(s): 25.8%us, 0.1%sy, 0.0%ni, 74.1%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 4138464k total, 1908724k used, 2229740k free, 316224k buffers
Swap: 2096440k total, 16k used, 2096424k free, 592384k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32183 mysql 15 0 903m 459m 4800 S 101.8 11.4 876:53.66 mysqld
–
/etc/my.cnf
[mysqld]
set-variable = max_connections=500
safe-show-database
max_user_connections=200
key_buffer_size = 16M
query_cache_size = 350M
tmp_table_size = 200M
max_heap_table_size = 200M
thread_cache_size = 4
table_cache = 800
thread_concurrency = 8
innodb_buffer_pool_size = 400M
innodb_log_file_size = 128M
query_cache_limit = 500M
innodb_flush_log_at_trx_commit = 2
Server Spec: Intel Core 2 Quad Q8300, 2.5 GHz, 4GB ram.
‘mysqladmin proc’:
+------+-----------------+-----------+----------------+---------+------+----------+-------------------------------------------------------------------------------
| Id | User | Host | db | Command | Time | State | Info
+------+-----------------+-----------+----------------+---------+------+----------+--------------------------------------------------------------------------------
| 265 | user | localhost | xxxxxxxxxxxxxx | Query | 15 | Updating | UPDATE data SET status = '2', error = 'Unknown error' WHERE number= 0xxxxx
| 269 | user | localhost | xxxxxxxxxxxxxx | Query | 17 | Updating | UPDATE data SET status = '2', error = 'Invalid ....' WHERE number= 0xxx
| 280 | user | localhost | xxxxxxxxxxxxxx | Query | 7 | Updating | UPDATE data SET status = 1 WHERE f = 0xxxx
| 300 | user | localhost | xxxxxxxxxxxxxx | Query | 1 | Updating | UPDATE data SET status = '2', error = 'Unknown ....' WHERE number= 0xx
| 314 | user | localhost | xxxxxxxxxxxxxx | Query | 13 | Updating | UPDATE data SET status = '2', error = 'Invalid....' WHERE number= 0xxxx
| 327 | user | localhost | xxxxxxxxxxxxxx | Query | 11 | Updating | UPDATE data SET status = '2', error = 'Unknown ....' WHERE number= 0xxxx
| 341 | user | localhost | xxxxxxxxxxxxxx | Sleep | 2 | | NULL
| 350 | user | localhost | xxxxxxxxxxxxxx | Query | 7 | Updating | UPDATE data SET status = '2', error = 'Unknown ....' WHERE number= 0xxx
| 360 | user | localhost | xxxxxxxxxxxxxx | Query | 5 | Updating | UPDATE data SET status = 1 WHERE number = 0xxxx
Explain:
+----+-------------+-------+-------------+----------------+----------------+---------+------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------------+----------------+----------------+---------+------+-------+----------------------------------------------+
| 1 | SIMPLE | data | index_merge | process,status | process,status | 52,1 | NULL | 16439 | Using intersect(process,status); Using where |
+----+-------------+-------+-------------+----------------+----------------+---------+------+-------+----------------------------------------------+
Is MySQL server using multiple cores?
Edit, more information:
data table:
CREATE TABLE data (
number varchar(50) NOT NULL,
dob varchar(50) NOT NULL,
other varchar(50) NOT NULL,
status tinyint(1) unsigned NOT NULL,
error varchar(150) NOT NULL,
process varchar(50) NOT NULL,
KEY process (process),
KEY status (status),
KEY number (number)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Number of rows in the data table: 929335
Indexed: process
, status
and number
data table Size: 137.3 MiB (according to phpMyAdmin)
Hard drive: 500 GB SATA, 7200 rpm (no raid)
LOOP
$offset = xx
$limit = xx
$data = SELECT * FROM data WHERE status = 0 AND process = 1 LIMIT $limit OFFSET $offset
LOOP ROWS
$error = errorCheck($data['number']);
if ($error) {
UPDATE data SET status = 2, error='$error' WHERE number = $data['number']
} else {
UPDATE data SET status = 1 WHERE number = $data['number']
}
END LOOP ROWS
END LOOP
- limit select query to return only a limited amount of rows and it will select the remaining rows during the next iteration. Using offset and limit
- read all the rows from the select query into an array of variables
- Iterate your array of numbers, update every row
- continue with the first step again
mysql> EXPLAIN SELECT * FROM data WHERE number = 0xxxxxxxxxx;
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | data | ALL | number | NULL | NULL | NULL | 934712 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
iostat
root@host [~]# iostat
Linux 2.6.18-194.17.1.el5PAE (xxxxxxx.xxxxxxxxxxxx.com) 21/12/11
avg-cpu: %user %nice %system %iowait %steal %idle
9.99 0.17 0.49 0.27 0.00 89.08
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 10.06 243.59 556.69 393848781 900082808
sda1 0.56 1.41 20.75 2284442 33551840
sda2 5.30 9.55 271.30 15442276 438656832
sda3 1.67 12.01 38.33 19412396 61973872
sda4 0.00 0.00 0.00 8 0
sda5 0.28 0.09 6.83 145372 11035184
sda6 0.00 0.00 0.00 4514 3072
sda7 2.25 220.53 219.48 356559333 354862008
Answer :
First, each time you UPDATE
the status column, you are having to update the index as well (source). Evaluate your indexing to see if you really need the index on the status column. My guess is no, since it has an extremely low cardinality and MySQL probably won’t use it anyway.
If you ignore me and think you do need it, follow the advice in the article to drop the index before your loop and re-add it after you’re done.
Here are some other things you might do if that doesn’t help:
-
You are taking all the columns from the
data
but only usingnumber
. Don’t do aSELECT *
, but instead aSELECT number
. That won’t help your writes, but it is a good performance practice. Only select the columns you’re using. -
Your
number
index isn’t getting used at all. This means it is not unique enough to be useful for updating. (Slight tangent: how many rows does a singleUPDATE
affect?) I would drop it, or at least add it toprocess
index. -
It looks like
process
is unique enough for MySQL to whittle the amount of rows down to 16k, instead of 1 million. In light of this, I would addAND process=x
to your update statement (I’m assuming you know process from the originalSELECT
statement):-- FAILED-- UPDATE data SET status = 2, error='$error' WHERE process=X AND number = $data['number'] -- SUCCESS -- UPDATE data SET status = 1 WHERE process=X AND number = $data['number']
A hint about unnecessary indexes in InnoDB. InnoDB is using a hidden ‘primary key’ (since you don’t have one defined) and is using that when it writes the indexes. So for each Index you’re using, you add the size of the index + the size of the hidden primary key to the data file. If you’re not using the index (or MySQL can’t use it), you are wasting space and adding overhead each time you insert a new number
(same for status
, as discussed earlier)
OBSERVATION #1
I noticed you have this query
SELECT * FROM data WHERE status = 0 AND process = 1 LIMIT $limit OFFSET $offset
You need to examine the cardinality of the columns individually and collectively by running these queries:
SELECT COUNT(1) status_count,status FROM data GROUP BY status;
SELECT COUNT(1) process_count,process FROM data GROUP BY process;
SELECT COUNT(1) statprocess_count,status,process FROM data GROUP BY status,process;
You mentioned that you have 929335 rows in the data table. Now, notice the following:
- FLOOR(5% of the table count) is 46466
- Full Table Scan occurs under these conditions
- Any status whose status_count > 46466
- Any process whose process_count > 46466
- Any (status,process) combinaiton whose statproc_count > 46466
- If all the distinct status values have status_count > 46466, drop the status index
- If all the distinct process values have process_count > 46466, drop the process index
- If all (status,process) combinaitons have statproc_count > 46466, you do not need a compound index in (status,process). Otherwise, you should make one with
ALTER TABLE data ADD INDEX statproc (status,process);
Removing unnecessary indexes reduces overall UPDATE
and SELECT
performance.
Index merging does not help the query’s performance if the cardinality of the indexes are low and the number of rows are high.
OBSERVATION #2
The process
field is varchar. If all values of process
is totally numeric, your should convert it to INT. You can ask MySQL to suggest a datatype as follows:
SELECT process FROM data PROCEDURE ANALYSE();
This could reduce the number of bytes that the index and table rows take up and potentially speed up queries in so doing.
I don’t think you need an index on status
– it’s very likely not selective enough to improve read performance, but definitely degrades write operations. Also, is it any reason you don’t have primary key on your table?