Improve `Update` performance (rows locking issue)

Posted on

Question :

I am running 30 scripts (PHP CLI) on Linux , each script are updating (loop) the data in the MySQL database.

When I typed ‘mysqladmin proc’ in terminal, I can see a lot of rows has been locked for 10-30 seconds. Mostly are Update queues. How to improve the performance faster? I am using InnoDB engine.

top command:

top - 10:48:54 up 17 days, 10:30,  2 users,  load average: 1.06, 1.05, 1.01
Tasks: 188 total,   1 running, 187 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.8%us,  0.1%sy,  0.0%ni, 74.1%id,  0.0%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   4138464k total,  1908724k used,  2229740k free,   316224k buffers
Swap:  2096440k total,       16k used,  2096424k free,   592384k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
32183 mysql     15   0  903m 459m 4800 S 101.8 11.4 876:53.66 mysqld

/etc/my.cnf

[mysqld]
set-variable = max_connections=500
safe-show-database
max_user_connections=200
key_buffer_size = 16M
query_cache_size = 350M
tmp_table_size = 200M
max_heap_table_size  = 200M
thread_cache_size = 4
table_cache = 800
thread_concurrency = 8
innodb_buffer_pool_size = 400M
innodb_log_file_size = 128M
query_cache_limit = 500M
innodb_flush_log_at_trx_commit = 2

Server Spec: Intel Core 2 Quad Q8300, 2.5 GHz, 4GB ram.

‘mysqladmin proc’:

+------+-----------------+-----------+----------------+---------+------+----------+-------------------------------------------------------------------------------
| Id   | User            | Host      | db             | Command | Time | State    | Info                                                                          
+------+-----------------+-----------+----------------+---------+------+----------+--------------------------------------------------------------------------------
|  265 | user            | localhost | xxxxxxxxxxxxxx | Query   |   15 | Updating | UPDATE data SET status = '2', error = 'Unknown error'  WHERE number= 0xxxxx    
|  269 | user            | localhost | xxxxxxxxxxxxxx | Query   |   17 | Updating | UPDATE data SET status = '2', error = 'Invalid ....'  WHERE number= 0xxx 
|  280 | user            | localhost | xxxxxxxxxxxxxx | Query   |    7 | Updating | UPDATE data SET status = 1  WHERE f = 0xxxx                                           
|  300 | user            | localhost | xxxxxxxxxxxxxx | Query   |    1 | Updating | UPDATE data SET status = '2', error = 'Unknown ....'  WHERE number= 0xx             
|  314 | user            | localhost | xxxxxxxxxxxxxx | Query   |   13 | Updating | UPDATE data SET status = '2', error = 'Invalid....'  WHERE number= 0xxxx
|  327 | user            | localhost | xxxxxxxxxxxxxx | Query   |   11 | Updating | UPDATE data SET status = '2', error = 'Unknown ....'  WHERE number= 0xxxx               
|  341 | user            | localhost | xxxxxxxxxxxxxx | Sleep   |    2 |          | NULL                                                                                      
|  350 | user            | localhost | xxxxxxxxxxxxxx | Query   |    7 | Updating | UPDATE data SET status = '2', error = 'Unknown ....'  WHERE number= 0xxx                
|  360 | user            | localhost | xxxxxxxxxxxxxx | Query   |    5 | Updating | UPDATE data SET status = 1  WHERE number = 0xxxx     

Explain:

+----+-------------+-------+-------------+----------------+----------------+---------+------+-------+----------------------------------------------+
| id | select_type | table | type        | possible_keys  | key            | key_len | ref  | rows  | Extra                                        |
+----+-------------+-------+-------------+----------------+----------------+---------+------+-------+----------------------------------------------+
|  1 | SIMPLE      | data  | index_merge | process,status | process,status | 52,1    | NULL | 16439 | Using intersect(process,status); Using where |
+----+-------------+-------+-------------+----------------+----------------+---------+------+-------+----------------------------------------------+

Is MySQL server using multiple cores?

Edit, more information:

data table:

CREATE TABLE data (
  number varchar(50) NOT NULL,
  dob varchar(50) NOT NULL,
  other varchar(50) NOT NULL,
  status tinyint(1) unsigned NOT NULL,
  error varchar(150) NOT NULL,
  process varchar(50) NOT NULL,
  KEY process (process),
  KEY status (status),
  KEY number (number)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Number of rows in the data table: 929335

Indexed: process, status and number

data table Size: 137.3 MiB (according to phpMyAdmin)

Hard drive: 500 GB SATA, 7200 rpm (no raid)

LOOP
  $offset = xx
  $limit  = xx
  $data = SELECT * FROM data WHERE status = 0 AND process = 1 LIMIT $limit OFFSET $offset

  LOOP ROWS
     $error = errorCheck($data['number']);
     if ($error) {
         UPDATE data SET status = 2, error='$error' WHERE number = $data['number']
     } else {
         UPDATE data SET status = 1 WHERE number = $data['number']
     }
  END LOOP ROWS

END LOOP
  • limit select query to return only a limited amount of rows and it will select the remaining rows during the next iteration. Using offset and limit
  • read all the rows from the select query into an array of variables
  • Iterate your array of numbers, update every row
  • continue with the first step again

mysql> EXPLAIN SELECT * FROM data WHERE number = 0xxxxxxxxxx;

+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | data  | ALL  | number        | NULL | NULL    | NULL | 934712 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+

iostat

root@host [~]# iostat
Linux 2.6.18-194.17.1.el5PAE (xxxxxxx.xxxxxxxxxxxx.com)        21/12/11

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           9.99    0.17    0.49    0.27    0.00   89.08

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              10.06       243.59       556.69  393848781  900082808
sda1              0.56         1.41        20.75    2284442   33551840
sda2              5.30         9.55       271.30   15442276  438656832
sda3              1.67        12.01        38.33   19412396   61973872
sda4              0.00         0.00         0.00          8          0
sda5              0.28         0.09         6.83     145372   11035184
sda6              0.00         0.00         0.00       4514       3072
sda7              2.25       220.53       219.48  356559333  354862008

Answer :

First, each time you UPDATE the status column, you are having to update the index as well (source). Evaluate your indexing to see if you really need the index on the status column. My guess is no, since it has an extremely low cardinality and MySQL probably won’t use it anyway.

If you ignore me and think you do need it, follow the advice in the article to drop the index before your loop and re-add it after you’re done.

Here are some other things you might do if that doesn’t help:

  • You are taking all the columns from the data but only using number. Don’t do a SELECT *, but instead a SELECT number. That won’t help your writes, but it is a good performance practice. Only select the columns you’re using.

  • Your number index isn’t getting used at all. This means it is not unique enough to be useful for updating. (Slight tangent: how many rows does a single UPDATE affect?) I would drop it, or at least add it to process index.

  • It looks like process is unique enough for MySQL to whittle the amount of rows down to 16k, instead of 1 million. In light of this, I would add AND process=x to your update statement (I’m assuming you know process from the original SELECT statement):

    -- FAILED--
    UPDATE data SET status = 2, error='$error' WHERE process=X AND number = $data['number']
    
    -- SUCCESS --
    UPDATE data SET status = 1 WHERE process=X AND number = $data['number']
    

A hint about unnecessary indexes in InnoDB. InnoDB is using a hidden ‘primary key’ (since you don’t have one defined) and is using that when it writes the indexes. So for each Index you’re using, you add the size of the index + the size of the hidden primary key to the data file. If you’re not using the index (or MySQL can’t use it), you are wasting space and adding overhead each time you insert a new number (same for status, as discussed earlier)

OBSERVATION #1

I noticed you have this query

SELECT * FROM data WHERE status = 0 AND process = 1 LIMIT $limit OFFSET $offset  

You need to examine the cardinality of the columns individually and collectively by running these queries:

SELECT COUNT(1) status_count,status FROM data GROUP BY status;
SELECT COUNT(1) process_count,process FROM data GROUP BY process;
SELECT COUNT(1) statprocess_count,status,process FROM data GROUP BY status,process;

You mentioned that you have 929335 rows in the data table. Now, notice the following:

  • FLOOR(5% of the table count) is 46466
  • Full Table Scan occurs under these conditions
    • Any status whose status_count > 46466
    • Any process whose process_count > 46466
    • Any (status,process) combinaiton whose statproc_count > 46466
  • If all the distinct status values have status_count > 46466, drop the status index
  • If all the distinct process values have process_count > 46466, drop the process index
  • If all (status,process) combinaitons have statproc_count > 46466, you do not need a compound index in (status,process). Otherwise, you should make one with ALTER TABLE data ADD INDEX statproc (status,process);

Removing unnecessary indexes reduces overall UPDATE and SELECT performance.

Index merging does not help the query’s performance if the cardinality of the indexes are low and the number of rows are high.

OBSERVATION #2

The process field is varchar. If all values of process is totally numeric, your should convert it to INT. You can ask MySQL to suggest a datatype as follows:

SELECT process FROM data PROCEDURE ANALYSE();

This could reduce the number of bytes that the index and table rows take up and potentially speed up queries in so doing.

I don’t think you need an index on status – it’s very likely not selective enough to improve read performance, but definitely degrades write operations. Also, is it any reason you don’t have primary key on your table?

Leave a Reply

Your email address will not be published. Required fields are marked *