MySQL – Export two columns from large table without causing a lock

Posted on

Question :

I am a sysadmin who doesn’t do a ton of dba stuff. For a project, I have access to a production server–which I am not the sysadmin for–with an enormous 40,000,000 row >10 GB MySQL InnoDB table. I want to export two small columns, one of which is an INT(11), and the other of which is a VARCHAR(20) from that table to a CSV or .SQL file (either is fine, I’ll write a crosswalk for the next step no problem).

We can call the columns ColumnA and ColumnB and the table SampleTable.

MySQLdump is not the right tool for this because I can’t specify the columns and I don’t need to export a massive massive table just got two tiny columns.

I know I can do a SELECT INTO statement (either to create a new table with just the columns or to do a SELECT INTO OUTFILE to skip the intermediate step), but I am concerned that this will cause a table level lock on a production server. The table is InnoDB.

What’s my best bet to avoid inconveniencing any live traffic on the server or locking anything?

Thanks

Answer :

Many experts share many ways on how to overcome this problem. These are my suggestions to play a safe game.

Try to set the below command in a seperate session.

SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;
use db;
SELECT COLA, COLB into outfile '/tmp/data.csv' from TABLE_NAME;
COMMIT;
exit;

Doing by this way the SELECT statements are performed in a nonlocking fashion, but a possible earlier version of a row might be used. Thus, using this isolation level, such reads are not consistent.When you say not-consistent it means recently changing records i.e.. DML transactions that are currently in process will not be read. I assume which is in your case it is acceptable. This is also called a “dirty read.” Otherwise, this isolation level works like READ COMMITTED.

If I were to be you, the below order is what I follow.

  • Check for non-peak hours for DB server also for the table that you mind to do select. By this the outfile reads/write, IOs shouldn’t cost much to the server.

  • Manually run this in the server back ground. And keept monitoring the server health.

$ cat bg_loaddata.sh

mysql -uroot -p*** << EOF
use db;
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;
SELECT COLA, COLB into outfile '/tmp/data.csv' from TABLE_NAME;
COMMIT;
exit;
EOF


$ nohup ./bg_loaddata.sh > bg_loaddata.log &
  • If you see any inserts or updates that are taking longer time to complete in show processlist; You may wait for sometime, but if it is causing severe turbulence then goahead and kill the connection id in processlist for your select ...into outfile ... [I’m 99% sure this will not happen, but always to be prepared for any back fires]

By this way you can be sure for N number of records to be retrieved it taks N number of seconds, and no issues can be faced by doing this way. And whenever you need this to be run next time put the script in cron and have a sound sleep. 🙂 [Ofcourse you have monitoring metrics to wake you up]-> Just to be aware before someone surprises you with the NEWS.

All the best !!

Leave a Reply

Your email address will not be published.