Table size reduce with help changing collation charset [MySQL]

Posted on

Question :

I have table which is big enough, phpmyadmin shows that it takes more than 40Gb space. I can’t remove data from it, but I’ve figure out one thing: there is one field which is described as

varchar(3) utf8_general_ci

There are always latin symbols.

I thought that changing collation to latin1_general_ci would reduce the Table size.

I’ve tried this out on small version of it with 200 Mb but I did not get expected result 🙁

Why so ?

Answer :

MySQL’s documentation for utf8 shows that it will use 1 byte for Latin characters, and only use more if the situation requires them.

Therefore, if you’re only using normal latin characters, both utf8_general_ci and latin1_general_ci will use between 1 and 4 bytes: one byte to store the length (0-3 characters), and then up to three bytes for the actual text.

If the value of this field will always be exactly three characters, you could save one byte by changing the column from a varchar(3) to a char(3) (as long as you also use latin1_general_ci).

If it’s not exactly three characters, it’s likely to be more efficient to leave it as a varchar – there’s not much more you can do to reduce the size of this field. However, with a 40GB table you can hopefully find something else to optimise!

You may have fragmentation in the table meaning that you could reclaim space by running the optimize command on the table. Check the free_data column from show table status like 'table_name'.
Further reading;
http://dev.mysql.com/doc/refman/5.5/en/optimize-table.html

If you are using the innodb storage engine (and you generally should with narrow exceptions), then you could consider using innodb row compression. Caveats here are that it’s not recommended on a CPU-bound workload and you need to ensure that you’re working with innodb_file_per_table enabled.
Further reading;
http://dev.mysql.com/doc/refman/5.5/en/innodb-compression-usage.html

Leave a Reply

Your email address will not be published. Required fields are marked *