Changing from utf8 to utf8mb4

Posted on

Question :

MySql database at the moment has:

  • Character Set: utf8
  • Default Collation Name: utf8_unicode_ci

I want to change it to:

  • utf8mb4
  • utf8mb4_unicode_ci

I would also do the same thing for all tables in the database.

My question is the following:

  • By doing that change is there a possibility that the content of my columns gets messed up in any way? (Cause I am really scared to do it!!!)
  • I have millions of rows in the tables, is there ANY way to do a check after I make the switch that the content of all tables is the OK?
    • Some crazy ideas I am thinking of include:
      • Running some scripts to compare original DB to changed DB
      • Running some scripts to calculate column sizes of original DB to changed DB

Any help or ideas which would guide me to the right direction would be appreciated.

THANK YOU

Answer :

Main thing to watch altering tables is the index length will increase for utf8 indexed columns which might put them over limit. Adjust URL for your mysql version as some differences apply. These index length exceeded errors will show up immediately on ALTER TABLE so this will be a final bit of the migration.

On validation, generally won’t be a problem. Try this on a few tables to get confidence in the idea confidence on smaller tables:

   CREATE TABLE tbl_new LIKE tbl
   ALTER TABLE tbl_new ..  {change to utf8mb4 fields}
   INSERT INTO tbl_new SELECT * FROM tbl
   RENAME TABLE tbl_new TO tbl, tbl TO tbl_old

A SELECT ... tbl JOIN tbl_old ON tbl.pk = tbl_old.pk WHERE tbl.utf_field != tbl_old.utf_field can be used to validate values.

Application behavior can be checked, and you can RENAME back if their are problems. (taking to account updates).

For each table:

ALTER TABLE t CONVERT TO CHARACTER SET utf8mb4 COLLATION utf8mb4_unicode_ci;

That changes all string columns in one table. If you have any columns that should not be utf8mb4, then evasive measures need to be taken.

If you blindly use VARCHAR(255) and are using MySQL 5.5 or 5.6, there could be index problems. Here are some workarounds: http://mysql.rjweb.org/doc.php/limits#767_limit_in_innodb_indexes

You might consider this newer collation: utf8mb4_unicode_520_ci

Leave a Reply

Your email address will not be published. Required fields are marked *