Question :
MySql database at the moment has:
- Character Set: utf8
- Default Collation Name: utf8_unicode_ci
I want to change it to:
- utf8mb4
- utf8mb4_unicode_ci
I would also do the same thing for all tables in the database.
My question is the following:
- By doing that change is there a possibility that the content of my columns gets messed up in any way? (Cause I am really scared to do it!!!)
- I have millions of rows in the tables, is there ANY way to do a check after I make the switch that the content of all tables is the OK?
- Some crazy ideas I am thinking of include:
- Running some scripts to compare original DB to changed DB
- Running some scripts to calculate column sizes of original DB to changed DB
- Some crazy ideas I am thinking of include:
Any help or ideas which would guide me to the right direction would be appreciated.
THANK YOU
Answer :
Main thing to watch altering tables is the index length will increase for utf8 indexed columns which might put them over limit. Adjust URL for your mysql version as some differences apply. These index length exceeded errors will show up immediately on ALTER TABLE
so this will be a final bit of the migration.
On validation, generally won’t be a problem. Try this on a few tables to get confidence in the idea confidence on smaller tables:
CREATE TABLE tbl_new LIKE tbl
ALTER TABLE tbl_new .. {change to utf8mb4 fields}
INSERT INTO tbl_new SELECT * FROM tbl
RENAME TABLE tbl_new TO tbl, tbl TO tbl_old
A SELECT ... tbl JOIN tbl_old ON tbl.pk = tbl_old.pk WHERE tbl.utf_field != tbl_old.utf_field
can be used to validate values.
Application behavior can be checked, and you can RENAME back if their are problems. (taking to account updates).
For each table:
ALTER TABLE t CONVERT TO CHARACTER SET utf8mb4 COLLATION utf8mb4_unicode_ci;
That changes all string columns in one table. If you have any columns that should not be utf8mb4, then evasive measures need to be taken.
If you blindly use VARCHAR(255)
and are using MySQL 5.5 or 5.6, there could be index problems. Here are some workarounds: http://mysql.rjweb.org/doc.php/limits#767_limit_in_innodb_indexes
You might consider this newer collation: utf8mb4_unicode_520_ci