MySQL data enconding conversion latin1 utf8

I was about to post a different story but things turned out differently, for good. Now I’ll post tips that may save the day in the event of encoding issues when dealing with MySQL table data. Hopefully you find it useful.

Say you have a MySQL table that has collation set to latin1 but you have utf8-encoded data stored. This happened when importing a dump, or because of Murphy’s law. When querying the data from your application or from mysql’s tools you get awful results for non-ascii characters such tildes. Changing the table’s collation definition will only work for new columns and records. Altering the collation for the said column would make things worse. So what to do?

Fortunately MySQL offers a nice way to save the day. You can use a transition conversion and then set the table encoding back to utf8, so it pairs accordingly with the encoding type for the data it has stored.

ALTER TABLE table_name CHANGE column_name column_name BLOB;

Issuing the previous command will make the table change it’s type to BLOB. Why? Because by doing it the stored data is kept untouched. This is important. If you use the CONVERT or MODIFY commands the data will be converted by MySQL. We don’t want that since the data is already in the encoding type we want.

You can note that column_name is repeated twice, this is because the CHANGE command copies the data and we want the resulting column to have the same name.

Now, we have to put the data back on the encoding we want, so everything is under control again. This is done by issuing the following command.

ALTER TABLE table_name CHANGE column_name column_name COLUMN_TYPE CHARACTER SET utf8;

COLUMN_TYPE depends on the content’s column type. For instance, for a TEXT column it should be.

ALTER TABLE table_name CHANGE column_name column_name TEXT CHARACTER SET utf8;

It works for VARCHAR, LONGTEXT and other data types that are used to store characters.