September 29 2011
Converting MySQL from LATIN1 to UTF8
MySQL defaults to "latin1" as its character set, but at some point, most people want to migrate to "utf8". I realize that there are dozens of posts about how people handled this, and yet, not a single one of those worked completely for me.
I wanted MySQL to use "utf8" for the character set and "utf8_unicode_ci" (not "utf8_general_ci") for the collation, and I wanted it all to work on RDS with ActiveRecord on Rails and Rack.
Here are my steps:
- use mysqldump to extract the old data as "latin1"
- use sed to replace "latin1" with "utf8" in the dump file
- create the new database with the right parameters: character set utf8 collate utf8_unicode_ci
- use mysql --default-character-set=utf8 to pipe the converted dump into the new database
Here is my code:
To verify the character set and collation, you can always query the MySQL variables:
Amazon RDS
For those using Amazon's AWS RDS for their MySQL instance, you have to create a parameter group with "utf8" values. I'd guess you could just modify the current parameter group then apply it, but I haven't verified that.
ActiveRecord in Rails and Rack
ActiveRecord supports an :encoding option in its parameters for ActiveRecord::Base.establish_connection. The option tells the connection to execute SET NAMES as soon as the connection is established, thereby telling the server what the character set the client wants.
However, I also wanted to specify the collation. When I added :encoding to ActiveRecord::Base.establish_connection, collation_connection (from the MySQL variables, not connection.collation) remained as "utf8_general_ci". Some people have indicated that you can specify :collation in database.yml for the establish_connection call, but that never worked for me. I think MySQL bug #34980 prevented it. Others indicated that you can simply add ActiveRecord::Base.connection.execute("set collation_connection='utf8_unicode_ci'") at the bottom of environment.rb for Rails; that also never worked for me. To specify the collation in Rails, I used a before_filter in application_controller.rb. See my code below.
Pointers
I pieced my steps together from the following helpful links: