Now With More UTF (Converting Your Wordpress Database to UTF-8)

Unsatisfied with the previous 7 Unicode Transformation Formats, I recently converted the database to UTF-8. Before I get into the gory details, here’s all you need to know:

Back up your database, install this plugin, read the instructions, and follow the instructions. Everything will be done with the push of one button. There is also a sanitizer if something goes wrong, but I have not tried this plugin.

Edit: According to the Wordpress Codex, the above plugin does not work properly on newer versions of Wordpress. You will want to backup your database tables and take the long route very carefully. The sanitizer will still work as a “last resort”.

Now the real motivator to convert came from JP Meyer’s post about the annoyance of converting. Basically it comes down to the fact that using UTF-8 will make your blog easily portable, and old versions of Wordpress did not use UTF-8. The solution sounds really bad, but since it can be automated, someone finally came up with a plugin to do everything for you.

Conversion to UTF-8 is definitely an issue for anime bloggers, because many of us use foreign characters in our posts, titles, categories, etc. Everything will seem to work until you try to move your database (such as when JP tried to move his databases to a new webhost), at which point all the old Latin1 stuff will look like garbled crap. After you have converted all your data once, you should not have to ever do it again, as new versions of Wordpress are set to use UTF-8 by default. I’m not sure exactly how you can tell everything was converted correctly, but I did notice that my blogroll order (alphabetical) changed when I refreshed the main page after I completed the conversion. So for JP’s hard work, he is now listed last on the blogroll orz.

I remember hearing about the idea first on Jason’s update post. I went to the link he included, started reading, and immediately thought, “OMG this sounds really painful, I’ll do it later.”

Beginning with Version 2.2, WordPress allows the user to define both the database character set and the collation in their wp-config.php file. Setting the

DB_CHARSET

and

DB_COLLATE

values in wp-config.php causes WordPress to create the database with the appropriate settings. But, the setting can only be designated for new installations, not for ‘already installed’ copies of WordPress.

I had been using the old wp-config.php file and so I actually had to download the newest Wordpress, and use the new version of the wp-config.php that tells WP to use UTF-8. Then I used the WP-DBManager to back up the database. After this step, you can either do things the hard way, or the easy way. I picked the easy way, and if you want some background info about this plugin, here’s the forum thread. Keep in mind, the converter plugin only works for versions 2.1.x and 2.2.x of Wordpress. The conversion does take a little bit of time, depending on the size of your database, so I cleaned up my database (dropping unneeded information in various plugin-created tables) to cut down on the amount of information that the plugin needed to convert.

Edit: As the conversion plugin has not been updated since, those who are converting from a post-2.1 or 2.2 version of Wordpress will get database corruption errors from the plugin. Unfortunately, you are currently stuck using the long way of converting your database until someone comes up with a working plugin.

Related posts:

  1. Wordpress 2.7 Preview
  2. Find And Replace
  3. Database Cleanup

This entry was posted in Blogging Tips, Site News. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

10 Comments

  1. (Power Level: 66)
    Posted September 4, 2007 at 12:44 am | Permalink

    Those lolicious pictures actually gave me (an IT-challenged tanuki) the courage to want to try this potentially catastrophic conversion.

    I need to lay off Moetan. XD

  2. (Power Level: 2351)
    Posted September 4, 2007 at 12:49 am | Permalink

    Look into those sad loli eyes and know that she can’t convert that database without you.

    While it sounds like a lot can go wrong, the actual process was pretty easy with the plugin. I don’t want to think about what it was like before the plugin, or I’ll start crying too.

    If your database blows up though, I want Zyl to know I had nothing to do with it.

  3. (Power Level: 83)
    Posted September 4, 2007 at 2:51 am | Permalink

    tl;dr…but those pictures were great.

    Just kidding, good luck with that, hope your site does not asplode.

  4. (Power Level: 68)
    Posted September 4, 2007 at 6:23 am | Permalink

    Thanks for the information ^^ I’ll let JP know, he’s been tinkering with this crap all weekend.

  5. (Power Level: 2351)
    Posted September 4, 2007 at 7:58 am | Permalink

    Hinano, I’m not sure of how JP originally backed up the data, so he may want to read some of the gory details of how the conversion works.

    First, make a backup of your database. This doesn’t mean do a mysql dump (which will get you a bunch of unusable rubbish), it means make a PHYSICAL COPY of the database folder, which you can find on your server at:

    $ cd /some/where/mysql/data/mydatabasename

    Otherwise it sounds like he will be unable to simply run the plugin to convert the text. I have heard bad things about Dreamhost support, but if he does not have a physical copy, perhaps Dreamhost has a backed up version on their servers. It may be worth asking them.

  6. (Power Level: 68)
    Posted September 4, 2007 at 9:03 am | Permalink

    He told me he backed everything up XD I got an email from his this morning telling me he was using the same file and said thanks to your post he’ll be able to get everything rolling by tonight – which is good cause I wanna do a sleezy school days post! ^_^

  7. (Power Level: 2351)
    Posted September 4, 2007 at 9:11 am | Permalink

    Tell him to hurry up, because I want to read your sleazy School Days post, lol =D. When stripped of kruft, our database was about 7 Mb, and took maybe 30 seconds to convert.

  8. (Power Level: 2351)
    Posted September 4, 2007 at 12:34 pm | Permalink

    I should mention that so far, Chris has helpfully pointed out one comment that got messed up in the conversion. It was an apostraphe that got converted into three characters of gibberish.

    So far, in a few cases where commenters used the apostraphe, ellipses , or quotes, the symbol got changed to junk. So I guess the plugin is not perfect. However, from what I can tell, the plugin worked correctly in most cases (in posts, in links, etc.).

  9. Jesus159159159
    (Power Level: 735)
    Posted September 4, 2007 at 3:10 pm | Permalink

    *stares at loli pictures* Well, Mr. Kabitzin, looks like you made me turn my PC into a MAC again! *BA DUM CHING!*…

    Anyways, if I even make a blog/website, I’ll make sure to visit these helpful guides again! :)

  10. (Power Level: 18)
    Posted September 4, 2007 at 8:40 pm | Permalink

    You know, last night I was juuuuuuuuust about to use that plugin before my DNS got screwed up and I couldn’t access my original host. Seeing this post was helpful since it let me know that it does all the work for me. The DNS issue straightened itself up this morning and I ran the plugin tonight with zero problems. Yippee!

2 Trackbacks

  1. By Sea Slugs! Anime Blog » Find And Replace on September 4, 2007 at 1:29 pm

    [...] of course the UTF-8 conversion could not go off without a hitch. It seems that comments are sometimes not converted correctly, [...]

  2. [...] then decided to take a risk with the plugin that Kabitzin used last year even though the plugin author only maintained it up to WP 2.2 and the relevant WP Forum [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*
:blank: :blush: :cool: :cry: :eek: :grin: :hmm: :lol: :love: :mad: :| ;P :( :o :) ;)

Subscribe without commenting

  • Recent Comments

  • Series Watched

  • Search

  • Abridged Blogroll

  • Useful Links

  • Buttons

  • Meta