Weird character encoding in WordPress
I recently had to migrate an older WordPress site to a new server while building a replacement. As soon as I got it running – with an up-to-date version of WordPress – I noticed strange, stray characters. I certainly didn’t add them, so where do these come from?
Character encoding and why it matters
I won’t go into too much depth about character encoding, but essentially all the content of your WordPress site is stored in a MySQL database, encoded in the UTF8 character set. The UTF8 character encoding set contains all possible characters, from letters to numbers and punctuation.
Early versions of WordPress (pre-2.3) didn’t specifically declare what character set to use at installation, so the default set – usually Latin1 – was used instead. WordPress, however, would still attempt to store your content in the UTF8 charset.
The result? A look at the front-end of your site showed all kinds of unusual and unexpected characters.
Fixing the problem
Once I knew what the problem was, I found a simple fix. No charset had been defined, so I added in the following two lines to my wp-config.php file.
define('DB_CHARSET', 'utf8'); /* Database Charset. */
define('DB_COLLATE', ''); /* Database Collate type. */
I saved the file, crossed my fingers and refreshed… and the problem was immediately fixed. Whether this will work 100% of the time I couldn’t say, but in the hopes that it will serve as a reminder to myself – and a possible help to others – I thought I’d better make note of it!