5 Dec, 2008 in mysql by khelll

Changing database encoding from latin1 to UTF8

Now a days, UTF-8 is the most used data encoding format, and the fact that your database is not using UTF8 encoding is really annoying, specially additionally when it comes to integrating different systems, that has no one unified encoding format.
So if you think it’s time to change your data encoding to utf8 format, then here what this post is all about.
I’ll list here the steps to do so, i just have to clarify that the main data encoding here is windows-1256 (which is the main Arabic encoding used in web applications), but it’s saved in latin1 encoding in the database (mydata ->windows-1256 -> latin1) ,also note that i’m using Mysql database.
Here are the steps:

  1. Export (only) the schema of the db,without “set Names …” phrase in the outputted sql file, this will bring you back the data in the original encoding (windows-1256)
    mysqldump --default-character-set=latin1 --skip-set-charset -d -uusername -ppassword db_name > db_name_schema.sql
  2. Export the data of the db without “set Names …” phrase in the outputted sql file, this will bring you back the data in the original encoding (windows-1256):
    mysqldump --default-character-set=latin1 --skip-set-charset -t -uusername -ppassword  db_name > db_name_data.sql
  3. Change the encoding of both files from arabic to utf8 -check the notes if you r using windows
    iconv -f windows-1256 -t utf8 db_name_schema.sql > db_name_schema_utf8.sql
    iconv -f windows-1256 -t utf8 db_name_data.sql > db_name_data_utf8.sql
  4. Open the file ‘db_name_schema.sql’ with any editor and replace each “DEFAULT CHARSET=latin1” phrase with “DEFAULT CHARSET=utf8” one
  5. Make a new db ,encoded in utf8:
     mysql -uusername -ppassword -e 'CREATE DATABASE new_db DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci'
  6. Import the schema and data in utf8 encoding:
    mysql --default-character-set=utf8 -uusername -ppassword new_db < db_name_schema_utf8.sql
    mysql --default-character-set=utf8  -uusername -ppassword new_db < db_name_data_utf8.sql

notes

  • If you are wondering why to separate schema from data upon exporting , then the answer is that the operation of replacing “DEFAULT CHARSET=latin1” phrase with “DEFAULT CHARSET=utf8” one , is taking place only on schema files, so it’s recommended to separate them so that you dont stuck when loading the big data files.
  • If you are a windows user and can’t use iconv , then u can use any editor to do the job for u, try scite or note++ or even dreamweaver

enjoy!!!

3 Responses so far | Have Your Say!

  1. hairy mary - Gravatar

    hairy mary  |  March 20th, 2009 at 7:09 pm #

    nice work. useful. thx.

  2. Geanciede - Gravatar

    Geanciede  |  May 20th, 2009 at 9:51 am #

    Hello Sir:
    Thanks for posting these useful infirmation. Keep them coming

  3. Fixing mixed-encoding MySQL dumpfiles with WordPress - Gravatar

    Fixing mixed-encoding MySQL dumpfiles with WordPress  |  May 26th, 2009 at 8:20 pm #

    [...] your data has a ‘latin1′ somewhere in it, either edit the dumpfile by hand or read this and dump your schema separate from your data. My data was clean so I just used Sed to replace the [...]

Leave a Feedback

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">