I believe, Ghost does not set the connection to utf8 when connecting to the MySQL server, which results in UTF-8 characters being displayed as "?" or similar characters, if the MySQL server itself is not configured to use UTF-8 as default. I am not 100% sure if this is an issue with Ghost or something else, but I've got the feeling that it is a Ghost problem.
I have converted my Ghost database to MySQL locally. Everything was fine there. Then I moved the database to a different server, where the characters where then interpreted wrongly.
I tripple-checked that the database and the tables used for Ghost were set to UTF-8, that the data in the tables were correctly encoded, that the output HTML page is served with UTF-8 encoding etc. Everything was as it should be.
The only difference between my local MySQL server and the production server was that in the server configuration itself the local server had all its defaults set to utf8 and the production server was set to Latin-1. Only when I changed the production MySQL server config and restarted it, the content was displayed correctly. I did not have to change anything else or re-import the tables.
Because the only way to fix this was to change the MySQL server config and any other MySQL client could read the UTF-8 encoded data correctly without the defaults being changed, I believe that Ghost doesn't set the connection to UTF-8, something like SET NAMES utf8.
My database object in the config.js file:
database: {
client: 'mysql',
connection: {
host: '...',
port: '...',
user: '...',
password: '...',
database: '...',
charset: 'utf8'
}
}
I could solve this issue, so that's no personal problem for me, but I think it should always work for anybody, no matter what the MySQL default settings are. Some users may not have access to the config or must use a shared MySQL server where they can't change the defaults.
Since the first version of Ghost we have included charset: 'utf8' in the configuration. This is used by bookshelf/knex to handle encoding data sent to the database. However, unfortunately the version of the mysql binding library that was available at that time simply ignored this setting because it was lowercase and used whatever the database was set to.
This has put us in a position where, unless your database was set to utf8, all of your data will have been double encoded, as data is sent in utf8 but stored however your MySQL configuration says to store it.
This is only an issue for anyone using MySQL who does not have their DB set to utf8. We are aware of the problem, and at some point need to write a migration path so that we can upgrade the version of the mysql bindings to a version which does use the utf8 setting provided by Ghost. However, that's not a trivial task to fix something which doesn't cause problems with running Ghost itself.
In this case, you're hitting the problem because you're trying to migrate data between two databases with different settings. Short term you have two options: 1) make sure the databases have the same configuration or 2) migrate using Ghost's import/export tools instead of moving the DB
This issue is related to #5856 - the main difficulty this causes is that non-ascii chars in the database will appear unreadable if navigating the DB directly, because of the double encoding.
Wow, this was really hard to solve! I think this should be included in the installation manual somewhere.
Anyone else encountering this problem, this is how I was able to solve it, following the steps described here: http://www.whitesmith.co/blog/latin1-to-utf8/
Basically the steps are:
mysqldump with --default-character-set=latin1 to create a backup.sqlbackup.sql with a text editor and replace every occurrence of latin1 with utf8 (there will be two: after SET NAMES and after creating the database)[mysqld]
character-set-server = utf8
collation-server = utf8_unicode_ci
backup.sqlEverything should be fine now.
I was just re-reading this and figured I should point out, the steps above only work if your database was set to latin1 - there are other possibilities so you should check what your DB was set to before running anything like this as you could end up triple-encoding 馃
I have the issue after migrate to Ghost 0.9. Now my latin characters looks like ?.
It's so strange because with the same database data and configuration, in local works fine.
Notes also that the database data is correct, just is a problem retrieving from the client.
Hey @Kikobeats Sorry to hear you are having trouble.
Did you try https://github.com/TryGhost/Ghost/issues/5945#issuecomment-184031511?
@kirrg001 yes, but in my case I did:
npm shrinkwrap.And now works fine!
Still got bit by this bug, the correct way (afaik) to declare utf8 is:
charset: 'UTF8_GENERAL_CI'
In part this bug is related to https://github.com/tgriesser/knex/issues/168#issue-26175740
It would be good if someone could edit the documentation in http://support.ghost.org/config/ if it's official.
@riveraja this bug is still open because the problem still exists. All Ghost blogs running on MySQL currently have double encoding. This is annoying, yes, but doesn't actually break anything.
The issue will be resolved in Ghost 1.0 and as everyone will have to go through a migration process, we'll have the perfect opportunity to resolve the issue for every blog running on MySQL, which we've never had previously.
Encoding can be a pain, same as timezones. As @ErisDS pointed out, this is only interesting for people who didn't configure their database to utf8.
So MySQL has fixed their charset issue in 2.3.0. In any previous version the utf8 charset option was not accepted and that has caused the connection to be set what your database is configured to. See https://github.com/mysqljs/mysql/blob/master/Changes.md#v230-2014-05-16
Actually if your database is configured to e.g latin1, it's not possible to store russian characters, as this is not supported. But with Ghost LTS you can insert any utf8 character and it's displayed correctly. I assume this is why the tables are created correctly, because the utf8 charset option was not ignored in Knex (see https://github.com/tgriesser/knex/blob/master/src/dialects/mysql/schema/tablecompiler.js#L38). You can see that by looking at the collation and charset option in the table schema.
If your database is configured to e.g latin1 and you change your database configuration to utf8, you won't solve the problem, because you would see that everything is double encoded. Updating then to the latest MySQL version, won't help either.
So here is my suggestion
I only found one true solution to this problem. Dumping your database with --skip-set-charset option, reimport with utf8 encoding and then you are able to use the new MySQL version.
http://dev.mysql.com/doc/refman/5.7/en/mysqldump.html#option_mysqldump_set-charset
Ghost 1.0.0 already uses the MySQL version >= 2.3.0 and as described, LTS users, which are affected, cannot simply update to the newer MySQL version. So the solution i am suggesting is, to not update the MySQL version for LTS users, but care about this problem and make it part of the migration path to 1.0.0.
mysqldump -u root -p --skip-set-charset DATABASE_NAME > ghost-dump.sql
mysql -u root -p --default-character-set=utf8 DATABASE_NAME < ghost-dump.sql
No matter if your database is configured to utf8 or latin1 or something else, you will get a dump of your database in the configured encoding and then import as utf8.
Ghost 1.0.0 uses utf8mb4 encoding, but that shouldn't make any difference.
This is great thanks 馃憤
When considering the migration path from LTS to 1.0, if we use an import -> export mechanism, this won't be a problem.
However, if we create a direct migration script we will need to include a check to see if a user was using MySQL and their DB was not utf8 and in that case ensure we provide a tool or step to handle this, recreating the MySQL database with the correct setting.
Anyone who starts on 1.0 won't encounter this.
The only TODOs here for Ghost 1.0 / Ghost-CLI is to:
However, if we create a direct migration script we will need to include a check to see if a user was using MySQL and their DB was not utf8 and in that case ensure we provide a tool or step to handle this, recreating the MySQL database with the correct setting.
The commands (mysqldump and reimport) should work for every user with MySQL. So there is the option to make this part of a migration script, if we decide to go with a script.
I will think and care about the TODO's you have listed 馃憤
Determine, is there some sort of check we should build into Ghost-CLI to ensure the DB is configured correctly?
That is not needed in my opinion. It doesn't matter what your MySQL database is configured to. As long as we
2.3.0 MySQL)Closing. We have opened a migration path issue, which keeps track of the notes here, see #8141.
Most helpful comment
I was just re-reading this and figured I should point out, the steps above only work if your database was set to latin1 - there are other possibilities so you should check what your DB was set to before running anything like this as you could end up triple-encoding 馃