I recently completed a project that populated the database with UTF8 encoded text. My script was able to create the database using the correct character-set, but the text data could not be properly imported. Apparently, the initdb scripts are assumed to be ascii encoded, so I had to make a minor modification to docker-entrypoint.sh.
mysql=( mysql --protocol=socket --default_character_set=utf8 -uroot )
This change was not a big deal for me, but I am raising the question as to whether this would be a good default. I can't think of any way it breaks compatibility with existing scripts, and the new default would provide a real benefit to anyone pre-populating with UTF8 encoded data.
Of course, an even better solution may be to use a variable.
Please note, this is different from the discussion on the initial database's character-set (#14). This is a discussion about the encoding of the docker-entrypoint-initdb.d script files.
I'm having a similar issue. I can source files in utf-8 and they work fine, but I can't use utf-8 when I type in SQL commands, even after setting:
set character_set_client='utf8';
set character_set_connection='utf8';
set character_set_database='utf8';
set character_set_results='utf8';
set character_set_server='utf8';
show variables like 'char%';
Based on some reading, I think the solution is right here:
https://github.com/docker-library/postgres/blob/8e867c8ba0fc8fd347e43ae53ddeba8e67242a53/9.5/Dockerfile#L23
The container operating system is not set, or not set to utf8. This line sets it. If that works and you're willing to merge it, I can try it out and submit a pull request.
Any updates on this? I would prefer if I don't have to build my MariaDB container locally with a patched Dockerfile.
I'd recommend adding --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci to your container command, as noted in the image description under "Configuration without a cnf file" (https://github.com/docker-library/docs/blob/93eaf4f47c52c4b65d1cd31a5dcaa257848e5f13/mysql/README.md#configuration-without-a-cnf-file). That should set the default for the entire daemon.
See also https://github.com/docker-library/mysql/issues/32#issuecomment-197026311 and https://github.com/docker-library/docs/pull/502.
This does not solve the problem.
With using the two flags, I get:
-- init.sql
CREATE TABLE `pages` (
`title` varchar(75) NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `pages` (`title`) VALUES
('SÃ¥ fungerar det');
> select title from pages limit 1;
+--------------------+
| title |
+--------------------+
| SÃ¥ fungerar det |
+--------------------+
1 row in set (0.00 sec)
I believe the problem is not the default database character encoding setting, but as mentioned above, that the moment of initial import which happens from the shell towards the database imports the database with the wrong character encoding. As one can see I already set specifically for the table the utf8 character set.
I would like to rather see the same change as for Postgres: https://github.com/docker-library/postgres/blob/8e867c8ba0fc8fd347e43ae53ddeba8e67242a53/9.5/Dockerfile#L23
@djui - nice simple test, though you have an extra comma after NULL,.
On Ubuntu 16.04 running his test I get
select title from pages;
+----------------+
| title |
+----------------+
| S fungerar det |
+----------------+
The issue may be with the terminal in the docker container and not in MySQL directly because I don't seem to have problems if I source a sql file that contains non-ascii characters. It's not a docker issue because non-ascii characters work out of the box with the Postgres docker container. It's just something about how the MySQL docker container is (was?) set up.
Correct, the issue is not MySQL, but the moment the terminal (shell) imports the SQL statements. One simple proof (in this case for MariaDB, but same for MySQL):
# docker-compose.yml
:
db:
#image: mariadb:latest
build:
context: .
dockerfile: Dockerfile.mariadb
ports:
- '3306:3306'
environment:
- MYSQL_ROOT_PASSWORD=secret
- MYSQL_DATABASE=foo
- MYSQL_USER=bar
- MYSQL_PASSWORD=baz
volumes:
- data:/var/lib/mysql
- ./init.sql.gz:/docker-entrypoint-initdb.d/init.sql.gz
:
# Dockerfile.mariadb
FROM mariadb:latest
# make the "en_US.UTF-8" locale so ~postgres~ mysql/mariadb will be utf-8 enabled by default
RUN apt-get update && apt-get install -y locales && rm -rf /var/lib/apt/lists/* $
&& localedef -i en_US -c -f UTF-8 -A /usr/share/locale/locale.alias en_US.UTF-8
ENV LANG en_US.utf8
-- init.sql
CREATE TABLE `pages` (
`title` varchar(75) NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `pages` (`title`) VALUES
('SÃ¥ fungerar det');
MariaDB [foo]> select title from pages limit 1;
+------------------+
| title |
+------------------+
| SÃ¥ fungerar det |
+------------------+
1 row in set (0.00 sec)
Can you try again adding -e LANG=C.UTF-8 to your docker run line and see if that's sufficient?
I tried -e LANG=en_US.utf8 which didn't work. -e LANG=C.UTF-8 seems to work. That would be a good enough fix! Thanks.
@tianon -e LANG=C.UTF-8 worked for me from the docker run command line! Yay! Thank you!
Here comes the no-brainer one liner fix for dummies like me. Place it to your Dockerfile.
ENV LANG=C.UTF-8
Closing, since this is solved! :+1:
Can't we have this mostly harmless setting built into the official image? I don't believe it will impact existing scripts since UTF-8 is ASCII compatible.
I will do a pull request during the New Years holiday if nobody beats me to it. I have to still build it and test it.
I guess it depends on what the impact of setting LANG would be on the default behavior of mysqld (since this issue is only regarding the client, which is going to apply regardless of which image is used for the client-side, and will be unaffected by the daemon itself including LANG).
If LANG does have a negative impact on mysqld (or even just changes the default behavior) then we might need to instead just document that the mysql client relies on LANG and that C.UTF-8 is a good value that's available/working out-of-the-box for folks who need UTF-8.
Alternatively, make the change that I noted at the top and have no potential side effects beyond the mysql command. Hmm.
As I understand setting LANG to C.UTF-8 makes mysql client to use utf-8 by default. But I think that better option would be set up this option directly in the mysql config file alongside with the server options. Like this:
; /etc/mysql/conf.d/utf8.cnf
[mysqld]
init_connect=‘SET collation_connection = utf8_unicode_ci’
character-set-server = utf8
collation-server = utf8_unicode_ci
[client]
default-character-set = utf8
client -> default-character-set = utf8 makes the same thing but it is more obvious and transparent then relaying on the LANG env.
Just in case someone is interested in writing a proper docker-compose.yml that supports Unicode in a MySQL-compatible database, they may use something like
services:
db:
image: mariadb:10.4.8
command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
environment:
LANG: C.UTF-8
MYSQL_ROOT_PASSWORD: root
volumes:
- ./db:/docker-entrypoint-initdb.d:ro
Setting LANG=C.UTF-8 causes the mysql client to assume the utf8 encoding. This does not support characters outside the BMP, for which the utf8mb4 encoding must be used.
The easiest way I have found to support non BMP characters in the sql files in /docker-entrypoint-initdb.d/ is to prefix the files with SET NAMES utf8mb4;. There is then no need to alter any configuration files, or set any environment variables.
For anyone lost in this thread having problems with cyrillic chars or special chars being changed to special non sense characters at init, using docker-compose:
The only thing you need to fix it is :
environment:
LANG: C.UTF-8
In your docker-compose.yml
This fixed my problem with init.sql in /docker-entrypoint-initdb.d with cyrillic texts
I follow your discuss , but mysql prints out like this.
mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.01 sec)
It doesn't work for database and server. So I add a configuaration in docker-compose.yml as the following.
command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
The whole yaml file is:
version: "3"
services:
mysql:
container_name: mysql
image: mysql:5.7
environment:
MYSQL_ROOT_PASSWORD: root
LANG: C.UTF-8
volumes:
- ./mysql:/var/lib/mysql
ports:
- "3306:3306"
command: --character-set-server=utf8mb4 --collation-server=utf8mb4_unicode_ci
Here is the final output.
mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.01 sec)
In case you also want to autoimport .sql file (for example ./db-import/db.sql) into your fresh database then all you need is another volume mounted in /docker-entrypoint-initdb.d directory.
From the docs:
When a container is started for the first time, a new database with the specified name will be created and initialized with the provided configuration variables. Furthermore, it will execute files with extensions .sh, .sql and .sql.gz that are found in /docker-entrypoint-initdb.d. Files will be executed in alphabetical order. You can easily populate your mysql services by mounting a SQL dump into that directory and provide custom images with contributed data. SQL files will be imported by default to the database specified by the MYSQL_DATABASE variable.
If file is big with very long values to handle then the import may fail with _'MySql server has gone away'_ error. To avoid this just set bigger max-allowed-packet parameter:
version: "3"
services:
mysql:
container_name: mysql
image: mysql:5.7
environment:
MYSQL_ROOT_PASSWORD: root
LANG: C.UTF-8
volumes:
- ./mysql:/var/lib/mysql
- ./db-import:/docker-entrypoint-initdb.d
ports:
- "3306:3306"
command: [ "--character-set-server=utf8mb4", "--collation-server=utf8mb4_unicode_ci", "--max-allowed-packet=67108864" ]
I tried
-e LANG=en_US.utf8which didn't work.-e LANG=C.UTF-8seems to work. That would be a good enough fix! Thanks.
Solved my problem. thank you.
Most helpful comment
I tried
-e LANG=en_US.utf8which didn't work.-e LANG=C.UTF-8seems to work. That would be a good enough fix! Thanks.