Icinga2: Status of UTF8 support and MYSQL

Created on 13 Jul 2017 · 13Comments · Source: Icinga/icinga2

So I've set up "standard" install (debian9, packages from packages.icinga.org, mariadb) and it seems impossible to get UTF8 working:

icingaweb2 breaks on some collate errors when filtering by custom attributes ( Icinga/icingaweb2#2508 ). Removing every occurence of COLLATE from icingaweb2 code seems to fix it (find . -type f -exec perl -pi -e 's/COLLATE latin1_general_ci//g' {} \;) with no side effect that I've found (after i dropped latin1 from database)
DB is created in utf8 with collation in utf8_general_ci yet a bunch of fields have its character set or collation switched to latin1. but that is only for mysql. Is that some legacy code ?
why is collation to latin1 even used ? that basically makes it wrong for anything but english/ascii
icinga2 seems to just insert data in wrong charset, everything utf8 is garbled, whether it is comment, service name or service check plugin output. My guess is that connector specifies wrong encoding as everything is 100% correct when using pgsql

But then switching to postgresql "magically" fixes every problem. I remember that i had some "fun" in the past with mysql's utf8 but as far as I know most installs default to "utf8 everything everywhere" and as long as clients use utf8 everything is fine.

Is there still a need for that collate/encoding circus ? IMO just saying "you have to use mysql configured to utf8" and setting everything to utf8 (with maybe giving user option to select the collation for their language at db creation) would be fine,

regardless of that I think there should be a warning in docs if ido-mysql have some problem preventing utf8 from working correctly and just recommend pgsql in that cases.

aredb-ido no-issue wontfix

Source

XANi

Most helpful comment

So is there correct way how to properly display utf8 characters from Icinga Director utf8 database in Icinga2 latin1 database? I have user names with accented characters which are not displayed correctly in Icingaweb2 despite they are diplayed correcty both in Icinga director database and Icinga Director Web interface.

majales on 25 Jul 2017

👍3

All 13 comments

3613 is related to this. @dnsmichi thoughts?

Crunsher on 14 Jul 2017

Side effects you will recognise are object names. Consider two services:

PING
ping

The case sensitive latin1 collation ensures that a select query on config dump will find 2 result rows, one for each service. If you remove that, you'll highly likely get just one object, and wrong configuration and status updates.

For reference: https://github.com/Icinga/icinga-core/commit/bb1ff18f489fc2495d84cc52db2028fa7efcbaed

That's a problem inherited from the 1.x DB IDO schema where this issue is all about. I don't see any possibility with a reasonable amount of time and resources to add changes here. Basically, the IDO 1.x schema is a dead end and I personally have given up on it. Only a new backend may ensure full utf8 support.

dnsmichi on 20 Jul 2017

@dnsmichi: don't worry, there are case sensitive utf8 collations. As more and more distributions switch(ed) to newer MySQL/MariaDB versions with utf8 or utf8mb4 as their default encoding, this problem will get worse and worse unless we specify a specific encoding in our schema and on for our clients (read: IDO) at connection time.

Thomas-Gelf on 20 Jul 2017

The case sensitive latin1 collation

latin1_general_ci is case i*ndepent, Vast majority of schema uses, or defaults to that. Only parts that are not AFAIK are:

cat  schema/mysql.sql |grep latin1_general_cs
  varname varchar(255) character set latin1 collate latin1_general_cs default NULL,
  varname varchar(255) character set latin1 collate latin1_general_cs default NULL,
  display_name varchar(255) character set latin1 collate latin1_general_cs  default '',
  name1 varchar(128) character set latin1 collate latin1_general_cs  default '',
  name2 varchar(128) character set latin1 collate latin1_general_cs default NULL,
  program_version varchar(64) character set latin1 collate latin1_general_cs default NULL,
  endpoint_name varchar(255) character set latin1 collate latin1_general_cs default NULL,
  display_name varchar(255) character set latin1 collate latin1_general_cs  default '',

from schema. And none in icingaweb2 code (there are a lot of unnecessary collations to _ci tho). And _ci collation is default so I don't think they actually do anything most of the time (except few that used display name). Kinda looks it was coded either "just in case" or " old mysql did something weird and required it", but as it is now it is not really useful. There is also utf8_bin collate for those case that it is really needed to have case-dependent group-by,

I wanted to use mysql backend because mysql have really nice and easy to configure master/master replication which makes making whole setup HA much easier but it seems there is no way to make it work with UTF8 ;/

If you remove that, you'll highly likely get just one object, and wrong configuration and status updates.

aren't services identifies by ID in DB tho ? only obvious one seems to be having to group by name on display_name but honestly so far only time when I had "same word, different case" there was "someone made a typo then fixed it", I don't think there is very many valid cases of having "ping", "Ping" and "PING" service, all on same host

XANi on 20 Jul 2017

@XANi: drop your schema, re-create it while using a latin1 client collation and you should be fine. Still, homework has to be done at our side - as far too many people installing Icinga on a recent system fall into this issue.

@dnsmichi: solving this would mean either hard-coding latin1 to the schema files or to become collation-aware.

Thomas-Gelf on 20 Jul 2017

@XANi: even with the schema created as latin1, Icinga 2 would store utf8-encoded data to it and completely ignore the collation. Icinga Web 2 is aware of using a latin1 connection BUT also expects to get utf8 characters over it - as that's what the core always shipped. All historic attempts to get encoding rules specified failed, sorry. Won't happen with our upcoming IDO successor.

Thomas-Gelf on 20 Jul 2017

@XANi: drop your schema, re-create it while using a latin1 client collation and you should be fine. Still, homework has to be done at our side - as far too many people installing Icinga on a recent system fall into this issue.

At least warning should be provided, "if you need utf8 support easiest way to do it is to use postgresql" as just that will probably save people some hours of debugging.

The default encoding in mysql should probably be utf8m4 (because mysql half-assed utf8 support, and someone someday will probably also complain that putting emoji crap in checks/commets doesn't work...) + utf8mb4_unicode_ci (because it actually tries to do decent language-aware sorting as opposed to *_general_ci, altho I guess it depends if UI actually needs sorted data). But then it has to be recent enough version of server to actually have that features and that can be a mess too ;/

XANi on 20 Jul 2017

@XANi: no need to use PostgreSQL. Use MySQL or MariaDB, but let the schema be latin1. Icinga 2 does not care and will happily store UTF8 data nonetheless. IcingaDB will (once available) happily deal with all that emoji crap, as it already uses utf8mb4 everywhere.

Thomas-Gelf on 20 Jul 2017

aren't services identifies by ID in DB tho ?

No, the IDs are only available inside the database table icinga_objects. Each time the configuration is dumped/updated, the ID cache is filled with basically this query:

For services (hosts put name2=NULL)

select object_id from icinga_objects where name1='hostname' and name2='servicename'

If you would change cs to ci (which is the default), you cannot deterministically fetch the unique object id by the name tuple.

That's a historical one, and attempts to solve this i.e. with using BINARY selects heavily failed. The main point is - if schema changes happen, this must be taken into account as application dependency.

dnsmichi on 20 Jul 2017

majales on 25 Jul 2017

👍3

This won't be fixed for the IDO schema. Just to provide an update, closing this.

dnsmichi on 3 Jul 2018

I found a workaround for this issue, and because this was one of three hits while googling the error message, I'm going to post it here.

In my case, this is what happened:

Icinga2 and icingaweb2 worked without issues immediately after installing them on Ubuntu 18.04 according to the official instructions.
Upgrading to Ubuntu 19.10 for unrelated reasons left icinga2 and icingaweb2 mostly functional.
As part of the upgrade, mysql 5.x was replaced with mysql 8.0.
After the upgrade, clicking a service in icingaweb2 and then clicking the History tab resulted in a very long error message starting with "General error: 1267 Illegal mix of collations (latin1_general_ci,EXPLICIT) and (utf8mb4_0900_ai_ci,COERCIBLE) for operation" instead of the history I was expecting.

The solution was:

Make a backup of /etc/mysql/mysql.conf.d/mysqld.cnf.
Open mysqld.cnf in an editor.
In the [mysqld] section, add the line character-set-server=latin1
Save the file and close the editor.
Run sudo systemctl restart mysql

After performing those steps, clicking the History tab on a service showed the history I was expecting. However, I recommend exercising caution before using this workaround for the following reasons:

It was the result of a lucky guess, and I don't claim to have a full understanding of the effect it will have on mysql or icinga2. It may have resolved that error while causing other problems that are not as evident. Because I'm just running icinga2 for fun, this was not a problem for me, but it may be a problem for people who depend on icinga2 for something serious.
Setting this variable affects all clients connecting to mysql. In my case, there were no other applications using the same database server to worry about, but people running multiple applications may want to think about how this configuration would affect them.
There may be better workarounds that I did not try, such as downgrading mysql or doing a fresh install on Ubuntu 19.10.
Unicode is great, so this is kind of throwing the baby out with the bath water.

erpo41 on 2 May 2020

@erpo41 I just ended up using PostgreSQL. Kinda shame as the original reason for using MySQL for me was wanting to use Galera replication but oh well, I'd rather not have hacks like treating utf strings as binary, that just makes everything else more complicated

XANi on 3 May 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Icinga 2.11.0 RC1/Icingaweb 2.6.3: Icinga Web 2 does not properly detect Icinga 2

peteeckel · 4Comments

Acknowledgement comments with newlines break emails using default notify scripts.

BuildTheRobots · 6Comments

[dev.icinga.com #13215] Newly configured already running ScheduledDowntime not put into effect

icinga-migration · 6Comments

icinga2 node setup: Permission denied: "/etc/icinga2/zones.conf", "/etc/icinga2/zones.conf.orig"

Al2Klimov · 6Comments

Documentation about dictionaries and assignements

ghost · 4Comments