So I've set up "standard" install (debian9, packages from packages.icinga.org, mariadb) and it seems impossible to get UTF8 working:
icingaweb2 breaks on some collate errors when filtering by custom attributes ( Icinga/icingaweb2#2508 ). Removing every occurence of COLLATE from icingaweb2 code seems to fix it (find . -type f -exec perl -pi -e 's/COLLATE latin1_general_ci//g' {} \;) with no side effect that I've found (after i dropped latin1 from database)
DB is created in utf8 with collation in utf8_general_ci yet a bunch of fields have its character set or collation switched to latin1. but that is only for mysql. Is that some legacy code ?
But then switching to postgresql "magically" fixes every problem. I remember that i had some "fun" in the past with mysql's utf8 but as far as I know most installs default to "utf8 everything everywhere" and as long as clients use utf8 everything is fine.
Is there still a need for that collate/encoding circus ? IMO just saying "you have to use mysql configured to utf8" and setting everything to utf8 (with maybe giving user option to select the collation for their language at db creation) would be fine,
regardless of that I think there should be a warning in docs if ido-mysql have some problem preventing utf8 from working correctly and just recommend pgsql in that cases.
Side effects you will recognise are object names. Consider two services:
PINGpingThe case sensitive latin1 collation ensures that a select query on config dump will find 2 result rows, one for each service. If you remove that, you'll highly likely get just one object, and wrong configuration and status updates.
For reference: https://github.com/Icinga/icinga-core/commit/bb1ff18f489fc2495d84cc52db2028fa7efcbaed
That's a problem inherited from the 1.x DB IDO schema where this issue is all about. I don't see any possibility with a reasonable amount of time and resources to add changes here. Basically, the IDO 1.x schema is a dead end and I personally have given up on it. Only a new backend may ensure full utf8 support.
@dnsmichi: don't worry, there are case sensitive utf8 collations. As more and more distributions switch(ed) to newer MySQL/MariaDB versions with utf8 or utf8mb4 as their default encoding, this problem will get worse and worse unless we specify a specific encoding in our schema and on for our clients (read: IDO) at connection time.
The case sensitive latin1 collation
latin1_general_ci is case i*ndepent, Vast majority of schema uses, or defaults to that. Only parts that are not AFAIK are:
cat schema/mysql.sql |grep latin1_general_cs
varname varchar(255) character set latin1 collate latin1_general_cs default NULL,
varname varchar(255) character set latin1 collate latin1_general_cs default NULL,
display_name varchar(255) character set latin1 collate latin1_general_cs default '',
name1 varchar(128) character set latin1 collate latin1_general_cs default '',
name2 varchar(128) character set latin1 collate latin1_general_cs default NULL,
program_version varchar(64) character set latin1 collate latin1_general_cs default NULL,
endpoint_name varchar(255) character set latin1 collate latin1_general_cs default NULL,
display_name varchar(255) character set latin1 collate latin1_general_cs default '',
from schema. And none in icingaweb2 code (there are a lot of unnecessary collations to _ci tho). And _ci collation is default so I don't think they actually do anything most of the time (except few that used display name). Kinda looks it was coded either "just in case" or " old mysql did something weird and required it", but as it is now it is not really useful. There is also utf8_bin collate for those case that it is really needed to have case-dependent group-by,
I wanted to use mysql backend because mysql have really nice and easy to configure master/master replication which makes making whole setup HA much easier but it seems there is no way to make it work with UTF8 ;/
If you remove that, you'll highly likely get just one object, and wrong configuration and status updates.
aren't services identifies by ID in DB tho ? only obvious one seems to be having to group by name on display_name but honestly so far only time when I had "same word, different case" there was "someone made a typo then fixed it", I don't think there is very many valid cases of having "ping", "Ping" and "PING" service, all on same host
@XANi: drop your schema, re-create it while using a latin1 client collation and you should be fine. Still, homework has to be done at our side - as far too many people installing Icinga on a recent system fall into this issue.
@dnsmichi: solving this would mean either hard-coding latin1 to the schema files or to become collation-aware.
@XANi: even with the schema created as latin1, Icinga 2 would store utf8-encoded data to it and completely ignore the collation. Icinga Web 2 is aware of using a latin1 connection BUT also expects to get utf8 characters over it - as that's what the core always shipped. All historic attempts to get encoding rules specified failed, sorry. Won't happen with our upcoming IDO successor.
@XANi: drop your schema, re-create it while using a latin1 client collation and you should be fine. Still, homework has to be done at our side - as far too many people installing Icinga on a recent system fall into this issue.
At least warning should be provided, "if you need utf8 support easiest way to do it is to use postgresql" as just that will probably save people some hours of debugging.
The default encoding in mysql should probably be utf8m4 (because mysql half-assed utf8 support, and someone someday will probably also complain that putting emoji crap in checks/commets doesn't work...) + utf8mb4_unicode_ci (because it actually tries to do decent language-aware sorting as opposed to *_general_ci, altho I guess it depends if UI actually needs sorted data). But then it has to be recent enough version of server to actually have that features and that can be a mess too ;/
@XANi: no need to use PostgreSQL. Use MySQL or MariaDB, but let the schema be latin1. Icinga 2 does not care and will happily store UTF8 data nonetheless. IcingaDB will (once available) happily deal with all that emoji crap, as it already uses utf8mb4 everywhere.
aren't services identifies by ID in DB tho ?
No, the IDs are only available inside the database table icinga_objects. Each time the configuration is dumped/updated, the ID cache is filled with basically this query:
For services (hosts put name2=NULL)
select object_id from icinga_objects where name1='hostname' and name2='servicename'
If you would change cs to ci (which is the default), you cannot deterministically fetch the unique object id by the name tuple.
That's a historical one, and attempts to solve this i.e. with using BINARY selects heavily failed. The main point is - if schema changes happen, this must be taken into account as application dependency.
So is there correct way how to properly display utf8 characters from Icinga Director utf8 database in Icinga2 latin1 database? I have user names with accented characters which are not displayed correctly in Icingaweb2 despite they are diplayed correcty both in Icinga director database and Icinga Director Web interface.
This won't be fixed for the IDO schema. Just to provide an update, closing this.
I found a workaround for this issue, and because this was one of three hits while googling the error message, I'm going to post it here.
In my case, this is what happened:
The solution was:
After performing those steps, clicking the History tab on a service showed the history I was expecting. However, I recommend exercising caution before using this workaround for the following reasons:
It was the result of a lucky guess, and I don't claim to have a full understanding of the effect it will have on mysql or icinga2. It may have resolved that error while causing other problems that are not as evident. Because I'm just running icinga2 for fun, this was not a problem for me, but it may be a problem for people who depend on icinga2 for something serious.
Setting this variable affects all clients connecting to mysql. In my case, there were no other applications using the same database server to worry about, but people running multiple applications may want to think about how this configuration would affect them.
There may be better workarounds that I did not try, such as downgrading mysql or doing a fresh install on Ubuntu 19.10.
Unicode is great, so this is kind of throwing the baby out with the bath water.
@erpo41 I just ended up using PostgreSQL. Kinda shame as the original reason for using MySQL for me was wanting to use Galera replication but oh well, I'd rather not have hacks like treating utf strings as binary, that just makes everything else more complicated
Most helpful comment
So is there correct way how to properly display utf8 characters from Icinga Director utf8 database in Icinga2 latin1 database? I have user names with accented characters which are not displayed correctly in Icingaweb2 despite they are diplayed correcty both in Icinga director database and Icinga Director Web interface.