Icinga2: Problem with zones after upgrading to 2.11

Created on 20 Sep 2019  路  6Comments  路  Source: Icinga/icinga2

Describe the bug

Hi, I have some problems after upgrading a 2.10 installation to 2.11.
I have one master, one satellite and a few agents; the configuration is top-bottom.

MASTER
The master defines the satellite endpoint and zone in the main zones.conf:

object Endpoint "srv06.foobar.local" { }
object Zone "srv06.foobar.local" {
    endpoints = [ "srv06.foobar.local" ]
    parent = "master"
}

The directory zones.d/srv06.foobar.local/ contains a hosts.conf file with the definition of the host srv06.foobar.local:

object Host "srv06.foobar.local" {
....

The same directory zones.d/srv06.foobar.local/ also contains a childs.conf file with the definition of the agent's endpoints and zones:

object Endpoint "srv01.foobar.local" { }
object Zone "srv01.foobar.local" {
    endpoints = [ "srv01.foobar.local" ]
    parent = "srv06.foobar.local"
}

object Endpoint "srv03.foobar.local" { }
object Zone "srv03.foobar.local" {
    endpoints = [ "srv03.foobar.local" ]
    parent = "srv06.foobar.local"
}
...

Each agent then has its own directory, eg. zones.d/srv01.foobar.local/ containing a hosts.conf file with the host definition:

object Host "srv01.foobar.local" {
...

After upgrading to icinga 2.11, the configuration check spits some warnings:

[2019-09-20 08:42:16 +0200] information/cli: Icinga application loader (version: 2.11.0-1)
[2019-09-20 08:42:16 +0200] information/cli: Loading configuration file(s).
[2019-09-20 08:42:16 +0200] warning/config: Ignoring directory '/etc/icinga2/zones.d/srv01.foobar.local' for unknown zone 'srv01.foobar.local'.
[2019-09-20 08:42:16 +0200] warning/config: Ignoring directory '/etc/icinga2/zones.d/srv03.foobar.local' for unknown zone 'srv03.foobar.local'.

As the warning implies, the directories containing the agent hosts definitions are skipped, and i noticed the agents failed the pre-reload config check.

I found this note in the "Upgrading to v2.11 - Config Sync":

Zone directories which are not configured in zones.conf, are not included anymore on secondary master/satellites/clients.

I then tried to move the endpoint and zone definitions on the master from zones.d/srv06.foobar.local/childs.conf to the main zones.conf.
The master now doesn't complain anymore (no warning about ignored directories), but these zones are not synced to the satellite anymore, as I can see in the satellite log:

[2019-09-20 08:44:55 +0200] warning/ApiListener: Ignoring config update for unknown zone 'srv01.foobar.local'.
[2019-09-20 08:44:55 +0200] warning/ApiListener: Ignoring config update for unknown zone 'srv03.foobar.local'.

Where should i put these zones in order for the master to correctly sync them to the satellite?

  • Version used (icinga2 --version): 2.11.0-1
  • Operating System and version: Amazon Linux AMI 2018.03
  • Enabled features (icinga2 feature list): api checker command ido-mysql mainlog notification
  • Icinga Web 2 version and modules (System - About): 2.7.1, monitoring 2.7.1
  • Config validation (icinga2 daemon -C): see above
  • If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes.
aredistributed

All 6 comments

Hi,

thanks for all the details. I'm trying to follow, so let's sum this up what I get:

  • You're syncing the agent Endpoint/Zone objects to the satellite from zones.d/satellitezonename
  • Each agent has its dedicated zones.d directory with specific local configuration. This is then a local check scheduler and no command_endpoint
  • The master and satellite do not find the agent zone, and therefore deny to include this.

It is a chicken egg problem. With the cluster config sync stages, one improvement was also made - to not include every directory automatically, but only those where Zone objects have been configured. Even this implementation required us to guess from config items (not objects) in advance, but tests have proven reliability.

https://github.com/Icinga/icinga2/issues/6716

The problem you have - since the agent zone object hides in another zone, the config compiler only reads the satellite zone and all its content. At this stage it doesn't know about the config items/objects in there, excluding your agent Zone objects. They will never be synced.

Configuring and syncing Zone objects via the zones.d directory inside the cluster config sync was never supported nor intended in its design. As said, chicken egg problem. If someone says - fix this, I frankly and honestly have no idea how.

The easiest fix is to move the agent Zone objects out of zones.d into the zones.conf of both the masters and satellites. You can also do something like mkdir -p /etc/icinga2/agent.zones.d && echo include_recursive "agent.zones.d"' >> /etc/icinga2/icinga2.conf and put your agent config there.

Disclaimer: That's only needed for agents which have their own zone for the cluster config sync. command_endpoint agents don't require this step.

Cheers,
Michael

I supposed that my setup was somewhat non-standard and hackish, but until now I followed the golden rule "if-it-works-don't-touch-it" :)
I duplicated the agent Zone objects, adding them both in the master and in the satellite's zones.conf; after zapping the old contents of api/zones/ and restarting the service everything started working again.
For the future I'll investigate the usage of command_endpoint, but that will probably take some time/tests to create the specific apply rules for agents.
Thanks a lot for your help!

I also have the same problem with icinga2 2.11. We use Director to configure Zone and Endpoint for master and satellite cluster and don't know how to fix with Director.
So my only solution for now is downgrading to icinga2 2.10 as below:
yum downgrade icinga2-2.10.5-1.el7.icinga.x86_64 icinga2-bin-2.10.5-1.el7.icinga.x86_64 icinga2-common-2.10.5-1.el7.icinga.x86_64 icinga2-ido-mysql-2.10.5-1.el7.icinga.x86_64 -y

@ctrlaltca

I expected things to break with changing this, unfortunately the "other" issue had more importance for making troubleshooting tremendously hard with left-over zones and what not.

If someone comes up with a better solution, or algorithm, feel free to share. My design thoughts are illustrated in detail in #6716 which should help understand the root cause and motivation.

Thanks for your understanding, this helps with the always "bad" feedback after pushing a release.

@latuannetnam

You are using an unsupported scenario unfortunately. While it sounds "easy" to use the Director infrastructure tab, and not care about zones.conf, this brings you into the chicken egg problem again.

Here's some collected infos on the matter: https://community.icinga.com/t/icinga-2-11-released/2255/2

Continue on the community forums please.

@ctrlaltca We're discussing how we can avoid such short comings, we just were not aware that one can build it this way. With 2.11, we tried to be more strict following along for future features we plan to add.

One thing definitely is making the Zone/Endpoint object handling easier. I'm not sure yet how, but rest assured we will be working on this in the future.

Meanwhile we will be discussing next week, if we can "fix the fix", but this is somewhat a Zone inception, a really tough one. For now, I'd suggest rolling back to 2.10.x unfortunately - unless you have adjusted and fixed it already, and are in need of the more stable cluster itself.

Have a nice weekend,
Michael

Was this page helpful?
0 / 5 - 0 ratings