When updating from version 2.10 to 2.11 my config was broken and I got the message "Command endpoint must no be set". It worked before but to get Icinga started I needed to remove all command_endpoint config entries.
In the end I looked at how the error message got merged into the project and discovered that it actually tests for the Zone being set (which I forgot).
I would like to make the error message more helpful and maybe hint at the problem I got. I can make my first pull request in this project if that helps.
The documentation can be adapted to hint in this direction. Or if you'd rather not change anything I would just write a small blog post in my blog.
Hi there,
thanks for creating this issue @kayssun. We experienced the same thing today after upgrading our Icinga2 installation to 2.11.
After fixing two or three specific Service objects by adding zone = host_name, we found Icinga2 would also complain about all apply Service objects our monitoring infrastructure is largely based upon.
To recover from that situation, we downloaded and installed all relevant packages for Icinga2 2.10.4 from https://packages.icinga.com/debian/pool/main/i/icinga2/ for Debian stretch and put the upgrade process on hold by invoking apt-mark hold icinga2 icinga2-doc.
Thanks in advance for looking into this, we will be happy to receive further guidelines about how we could make the upgrade work for us or what we might have missed with adjusting the configuration accordingly.
With kind regards,
Andreas.
This is the error message we are receiving from service icinga2 checkconfig when running our current configuration on Icinga2 2.11.0:
[2019-09-19 13:07:52 +0200] critical/config: Error: Validation failed for object 'nirvana.example.de!Package updates - detailed' of type 'Service'; Attribute 'command_endpoint': Command endpoint must not be set.
Location: in /etc/icinga2/organizations.d/acme/hosts/nirvana.example.de.conf: 72:1-72:43
/etc/icinga2/organizations.d/acme/hosts/nirvana.example.de.conf(70): */
/etc/icinga2/organizations.d/acme/hosts/nirvana.example.de.conf(71):
/etc/icinga2/organizations.d/acme/hosts/nirvana.example.de.conf(72): object Service "Package updates - detailed" {
The code portions are located in Checkable::OnAllConfigLoaded() in lib/icinga/checkable.cpp.
From a configuration perspective, your configuration is located in conf.d rather than zones.d and you are using command_endpoint. You can see that with object list where the zone attribute is empty but command endpoint is set. That's a scenario which "may work" but generally remains unsupported. In order to harden this, the configuration now throws errors cc @Al2Klimov
The most simple solution is to use the master zone (naming depends on your setup), and put everything into zones.d/master instead of conf.d.
Running into the same issue. Why does it suddenly mattter where the config is located? We just have _include_recursive "conf.d"_ in the icinga2.conf. zones.d is unused.
It always mattered. We had seen bugs and issue reports that host/services are not checked, with command_endpoint. Some were related to the API feature, others to object authority, others solely hard to reproduce.
Since the only supported way of using this feature, command_endpoint, with regard to the zone trust hierarchy, this config check now drops an error whenever it detects that the host/service is not put into a zone.
Interesting to see that so many users have been using this though. I thought that zones.d is commonly used with following the distributed monitoring docs and scenarios.
I agree on the fact that the log message needs improvements, but the configuration is still wrong and should throw an error.
@dnsmichi we don't use any sync mechanics and just generate the correct config in place, maybe that is why we never had issues with endpoints.
If the service object exists on the master, but the check command is supposed to run on the slave, should the service object be part of the master zone or of the slave/endpoint's zone?
Take the "master with agents" scenario as example: https://icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/#master-with-agents
There you'll create the configuration bits inside /etc/icinga2/zones.d/master, similar to what you would do with conf.d or any other custom include of yours.
vim /etc/icinga2/zones.conf
object Endpoint "icinga2-master1.localdomain" {
}
object Zone "master" {
endpoints = [ "icinga2-master1.localdomain" ]
}
The above should exist in your setup already.
Then you'll manage your hosts and services, via command_endpoint and executed from the master zone as of now.
mkdir -p /etc/icinga2/zones.d/master
cd /etc/icinga2/zones.d/master
vim icinga2-agent1.localdomain.conf
object Endpoint "icinga2-agent1.localdomain" {
host = "..."
}
object Zone "icinga2-agent1.localdomain" {
endpoints = [ "icinga2-agent1.localdomain" ]
}
object Host "icinga2-agent1.localdomain" {
check_command = "hostalive"
address = "..."
vars.agent_endpoint = name
}
Depending on the service apply rules, put them also into the master zone if only needed there.
vim disk.conf
apply Service "disk" {
check_command = "disk"
command_endpoint = host.vars.agent_endpoint
assign where host.vars.agent_endpoint
}
That being said, re-organize your main entry point from "deployed.config.objects.d" or whatever you use to zones.d/master. Likely just a single line of code to change in your deployment script.
Cheers,
Michael
OK, now I am confused. Until 2.11 we added hosts to the zones.conf:
object Endpoint "icinga2" {
}
object Zone "master" {
endpoints = [ "icinga2" ]
}
object Endpoint "MyClient.domain" {
host = "MyClient.domain"
}
object Zone "MyClient.domain" {
parent = "master"
endpoints = [ "MyClient.domain" ]
}
Then we added the host to conf.d/hosts.conf:
object Host "MyClient.domain" {
import "generic-host"
address = "MyClient.domain"
vars.os = "Windows Server"
vars.os_detail = "Windows Server 2012 R2 Standard"
vars.remote_client = "MyClient.domain"
vars.windows_disk = "1"
}
And had the service in conf.d/services.conf:
apply Service "WindowsDisk" {
import "generic-service"
check_command = "disk-windows"
command_endpoint = host.vars.remote_client
assign where host.vars.windows_disk == "1"
check_interval = 5m
vars.disk_win_warn = "10%"
vars.disk_win_crit = "5%"
}
That worked perfectly until 2.11. Now it throws the error "command_enpoint must not be set". I followed your advice to put the services into zones.d. Now I don't get an error but icinga2 doesn't apply the services to the hosts anymore. What are we doing wrong?
That worked perfectly until 2.11.
Trust me, I am happy that it worked in your environment. With others, it did not so well, that's why the change.
Now I don't get an error but icinga2 doesn't apply the services to the hosts anymore. What are we doing wrong?
Please be as verbose as possible and include the actual error. Also, please show the file tree from zones.d and a config validation output.
icinga2 daemon -C -x notice
ls -lR /etc/icinga2/zones.d
I think I have successfully worked around it by putting zone = "master" into every service object.
The structure we currently use is the following:
zones.conf includes the master endpoint object and the master zone.
zones.d is empty.
conf.d contains several structurally different files, putting them into directories wouldn't be a bad idea:
domain.tld.conf - file containing a Host object, an ApiUser object for that host, a zone object, an endpoint definition, some apply rules for notifications and all the service objects for that host.
dumm.conf - plenty of "false" checkcommands that only exit so that icinga's config validation won't fail. For every service that calls an checkcommand that doesn't exit/has no place on the host we generate a dummy false checkcommand.
From what I gather you suggest moving the domain.tld.conf file to zones.d/master/. We could do that, but I don't really see the different. It feels wrong to have service object definitions in zones.d.
From what I gather you suggest moving the domain.tld.conf file to zones.d/master/. We could do that, but I don't really see the different. It feels wrong to have service object definitions in zones.d.
The latter is the only suggested and supported way. Putting a host and service object into a zone is totally legit.
zone as attribute is an ugly hack in this regard, and complicates debugging one further. Either way, both directions work - pick what fits you best.
PS: Next time, please test the RC. It is always better to discuss such things before any release without any stress before in a hurry fixing things.
Just to understand this correctly... having the object definition in _zones.d/ZONENAME/_ has the same effect as adding _zone=ZONENAME_ in the object definition, thus the directory path is more or less part of the configuration?
I always that that after apply rules are applied and templates are instanced/multiplied by them all the config objects would end up in one big internal pool. Didn't expect icinga2 to consider path segments as part of the config files as part of the config. Isn't there even a away to generate config objects at run time through the api? How does that work in regards to the importance of the config location?
Wasn't even aware that there was an RC to test. We are currently considering a separate test environment for Icinga, once we have that we can certainly test RC.
OK now I put the structure like you said @dnsmichi. I moved the agent zone definitions to etc/icinga2/zones.d/master/zones.conf and left the master definition in /etc/icinga2/zones.conf. I moved the service definitions to /etc/icinga2/zones.d/master. Now the Endpoints are not found:
icinga2 daemon -C -x notice
[2019-09-20 15:03:10 +0200] critical/config: Error: Validation failed for object 'MyClient!LinuxLoad' of type 'Service'; Attribute 'command_endpoint': Object 'MyClient.domain' of type 'Endpoint' does not exist.
Location: in /etc/icinga2/zones.d/master/services/LinuxLoad.conf: 4:9-4:50
/etc/icinga2/zones.d/master/services/LinuxLoad.conf(2): import "generic-service"
/etc/icinga2/zones.d/master/services/LinuxLoad.conf(3): check_command = "load"
/etc/icinga2/zones.d/master/services/LinuxLoad.conf(4): command_endpoint = host.vars.remote_client
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/etc/icinga2/zones.d/master/services/LinuxLoad.conf(5): assign where host.vars.linux_load == "1"
ls -lR /etc/icinga2/zones.d
/etc/icinga2/zones.d:
total 12
drwxrwxr-x 2 root icingaconfig 4096 Nov 9 2018 global-templates
drwxr-xr-x 3 root root 4096 Sep 20 15:09 master
-rw-rw-r-- 1 root icingaconfig 133 May 19 2016 README
/etc/icinga2/zones.d/global-templates:
total 4
-rw-rw-r-- 1 root icingaconfig 2491 Nov 9 2018 customcommands.conf
/etc/icinga2/zones.d/master:
total 12
drwxr-xr-x 2 root root 4096 Sep 20 14:54 services
-rw-r----- 1 root root 5756 Sep 20 14:58 zones.conf
/etc/icinga2/zones.d/master/services:
total 80
-rw-rw-r-- 1 nagios nagios 397 Sep 20 14:08 apt-remote.conf
-rw-rw-r-- 1 root root 210 Nov 9 2018 check_ifconfig.conf
-rw-rw-r-- 1 root root 148 Nov 9 2018 check_linux_memory.conf
-rw-rw-r-- 1 root root 662 Nov 9 2018 check_linux_traffic.conf
-rw-rw-r-- 1 root root 283 Nov 9 2018 dns2.conf
-rw-rw-r-- 1 root root 350 Nov 9 2018 dns.conf
-rw-rw-r-- 1 root root 289 Nov 9 2018 ge2_ppp_VDSL1_Heise.conf
-rw-rw-r-- 1 root root 263 Feb 12 2019 ge3_ppp_VDSL2_Golem.conf
-rw-rw-r-- 1 root root 685 Nov 9 2018 HW_Temp.conf.old
-rw-rw-r-- 1 root root 349 Nov 9 2018 LinuxDisk.conf
-rw-rw-r-- 1 root root 200 Nov 9 2018 LinuxLoad.conf
-rw-rw-r-- 1 root root 243 Nov 9 2018 load-windows.conf
-rw-rw-r-- 1 root root 181 Nov 9 2018 memory-windows.conf
-rw-rw-r-- 1 root root 188 Sep 20 08:26 network.conf
-rw-rw-r-- 1 root root 126 Nov 9 2018 Ping_Ge_PPP.conf
-rw-rw-r-- 1 root root 354 Nov 9 2018 snmp.conf
-rw-rw-r-- 1 root root 245 Nov 9 2018 ssl_cert.conf
-rw-rw-r-- 1 root root 542 Jul 19 15:16 WindowsDisk.conf
-rw-rw-r-- 1 root root 261 Jul 17 13:04 WindowsDisk.conf.old
-rw-rw-r-- 1 root root 248 Nov 9 2018 win_services.conf
The whole zones.conf, zones.d and conf.d is quite unintuitive, I don't know which part I don't get here...
Hi,
I am trying my best to answer your questions, but 3 different questions is really hard. May I suggest continuing over at https://community.icinga.com each with their own topic?
@mgeerdsen
Just to understand this correctly... having the object definition in zones.d/ZONENAME/ has the same effect as adding zone=ZONENAME in the object definition, thus the directory path is more or less part of the configuration?
The whole config sync with zones.d was designed with the problem in mind that users would need to manually define the zone attribute. This really is cumbersome, and complicates the configuration - unless you generate it though.
With putting configuration objects into zones.d/<zonename> and defining <zonename> in zones.conf, the config compiler knows to automatically include this directory.
When the object is constructed, the compiler knows about the origin and automatically adds the zone attribute in the background.
This is a convenient step, with also allowing the configuration files being synced to the secondary master, a satellite zone below and so on.
You can read more about this in the technical concepts chapter: https://icinga.com/docs/icinga2/latest/doc/19-technical-concepts/#config-sync
@mphilipps
Isn't there even a away to generate config objects at run time through the api? How does that work in regards to the importance of the config location?
This one works in the way that you manually specify the zone attribute inside the PUT request for creating the object. This transaction is atomic and therefore needed in this regard.
Additionally, there are config packages which drop files into Icinga via the REST API. That's consumed by the Icinga Director, and also allows to deploy zones.d/<zonename> taking care about the zone membership as well as automated syncs through the cluster.
You can read more about this here: https://icinga.com/docs/icinga2/latest/doc/12-icinga2-api/
Wasn't even aware that there was an RC to test. We are currently considering a separate test environment for Icinga, once we have that we can certainly test RC.
I would recommend to follow our blog on icinga.com/blog or subscribe to the newsletter where these announcements are pushed. Or go by social media via Twitter, FB if you prefer to follow it this way.
In terms of staging environments, also a small Vagrant or Docker deployment could help leveraging the problems before going "live". Times changed somehow, and monitoring became more than just some simple ping checks these days. Icinga is complicated, but also powerful.
@hypermagicmountain
Please show the full output of icinga2 daemon -C -x notice whether to see which config files are actually included and loaded by the config compiler.
Also, please include the definition of your Endpoint object from zones.d/master/zones.conf.
The whole zones.conf, zones.d and conf.d is quite unintuitive, I don't know which part I don't get here...
Meaning to say, you have never worked with it up until now? I've invested quite some time into writing good documentation about it, can you may clarify which parts you don't fully understand?
https://icinga.com/docs/icinga2/latest/doc/06-distributed-monitoring/
Cheers,
Michael
@dnsmichi
Oh, your last request to post the full output of the daemon got me to read all of it :/. In the end, it was a permissions problem, I had to run chown root:icingaconfig on the new zones.conf. It's working now!
I've invested quite some time into writing good documentation about it, can you may clarify which parts you don't fully understand?
The documentation is very detailed, no problems there. I think it's just the terminology and the concept of configuring "zones" when you just want to get some agents running in a single master/no satellites setup. I understand that there may be reasons in terms of scalability / HA Cluster setups.
Thanks for your help and great active community work!
Dear Michael,
thanks a bunch for your valuable suggestions. We moved our custom configuration completely into zones.d/master, invoked
rm -rf /var/lib/icinga2/api/zones/*
rm -rf /var/lib/icinga2/api/zones-stage/*
systemctl restart icinga2
on some hosts as outlined over at #7516 and added
deb http://ftp.debian.org/debian stretch-backports main
to the apt configuration on two other hosts in order to make the more recent libboost-1.67.0 available to the system.
While I am writing this, our monitoring system is recovering. The master host as well as all satellite hosts are now running Icinga 2.11.
Thanks again for your help and have a good weekend!
wow what a mess, our system was completely broken :(
Still need to understand what we did wrong and what the best solution is.
The most simple solution is to use the master zone (naming depends on your setup), and put everything into zones.d/master instead of conf.d.
Is this the proper solution? Just move the whole structure? Or will we have other problems? Just moved it, and it works on 2.10, but I'm afraid what will happen when going to 2.11.
A pity this wasn't better documented or at least some transition period where we had clear warnings.
A pity this wasn't better documented or at least some transition period where we had clear warnings.
We are doing everything to make the transition smooth, we even had a release candidate testing phase of 7 weeks.
https://icinga.com/2019/07/25/icinga-2-11-release-candidate/
https://github.com/Icinga/icinga2/issues/7380
Everyone was invited to test it and provide their feedback. This change is so obvious, but no-one reported it unfortunately.
I do understand the frustration level when things break, but I also kindly ask to hide wordings like " what a mess" or "a pity". This hurts feelings and given all the hard work we鈥檝e invested into 2.11, no-one deserves a blame here. 2.11 is the best release in many years and everyone involved did a great job 馃憤
Hello,
I understand and I'm sorry. I woke up in the middle of the night because some clients auto updated and were incompatible with the master server (checksum failure I think).
Today I need to celebrate a birthday, so not the best moment to wake up because of alarms and being pretty tired :)
Anyway we're for sure very thankful for all the great work on Icinga!!! All for free, wow, very nice!! :D
Thanks for your kind words. In terms of the automated updater, maybe you'll find a way to pin this to major versions? Like, the agent stays on 2.10.x but every major hop to 2.11 is prohibited? Depends on the tool used though.
Enjoy your birthday - life with family and friends is important 馃槝
no-one deserves a blame here
We second that.
2.11 is the best release in many years and everyone involved did a great job.
Kudos to @dnsmichi and all the people from the Icinga2 community and the colleagues from Netways. Congratulations for the new release, you know who you are.
Cheers,
Andreas.
@dnsmichi I think one thing that adds to the confusion ist the README-File in zones.d:
This directory contains configuration files for cluster zones. If you're not
running a cluster you can safely ignore this directory.
Most people wouldn't call a server/agent infrastructure a cluster I guess.
@hypermagicmountain
Oh, I wasn't aware that this still exists. Thanks for noticing.
Thank you, @dnsmichi
With these changes I guess this can be close this issue.
You're welcome. Please leave the issue open in this case, once the linked PR is merged it will automatically be closed :-)
Docs:
https://icinga.com/docs/icinga2/latest/doc/16-upgrading-icinga-2/#agent-hosts-with-command-endpoint-require-a-zone
https://icinga.com/docs/icinga2/latest/doc/15-troubleshooting/#agent-hosts-with-command-endpoint-require-a-zone
Why do the instructions on the Puppet Forge official Icinga2 page have all _target_ values set to the _conf.d_ directory if that is not the correct way to do it?? Now all my configuration is broken? =S
I concur with _HOSTED-POWER_ that it would have been nice if we could have had a release that gave errors that this was not correct, before a release that just stopped everything working.
Most helpful comment
We are doing everything to make the transition smooth, we even had a release candidate testing phase of 7 weeks.
https://icinga.com/2019/07/25/icinga-2-11-release-candidate/
https://github.com/Icinga/icinga2/issues/7380
Everyone was invited to test it and provide their feedback. This change is so obvious, but no-one reported it unfortunately.
I do understand the frustration level when things break, but I also kindly ask to hide wordings like " what a mess" or "a pity". This hurts feelings and given all the hard work we鈥檝e invested into 2.11, no-one deserves a blame here. 2.11 is the best release in many years and everyone involved did a great job 馃憤