Icinga2: Restarting Icinga service causes lots of alerts (systemd)

Created on 3 Jan 2019  Â·  16Comments  Â·  Source: Icinga/icinga2

Expected Behavior

systemctl restart icinga2 - no alerts after.

Current Behavior

After restarting the service lots of alerts appear on the dashboard. All checks that were running will return critical.

Possible Solution

Do not alert on

Steps to Reproduce (for bugs)

  1. systemctl restart icinga2
  2. see alerts

Screenshot

screen

Context


Your Environment

  • Version used (icinga2 --version): version: r2.10.2-1
  • Operating System and version: CentOS Linux release 7.6.1810 (Core)
  • Enabled features (icinga2 feature list): api checker command compatlog graphite ido-pgsql influxdb livestatus mainlog notification
  • Icinga Web 2 version and modules (System - About): 2.6.2
  • Config validation (icinga2 daemon -C):

output

[2019-01-03 11:23:34 +1100] information/cli: Icinga application loader (version: r2.10.2-1)
[2019-01-03 11:23:34 +1100] information/cli: Loading configuration file(s).
[2019-01-03 11:23:34 +1100] information/ConfigItem: Committing config item(s).
[2019-01-03 11:23:34 +1100] warning/ApiListener: Attribute 'key_path' for object 'api' of type 'ApiListener' is deprecated and should not be used.
[2019-01-03 11:23:34 +1100] warning/ApiListener: Attribute 'ca_path' for object 'api' of type 'ApiListener' is deprecated and should not be used.
[2019-01-03 11:23:34 +1100] warning/ApiListener: Attribute 'cert_path' for object 'api' of type 'ApiListener' is deprecated and should not be used.
[2019-01-03 11:23:34 +1100] warning/ApiListener: Please read the upgrading documentation for v2.8: https://icinga.com/docs/icinga2/latest/doc/16-upgrading-icinga-2/
[2019-01-03 11:23:34 +1100] information/ApiListener: My API identity: HOST_FQDN_REDACTED
[2019-01-03 11:23:35 +1100] warning/ApplyRule: Apply rule 'mail-icingaadmin' (in /etc/icinga2/conf.d/notifications.conf: 11:1-11:45) for type 'Notification' does not match anywhere!
[2019-01-03 11:23:35 +1100] warning/ApplyRule: Apply rule 'mail-icingaadmin' (in /etc/icinga2/conf.d/notifications.conf: 23:1-23:48) for type 'Notification' does not match anywhere!
[2019-01-03 11:23:35 +1100] warning/ApplyRule: Apply rule 'xmpp_host' (in /etc/icinga2/conf.d/notifications_xmpp.conf: 29:1-29:38) for type 'Notification' does not match anywhere!
[2019-01-03 11:23:35 +1100] warning/ApplyRule: Apply rule 'backup-downtime' (in /etc/icinga2/conf.d/downtimes.conf: 5:1-5:52) for type 'ScheduledDowntime' does not match anywhere!
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 4937 Services.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 1 InfluxdbWriter.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 1 LivestatusListener.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 1 IcingaApplication.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 367 Hosts.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 1 FileLogger.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 10 NotificationCommands.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 4003 Notifications.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 1 NotificationComponent.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 46 HostGroups.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 1 ApiListener.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 1 Downtime.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 1 GraphiteWriter.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 15 Comments.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 1 CheckerComponent.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 4 Zones.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 1 ExternalCommandListener.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 3 Endpoints.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 1 ApiUser.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 1 CompatLogger.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 15 Users.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 226 CheckCommands.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 1 IdoPgsqlConnection.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 9 UserGroups.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 28 ServiceGroups.
[2019-01-03 11:23:35 +1100] information/ConfigItem: Instantiated 4 TimePeriods.
[2019-01-03 11:23:36 +1100] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2019-01-03 11:23:36 +1100] information/cli: Finished validating the configuration file(s).

  • If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes.
    output

```
icinga2 object list --type Endpoint
Object 'external_host_fqdn' of type 'Endpoint':
% declared in '/etc/icinga2/zones.conf', lines 3:1-3:45

  • __name = "external_host_fqdn"
  • host = "10.10.1.34"
    % = modified in '/etc/icinga2/zones.conf', lines 4:3-4:21
  • log_duration = 86400
  • name = "external_host_fqdn"
  • package = "_etc"
  • port = "5665"
    % = modified in '/etc/icinga2/zones.conf', lines 5:3-5:13
  • source_location

    • first_column = 1

    • first_line = 3

    • last_column = 45

    • last_line = 3

    • path = "/etc/icinga2/zones.conf"

  • templates = [ "external_host_fqdn" ]
    % = modified in '/etc/icinga2/zones.conf', lines 3:1-3:45
  • type = "Endpoint"
  • zone = ""

Object 'primary_host_fqdn' of type 'Endpoint':
% declared in '/etc/icinga2/zones.conf', lines 13:1-13:48

  • __name = "master_host_fqdn"
  • host = "localhost"
    % = modified in '/etc/icinga2/zones.conf', lines 14:3-14:20
  • log_duration = 86400
  • name = "master_host_fqdn"
  • package = "_etc"
  • port = "5665"
    % = modified in '/etc/icinga2/zones.conf', lines 15:3-15:13
  • source_location

    • first_column = 1

    • first_line = 13

    • last_column = 48

    • last_line = 13

    • path = "/etc/icinga2/zones.conf"

  • templates = [ "primary_host_fqdn" ]
    % = modified in '/etc/icinga2/zones.conf', lines 13:1-13:48
  • type = "Endpoint"
  • zone = ""

Object 'internal_host_fqdn' of type 'Endpoint':
% declared in '/etc/icinga2/zones.conf', lines 8:1-8:50

  • __name = "internal_host_fqdn"
  • host = "10.10.1.11"
    % = modified in '/etc/icinga2/zones.conf', lines 9:3-9:23
  • log_duration = 86400
  • name = "internal_host_fqdn"
  • package = "_etc"
  • port = "5665"
    % = modified in '/etc/icinga2/zones.conf', lines 10:3-10:13
  • source_location

    • first_column = 1

    • first_line = 8

    • last_column = 50

    • last_line = 8

    • path = "/etc/icinga2/zones.conf"

  • templates = [ "internal_host_fqdn" ]
    % = modified in '/etc/icinga2/zones.conf', lines 8:1-8:50
  • type = "Endpoint"
  • zone = ""
    ```

arechecks bug

Most helpful comment

Looks like it works, I restarted it 4 times and there were no alerts at all.

All 16 comments

That sounds familiar. Could you provide the log output around a restart? It'd be interesting why Icinga 2 kills these checks.

@sammcj can you please provide that while I'm away?

I received a report from a colleague who can reproduce this, logs are sadly of no help. We currently suspect systemd to be the culprit, killing child processes before we can terminate them.

The solution would be to ignore checkresults when we are restarting/quitting and rescheduling them to be run when icinga2 runs again.

Possible fix: #6908

I'd even say it's the fix as it doesn't let bad check results even happen instead of ignoring and re-scheduling.

Any update on this? I see #6908 was merged - did that fix it?

@sammcj Please could you test our snapshot packages and report the result?

Will do that tomorrow as it's 23:26 in Melbourne, Australia

@Al2Klimov FYI @alexizmailov works with / near me, he's pointed out that while it's been merged it hasn't been released yet so once it has been released, we'll install - test and update this ticket.

Thanks :)

That's a bit tricky, as we're waiting for your feedback to release this. So we'd appreciate it if you can test the snapshot package in your environment prior to any release :)

No, unfortunately the problem still persists, tested 3 times:

failure

root@dev-alex-02:~  # rpm -aq | grep icinga2
icinga2-common-2.10.2.219.ge555b2f-0.2019.02.09+1.el7.icinga.x86_64
icinga2-bin-2.10.2.219.ge555b2f-0.2019.02.09+1.el7.icinga.x86_64
icinga2-2.10.2.219.ge555b2f-0.2019.02.09+1.el7.icinga.x86_64
icinga2-ido-pgsql-2.10.2.219.ge555b2f-0.2019.02.09+1.el7.icinga.x86_64

I'm afraid that fix is only the half rent. I just installed a fresh Icinga 2 on a fresh CentOS 7 – our icinga2.service doesn't specify any KillMode. The default is control-group so systemd kills all check plugins.

The mentioned fix also causes other problems with restart delays and missing Stop() calls then. I will partially revert this and re-evaluate a possible fix for the kill problem.

@alexizmailov Please try to change /usr/lib/systemd/system/icinga2.service as shown here and run systemctl daemon-reload. Does this help?

I will try this tomorrow because it's 22:45 now in Melbourne,

Looks like it works, I restarted it 4 times and there were no alerts at all.

Was this page helpful?
0 / 5 - 0 ratings