Icinga2: /var/log/icinga2/icinga2.log is growing very fast on satellites

Created on 10 Aug 2018  路  26Comments  路  Source: Icinga/icinga2

We have a HA configuration with 2 master nodes, 2 satellites in the Network Zone and 2 satellites in the Server Zone. For configuration we use the director.

Since some time - we noticed it after upgrade to Version 2.9.1 - that icinga2.log ist growing and growing very fast from one point to another. But only on the satellites in the Network Zone. It seems it happend after deployment with the director.

For example: that are times when there was a deplolyment today:

deployment_director

That is the visualisation with grafana where you can see the growing of the log file:
grafana

and direct in the bash:
filesystem

I think the problem are the ServiceSets which wie generate with the director (see also https://github.com/Icinga/icingaweb2-module-director/issues/1589). Because it will deploy them at every node. So if the ServerSet is for example for the Server Zone, the satellite in the Network Zone won't find the devices to apply. And if I take a look into the icinga2.log at the time of the depolyment ist is full of Messages like this:
apply_rule

the only work arround is to empty the file with
_> icinga2.log_

Your Environment


鈥irector version (System - About): Git Master 71ad855
鈥cinga Web 2 version and modules (System - About): 2.6.1
鈥cinga 2 version (icinga2 --version): r2.9.1-1
鈥perating System and version: CentOS 7
鈥ebserver, PHP versions: Apache 2.4.6-80.el7, rh-php 7.1.8-1.el7

areconfiguration arelog bug reIP reNC

All 26 comments

Awesome, opening an issue for this has been on my agenda, too.
+1!

Same here:

Your Environment

  • Director version (System - About): 1.5.1
  • Icinga Web 2 version and modules (System - About): 2.6.1
  • Icinga 2 version (icinga2 --version): 2.9.1
  • Operating System and version: RHEL 7.5

Currently we are "handling" this problem by creating a dedicated check_disk service for the logging volume /var, which executes an EventCommand @ warning free space <15%. The EventCommand calls a shell script which executes: > icinga2.log

Rotating & zipping the log may be another approach to handle this, depending on the need to keep the logs.

More thoughts on this:

The commits are not exclusively triggered by Director's Deployments. I triggered 4 Deploys this morning:
image

The icinga2.log counts much more sets of Apply Rule mismatches (generally headed by the message 'Commiting config item(s)', so i grepped for it):

[root@satellite_z1n1 icinga2]# grep 'Committing config item' icinga2.log | wc -l
421

Each commit triggers 690 warnings of Apply Rule mismatches on one of our satellites in Zone A, which seems valid, since theses services need to be applied to hosts in a different Zone B. The log on a sattelite in Zone B looks vice versa: 294 warnings of Apply Rule mismatches, because these services need to be deployed at Zone A.

It may also be helpful to know the reason for the huge amount of commits.

@dbu1986 a script for deleting the log file would be a work-around, we thought also in our department. But it first it would be nice if this would be patched and like you wrote why this happend.

When apply rules are configured, and they do not match anywhere, the user should be made aware of them. This is a general principle with Icinga and the config compiler, and has been sitting there since 2.0. I'm not sure which change has had an impact on generating more of such non-matching apply rules.

In any case, I would analyse where these apply rules source from, and why they are deployed into the satellite node. Highly likely they do exist as global zone and are synced to any cluster node, or they are wrongly pushed to the satellite via cluster config sync and the corresponding zone.

Removing the configuration warning, or changing its verbosity level is not an option here, as this would remove a well known configuration validation feature used in many places.

@dnsmichi thanks for your answer.
I think the "problem" in our case is the director. Because he depolys the ServerSets-configuration into /var/lib/icinga2/api/zones/director-global/director/servicesets.conf on every node in the HA-Cluster.
But why the icinga produce in a couple of minutes so a big log file is the mystery. Maybe you can fix this in combination with the director. Like my Suggestion in the Issue here: https://github.com/Icinga/icingaweb2-module-director/issues/1589

I'm currently on vacation, just some quick thoughts on this:

  • when there are 4 Director-triggered deployments on a single day, but 420 restarts (if I correctly understood your numbers), then there is something weird going on
  • 690 warnings for every one of those 420 restarts may account for 30MB log entries, but never for 11 GB
  • the linked feature request for the Director is reasonable and can be implemented, it will however not solve your main issue here

Cheers,
Thomas

When apply rules are configured, and they do not match anywhere, the user should be made aware of them.

Yes, absolutely! I highly appreciate these messages and it's not my intention to get them suppressed.

The Servicesets are created via Director, and deployed via global zone "director-global". Here is an example:

[2018-08-22 13:16:40 +0200] warning/ApplyRule: Apply rule 'SERVICENAME' (in /var/lib/icinga2/api/zones/director-global/director/servicesets.conf: 13096:1-13096:58) for type 'Service' does not match anywhere!

This seems to be standard behaviour. Director currently does not offer an option to set the zone per Serviceset (https://github.com/Icinga/icingaweb2-module-director/issues/1589).

In my environment the global zone "director-global" is configured on both master nodes and all satellite nodes (2 satellites per zone). Aside of that I have a global zone "globalconfig" configured on all nodes (Masters, Satellites, and client hosts) for deploying Director's CheckCommands. The intention of using these two global zones was to avoid non-matching Apply Rule warnings at all client hosts.

However, I'm confident that the logs will get much smaller if we get the deploying of apply rules solved, I think that is a side effect. But I'm not sure if we have another issue here which causes the high rate of commits. Anyway there still would be a questionble high rate of "Committing config item(s)" mentions in the log.

We also have a problem of reload looping. Sorry for not mentioning earlier, it just came to my mind. At this moment the icinga2.log's look like this:

MasterA (configuration master):

...
[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
...

MasterB:

...
[2018-08-22 14:14:21 +0200] information/ConfigItem: Committing config item(s).
[2018-08-22 14:14:21 +0200] information/ConfigItem: Instantiated 1 Comment.
[2018-08-22 14:14:21 +0200] information/ConfigItem: Triggering Start signal for config items
[2018-08-22 14:14:21 +0200] information/ConfigItem: Activated all objects.
[2018-08-22 14:14:21 +0200] information/ConfigItem: Committing config item(s).
[2018-08-22 14:14:21 +0200] information/ConfigItem: Instantiated 1 Comment.
[2018-08-22 14:14:21 +0200] information/ConfigItem: Triggering Start signal for config items
[2018-08-22 14:14:21 +0200] information/ConfigItem: Activated all objects.
[2018-08-22 14:14:21 +0200] information/ConfigItem: Committing config item(s).
[2018-08-22 14:14:21 +0200] information/ConfigItem: Instantiated 1 Comment.
[2018-08-22 14:14:21 +0200] information/ConfigItem: Triggering Start signal for config items
[2018-08-22 14:14:21 +0200] information/ConfigItem: Activated all objects.
[2018-08-22 14:14:21 +0200] information/ConfigItem: Committing config item(s).
[2018-08-22 14:14:21 +0200] information/ConfigItem: Instantiated 1 Comment.
[2018-08-22 14:14:21 +0200] information/ConfigItem: Triggering Start signal for config items
[2018-08-22 14:14:21 +0200] information/ConfigItem: Activated all objects.
...

satellite_z1n1 & satellite_z1n2:

same as on MasterB, but additionally with all the apply rules mismatch warnings. That may be the reason for their huge logs.

Any advice, how I can get this under control? I've been struggling for days trying to debug when this starts and when it ends. Currently I'm establishing another separate environment for testing our config, but for now I don't have any clue where this is coming from.

@Thomas-Gelf thanks for the answer. Maybe it is a mixture of more things? But I think it reduces the warnings in the logs, if the director deploy the servicesets in the correct zones. For example in our case if the serviceset will deploy it only for the server-zone and not for the network Zone and vice versa.

One information for you: in our configuration we import the hosts every 2 hours from our CMDB via CSV. If there are changes the directore will sync. Sometimes we have to do some manual configuration.

But why icinga produce so much warnings is the other point.

For now: enjoy your holidays ;-) you deserve it! You are doing a good job with the director!

@stevie-sy: glad to hear that, thank you! Just wanted to point you to that discrepancy in your numbers, you should try to dig deeper. Eventually you can track down a looping behavior similar to what @dbu1986 describes. Your log file is so huge, take an average log line length and do the math on your own. Regular deployments can hardly account for even 0.1% of that data.

I'm wondering why these messages are coming in from the satellite to the master. This should have been denied earlier.

[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.

https://github.com/Icinga/icinga2/blob/master/lib/remote/apilistener-configsync.cpp#L79

Please show the Endpoint and Zone configuration on your master, including all the satellites.

I want to give you a small update to this Issue:
After updating to the newest Version of Icinga (2.10.x) and the latest version of the director in the master branch we didn't got this behavior again. We also startet with a new and empty director database.

Maybe this [(https://github.com/Icinga/icinga2/pull/6663]) from the update 2.10.1 helped.

If @dbu1986 didn't get this Problem also again, maybe this could be closed.

Likely it helps a bit if you have many created objects and non-matching apply rules. It popped into my mind how to silence these logged lines, so I just did it.

And you did a good job (y).
if @Thomas-Gelf can also supress this with the director like my suggestion, that would also help I think. If it happen again than it need some more analysis and investigation like you wrote before.

The thing I'm still curious about is that log line

[2018-08-22 14:16:24 +0200] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.

This must never happen, and I want to know how to reproduce this. @dbu1986 Unfortunately the zones configuration doesn't tell me much, how's the configuration deployed and synced top down, what exactly causes this behaviour?

@dnsmichi

https://github.com/Icinga/icinga2/blob/master/lib/remote/apilistener-configsync.cpp#L79

I assume you are reffering to #L224 ? The log message might be related to the ConfigDeleteObjectAPIHandler, since here no further details on the discarded object are being written into the log (following origin->FromClient->GetIdentity() ).
Could you provide further details on what Icinga2 is attempting to do here?

, what exactly causes this behaviour?

I'd desperately love to reveal that, too 馃槃 Unfortunately I have no idea what triggers it and how to find a point to start from.

However, I'm not sure if I got your question well, but as mentioned above, we're using HA distributed monitoring infrastructure in top/down mode, containing 2 Master, 3 Zones with 2 Satellites each, 4 Zones with 1 Satellite each, and a load of clients per Zone. Most of the objects are managed by Director (~10.000 Services, ~800 Hosts), plus ~5.000 Services created as runtime objects via REST-API on the config master, plus <20 Services defined in a configfile on the config master.

Additionally we are using a ticket system, fed by Icinga2 Notifications (passed in a perl script via REST API), and as a return path the ticket system sends Acknowledgements and Comments back to Icinga2 asynchronously.

One thing coming up my mind is the discrepancy between the timestamps of these messages. Today's messages as an example:
MasterA:

[root@MasterA ~]# grep "Discarding.*from 'satellite_z1n1" /var/log/icinga2/icinga2.log | head -40
[2018-11-05 04:06:40 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:06:53 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:08:02 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:08:06 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:08:53 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:09:16 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:09:20 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:09:24 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:09:31 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:09:51 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:09:52 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:15:51 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:22:28 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:22:50 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:23:51 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:24:35 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:24:49 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:25:02 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:25:10 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:25:43 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:26:05 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:26:15 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:26:41 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:30:23 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:39:10 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:39:39 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:44:47 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:48:54 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:49:04 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:49:05 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:55:37 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:56:42 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 04:59:50 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 05:05:16 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 05:18:12 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 05:20:43 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 05:24:39 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 05:39:24 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 05:42:23 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[2018-11-05 05:55:07 +0100] warning/ApiListener: Discarding 'config update object' message from 'satellite_z1n1'.
[root@MasterA ~]#

satellite_z1n1:

[root@satellite_z1n1 ~]# grep "Sending config updates for endpoint 'MasterA" /var/log/icinga2/icinga2.log
[2018-11-05 08:53:44 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 08:54:05 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 08:58:30 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 08:58:48 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 09:01:16 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 09:01:35 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 09:14:51 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 09:15:10 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 09:22:30 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 09:22:52 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 10:06:06 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 10:06:23 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 10:29:28 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 10:37:54 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 10:37:54 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 10:45:04 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 10:48:34 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 10:51:28 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 11:22:48 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 11:23:05 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 11:29:26 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 11:29:45 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 12:20:54 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 12:21:12 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 12:26:17 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 12:26:36 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 12:55:54 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[2018-11-05 14:12:35 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.
[root@satellite_z1n1 ~]#

I somehow expected the message timestamps to nearly match to each other, but the 'discarding' messages at the master started around 04:06 and the 'sending config' messages at the satellite started much later, around 08:53. So there may be config updates on the satellite, that I am totally missing. Director's deployment log is listing today's first deployment at 07:52. And the icinga2 system processes are up and running since a couple of days, so I can also exclude a restart of Icinga2.

A few log lines from satellite_z1n1 before the first 'sending config' message to MasterA:

[2018-11-05 08:53:25 +0100] information/WorkQueue: #10 (JsonRpcConnection, #1) items: 1, rate: 111.45/s (6687/min 32724/5min 98056/15min);
[2018-11-05 08:53:26 +0100] information/ApiListener: New client connection for identity 'q4dee1syw4s' from [139.1.67.48]:44861 (no Endpoint object found for identity)
[2018-11-05 08:53:26 +0100] information/JsonRpcConnection: Received certificate request for CN 'q4dee1syw4s' signed by our CA.
[2018-11-05 08:53:26 +0100] information/JsonRpcConnection: The certificate for CN 'q4dee1syw4s' is valid and uptodate. Skipping automated renewal.
[2018-11-05 08:53:27 +0100] information/ApiListener: New client connection for identity 'linclutest01' from [139.1.67.126]:33682 (no Endpoint object found for identity)
[2018-11-05 08:53:31 +0100] information/ApiListener: New client connection for identity 'clientvm250509' from [139.1.66.123]:55057 (no Endpoint object found for identity)
[2018-11-05 08:53:38 +0100] information/WorkQueue: #6 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-11-05 08:53:40 +0100] information/ApiListener: New client connection for identity 'cldvm00004' from [172.30.14.11]:44500 (no Endpoint object found for identity)
[2018-11-05 08:53:40 +0100] information/JsonRpcConnection: Received certificate request for CN 'cldvm00004' signed by our CA.
[2018-11-05 08:53:40 +0100] information/JsonRpcConnection: The certificate for CN 'cldvm00004' is valid and uptodate. Skipping automated renewal.
[2018-11-05 08:53:40 +0100] warning/TlsStream: TLS stream was disconnected.
[2018-11-05 08:53:40 +0100] warning/JsonRpcConnection: API client disconnected for identity 'MasterA'
[2018-11-05 08:53:40 +0100] warning/ApiListener: Removing API client for endpoint 'MasterA'. 0 API clients left.
[2018-11-05 08:53:44 +0100] information/ApiListener: New client connection for identity 'MasterA' from [139.1.170.8]:39106
[2018-11-05 08:53:44 +0100] information/ApiListener: Requesting new certificate for this Icinga instance from endpoint 'MasterA'.
[2018-11-05 08:53:44 +0100] information/ApiListener: Sending config updates for endpoint 'MasterA' in zone 'Z_Master'.

Could the 'API client disconnected' message be related to this?

We have the same problem here.

Your Environment

  • Icinga 2 version (icinga2 --version): r2.10.2-1
  • Operating System and version: CentOS 7
  • Icinga Web 2 version: 2.6.1

[2018-11-16 06:55:00 +0100] information/ConfigObjectUtility: Created and activated object 'host.example.com!Service Name!8b488fc1-26a3-47cd-bfbe-97010e9a9ddd' of type 'Downtime'.

This is flooding our logs.

cat /var/log/icinga2/icinga2.log | grep "Created and activated object" | wc -l
277086

Are you syncing the ScheduledDowntime objects via global zone to that satellite? Those downtime creation messages can either originate from REST API creation calls (or the legacy command pipe) or they are created from those SD objects.

For further tests, please remove any ScheduledDowntime objects/apply rules from your configuration, and test again. I'd like to know whether this has an influence here or not.

@lippserd @dnsmichi After lots of other tasks, tech. problems and (uncached) Icinga 2 builds I've finally set up a playground for this and tried to reproduce the issue. Either I've overseen an elephant (again) or there's something really interesting (and creepy) with that ScheduledDowntimes.

@dnsmichi Let's talk offline about the tech. details on monday.

The ScheduledDowntimes by themselves purr like a cat. But the created downtimes disappear suddenly.

The web app shown here polls the Icinga 2 API all the time and appends new downtimes to its list and colors them green. Once a downtime disappear, it colors it red.

This seems to be the following case:

  • Downtime -> ExpiredHandler() kicks in after 60 seconds
  • GetObject in HasValidConfigOwner() returns are nullptr
  • Downtime is deleted
  • SD detects that there is downtime missing, so it again creates one

Both calls lead into lib/remote/configobjectutility.cpp where CreateObject() always logs that it has created a runtime object (a downtime in this case). When a downtime is deleted, nothing is logged.

Therefore a local loop exists which immediately stops on restart, but then begins again after the expired downtime handler kicks in.

It doesn't matter whether the zone is the master or the satellite, creation and deletion happens locally.
If the configuration activation is fast enough this might not happen.

@all

Can you please test the snapshot packages, it includes all referenced issues and PRs.

Thanks,
Michael

ref/NC/585559
ref/IP/9673

ref/IP/9673 reported this working as expected. Since there's no more feedback here, I consider this being resolved.

Was this page helpful?
0 / 5 - 0 ratings