Icinga2: Inconsistent behavior between notifications outside timeperiods

Created on 9 Apr 2019  路  10Comments  路  Source: Icinga/icinga2

Expected Behavior


A critical state recovers outside of a timeperiod, the notification for recovery is sent, when the timeperiod is reached.

Current Behavior


When an object enters critical state outside of a timeperiod, the notification is sent when the timeperiod is reached
When an object recovers outside of a timeperiod no notification is sent, when the timeperiod is reached

Possible Solution


Apply the same behavior for critical on recovery notification

Steps to Reproduce (for bugs)


  1. Create a timeperiod for an object
  2. Inside the timeperiod set the object status to critical
  3. Outside of the timeperiod let the object recover
  4. Timeperiod begins and no notification is sent, that the state recovered

Context


In the current scenario we have no clue, if the service already recovered outside the timeperiod. The only chance to check this is to have a look at the state itself.
People depending on the notification are somehow blind about the current state, when recovery notifications aren't sent.

Your Environment

  • Version used (icinga2 --version): 2.10.3
arenotifications bug

All 10 comments

If a notification is triggered from a critical HARD state, and a period doesn't match, this won't let the notification through. So I'm not sure what you mean by

Apply the same behavior for critical on recovery notification

Period filters are for removing any type of notification, it would be unexpected being notified for recoveries if I am not on call anymore.

Period filters work well. As you said, the notifications aren't sent, if no period is matching. This problem is about the notifications getting delayed to the next valid period. The case I noticed is:

When a critical hard state is triggered without a period matching, the notification is sent when a period matches again (so delayed to be sent later on)

Maybe a little bit more in detail with an example:

  1. Host with a period from 8am to 6pm
  2. Host gets into critical hard state at 3am
  3. If the host isn't recovering, the notification is sent at 8am

But this isn't working for recovery notifications, only for critical ones.

I think i noticed a similar behavior as described. Trying to get in some more detail with two examples in which on works as expected while the other didn't - both with the same configuration.

Most of our hosts and services just sent notifications during the day, which is from 06:55 - 18:00, outside of that timeperiod, we don't need a notification when the host/service recovers again outside that timeperiod. If a service reaches a critical state, e.g at 03:00 and stays critical, we expect a notification at 06:55.

Object '11x5' of type 'TimePeriod':
  % declared in '/etc/icinga2/zones.d/global-templates/timeperiod.conf', lines 31:1-31:24
  * __name = "11x5"
  * display_name = "11x5 Timeperiod"
    % = modified in '/etc/icinga2/zones.d/global-templates/timeperiod.conf', lines 34:9-34:51
  * excludes = [ ]
  * includes = [ ]
  * name = "11x5"
  * package = "_etc"
  * prefer_includes = true
  * ranges
    % = modified in '/etc/icinga2/zones.d/global-templates/timeperiod.conf', lines 36:9-42:9
    * friday = "06:55-18:00"
    * monday = "06:55-18:00"
    * thursday = "06:55-18:00"
    * tuesday = "06:55-18:00"
    * wednesday = "06:55-18:00"
  * source_location
    * first_column = 1
    * first_line = 31
    * last_column = 24
    * last_line = 31
    * path = "/etc/icinga2/zones.d/global-templates/timeperiod.conf"
  * templates = [ "11x5", "legacy-timeperiod", "legacy-timeperiod" ]
    % = modified in '/etc/icinga2/zones.d/global-templates/timeperiod.conf', lines 31:1-31:24
    % = modified in 'icinga-itl.conf', lines 21:2-21:99
    % = modified in 'icinga-itl.conf', lines 21:2-21:99
  * type = "TimePeriod"
  * update
    % = modified in 'icinga-itl.conf', lines 22:3-22:27
    % = modified in 'icinga-itl.conf', lines 22:3-22:27
    * arguments = [ "tp", "begin", "end" ]
    * deprecated = false
    * name = "Internal#LegacyTimePeriod"
    * side_effect_free = false
    * type = "Function"
  * vars = null
  * zone = "global-templates"

These two services have the same notfication applied.

Object 'vsltrsim01 Simphony Server LTR!Partition E:!service_notification' of type 'Notification':
  % declared in '/etc/icinga2/zones.d/master/notification.conf', lines 24:1-24:52
  * __name = "vsltrsim01 Simphony Server LTR!Partition E:!service_notification"
  * command = "html-service-notification"
    % = modified in '/etc/icinga2/zones.d/master/notification_template.conf', lines 25:2-25:41
  * command_endpoint = ""
  * host_name = "vsltrsim01 Simphony Server LTR"
    % = modified in '/etc/icinga2/zones.d/master/notification.conf', lines 24:1-24:52
  * interval = 0
    % = modified in '/etc/icinga2/zones.d/master/notification_template.conf', lines 24:2-24:16
  * name = "service_notification"
  * package = "_etc"
    % = modified in '/etc/icinga2/zones.d/master/notification.conf', lines 24:1-24:52
  * period = "11x5"
    % = modified in '/etc/icinga2/zones.d/master/notification_template.conf', lines 26:2-26:19
    % = modified in '/etc/icinga2/zones.d/master/notification.conf', lines 31:3-31:19
  * service_name = "Partition E:"
    % = modified in '/etc/icinga2/zones.d/master/notification.conf', lines 24:1-24:52
  * source_location
    * first_column = 1
    * first_line = 24
    * last_column = 52
    * last_line = 24
    * path = "/etc/icinga2/zones.d/master/notification.conf"
  * states = [ "OK", "Critical", "Unknown" ]
    % = modified in '/etc/icinga2/zones.d/master/notification_template.conf', lines 28:2-28:39
  * templates = [ "service_notification", "service-notification" ]
    % = modified in '/etc/icinga2/zones.d/master/notification.conf', lines 24:1-24:52
    % = modified in '/etc/icinga2/zones.d/master/notification_template.conf', lines 22:1-22:44
  * times = null
  * type = "Notification"
  * types = [ "Problem", "Recovery" ]
    % = modified in '/etc/icinga2/zones.d/master/notification_template.conf', lines 29:4-29:36
  * user_groups = [ "Ticket_Arcade" ]
    % = modified in '/etc/icinga2/zones.d/master/notification.conf', lines 30:17-30:61
  * users = null
  * vars = null
  * zone = "sonde.ltr"
    % = modified in '/etc/icinga2/zones.d/master/notification.conf', lines 24:1-24:52
Object 'ewl-vssql11-b Applikationsserver ewl!SQL Engine Availability Group INST01\ewl-vsavg01-b!service_notification' of type 'Notification':
  % declared in '/etc/icinga2/zones.d/master/notification.conf', lines 24:1-24:52
  * __name = "ewl-vssql11-b Applikationsserver ewl!SQL Engine Availability Group INST01\ewl-vsavg01-b!service_notification"
  * command = "html-service-notification"
    % = modified in '/etc/icinga2/zones.d/master/notification_template.conf', lines 25:2-25:41
  * command_endpoint = ""
  * host_name = "ewl-vssql11-b Applikationsserver ewl"
    % = modified in '/etc/icinga2/zones.d/master/notification.conf', lines 24:1-24:52
  * interval = 0
    % = modified in '/etc/icinga2/zones.d/master/notification_template.conf', lines 24:2-24:16
  * name = "service_notification"
  * package = "_etc"
    % = modified in '/etc/icinga2/zones.d/master/notification.conf', lines 24:1-24:52
  * period = "11x5"
    % = modified in '/etc/icinga2/zones.d/master/notification_template.conf', lines 26:2-26:19
    % = modified in '/etc/icinga2/zones.d/master/notification.conf', lines 31:3-31:19
  * service_name = "SQL Engine Availability Group INST01\ewl-vsavg01-b"
    % = modified in '/etc/icinga2/zones.d/master/notification.conf', lines 24:1-24:52
  * source_location
    * first_column = 1
    * first_line = 24
    * last_column = 52
    * last_line = 24
    * path = "/etc/icinga2/zones.d/master/notification.conf"
  * states = [ "OK", "Critical", "Unknown" ]
    % = modified in '/etc/icinga2/zones.d/master/notification_template.conf', lines 28:2-28:39
  * templates = [ "service_notification", "service-notification" ]
    % = modified in '/etc/icinga2/zones.d/master/notification.conf', lines 24:1-24:52
    % = modified in '/etc/icinga2/zones.d/master/notification_template.conf', lines 22:1-22:44
  * times = null
  * type = "Notification"
  * types = [ "Problem", "Recovery" ]
    % = modified in '/etc/icinga2/zones.d/master/notification_template.conf', lines 29:4-29:36
  * user_groups = [ "Ticket_Arcade" ]
    % = modified in '/etc/icinga2/zones.d/master/notification.conf', lines 30:17-30:61
  * users = null
  * vars = null
  * zone = "sonde2.ewl"
    % = modified in '/etc/icinga2/zones.d/master/notification.conf', lines 24:1-24:52

The service with the partition didn't send a notification at 06:55 and is still critical.

image

While the second one did send the notification at 06:55.

image

I am trying to figure out if there's any difference which may cause this but was unable to find one until now.

icinga2 - The Icinga 2 network monitoring daemon (version: r2.10.4-1)

Copyright (c) 2012-2019 Icinga GmbH (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: CentOS Linux
  Platform version: 7 (Core)
  Kernel: Linux
  Kernel version: 3.10.0-693.5.2.el7.x86_64
  Architecture: x86_64

Build information:
  Compiler: GNU 4.8.5
  Build host: unknown

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

interval=0 doesn't delay the notification, this gets filtered away with the period attribute.

I'm aware of the the interval=0 setting, we just want the notification to get triggered once, not reoccurring. But i don't get it why one example works and one doesn't.

I'd expect that when a notification gets triggered outside of it timeperiod it gets "hold back" until the timeperiod range matches again, or am i wrong here?

No, that's exactly what I am saying.

So how can a scenario like this be achieved, as it seems to work sometimes..?

interval=0 and period filters won't work, this is a missing feature. I would rather implement a new attribute which says one_time_notifications (or similar) and leave away the interval=0 hack (I'm calling that hack since it was implemented for 1.x compatibility reasons and lacks a proper design). Such specific notification types can also greatly be delayed without any further notice.

Oh okay, didn't know that, thanks for pointing that out. I removed the interval=0 setting completely and replaced it with a higher interval of 7d (which has basically the same effect for us). Way to much overthinking for a pretty simple solution. Really appreciate the work you guys doing.

Interesting workaround, thanks for sharing. If anyone still thinks that interval=0 + notification periods must work, feel free to get in touch and fund the development time.

Was this page helpful?
0 / 5 - 0 ratings