I schedule a fixed downtime for a service.
The service goes CRITICAL within the downtime.
The service is still CRITICAL when the downtime ends.
I expect to be notified right when the downtime ends.
When the downtime ends, the notification for one contact fires right away, but the notification for a second contact is delayed.
Experiments show that the interval setting for a notification is the key: The one contact that gets the notification right after the downtime ends has notification interval of 0 (zero).
The other contact has an interval setting of 600 seconds (10m), and he gets the notification 10 minutes after the hard state change happened (during the downtime).
Please take a look at the attached screenshot which shows the history of such a behaviour:

IMHO, the notification should happen immediately after the downtime has ended, no matter which interval was set.
I guess, I watched the same behaviour when using timeperiods which are not 24x7, i.e. when using notification period 9to17 and a outtage happens before that time. The contact attached to notification object with interval 0 is notified right when the notification period starts, the contact attached to a notification object with an interval, is delayed until the next regular interval after the outtage.
The background: Our contact with interval 0 is a ticket system which should only receive one notification, while our staff should be re-informed every hour.
Contacts/Users don't have a notification interval, that's to be defined inside the notification object.
Can you share a sample configuration in order to reproduce the issue?
Cheers,
Michael
Ok, here we go:
I applied the following changes to a vanilla icinga 2.8 installation:
conf.d/services.conf
apply Service "dummy" {
import "generic-service"
check_command = "dummy"
max_check_attempts = 1
assign where host.name == NodeName
}
conf.d/users.conf
object User "UserA" {
import "generic-user"
display_name = "nur eine Benachrichtigung"
email = "root@localhost"
}
object User "UserB" {
import "generic-user"
display_name = "Benachrichtigung jede Stunde"
email = "root@localhost"
}
conf.d/notifications.conf
apply Notification "einmalige-Mail" to Service {
import "mail-service-notification"
users = [ "UserA" ]
interval = 0
assign where match(service.name, "dummy")
}
apply Notification "stuendliche-Mail" to Service {
import "mail-service-notification"
users = [ "UserB" ]
interval = 1h
assign where match(service.name, "dummy")
}
conf.d/templates.conf
template User "generic-user" {
states = [ Up, Down, OK, Warning, Critical, Unknown ]
types = [ Problem, Acknowledgement, Recovery ]
}
Here's how to reproduce the problem:
Result:
Conclusion:
IMHO, all users should be notified immediately after the downtime has ended.
In our production environment, I guess the same problem occurs when you use notification timeperiods other than 24x7 and the outtage happens outside of the timeperiod. Here, the notification object with interval=0 fires immediately when the notification period has started, and the other notification objects with interval != 0 fire later. I will reproduce that in the test environment.

Ok, understood. The main request is to ignore the notification interval if a downtime has ended. Right now the calculated next notification time is
notification -> suppressed by downtime
+10m for next_notification
downtime ends after 5m
5m later, the next notification is sent for the problem
Changing this could break existing setups. I'd like to hear from others what they think. Or see a possible patch to adjust the behaviour and fully test it.
Just a quick addendum: I watched the same behaviour when an outage happens out of a notification period: when the notification period starts, UserA with interval=0 gets a notification immediatly, and the user with interval=60m gets the notification later, apparently with the same formula that dnsmichi has shown before.
I cannot imagine why someone doesn't want to be informed of an outage immediatly when a downtime ends or a notification period starts, so count my vote for a change of that behaviour.
Sure, I hear you. I'm not sure how this can be implemented yet though.
I noticed in my setup the same behavior and I agree with @edpstiffel that a notification should be sent right after the downtime.
BUMP
Our intended setup relies heavily on what @edpstiffel is describing being the case. Consider the following scenario:
You monitor the software update state for ~500 hosts and Icinga notifications are sent directly to the ticket system. For this to work reliably, without spamming our ticket system every now and then, we have defined a downtime specific to the update checks, so that they only run once a week (a full day). With the current behaviour, if a host gets updates during said downtime, no notification will be sent when the downtime is over, since the check interval is 24h.
That being said, I understand people might be relying on the current behaviour for their setups, so maybe finding some middle ground (e.g. a setting to toggle this behaviour) would statisfy all of us.
The same issue or wish for feature request here; every night our print servers were rebooted, at this time they are in downtime. When a service ended at the downtime, then, in this case reboot and the service doesn't came up, we haven't any notification... Yes.. in the downtime it reached critical state, yes.. the state doesn't changed, when the downtime ends...
Maybe a workarround.. we will reset the service to "ok" after downtime with api from our ticket system..
+1
Could this be solved by adding some sort of queue where all notifications that occured during a downtime (or while outside of an notification timeperiod) are collected? After the downtime ends, the get deduplicated and checked if they still apply. If yes, then the notifications get sent immediately.
This is a sponsored feature request, thanks for granting us the time to implement it.
ref/IP/14729
Most helpful comment
+1
Could this be solved by adding some sort of queue where all notifications that occured during a downtime (or while outside of an notification timeperiod) are collected? After the downtime ends, the get deduplicated and checked if they still apply. If yes, then the notifications get sent immediately.