Pinging @elastic/kibana-alerting-services (Team:Alerting Services)
The assumption being that updated_at is the last time a human updated the SO, compared to the code that now updates the execution status.
The possible solutions mentioned in #82658 are:
I'll add
Option 1 doesn't sound good - not clear what downstream effects might be if we do actually update the SO, but the updated_at doesn't reflect that. Maybe nothing today, perhaps something in the future?
Option 2 sounds like the most straight-forward thing to do. Add a new field in the alert SO to track this date, and update it on all the existing "update" call sites, but obviously don't update it when we update just the execution status.
Option 3 sounds intriguing. Since we're basically adding the execution status bits to the event log for the execute event action, presumably we could pull these out of a query. There's likely a lot more code involved in doing this, including multi-SO queries against the event log, which we don't currently support. Perhaps this could be a long-term goal, with whatever we do sooner being a tactical solution.
I'm interested to understand how the update_at value is being used today. For informative purposes in UIs to show customers when the alert was last changed? Or in the alert executor? Sounds like the first.
Keep in mind that the alert will also be updated for enable/disable/muteAll/unmuteAll/updateApiKey. Also, updating the throttle can only be done via update() anyway, so that's perhaps another cases of "not really updating the alert", but instead modifying some "meta data" about the alert.
If the intent is to show customers the last time a "non meta data" aspect of the alert was updated, that I think we'd certainly want something like a new secondary field that only dealt with the "non meta data" updates. Eg, say the new field is lastEditedAt (name to be debated, obviously), we could arrange to not update this value for cases like muteAll. Trying to not update it for throttling changes would involve diffing the update to see if it's JUST a throttle change, might not be worth it, or might be wrong to not update it anyway, given the throttle can only be updated via update() anyway.
Option 3 sounds intriguing. Since we're basically adding the execution status bits to the event log for the execute event action, presumably we could pull these out of a query. There's likely a lot more code involved in doing this, including multi-SO queries against the event log, which we don't currently support. Perhaps this could be a long-term goal, with whatever we do sooner being a tactical solution.
One downside with this approach is that we'd lose the ability to filter alerts by status or by error reason. Some teams are depending on this capability for UX purposes
One downside with this approach is that we'd lose the ability to filter alerts by status or by error reason. Some teams are depending on this capability for UX purposes
We could still do those, but it would require adding filtering on the event log query (not supported right now), and presumably this would be a secondary query; eg, in the alert list we'd need to do a query on the SOs themselves for filters associated with SO things, and a second one against the EL for filters associated with execution status, and "join" the results. Sounds complicated. :-)
I did want to mention the option of "woops, we shouldn't have added the execution status in the alert SO, let's get the data from somewhere else", since if there was a good answer there, we could go back to the way things used to be. Sounds like that's not going to be feasible short-term anyway.
The assumption being that
updated_atis the last time a human updated the SO, compared to the code that now updates the execution status.
This was the assumption when we delivered https://github.com/elastic/kibana/issues/52738#issuecomment-569234567, which is why I marked this as a bug. When we delivered _that_ original issue, the behaviour that we promised siem was that it would represent the last _user_ update.
That PR https://github.com/elastic/kibana/pull/53793 is confusing to me, as it mentions adding an updatedAt field, which I assumed meant to the SO, but it looks like it's just a RESTy API field copied from the SO's own updated_at. Also mention in that PR that we can't sort on that field like other alert SO fields, since it's not really a field (but you should be able to sort on the SO's updated_at some how, rite?)
Anyhoo, feels like we're going to need to add a real updatedAt field, which the execution status updates would specifically not update (they're already partial updates, so guessing no change to that code anyway).
It's worth keeping in mind that this fix will probably require a Saved Object migration.
As we're being cautious with migrations, it might be worth validating whether we need to backport this into the 7.10.1 patch version or not.
If we can avoid a migration at this point we probably want to, but if this causes broken behaviour in 7.10, then we might not have a choice.
Yeah, I'm curious whether we need a backport to 7.10, given the referenced issue https://github.com/elastic/kibana/issues/82658 doesn't seem to indicate this really "breaks" anything. Eg, if SIEM rules depended on this value in their alert type executors, I'd imagine this could be breaking things. It sounds like it's only visible in the UI though, so may be survivable through 7.10.x releases.
So for this issue it sounds like:
updated_at field because that belongs to the Saved Object and we can't control when that is updated (so we can't prevent it from being updated when the execution status is updated)createdAt date or the lastExecutionDate date)Will it be too confusing to use updatedAt for the field name? We would then have an updated_at field and an alert.updatedAt field, which might be a little confusing but would be consistent with existing the alert.createdAt, alert.createdBy and alert.updatedBy fields
The points above make sense to me 馃憤 I am indifferent on the naming, updatedAt makes sense to me for consistency 馃憤
we will need a migration to add this new field (default to either the createdAt date or the lastExecutionDate date)
Sadly, createdAt or lastExecutionDate aren't great, but we don't really have anything else to go on, do we? But lastExecutionDate seems better than createdAt, if both are set. For 7.10, I think lastExecutionDate is the same as the SO updated_at because of the new update on the execution status field. So we can probably use that existing date updated_at as the date to populate the new field. So, default value could be updated_at || createdAt?
Other than that, 馃憤