Kibana: [Task Manager / Alerting] Support for limited concurrency Task Types & Alert Types

Created on 15 Jan 2020 · 9Comments · Source: elastic/kibana

Describe the feature:
Task Manager used to be able to limit how many concurrent instances of a specific task type run on a single Kibana instance.
We have also identified that there might be need to limit the concurrency of specific tasks (or groups of tasks), as alert types also want to synamically limit how many instances of a certain type can run concurrently.

Describe a specific use case for the feature:
We need to bring this feature back for Scheduled tasks and possibly others such as SIEM.

Alerting Task Manager Alerting Services enhancement

Source

gmmorris

Most helpful comment

Having discussed the issue with Alerting Services and Reporting, we've decided to go the route of adding limited support for concurrency which will specifically support Reporting, but we won't allow other task types to utilise it for the time being to avoid adding too many additional pollers.

We feel comfortable adding a second poller for Reporting as they'll be removing their use of ES queue in that same version, meaning that, in effect, there's the same number of polls running in parallel as before.

This work will follow the path spiked over here: https://github.com/elastic/kibana/pull/74883

gmmorris on 28 Oct 2020

👍2

All 9 comments

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

elasticmachine on 15 Jan 2020

For actions, and I think alerts, we create a new task type per action type. It may make sense to be able to set the max_workers to all of actions, by being able to say "only run 10 tasks of type action:.* or something - all the action taskTypes start with action:. Another set of knobs and dials, but it's coarser than per actionType exactly, and so would be easier to configure for customers, vs having to configure every single actionType.

Alternatively, we could probably also just have one taskType for all actions, and plumb more data into it - not sure what the pros/cons are to that.

pmuellr on 16 Jan 2020

Part of the complication is in how TM claims tasks - we don't want to lose cycles where we claim 10 and then drop them because's we're at capacity with that specific type, but have capacity for others.
We need to see if we can find a solution that can be applied in the query within ES.

gmmorris on 17 Jan 2020

Would it also be possible to use these settings to configure TM to completely disable itself from claiming a certain task type?

Maybe that could be the same as setting the allowed concurrent tasks of a type to 0.

If Reporting uses Task Manager and I have an instance that I don't want to be able to execute Reports, this setting would give me what I need.

tsullivan on 8 Jul 2020

Maybe that could be the same as setting the allowed concurrent tasks of a type to 0.

That makes sense, but we will probably want an info message about this on at startup, for diagnostic purposes. Eg, someone uses 0 on all instances, and then wonders why those tasks never run.

pmuellr on 9 Jul 2020

👍1

Came up with a possible direction, details over here:
https://github.com/elastic/kibana/issues/71441#issuecomment-661955231
https://github.com/elastic/kibana/issues/71441#issuecomment-662388606

If @tsullivan & @joelgriffith feel this adequately addresses their needs and @elastic/kibana-alerting-services like the direction, then we can consider pulling this issue into the To Do list I think.

gmmorris on 22 Jul 2020

This work will follow the path spiked over here: https://github.com/elastic/kibana/pull/74883

gmmorris on 28 Oct 2020

👍2

I'm wondering if we would want to reframe this as a "one concurrent task poller", compared to just reporting. Would be for "large/expensive" tasks. Reporting today, probably more tomorrow ...

pmuellr on 28 Oct 2020

"one concurrent task poller"

That makes sense to me. Allow any app or service to register a "large/expensive" task definition, and the secondary poller could search for these tasks with a size of 1. Whichever large task has been waiting the longest would get singularly claimed with each poll interval. Scaling up with multiple instances of Kibana would help with keeping a backlog down. Perhaps the interval duration could be configurable if the machine has the hardware to do more work on the backlog.

tsullivan on 29 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings