Describe the feature:
Task Manager used to be able to limit how many concurrent instances of a specific task type run on a single Kibana instance.
We have also identified that there might be need to limit the concurrency of specific tasks (or groups of tasks), as alert types also want to synamically limit how many instances of a certain type can run concurrently.
Describe a specific use case for the feature:
We need to bring this feature back for Scheduled tasks and possibly others such as SIEM.
Pinging @elastic/kibana-alerting-services (Team:Alerting Services)
For actions, and I think alerts, we create a new task type per action type. It may make sense to be able to set the max_workers to all of actions, by being able to say "only run 10 tasks of type action:.* or something - all the action taskTypes start with action:. Another set of knobs and dials, but it's coarser than per actionType exactly, and so would be easier to configure for customers, vs having to configure every single actionType.
Alternatively, we could probably also just have one taskType for all actions, and plumb more data into it - not sure what the pros/cons are to that.
Part of the complication is in how TM claims tasks - we don't want to lose cycles where we claim 10 and then drop them because's we're at capacity with that specific type, but have capacity for others.
We need to see if we can find a solution that can be applied in the query within ES.
Would it also be possible to use these settings to configure TM to completely disable itself from claiming a certain task type?
Maybe that could be the same as setting the allowed concurrent tasks of a type to 0.
If Reporting uses Task Manager and I have an instance that I don't want to be able to execute Reports, this setting would give me what I need.
Maybe that could be the same as setting the allowed concurrent tasks of a type to 0.
That makes sense, but we will probably want an info message about this on at startup, for diagnostic purposes. Eg, someone uses 0 on all instances, and then wonders why those tasks never run.
Came up with a possible direction, details over here:
https://github.com/elastic/kibana/issues/71441#issuecomment-661955231
https://github.com/elastic/kibana/issues/71441#issuecomment-662388606
If @tsullivan & @joelgriffith feel this adequately addresses their needs and @elastic/kibana-alerting-services like the direction, then we can consider pulling this issue into the To Do list I think.
Having discussed the issue with Alerting Services and Reporting, we've decided to go the route of adding limited support for concurrency which will specifically support Reporting, but we won't allow other task types to utilise it for the time being to avoid adding too many additional pollers.
We feel comfortable adding a second poller for Reporting as they'll be removing their use of ES queue in that same version, meaning that, in effect, there's the same number of polls running in parallel as before.
This work will follow the path spiked over here: https://github.com/elastic/kibana/pull/74883
I'm wondering if we would want to reframe this as a "one concurrent task poller", compared to just reporting. Would be for "large/expensive" tasks. Reporting today, probably more tomorrow ...
"one concurrent task poller"
That makes sense to me. Allow any app or service to register a "large/expensive" task definition, and the secondary poller could search for these tasks with a size of 1. Whichever large task has been waiting the longest would get singularly claimed with each poll interval. Scaling up with multiple instances of Kibana would help with keeping a backlog down. Perhaps the interval duration could be configurable if the machine has the hardware to do more work on the backlog.
Most helpful comment
Having discussed the issue with Alerting Services and Reporting, we've decided to go the route of adding limited support for concurrency which will specifically support Reporting, but we won't allow other task types to utilise it for the time being to avoid adding too many additional pollers.
We feel comfortable adding a second poller for Reporting as they'll be removing their use of ES queue in that same version, meaning that, in effect, there's the same number of polls running in parallel as before.
This work will follow the path spiked over here: https://github.com/elastic/kibana/pull/74883