Nomad: [question] is it possible to limit number of childs in periodic jobs?

Created on 22 Mar 2019 · 3Comments · Source: hashicorp/nomad

I am trying to understand if its possible to limit number of periodic job childs somehow.

E.g. i do have a periodic job running every 15 minutes which normally exits in a 5-30m. It is fine to have concurrent jobs, however, if due some bugs in the code this jobs would start hanging - one such parent and it childs will exhaust all cluster resources very soon as number of concurrent jobs will grow very fast.

Is there any way to limit allowed maximum amount of children?

themscheduling typquestion

Source

samm-git

Most helpful comment

Currently you cannot specify a run limit for periodic batch jobs. You may want to track #1782 as timeouts might be useful for your use case when implemented.

That being said one workaround would be to use constraints, probably node_class, to limit the number of nodes the periodic batch jobs could run on. For example if you have 5 servers with node_class=periodic that have 30gb of memory, and each invocation of the periodic batch job requires 3gb: (5 servers * 30 gb) / 3 = 50 - so 50 max running instances. Further invocations will be queued until resources are freed.

I'm going to close this ticket, but please feel free to open a feature request with your ideal behavior if this doesn't meet your needs.

schmichael on 22 Mar 2019

👍2

All 3 comments

Currently you cannot specify a run limit for periodic batch jobs. You may want to track #1782 as timeouts might be useful for your use case when implemented.

I'm going to close this ticket, but please feel free to open a feature request with your ideal behavior if this doesn't meet your needs.

schmichael on 22 Mar 2019

👍2

@schmichael thank you for quick reply and explanation, that was very helpful.

As for now we would probably just disable prohibit_overlap for all running batches, as it seems to be very dangerous feature for the our env. However, it would be great to see some limits of maximum task children in the feature. Thanks for the hint with node_class, however, this is not the best situation for us as we will need to provision special "batch workers" nodes for that and its not something we would like to do.

samm-git on 23 Mar 2019

Our enterprise product does have Namespaces and Quotas which would allow you to use my node_class approach without provisioning new nodes. You would launch these batch jobs in their own namespace with a resource constrained quota.

(I promise we are not intentionally leaving out the batch job limit to sell more licenses! I just wanted to offer another workaround.)

schmichael on 25 Mar 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings