Nomad: Implement deployments for system jobs

Created on 11 Dec 2017 · 11Comments · Source: hashicorp/nomad

As of 0.7.0 deployments - update stanzas with auto_revert, canary, etc - are only implemented for service type jobs. We need to implement auto reverts and other reconciliation features for system jobs. There should be a way to stop a rolling upgrade of a bad system job across the fleet.

stagaccepted themscheduling typenhancement

Source

preetapan

👍21 😕2

Most helpful comment

We'd also love to see this functionality, so +1 from our end! 👍

dpn on 30 Apr 2020

👍6

All 11 comments

/me clicks on issue linked by co-worker, reads issue, sees @schmichael in the activity feed, strokes beard while nodding with approval

👋

mihasya on 12 Dec 2017

😄1

I'm so bored with non-transparent updates for system jobs. Really needs this feature.

alxark on 15 Feb 2018

@preetapan @schmichael @dadgar this is something I really want to see and am happy to have a crack at it unless its already being worked on internally? If you're not, any thoughts, ideas or tips would be greatly appreciated.

jrasell on 8 Jun 2018

@jrasell We want to make several improvements to the system scheduler including implementing deployments, as well as bringing in other improvements that are in the reconciler. This is a fairly large scoped project and implementing this will involve a set of non trivial changes. We are currently targeting this for a future release, likely after Nomad 0.9.0

preetapan on 12 Jun 2018

👍1

@preetapan any update on a timeline for this? I specifically am looking for canary support for system jobs.

jpasichnyk on 19 Jan 2019

@preetapan we've just launched into the world of Nomad, and found this issue when deploying our first system-level job to our cluster. Any update on when we can expect healthchecking for system jobs?

mgeggie on 12 Jul 2019

@preetapan I see a new 0.10 nomad was recently released. Any updates on this feature?

taer on 22 Nov 2019

@jrasell will you take this in your hands now? ;)

burdandrei on 23 Feb 2020

We'd also love to see this functionality, so +1 from our end! 👍

dpn on 30 Apr 2020

👍6

Any updates about status for this functionality?
Or maybe it's already done but only for enterprise version?

xsikor on 16 Jun 2020

Or maybe it's already done but only for enterprise version?

Nope, it will be OSS!

This _is_ roadmapped. As everyone can probably guess there's _a lot_ going on in everyone's lives, so a timeline has been very tricky. We're very excited to see the initial PR #8841 from @dubadub and hope to have someone dig into it with them. There are some very tricky aspects to deployments for system jobs that we need to be right to maximize usability and minimize complexity.

For example if we spun up canaries concurrently with the stable version's allocation on the same node, there would likely be resource conflicts (static ports, host volumes) that block placement or prevent proper functioning. Therefore it seems like system deployments should diverge from service deployments in that canaries should act as _replacements_ instead of _additional capacity._

To further complicate matters: as @dubadub discovered in #8841 the code in question could use some refactoring. The layout of Nomad's scheduler has basically never changed, so as you can imagine there are some opportunities for cleanup.

So please keep the use cases coming! The more detailed you can be about the desired behaviors the better! I know it seems like we're silent sometimes, but we definitely parse and discuss and rehash every word of Github comments to ensure we're meeting the desired use case.