Nomad: Rolling update doesn't wait for new container to start before stopping next batch

Created on 3 Aug 2016 · 12Comments · Source: hashicorp/nomad

Nomad version

Nomad v0.4.0

Issue

When rolling update are in place, nomad apply delay only for stopping existing evaluations and doesn't check if containers that are replacing stopped ones actually started, which can lead to situations like this:

ID                                    Priority  Triggered By    Status    Placement Failures
cc22d922-637e-6502-0a27-70e7d45f0e59  50        rolling-update  complete  false
1728faa0-e0de-c3d0-75ca-87e98312c525  50        rolling-update  complete  false
d89f4563-6573-3083-8872-f28205ea3209  50        job-register    complete  false

Allocations
ID                                    Eval ID                               Node ID                               Task Group  Desired  Status
06b7b662-7765-40de-47d5-913a9d218090  cc22d922-637e-6502-0a27-70e7d45f0e59  93c61971-69c4-30df-99af-21540ea6a909  admin       run      pending
374da636-08a5-81eb-b8ad-9527b6c2e514  cc22d922-637e-6502-0a27-70e7d45f0e59  93c61971-69c4-30df-99af-21540ea6a909  admin       run      pending
97a4c2ea-703d-7700-0e76-8112639b4cd3  1728faa0-e0de-c3d0-75ca-87e98312c525  45372a28-46f6-c0a4-01a1-1f5af00e056c  admin       run      pending
702a570c-7ccd-69fb-1d85-559c084d0781  1728faa0-e0de-c3d0-75ca-87e98312c525  93c61971-69c4-30df-99af-21540ea6a909  admin       run      pending
92787c01-07f5-31db-e49a-3422ff249dc7  d89f4563-6573-3083-8872-f28205ea3209  45372a28-46f6-c0a4-01a1-1f5af00e056c  admin       run      pending
c0c70990-4b69-adc4-4901-60f904e0b056  d89f4563-6573-3083-8872-f28205ea3209  45372a28-46f6-c0a4-01a1-1f5af00e056c  admin       run      pending
5ea9aa76-a363-37a6-c7b7-9b9794919057  9f14f3f2-5719-2d06-e46c-b7875b7bb4ec  93c61971-69c4-30df-99af-21540ea6a909  admin       stop     complete
161b1b7b-22c0-6b57-85c9-e6021f2fdef3  9f14f3f2-5719-2d06-e46c-b7875b7bb4ec  45372a28-46f6-c0a4-01a1-1f5af00e056c  admin       stop     complete
0ac7c080-4877-a42e-41ea-ae2f9d85bd82  bd2f1c88-422c-e4b1-fc85-bd0279ac034e  45372a28-46f6-c0a4-01a1-1f5af00e056c  admin       stop     complete
b2f60278-fd21-f8ab-4716-3a9975cb9f5e  bd2f1c88-422c-e4b1-fc85-bd0279ac034e  93c61971-69c4-30df-99af-21540ea6a909  admin       stop     complete
5103e096-9eeb-706b-4b08-d1f73fafe927  758af4a6-1e44-2abf-b830-7d4ddee06237  45372a28-46f6-c0a4-01a1-1f5af00e056c  admin       stop     complete
e98f0f97-6d96-e569-74d5-8fc687eb8a83  758af4a6-1e44-2abf-b830-7d4ddee06237  93c61971-69c4-30df-99af-21540ea6a909  admin       stop     complete

In this case there was short stagger time, which lead to service unavailability as nomad stopped all containers, while new ones didn't yet started (here they are pending because nodes are fetching images from remote registry).

While having longer stagger time will resolve the issue, this seems to be a potential cause of downtime as nomad will eventually stop all containers regardless of whether new ones started in place.

Wonder if it would make sense to have an extra flag that will wait for replacing containers to start and abort evaluation in case if containers didn't start within stagger interval. (It is better to have old image running on slightly smaller amount of instances after failed evaluation than end up with no instances running at all)

themcore typenhancement

Source

mclate

👍6

Most helpful comment

We will most likely solve this by waiting for the service to be healthy in Consul before moving on

dadgar on 6 Aug 2016

👍10

All 12 comments

We will most likely solve this by waiting for the service to be healthy in Consul before moving on

dadgar on 6 Aug 2016

👍10

I think that waiting for healthy is the most correct solution to this, as there is a combination of both docker registry pull time and service startup time that need to be waited on before the next service is restarted in order to avoid downtime. That being said, I feel like it would not hurt to also implement the more raw wait for service to transition to running in the mean time, and in case people are not running consul.

camerondavison on 8 Aug 2016

Agree that consul service status would be the best solution, and i also think that there should be "raw wait" at least for tasks that do not expose services.

mclate on 8 Aug 2016

I' see a few issues with relying on Consul only for health checks, In a sense there are two phases in health checking a task, an initial check, to determine that the task has started on the node and after that continuous health checks to verify that the task is still running healthy.

If the initial check passes, then Nomad should continue the rolling update, a failed initial check should probably stop the update?

The initial health check should be done by Nomad, it might be a really simple tcp, http or script check, the continuous checks could be done by Consul.

cldmnky on 26 Sep 2016

👍5

+1 for making sure the task passes it's health check before continuing on.

We ended up writing a push script that would deploy our task in a new job, wait for a successful consul health check, and then update the main job for an upgrade. This wrapper feels unnecessary if the rolling upgrade process worked a little different. With out it we have the risk of rolling out an unhealthy task to 100% of our cluster which isn't one we're going to take.

nathanpalmer on 11 Nov 2016

Hey all,

Just wanted to update that this is slated for 0.6.0

dadgar on 14 Nov 2016

🎉5 👍3

Any estimates on when to expect v0.6.0? Or at least 0.5.0?

mclate on 15 Nov 2016

v0.6.0 is a bit far out to estimate but 0.5.0 will be out very soon!

dadgar on 15 Nov 2016

@nathanpalmer would you like to share your script? Sounds like nice workaround of missing wait for successful health check.

venth on 10 Mar 2017

@venth The whole script is a bit too involved and tied to our setup to get posted here (it spans several files and utilities.) However the basics it looks like this.

1) Our jobs are setup using an prefix and a group. Setup in a blue/green deployment. For example api-green and api-blue.

2) The job.hcl file is scripted using consul-template and environment variables to determine which group we're deploying at a given time (using SERVICE_NAME and SERVICE_GROUP)

{{$name := (env "SERVICE_NAME")}}{{$group := (env "SERVICE_GROUP")}}job "{{$name}}-{{$group}}" {
  group "default" {
    count = {{key (print "service/api/jobs/" $name "-" $group "/count")}}

    meta {
      revision = "{{key (print "service/api/jobs/" $name "-" $group "/revision")}}"
    }

    task "api" {
      ...
    }
  }
}

3) Since that job file is setup so we can deploy the app to any named group we deploy first to a migration group call api-migrate.

4) Wait for the allocation to complete

We query /v1/allocation/ and look for essentially this

allocation["ClientStatus"] == "failed" ||
(allocation["DesiredStatus"] == "run" && allocation["ClientStatus"] == "running") ||
(allocation["DesiredStatus"] == "stop" && allocation["ClientStatus"] == "complete")

5) Wait for the health check to pass

We query /v1/health/service/#{service} looking for a status of passing

6) We tear down the api-migrate job and if the health was successful we update the main group we're deploying (api-green for example)

nathanpalmer on 13 Mar 2017

👍1

@nathanpalmer Thanks for explanation and knowledge sharing ;)

venth on 13 Mar 2017

Hey this has been addressed by deployments in 0.6.0

dadgar on 10 Aug 2017

👍2

Was this page helpful?

0 / 5 - 0 ratings