Nomad v0.4.1
CoreOS 1122.2.0
We're using Nomad for running periodic jobs in our PaaS and provide our users with various statistics, e.g. about job allocations.
We have noticed that Nomad's job summary endpoint very often reports allocations to be "queued" in addition to being "complete" (the latter is definitely true as most jobs work just fine).
Here's an example job that runs every minute:
$ nomad status wlc-wonderland-cluster-autoscaler
ID = wlc-wonderland-cluster-autoscaler
Name = wonderland-cluster-autoscaler
Type = batch
Priority = 50
Datacenters = eu-west-1
Status = running
Periodic = true
Next Periodic Launch = 10/19/16 14:23:00 UTC (47s from now)
Using this command...
$ nomad status wlc-wonderland-cluster-autoscaler | grep dead | cut -d" " -f1 | xargs -n1 nomad status | grep ^default
...one can see many allocations where "Queued" and "Complete" are both set to 1:
Task Group Queued Starting Running Failed Complete Lost
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 1 0 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 1 0 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 1 0 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 1 0 0 1 0 0
default 0 0 0 0 1 0
default 1 0 0 0 1 0
default 0 0 0 0 1 0
default 0 0 0 0 1 0
This is very confusing, as we show our users a summary over all allocations, e.g.
Summary:
Queued Starting Running Failed Complete Lost
41 0 0 4 96 0
My best guess is that there's a bug and the queued counter isn't decremented in all cases.
Thanks for reporting, yeah we noticed the summary isn't always accurate. Will look into
The summary is almost always out of sync in our cluster :)
Most helpful comment
Thanks for reporting, yeah we noticed the summary isn't always accurate. Will look into