Nomad v0.12.8 (b8501642d24a23c6788c267acfa2b38e50869cdf)
Seen also in 0.10.
Ubuntu 18.08
Allocations and Jobs APIs return different states for the same job.
The result from the allocation and jobs API is different:
$ curl -s 127.0.0.1:4646/v1/allocation/859e12ae-559d-72f4-8db9-0f0c8d2d088c | jq .Job > /tmp/alloc-job.json
$ curl -s 127.0.0.1:4646/v1/job/consul | jq . > /tmp/job.json
$ diff /tmp/job.json /tmp/alloc-job.json
146c146
< "Status": "running",
---
> "Status": "pending",
148c148
< "Stable": true,
---
> "Stable": false,
152c152
< "ModifyIndex": 17,
---
> "ModifyIndex": 10,
Using the alloc subcommand shows everything as running:
ID = 859e12ae-559d-72f4-8db9-0f0c8d2d088c
Eval ID = aab59136
Name = consul.server[0]
Node ID = c541f14e
Node Name = voyager
Job ID = consul
Job Version = 0
Client Status = running
Client Description = Tasks are running
Desired Status = run
Desired Description = <none>
Created = 36m48s ago
Modified = 36m34s ago
Deployment ID = 36ec8bce
Deployment Health = healthy
Task "consul-dev" is "running"
Task Resources
CPU Memory Disk Addresses
315/100 MHz 99 MiB/300 MiB 300 MiB
Task Events:
Started At = 2020-11-18T11:53:05Z
Finished At = N/A
Total Restarts = 0
Last Restart = N/A
Recent Events:
Time Type Description
2020-11-18T12:53:05+01:00 Started Task started by client
2020-11-18T12:53:01+01:00 Downloading Artifacts Client is downloading artifacts
2020-11-18T12:53:01+01:00 Task Setup Building Task Directory
2020-11-18T12:53:01+01:00 Received Task received by client
Reproduced with this job file:
job "consul" {
datacenters = ["dc1"]
group "server" {
task "consul-dev" {
driver = "raw_exec"
config {
command = "consul"
args = ["agent", "-dev"]
}
artifact {
source = "https://releases.hashicorp.com/consul/1.7.1/consul_1.7.1_linux_amd64.zip"
}
}
}
}
2020-11-18T13:32:00.502+0100 [DEBUG] http: request complete: method=GET path=/v1/job/consul duration=181.042碌s
2020-11-18T13:32:01.416+0100 [DEBUG] http: request complete: method=GET path=/v1/allocation/859e12ae-559d-72f4-8db9-0f0c8d2d088c duration=607.667碌s
thanks, @jsoriano, i have reproduced this ~and will push a fix.~ see below
the problem appears to be that updateJobStabilityImpl upserts a modified copy of the job, but the allocation still has a pointer to the previous version of the job:
https://github.com/hashicorp/nomad/blob/v0.12.8/nomad/state/state_store.go#L3805-L3807
okay, @jsoriano , the Job field on the allocation is a copy of the job, created at allocation time, intended for use by the Nomad clients. it is only modified when there are changes to the job that allow for an in-place update of the allocation. this behavior is intentional, although it is not documented.
and even if it were documented on the API, it would be buried pretty deep in here
@cgbaker thanks for the clarifications! I think that a description of the Job field in the Allocation API would help, even if buried pretty deep :slightly_smiling_face:
So, if we want to check the status of a job we should rely on the Jobs API and not on the Allocations one, right?
By the way, to what client refers the ClientStatus of an allocation? Could this be also used to check the status of a job?
Yes, the job endpoint will have the best information (especially if there are allocations from overlapping versions of the job). The deployment endpoint may be useful as well.
ClientStatus of the allocation refers to the actual status of the allocation on the client, as opposed to the DesiredStatus. ClientStatus will be one of the following:
https://github.com/hashicorp/nomad/blob/v0.12.8/nomad/structs/structs.go#L8499-L8503
You are correct; the Status of the job can be computed from the status of the allocations. In fact, that's how the job status is computed:
https://github.com/hashicorp/nomad/blob/v0.12.8/nomad/structs/structs.go#L3702-L3704
The only other thing to consider for job status is Stopped... a job will be "dead" if all of the allocations are terminal, but Stopped = true means that the operator set the desired state of the job to stopped, using the nomad job stop
Thanks @cgbaker!
Most helpful comment
okay, @jsoriano , the
Jobfield on the allocation is a copy of the job, created at allocation time, intended for use by the Nomad clients. it is only modified when there are changes to the job that allow for an in-place update of the allocation. this behavior is intentional, although it is not documented.and even if it were documented on the API, it would be buried pretty deep in here