Nomad v0.6.0-dev (c96995b3f56b787be7356ae3c639b451519b3b09+CHANGES)
it was the latest stable build on the CI-server for linux, at the time (https://travis-ci.org/hashicorp/nomad/builds/245519967)
Ubuntu 16.04.2 LTS
Allocation folder is not created.
After stopping and starting correctly for a few hours, tasks stay in "pending" and allocation does not exist. After that it is every time reproducibly.
Stop and start a system-type job (zookeeper) with raw-exec, continuously, for a few hours.
==> Monitoring evaluation "c25a1f6f"
Evaluation triggered by job "zookeeper"
Allocation "85cd2cdb" created: node "7dcfd48d", group "zk"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "c25a1f6f" finished with status "complete"
Allocations
ID Node ID Task Group Desired Status Created At
85cd2cdb-e46d-63d4-28c7-75c32ebb019b 7dcfd48d-b306-1771-2b91-b334f650e669 zk run pending 06/28/17 14:28:48 CEST
ID = 85cd2cdb-e46d-63d4-28c7-75c32ebb019b
Eval ID = c25a1f6f-c0a7-3d41-2806-1ffc610060ef
Name = zookeeper.zk[0]
Node ID = 7dcfd48d-b306-1771-2b91-b334f650e669
Job ID = zookeeper
Client Status = pending
Client Description =
Desired Status = run
Desired Description =
Created At = 06/28/17 14:28:48 CEST
Evaluated Nodes = 1
Filtered Nodes = 0
Exhausted Nodes = 0
Allocation Time = 32.143碌s
Failures = 0
Couldn't retrieve stats (HINT: ensure Client.Advertise.HTTP is set): Unexpected response code: 500 (unknown allocation ID "85cd2cdb-e46d-63d4-28c7-75c32ebb019b")
2017/06/28 14:28:48.191792 [DEBUG] http: Request /v1/node/7dcfd48d-b306-1771-2b91-b334f650e669/allocations (5.285766ms)
2017/06/28 14:28:48.218902 [DEBUG] http: Request /v1/jobs (184.575碌s)
2017/06/28 14:28:48.269848 [DEBUG] http: Request /v1/evaluation/63dd241d-2292-0d91-7bc0-8c144da0b2d3 (97.452碌s)
2017/06/28 14:28:48.270547 [DEBUG] http: Request /v1/evaluation/63dd241d-2292-0d91-7bc0-8c144da0b2d3/allocations (74.02碌s)
2017/06/28 14:28:48.289463 [DEBUG] http: Request /v1/jobs (4.823423ms)
2017/06/28 14:28:48.289480 [DEBUG] worker: dequeued evaluation c25a1f6f-c0a7-3d41-2806-1ffc610060ef
2017/06/28 14:28:48.289558 [DEBUG] sched: <Eval 'c25a1f6f-c0a7-3d41-2806-1ffc610060ef' JobID: 'zookeeper'>: allocs: (place 1) (update 0) (migrate 0) (stop 0) (ignore 0) (lost 0)
2017/06/28 14:28:48.291087 [DEBUG] worker: submitted plan for evaluation c25a1f6f-c0a7-3d41-2806-1ffc610060ef
2017/06/28 14:28:48.291106 [DEBUG] sched: <Eval 'c25a1f6f-c0a7-3d41-2806-1ffc610060ef' JobID: 'zookeeper'>: setting status to complete
2017/06/28 14:28:48.292026 [DEBUG] http: Request /v1/evaluation/c25a1f6f-c0a7-3d41-2806-1ffc610060ef (147.784碌s)
2017/06/28 14:28:48.292967 [DEBUG] worker: updated evaluation <Eval 'c25a1f6f-c0a7-3d41-2806-1ffc610060ef' JobID: 'zookeeper'>
2017/06/28 14:28:48.292994 [DEBUG] worker: ack for evaluation c25a1f6f-c0a7-3d41-2806-1ffc610060ef
2017/06/28 14:28:48.293086 [DEBUG] http: Request /v1/evaluation/c25a1f6f-c0a7-3d41-2806-1ffc610060ef/allocations (59.616碌s)
2017/06/28 14:28:49.295300 [DEBUG] http: Request /v1/evaluation/c25a1f6f-c0a7-3d41-2806-1ffc610060ef (93.644碌s)
2017/06/28 14:28:49.296089 [DEBUG] http: Request /v1/evaluation/c25a1f6f-c0a7-3d41-2806-1ffc610060ef/allocations (76.657碌s)
2017/06/28 14:28:51.120297 [DEBUG] http: Request /v1/agent/members (161.74碌s)
2017/06/28 14:28:51.120904 [DEBUG] http: Request /v1/status/leader?region=global (26.519碌s)
2017/06/28 14:28:51.134218 [DEBUG] http: Request /v1/nodes (88.538碌s)
2017/06/28 14:28:51.140131 [DEBUG] http: Request /v1/node/7dcfd48d-b306-1771-2b91-b334f650e669/allocations (5.196479ms)
2017/06/28 14:28:51.167762 [DEBUG] http: Request /v1/jobs (288.523碌s)
We've tested with build 5811, but in the pull requests, there's this merge about persistent allocations and new evaluations, which at least to us _sounds related:_ https://github.com/hashicorp/nomad/commit/5d9ca1b6840a259e2629ca5f5167a0767a91e8ec
Currently re-running tests with build 5839 at revision e9a55d9 to see.
nope, problem persists with
https://github.com/hashicorp/nomad/commit/e9a55d9e0ec8f3ac9df939a70c4946cb2ee7b55b
problem is not reproducible by stopping and starting jobs with version 0.5.6, at least so far.
@DapperTayra Can you share the client logs
Can you also share nomad alloc-status -json <id> of one of the pending allocs
Here is the json data.
json.txt
We have server and client running together.
We stopped the previous task. Its allocation is still available: Its allocation's executor's log is this:
2017/06/29 17:40:14.847440 [DEBUG] executor: launching command /opt/jacktime/scripts/start_zk.sh
The newly started task doesn't have an allocation folder though.
We are seeing the same issue.
Should be fixed by #2852 which will be released in 0.6rc2 soon.