Nomad: Changing count with constraint kills unrelated allocation

Created on 9 Dec 2016  Â·  12Comments  Â·  Source: hashicorp/nomad

Nomad version

0.5.1-rc1

Operating system and Environment details

golang docker image
landscape: 1 server and 3 clients in seperate docker images

Issue

We use constraints to run jobs selectively on nodes. When trying to stop a allocation on a certain node, we experienced another allocation in unrelated node is killed as well.

First run

ID          = test
Name        = test
Type        = service
Priority    = 50
Datacenters = dc1
Status      = running
Periodic    = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
test        0       0         3        0       0         0

Evaluations
ID                                    Priority  Triggered By  Status    Placement Failures
a4e9ce2d-54f1-c033-f337-cf989a26c3b8  50        job-register  complete  false

Allocations
ID                                    Eval ID                               Node ID                               Task Group  Desired  Status   Created At
5c8fdff9-6059-c8b4-90e2-0b0967e2b9c2  a4e9ce2d-54f1-c033-f337-cf989a26c3b8  77bff70c-6e19-14b0-2aba-2a111e74df64  test        run      running  12/09/16 16:46:41 +03
97623f70-0553-c0f4-2718-211f65a608c3  a4e9ce2d-54f1-c033-f337-cf989a26c3b8  300d9df9-c34a-2ca5-8f07-8d52e437d24c  test        run      running  12/09/16 16:46:41 +03
9e57df5a-2795-3b64-fe3c-31c2ae24acae  a4e9ce2d-54f1-c033-f337-cf989a26c3b8  b2da178c-e411-aa76-fa8b-b4884fd82ed7  test        run      running  12/09/16 16:46:41 +03

Decrease count by one

ID          = test
Name        = test
Type        = service
Priority    = 50
Datacenters = dc1
Status      = running
Periodic    = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
test        0       0         2        0       2         0

Evaluations
ID                                    Priority  Triggered By  Status    Placement Failures
85079a62-dbad-3aea-4d54-cb1384499a28  50        job-register  complete  false
a4e9ce2d-54f1-c033-f337-cf989a26c3b8  50        job-register  complete  false

Allocations
ID                                    Eval ID                               Node ID                               Task Group  Desired  Status    Created At
61531a3d-5536-d8fa-c465-2f2769293aae  85079a62-dbad-3aea-4d54-cb1384499a28  77bff70c-6e19-14b0-2aba-2a111e74df64  test        run      running   12/09/16 16:46:53 +03
5c8fdff9-6059-c8b4-90e2-0b0967e2b9c2  a4e9ce2d-54f1-c033-f337-cf989a26c3b8  77bff70c-6e19-14b0-2aba-2a111e74df64  test        stop     complete  12/09/16 16:46:41 +03
97623f70-0553-c0f4-2718-211f65a608c3  85079a62-dbad-3aea-4d54-cb1384499a28  300d9df9-c34a-2ca5-8f07-8d52e437d24c  test        run      running   12/09/16 16:46:41 +03
9e57df5a-2795-3b64-fe3c-31c2ae24acae  a4e9ce2d-54f1-c033-f337-cf989a26c3b8  b2da178c-e411-aa76-fa8b-b4884fd82ed7  test        stop     complete  12/09/16 16:46:41 +03

Server logs:

Reproduction steps

Nomad Server and Client nodes

node-master_1  |     2016/12/09 13:46:41.916441 [DEBUG] worker: dequeued evaluation a4e9ce2d-54f1-c033-f337-cf989a26c3b8
node-master_1  |     2016/12/09 13:46:41.916543 [DEBUG] http: Request /v1/jobs?region=global (74.583865ms)
node-master_1  |     2016/12/09 13:46:41.917500 [DEBUG] sched: <Eval 'a4e9ce2d-54f1-c033-f337-cf989a26c3b8' JobID: 'test'>: allocs: (place 3) (update 0) (migrate 0) (stop 0) (ignore 0) (lost 0)
node-master_1  |     2016/12/09 13:46:41.920733 [DEBUG] http: Request /v1/evaluation/a4e9ce2d-54f1-c033-f337-cf989a26c3b8?region=global (199.263µs)
node-master_1  |     2016/12/09 13:46:41.924095 [DEBUG] http: Request /v1/evaluation/a4e9ce2d-54f1-c033-f337-cf989a26c3b8/allocations?region=global (328.009µs)
node-master_1  |     2016/12/09 13:46:41.954725 [DEBUG] worker: submitted plan for evaluation a4e9ce2d-54f1-c033-f337-cf989a26c3b8
node-master_1  |     2016/12/09 13:46:41.954789 [DEBUG] sched: <Eval 'a4e9ce2d-54f1-c033-f337-cf989a26c3b8' JobID: 'test'>: setting status to complete
node-master_1  |     2016/12/09 13:46:41.988504 [DEBUG] worker: updated evaluation <Eval 'a4e9ce2d-54f1-c033-f337-cf989a26c3b8' JobID: 'test'>
node-master_1  |     2016/12/09 13:46:41.988771 [DEBUG] worker: ack for evaluation a4e9ce2d-54f1-c033-f337-cf989a26c3b8
node-master_1  |     2016/12/09 13:46:42.660747 [DEBUG] http: Request /v1/status/peers (302.84µs)
node-master_1  |     2016/12/09 13:46:42.935293 [DEBUG] http: Request /v1/evaluation/a4e9ce2d-54f1-c033-f337-cf989a26c3b8?region=global (606.678µs)
node-master_1  |     2016/12/09 13:46:42.939550 [DEBUG] http: Request /v1/evaluation/a4e9ce2d-54f1-c033-f337-cf989a26c3b8/allocations?region=global (922.802µs)
node-master_1  |     2016/12/09 13:46:44 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:46:44 [INFO] agent: Synced check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:46:44 [INFO] agent: Deregistered check 'd2274318992953a57b186e3b571bff4fe17e6b02'
node-master_1  |     2016/12/09 13:46:46 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:46:46 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:46:46 [INFO] agent: Synced check '2e5e445cb4e40a9aa01af2141487873d2b52ec5f'
node-master_1  |     2016/12/09 13:46:46 [DEBUG] memberlist: TCP connection from=172.19.0.2:42278
node-master_1  |     2016/12/09 13:46:46 [INFO] agent: Deregistered check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:46:46 [INFO] agent: Synced check 'd2274318992953a57b186e3b571bff4fe17e6b02'
node-master_1  |     2016/12/09 13:46:46 [INFO] agent: Deregistered check '2e5e445cb4e40a9aa01af2141487873d2b52ec5f'
node-master_1  |     2016/12/09 13:46:46 [INFO] agent: Deregistered check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:46:47.652559 [DEBUG] http: Request /v1/jobs?prefix=test (359.173µs)
node-master_1  |     2016/12/09 13:46:47.655421 [DEBUG] http: Request /v1/job/test (256.395µs)
node-master_1  |     2016/12/09 13:46:47.658849 [DEBUG] http: Request /v1/job/test/allocations (448.766µs)
node-master_1  |     2016/12/09 13:46:47.660902 [DEBUG] http: Request /v1/job/test/evaluations (165.403µs)
node-master_1  |     2016/12/09 13:46:47.663151 [DEBUG] http: Request /v1/job/test/summary (133.541µs)
node-master_1  |     2016/12/09 13:46:49 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:46:49 [INFO] agent: Synced check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:46:49 [INFO] agent: Deregistered check 'd2274318992953a57b186e3b571bff4fe17e6b02'
node-master_1  |     2016/12/09 13:46:51 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:46:51 [INFO] agent: Synced check '2e5e445cb4e40a9aa01af2141487873d2b52ec5f'
node-master_1  |     2016/12/09 13:46:51 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:46:51 [INFO] agent: Deregistered check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:46:51 [INFO] agent: Synced check 'd2274318992953a57b186e3b571bff4fe17e6b02'
node-master_1  |     2016/12/09 13:46:51 [INFO] agent: Deregistered check '2e5e445cb4e40a9aa01af2141487873d2b52ec5f'
node-master_1  |     2016/12/09 13:46:52.664246 [DEBUG] http: Request /v1/status/peers (261.888µs)
node-master_1  |     2016/12/09 13:46:53.488955 [DEBUG] worker: dequeued evaluation 85079a62-dbad-3aea-4d54-cb1384499a28
node-master_1  |     2016/12/09 13:46:53.489154 [DEBUG] sched: <Eval '85079a62-dbad-3aea-4d54-cb1384499a28' JobID: 'test'>: allocs: (place 0) (update 2) (migrate 0) (stop 1) (ignore 0) (lost 0)
node-master_1  |     2016/12/09 13:46:53.488971 [DEBUG] http: Request /v1/jobs?region=global (46.215815ms)
node-master_1  |     2016/12/09 13:46:53.489551 [DEBUG] sched: <Eval '85079a62-dbad-3aea-4d54-cb1384499a28' JobID: 'test'>: 1 in-place updates of 2
node-master_1  |     2016/12/09 13:46:53.493578 [DEBUG] http: Request /v1/evaluation/85079a62-dbad-3aea-4d54-cb1384499a28?region=global (598.687µs)
node-master_1  |     2016/12/09 13:46:53.497262 [DEBUG] http: Request /v1/evaluation/85079a62-dbad-3aea-4d54-cb1384499a28/allocations?region=global (1.208662ms)
node-master_1  |     2016/12/09 13:46:53.524795 [DEBUG] worker: submitted plan for evaluation 85079a62-dbad-3aea-4d54-cb1384499a28
node-master_1  |     2016/12/09 13:46:53.524837 [DEBUG] sched: <Eval '85079a62-dbad-3aea-4d54-cb1384499a28' JobID: 'test'>: setting status to complete
node-master_1  |     2016/12/09 13:46:53.550077 [DEBUG] worker: updated evaluation <Eval '85079a62-dbad-3aea-4d54-cb1384499a28' JobID: 'test'>
node-master_1  |     2016/12/09 13:46:53.550150 [DEBUG] worker: ack for evaluation 85079a62-dbad-3aea-4d54-cb1384499a28
node-master_1  |     2016/12/09 13:46:54.500506 [DEBUG] http: Request /v1/evaluation/85079a62-dbad-3aea-4d54-cb1384499a28?region=global (218.34µs)
node-master_1  |     2016/12/09 13:46:54.502787 [DEBUG] http: Request /v1/evaluation/85079a62-dbad-3aea-4d54-cb1384499a28/allocations?region=global (280.865µs)
node-master_1  |     2016/12/09 13:46:54 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:46:54 [INFO] agent: Synced check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:46:54 [INFO] agent: Deregistered check 'd2274318992953a57b186e3b571bff4fe17e6b02'
node-master_1  |     2016/12/09 13:46:55.676551 [DEBUG] http: Request /v1/jobs?prefix=test (333.004µs)
node-master_1  |     2016/12/09 13:46:55.678658 [DEBUG] http: Request /v1/job/test (179.087µs)
node-master_1  |     2016/12/09 13:46:55.684467 [DEBUG] http: Request /v1/job/test/allocations (223.334µs)
node-master_1  |     2016/12/09 13:46:55.687675 [DEBUG] http: Request /v1/job/test/evaluations (1.343601ms)
node-master_1  |     2016/12/09 13:46:55.689539 [DEBUG] http: Request /v1/job/test/summary (132.042µs)
node-master_1  |     2016/12/09 13:46:56 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:46:56 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:46:56 [INFO] agent: Deregistered check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:46:56 [INFO] agent: Synced check 'd2274318992953a57b186e3b571bff4fe17e6b02'
node-master_1  |     2016/12/09 13:46:56 [DEBUG] memberlist: TCP connection from=172.19.0.2:42322
node-master_1  |     2016/12/09 13:46:56 [INFO] agent: Deregistered check '2e5e445cb4e40a9aa01af2141487873d2b52ec5f'
node-master_1  |     2016/12/09 13:46:56 [INFO] agent: Synced check 'd2274318992953a57b186e3b571bff4fe17e6b02'
node-master_1  |     2016/12/09 13:46:59 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:46:59 [INFO] agent: Synced check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:46:59 [INFO] agent: Deregistered check 'd2274318992953a57b186e3b571bff4fe17e6b02'
node-master_1  |     2016/12/09 13:47:01 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:47:01 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:47:01 [INFO] agent: Synced check '2e5e445cb4e40a9aa01af2141487873d2b52ec5f'
node-master_1  |     2016/12/09 13:47:01 [INFO] agent: Deregistered check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:47:01 [INFO] agent: Synced check 'd2274318992953a57b186e3b571bff4fe17e6b02'
node-master_1  |     2016/12/09 13:47:01 [INFO] agent: Deregistered check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:47:01 [INFO] agent: Deregistered check '2e5e445cb4e40a9aa01af2141487873d2b52ec5f'
node-master_1  |     2016/12/09 13:47:02.666590 [DEBUG] http: Request /v1/status/peers (463.249µs)
node-master_1  |     2016/12/09 13:47:04 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:47:04 [INFO] agent: Synced check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:47:04 [INFO] agent: Deregistered check 'd2274318992953a57b186e3b571bff4fe17e6b02'
node-master_1  |     2016/12/09 13:47:06 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:47:06 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:47:06 [INFO] agent: Synced check '2e5e445cb4e40a9aa01af2141487873d2b52ec5f'
node-master_1  |     2016/12/09 13:47:06 [INFO] agent: Synced check 'd2274318992953a57b186e3b571bff4fe17e6b02'
node-master_1  |     2016/12/09 13:47:06 [DEBUG] memberlist: TCP connection from=172.19.0.2:42338
node-master_1  |     2016/12/09 13:47:06 [INFO] agent: Deregistered check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:47:06 [INFO] agent: Deregistered check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:47:06 [INFO] agent: Synced check 'd2274318992953a57b186e3b571bff4fe17e6b02'
node-master_1  |     2016/12/09 13:47:09 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:47:09 [INFO] agent: Synced check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:47:09 [INFO] agent: Deregistered check '2e5e445cb4e40a9aa01af2141487873d2b52ec5f'
node-master_1  |     2016/12/09 13:47:09 [INFO] agent: Deregistered check 'd2274318992953a57b186e3b571bff4fe17e6b02'
node-master_1  |     2016/12/09 13:47:11 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:47:11 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:47:11 [INFO] agent: Synced check '2e5e445cb4e40a9aa01af2141487873d2b52ec5f'
node-master_1  |     2016/12/09 13:47:11 [INFO] agent: Synced check 'd2274318992953a57b186e3b571bff4fe17e6b02'
node-master_1  |     2016/12/09 13:47:11 [INFO] agent: Deregistered check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:47:11 [INFO] agent: Deregistered check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:47:11 [INFO] agent: Deregistered check '2e5e445cb4e40a9aa01af2141487873d2b52ec5f'
node-master_1  |     2016/12/09 13:47:12.669439 [DEBUG] http: Request /v1/status/peers (480.728µs)
node-master_1  |     2016/12/09 13:47:13 [INFO] agent: Synced check 'd2274318992953a57b186e3b571bff4fe17e6b02'
node-master_1  |     2016/12/09 13:47:14 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:47:14 [INFO] agent: Synced check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:47:14 [INFO] agent: Deregistered check 'd2274318992953a57b186e3b571bff4fe17e6b02'
node-master_1  |     2016/12/09 13:47:16 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:47:16 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:47:16 [INFO] agent: Synced check '2e5e445cb4e40a9aa01af2141487873d2b52ec5f'
node-master_1  |     2016/12/09 13:47:16 [INFO] agent: Deregistered check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:47:16 [INFO] agent: Synced check 'd2274318992953a57b186e3b571bff4fe17e6b02'
node-master_1  |     2016/12/09 13:47:16 [INFO] agent: Deregistered check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:47:16 [DEBUG] memberlist: TCP connection from=172.19.0.2:42354
node-master_1  |     2016/12/09 13:47:16 [INFO] agent: Deregistered check '2e5e445cb4e40a9aa01af2141487873d2b52ec5f'
node-master_1  |     2016/12/09 13:47:19 [INFO] agent: Synced service '_nomad-client-nomad-client-http'
node-master_1  |     2016/12/09 13:47:19 [INFO] agent: Synced check 'f78cd714d79a0632f32d1680b09cb19650591072'
node-master_1  |     2016/12/09 13:47:19 [INFO] agent: Deregistered check 'd2274318992953a57b186e3b571bff4fe17e6b02'
node-master_1  |     2016/12/09 13:47:20 [INFO] agent: Synced check 'f78cd714d79a0632f32d1680b09cb19650591072'

Client logs

node-worker_2  |     2016/12/09 13:46:41.966382 [DEBUG] client: starting task context for 'test' (alloc '97623f70-0553-c0f4-2718-211f65a608c3')
node-worker_3  |     2016/12/09 13:46:16.108946 [DEBUG] driver.exec: exec driver is enabled
node-worker_1  |     2016/12/09 13:46:14.551683 [DEBUG] driver.exec: exec driver is enabled
node-worker_2  |     2016/12/09 13:46:41 [DEBUG] plugin: starting plugin: /usr/local/bin/nomad []string{"/usr/local/bin/nomad", "executor", "/tmp/nomad/alloc/97623f70-0553-c0f4-2718-211f65a608c3/test/test-executor.out"}
node-worker_3  |     2016/12/09 13:46:16.109117 [DEBUG] client: available drivers [exec raw_exec]
node-worker_1  |     2016/12/09 13:46:14.551703 [DEBUG] client: available drivers [raw_exec exec]
node-worker_2  |     2016/12/09 13:46:41 [DEBUG] plugin: waiting for RPC address for: /usr/local/bin/nomad
node-worker_3  |     2016/12/09 13:46:16.109479 [DEBUG] client: fingerprinting exec every 15s
node-worker_1  |     2016/12/09 13:46:14.551811 [DEBUG] client: fingerprinting docker every 15s
node-worker_2  |     2016/12/09 13:46:41 [DEBUG] plugin: nomad: 2016/12/09 13:46:41 [DEBUG] plugin: plugin address: unix /tmp/plugin393590857
node-worker_3  |     2016/12/09 13:46:16.109511 [DEBUG] client: fingerprinting docker every 15s
node-worker_1  |     2016/12/09 13:46:14.551863 [DEBUG] client: fingerprinting exec every 15s
node-worker_2  |     2016/12/09 13:46:42.008656 [DEBUG] driver.raw_exec: started process with pid: 53
node-worker_3  |     2016/12/09 13:46:16.109533 [DEBUG] client: fingerprinting rkt every 15s
node-worker_1  |     2016/12/09 13:46:14.555113 [INFO] client: Node ID "b2da178c-e411-aa76-fa8b-b4884fd82ed7"
node-worker_2  |     2016/12/09 13:46:42.222906 [DEBUG] client: updated allocations at index 14 (pulled 0) (filtered 1)
node-worker_3  |     2016/12/09 13:46:16.111698 [INFO] client: Node ID "77bff70c-6e19-14b0-2aba-2a111e74df64"
node-worker_1  |     2016/12/09 13:46:14.558544 [DEBUG] client: updated allocations at index 1 (pulled 0) (filtered 0)
node-worker_2  |     2016/12/09 13:46:42.223185 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 1)
node-worker_3  |     2016/12/09 13:46:16.115786 [DEBUG] client: updated allocations at index 1 (pulled 0) (filtered 0)
node-worker_1  |     2016/12/09 13:46:14.559167 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 0)
node-worker_2  |     2016/12/09 13:46:53.525526 [DEBUG] client: updated allocations at index 17 (pulled 1) (filtered 0)
node-worker_3  |     2016/12/09 13:46:16.115975 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 0)
node-worker_2  |     2016/12/09 13:46:53.529408 [DEBUG] client: allocs: (added 0) (removed 0) (updated 1) (ignore 0)
node-worker_2  |     2016/12/09 13:46:56.496575 [DEBUG] http: Request /v1/agent/servers (385.242µs)
node-worker_1  |     2016/12/09 13:46:14.602584 [INFO] client: node registration complete
node-worker_3  |     2016/12/09 13:46:16.225712 [INFO] client: node registration complete
node-worker_1  |     2016/12/09 13:46:14.602675 [DEBUG] client: periodically checking for node changes at duration 5s
node-worker_3  |     2016/12/09 13:46:16.225891 [DEBUG] client: periodically checking for node changes at duration 5s
node-worker_1  |     2016/12/09 13:46:23.962825 [DEBUG] client: state updated to ready
node-worker_3  |     2016/12/09 13:46:23.557982 [DEBUG] client: state updated to ready
node-worker_1  |     2016/12/09 13:46:26.237898 [DEBUG] http: Request /v1/agent/servers (656.818µs)
node-worker_3  |     2016/12/09 13:46:31.359478 [DEBUG] http: Request /v1/agent/servers (1.998523ms)
node-worker_1  |     2016/12/09 13:46:40.940483 [DEBUG] http: Request /v1/agent/servers (601.585µs)
node-worker_3  |     2016/12/09 13:46:41.955034 [DEBUG] client: updated allocations at index 12 (pulled 1) (filtered 0)
node-worker_1  |     2016/12/09 13:46:41.956297 [DEBUG] client: updated allocations at index 12 (pulled 1) (filtered 0)
node-worker_3  |     2016/12/09 13:46:41.959128 [DEBUG] client: allocs: (added 1) (removed 0) (updated 0) (ignore 0)
node-worker_1  |     2016/12/09 13:46:41.962678 [DEBUG] client: allocs: (added 1) (removed 0) (updated 0) (ignore 0)
node-worker_3  |     2016/12/09 13:46:41.964146 [DEBUG] client: starting task runners for alloc '5c8fdff9-6059-c8b4-90e2-0b0967e2b9c2'
node-worker_1  |     2016/12/09 13:46:41.965021 [DEBUG] client: starting task runners for alloc '9e57df5a-2795-3b64-fe3c-31c2ae24acae'
node-worker_3  |     2016/12/09 13:46:41.964363 [DEBUG] client: starting task context for 'test' (alloc '5c8fdff9-6059-c8b4-90e2-0b0967e2b9c2')
node-worker_1  |     2016/12/09 13:46:41.965158 [DEBUG] client: starting task context for 'test' (alloc '9e57df5a-2795-3b64-fe3c-31c2ae24acae')
node-worker_3  |     2016/12/09 13:46:41 [DEBUG] plugin: starting plugin: /usr/local/bin/nomad []string{"/usr/local/bin/nomad", "executor", "/tmp/nomad/alloc/5c8fdff9-6059-c8b4-90e2-0b0967e2b9c2/test/test-executor.out"}
node-worker_1  |     2016/12/09 13:46:41 [DEBUG] plugin: starting plugin: /usr/local/bin/nomad []string{"/usr/local/bin/nomad", "executor", "/tmp/nomad/alloc/9e57df5a-2795-3b64-fe3c-31c2ae24acae/test/test-executor.out"}
node-worker_3  |     2016/12/09 13:46:41 [DEBUG] plugin: waiting for RPC address for: /usr/local/bin/nomad
node-worker_1  |     2016/12/09 13:46:41 [DEBUG] plugin: waiting for RPC address for: /usr/local/bin/nomad
node-worker_3  |     2016/12/09 13:46:41 [DEBUG] plugin: nomad: 2016/12/09 13:46:41 [DEBUG] plugin: plugin address: unix /tmp/plugin828672124
node-worker_1  |     2016/12/09 13:46:41 [DEBUG] plugin: nomad: 2016/12/09 13:46:41 [DEBUG] plugin: plugin address: unix /tmp/plugin077826786
node-worker_3  |     2016/12/09 13:46:41.995077 [DEBUG] driver.raw_exec: started process with pid: 55
node-worker_1  |     2016/12/09 13:46:42.000928 [DEBUG] driver.raw_exec: started process with pid: 55
node-worker_3  |     2016/12/09 13:46:42.224788 [DEBUG] client: updated allocations at index 14 (pulled 0) (filtered 1)
node-worker_1  |     2016/12/09 13:46:42.223212 [DEBUG] client: updated allocations at index 14 (pulled 0) (filtered 1)
node-worker_3  |     2016/12/09 13:46:42.225486 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 1)
node-worker_1  |     2016/12/09 13:46:42.223610 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 1)
node-worker_3  |     2016/12/09 13:46:53.525947 [DEBUG] client: updated allocations at index 17 (pulled 2) (filtered 0)
node-worker_1  |     2016/12/09 13:46:53.525085 [DEBUG] client: updated allocations at index 17 (pulled 1) (filtered 0)
node-worker_3  |     2016/12/09 13:46:53.527484 [DEBUG] client: allocs: (added 1) (removed 0) (updated 1) (ignore 0)
node-worker_1  |     2016/12/09 13:46:53.530257 [DEBUG] client: allocs: (added 0) (removed 0) (updated 1) (ignore 0)
node-worker_1  |     2016/12/09 13:46:53 [DEBUG] plugin: /usr/local/bin/nomad: plugin process exited
node-worker_1  |     2016/12/09 13:46:53.629011 [DEBUG] client: updated allocations at index 19 (pulled 0) (filtered 1)
node-worker_3  |     2016/12/09 13:46:53.528743 [DEBUG] client: starting task runners for alloc '61531a3d-5536-d8fa-c465-2f2769293aae'
node-worker_1  |     2016/12/09 13:46:53.629197 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 1)
node-worker_3  |     2016/12/09 13:46:53.528979 [DEBUG] client: starting task context for 'test' (alloc '61531a3d-5536-d8fa-c465-2f2769293aae')
node-worker_3  |     2016/12/09 13:46:53 [DEBUG] plugin: starting plugin: /usr/local/bin/nomad []string{"/usr/local/bin/nomad", "executor", "/tmp/nomad/alloc/61531a3d-5536-d8fa-c465-2f2769293aae/test/test-executor.out"}
node-worker_3  |     2016/12/09 13:46:53 [DEBUG] plugin: waiting for RPC address for: /usr/local/bin/nomad
node-worker_3  |     2016/12/09 13:46:53 [DEBUG] plugin: /usr/local/bin/nomad: plugin process exited
node-worker_3  |     2016/12/09 13:46:53 [DEBUG] plugin: nomad: 2016/12/09 13:46:53 [DEBUG] plugin: plugin address: unix /tmp/plugin785637411
node-worker_3  |     2016/12/09 13:46:53.551602 [DEBUG] driver.raw_exec: started process with pid: 76
node-worker_3  |     2016/12/09 13:46:53.802529 [DEBUG] client: updated allocations at index 20 (pulled 0) (filtered 2)
node-worker_3  |     2016/12/09 13:46:53.802634 [DEBUG] client: allocs: (added 0) (removed 0) (updated 0) (ignore 2)

Job file (if appropriate)

job "test" {
    datacenters = ["dc1"]

    constraint {
      attribute = "${node.unique.id}"
      value     = "300d9df9|77bff70c"
      operator  = "regexp"
    }

    type = "service"

    group "test" {
        count = 2
        restart {
            interval = "5m"
            attempts = 20
            delay = "10s"
            mode = "delay"
        }

        task "test" {
            driver = "raw_exec"
            config {
                command= "/bin/sleep"
                args = ["1000"]
            }

            resources {
                cpu = 100
                memory = 100
                // disk = 110
                network {
                    mbits = 1
                }
            }
        }
    }
}
stagwaiting-reply

All 12 comments

@kaskavalci I couldn't reproduce this.

I created two clients and a server on the same node. I used your job file and set the count to 3 and then decremented it to 2, I saw only one alloc move to stopped state.

Can you try a couple of times? It is sort of sporadic but pretty often. I will also share my setup with you through gitter.

@kaskavalci Hey I understand why this is happening now and it isn't really a bug but a side effect of the systems design. When you scale down that will cause the allocation with the highest alloc index to be destroyed. So even though you are targeting a particular set of nodes to keep it may still cause a destroy.

To be clear this is job.taskgroup[0-count-1]. This is so that the set of allocations that exists are between 0 and count - 1.

Hi @dadgar thanks for looking into it. Is it possible to respect the new constraint without disturbing others? Another follow up question, is it always one alloc will be restarted in such case? Could it be a case where almost all allocs are getting restarted?

@kaskavalci It is very much possible for all allocs to be affected. All depends on the nodes that get to stay and their alloc indexes.

It is something we have been thinking about as we want to bring life-cycle hooks to allocations but it is a pretty core part of the schedulers design and there are nice side effects for users. They can use the alloc index for leader election/sharding/coordination etc. So for now there isn't really a way to kill particular allocations like that.

Can you describe your use case?

@dadgar I see. We want user to be able to kill all allocations on a given node or add/remove nodes to jobs without doing a complete stop -> start cycle. When dealing with stateful jobs with large data in memory, you don't want to restart them with no reason. User should be able to scale down for maintenance or turning off a node to save energy without disrupting the cluster.

@kaskavalci Have you seen nomad node-drain? We support that case. The scaling down while killing particular allocation is still a WIP

Is it possible to do it via HTTP API?

I am going to close this issue since it is not a "bug" and the reason has been explained. This is not to say we aren't interested in the follow up discussion that was had. It is part of the greater life cycle control and is important to us.

@dadgar what about triggering node-drain affect on the other allocations? Could we expect restarts as with scaling down? Let's assume count = 2 and job is running on Node1 and Node2 as shown below:

  1. Initial configuration
Node1 [ x ] 
Node2 [ x ]
Node3 [   ]
  1. Drain Node2. Job migrates to Node3
Node1 [ x ] <-- can we guarantee that this alloc will not be effected?
Node2 [   ]
Node3 [ x ]

@kaskavalci You can't guarantee that the new alloc doesn't get placed on Node1 unless you use the distinct_hosts constraint but the original allocation(s) on Node1 will not be effected.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dvusboy picture dvusboy  Â·  3Comments

byronwolfman picture byronwolfman  Â·  3Comments

Gerrrr picture Gerrrr  Â·  3Comments

mancusogmu picture mancusogmu  Â·  3Comments

hamann picture hamann  Â·  3Comments