I have both consul cluster and nomad cluster running in the same 3 nodes(node-0, node-1, node-2) and a service call goddess-of-harvest running with count of 2, deployed by nomad.
After releasing several versions of that service, I noticed a zombie service occured.
After checking the consul log file, I noticed the service was synced and deregistered infinitely. Below is the consul log.
2017/06/07 17:24:57 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:01 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:02 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:06 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:07 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:11 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:12 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:16 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:17 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:21 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:22 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:26 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:27 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:31 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:32 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:36 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:37 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
I tried to call consul deregister api curl -v --request PUT "http://127.0.0.1:8500/v1/agent/service/deregister/goddess-of-harvest-fe". Nothing happened and the service can still be seen in v1/catalog/service api response.
What was strange, however, the consul log said service does not exist.
2017/06/07 16:59:47 [WARN] agent: Failed to deregister service "goddess-of-harvest-fe": Service does not exist
I also tried to call deregister api curl --request PUT --data @payload.json http://127.0.0.1:8500/v1/catalog/deregister with payload file to deregister the service.
{
"Datacenter": "dc1",
"Node": "node-2",
"ServiceID": "_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service"
}
Althought that PUT request returned true, consul catalog api told me the service was still available.
curl -s localhost:8500/v1/catalog/service/goddess-of-harvest-fe | jq .
[
{
"ModifyIndex": 28094261,
"CreateIndex": 28094261,
"ServiceEnableTagOverride": false,
"Node": "node-2",
"Address": "10.0.1.10",
"TaggedAddresses": {
"wan": "10.0.1.10",
"lan": "10.0.1.10"
},
"ServiceID": "_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service",
"ServiceName": "demeter-fe",
"ServiceTags": [
"demeter-fe",
"http-service"
],
"ServiceAddress": "10.0.1.10",
"ServicePort": 38846
}
]
consul info for both Client and ServerClient:
agent:
check_monitors = 0
check_ttls = 0
checks = 3
services = 4
build:
prerelease =
revision = '21f2d5a
version = 0.7.5
consul:
known_servers = 3
server = false
runtime:
arch = amd64
cpu_count = 1
goroutines = 43
max_procs = 1
os = linux
version = go1.7.5
serf_lan:
encrypted = false
event_queue = 0
event_time = 39
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 2039
members = 35
query_queue = 0
Server:
agent:
check_monitors = 0
check_ttls = 0
checks = 3
services = 5
build:
prerelease =
revision = 'a9455cd
version = 0.7.1
consul:
bootstrap = false
known_datacenters = 1
leader = false
leader_addr = 10.0.1.3:8300
server = true
raft:
applied_index = 28168266
commit_index = 28168266
fsm_pending = 0
last_contact = 50.741295ms
last_log_index = 28168266
last_log_term = 43
last_snapshot_index = 28167119
last_snapshot_term = 43
latest_configuration = [{Suffrage:Voter ID:10.0.1.2:8300 Address:10.0.1.2:8300} {Suffrage:Voter ID:10.0.1.3:8300 Address:10.0.1.3:8300} {Suffrage:Voter ID:10.0.1.4:8300 Address:10.0.1.4:8300}]
latest_configuration_index = 4247902
num_peers = 2
protocol_version = 1
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Follower
term = 43
runtime:
arch = amd64
cpu_count = 2
goroutines = 209
max_procs = 2
os = linux
version = go1.7.3
serf_lan:
encrypted = false
event_queue = 0
event_time = 39
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 2039
members = 35
query_queue = 0
query_time = 2459
serf_wan:
encrypted = false
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1
members = 1
query_queue = 0
query_time = 1
My consul version is Consul v0.7.1 and noamd version is Nomad v0.5.4.
Thanks
Hi @waterdudu thanks for opening an issue. Are there any messages in the logs on your Consul servers when this is happening on the agent (the dereg/sync loop and when you try to remove it from the catalog manually)?
Hi @slackpad
Since log level is INFO, there are only a few synced node lines except for the Deregistered service/Synced servcie lines.
2017/06/07 11:36:27 [INFO] agent: Synced node info
Ran into the same issue, only way for me to get rid of the zombies was to stop the consul agent, nuke its data directory and then start it again.
Looks like this is still an issue - at least on the 1.4.3 version. Also, deregistering a service makes a rendom number of other services turn critical for some seconds until they have been run through the next health check. Deleting the data directory fixes the problem.
Same issue on the version 1.8.3
Most helpful comment
Ran into the same issue, only way for me to get rid of the zombies was to stop the consul agent, nuke its data directory and then start it again.