Consul: service did not deregistered properly

Created on 8 Jun 2017  路  5Comments  路  Source: hashicorp/consul

I have both consul cluster and nomad cluster running in the same 3 nodes(node-0, node-1, node-2) and a service call goddess-of-harvest running with count of 2, deployed by nomad.

After releasing several versions of that service, I noticed a zombie service occured.

After checking the consul log file, I noticed the service was synced and deregistered infinitely. Below is the consul log.

2017/06/07 17:24:57 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:01 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:02 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:06 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:07 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:11 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:12 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:16 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:17 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:21 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:22 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:26 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:27 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:31 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:32 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:36 [INFO] agent: Deregistered service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'
2017/06/07 17:25:37 [INFO] agent: Synced service '_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service'

I tried to call consul deregister api curl -v --request PUT "http://127.0.0.1:8500/v1/agent/service/deregister/goddess-of-harvest-fe". Nothing happened and the service can still be seen in v1/catalog/service api response.

What was strange, however, the consul log said service does not exist.

2017/06/07 16:59:47 [WARN] agent: Failed to deregister service "goddess-of-harvest-fe": Service does not exist

I also tried to call deregister api curl --request PUT --data @payload.json http://127.0.0.1:8500/v1/catalog/deregister with payload file to deregister the service.

{
  "Datacenter": "dc1",
  "Node": "node-2",
  "ServiceID": "_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service"
}

Althought that PUT request returned true, consul catalog api told me the service was still available.

curl -s localhost:8500/v1/catalog/service/goddess-of-harvest-fe | jq .

[
  {
    "ModifyIndex": 28094261,
    "CreateIndex": 28094261,
    "ServiceEnableTagOverride": false,
    "Node": "node-2",
    "Address": "10.0.1.10",
    "TaggedAddresses": {
      "wan": "10.0.1.10",
      "lan": "10.0.1.10"
    },
    "ServiceID": "_nomad-executor-4f37671b-4e02-88cd-e445-94fc626f1f8c-goddess-of-harvest-fe-goddess-of-harvest-fe-goddess-of-harvest-fe-http-service",
    "ServiceName": "demeter-fe",
    "ServiceTags": [
      "demeter-fe",
      "http-service"
    ],
    "ServiceAddress": "10.0.1.10",
    "ServicePort": 38846
  }
]

consul info for both Client and Server

Client:

agent:
    check_monitors = 0
    check_ttls = 0
    checks = 3
    services = 4
build:
    prerelease =
    revision = '21f2d5a
    version = 0.7.5
consul:
    known_servers = 3
    server = false
runtime:
    arch = amd64
    cpu_count = 1
    goroutines = 43
    max_procs = 1
    os = linux
    version = go1.7.5
serf_lan:
    encrypted = false
    event_queue = 0
    event_time = 39
    failed = 0
    health_score = 0
    intent_queue = 0
    left = 0
    member_time = 2039
    members = 35
    query_queue = 0

Server:

agent:
    check_monitors = 0
    check_ttls = 0
    checks = 3
    services = 5
build:
    prerelease =
    revision = 'a9455cd
    version = 0.7.1
consul:
    bootstrap = false
    known_datacenters = 1
    leader = false
    leader_addr = 10.0.1.3:8300
    server = true
raft:
    applied_index = 28168266
    commit_index = 28168266
    fsm_pending = 0
    last_contact = 50.741295ms
    last_log_index = 28168266
    last_log_term = 43
    last_snapshot_index = 28167119
    last_snapshot_term = 43
    latest_configuration = [{Suffrage:Voter ID:10.0.1.2:8300 Address:10.0.1.2:8300} {Suffrage:Voter ID:10.0.1.3:8300 Address:10.0.1.3:8300} {Suffrage:Voter ID:10.0.1.4:8300 Address:10.0.1.4:8300}]
    latest_configuration_index = 4247902
    num_peers = 2
    protocol_version = 1
    protocol_version_max = 3
    protocol_version_min = 0
    snapshot_version_max = 1
    snapshot_version_min = 0
    state = Follower
    term = 43
runtime:
    arch = amd64
    cpu_count = 2
    goroutines = 209
    max_procs = 2
    os = linux
    version = go1.7.3
serf_lan:
    encrypted = false
    event_queue = 0
    event_time = 39
    failed = 0
    health_score = 0
    intent_queue = 0
    left = 0
    member_time = 2039
    members = 35
    query_queue = 0
    query_time = 2459
serf_wan:
    encrypted = false
    event_queue = 0
    event_time = 1
    failed = 0
    health_score = 0
    intent_queue = 0
    left = 0
    member_time = 1
    members = 1
    query_queue = 0
    query_time = 1

My consul version is Consul v0.7.1 and noamd version is Nomad v0.5.4.

Thanks

themconsul-nomad theminternal-cleanup typbug

Most helpful comment

Ran into the same issue, only way for me to get rid of the zombies was to stop the consul agent, nuke its data directory and then start it again.

All 5 comments

Hi @waterdudu thanks for opening an issue. Are there any messages in the logs on your Consul servers when this is happening on the agent (the dereg/sync loop and when you try to remove it from the catalog manually)?

Hi @slackpad

Since log level is INFO, there are only a few synced node lines except for the Deregistered service/Synced servcie lines.

2017/06/07 11:36:27 [INFO] agent: Synced node info

Ran into the same issue, only way for me to get rid of the zombies was to stop the consul agent, nuke its data directory and then start it again.

Looks like this is still an issue - at least on the 1.4.3 version. Also, deregistering a service makes a rendom number of other services turn critical for some seconds until they have been run through the next health check. Deleting the data directory fixes the problem.

Same issue on the version 1.8.3

Was this page helpful?
0 / 5 - 0 ratings

Related issues

satheeshCharles picture satheeshCharles  路  3Comments

darron picture darron  路  4Comments

eshujiushiwo picture eshujiushiwo  路  3Comments

hooksie1 picture hooksie1  路  3Comments

runswithd6s picture runswithd6s  路  3Comments