Consul: Name conflict error after Consul leave

Created on 6 Dec 2019 · 8Comments · Source: hashicorp/consul

Overview of the Issue

After a Consul client performs consul leave, a new one is brought up with the same name and different IP. We then receive these errors:

2019/12/06 18:32:40 [ERR] memberlist: Conflicting address for ip-10-0-103-217.ec2.internal. Mine: 10.0.103.165:8301 Theirs: 10.0.103.226:8301 Old state: 2
2019/12/06 18:32:40 [WARN] serf: Name conflict for 'ip-10-0-103-217.ec2.internal' both 10.0.103.165:8301 and 10.0.103.226:8301 are claiming

For 42 seconds and the node remains in the left state until it joins.

Reproduction Steps

Install Consul with consul-helm
Delete one of the Consul client Pods

Consul logs

Pod is deleted at t32:32 (and performs consul leave), comes back almost immediately, and is fully alive at t33:17.

consul members output by time

Fri  6 Dec 2019 10:32:32 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  alive   client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:32:35 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  left    client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:32:37 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  left    client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:32:39 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  left    client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:32:42 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  left    client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:32:44 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  left    client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:32:46 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  left    client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:32:49 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  left    client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:32:51 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  left    client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:32:53 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  left    client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:32:56 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  left    client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:32:58 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  left    client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:33:00 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  left    client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:33:03 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  left    client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:33:05 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  left    client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:33:08 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  left    client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:33:10 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  left    client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:33:12 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  left    client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:33:15 PST
ip-10-0-103-217.ec2.internal  10.0.103.165:8301  left    client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:33:17 PST
ip-10-0-103-217.ec2.internal  10.0.103.226:8301  alive   client  1.6.2  2         dc1  <default>
Fri  6 Dec 2019 10:33:19 PST
ip-10-0-103-217.ec2.internal  10.0.103.226:8301  alive   client  1.6.2  2         dc1  <default>

server 0 logs

2019/12/06 18:32:34 [INFO] serf: EventMemberLeave: ip-10-0-103-217.ec2.internal 10.0.103.165
2019/12/06 18:32:34 [INFO] consul: member 'ip-10-0-103-217.ec2.internal' left, deregistering
2019/12/06 18:32:40 [ERR] memberlist: Conflicting address for ip-10-0-103-217.ec2.internal. Mine: 10.0.103.165:8301 Theirs: 10.0.103.226:8301 Old state: 2
2019/12/06 18:32:40 [WARN] serf: Name conflict for 'ip-10-0-103-217.ec2.internal' both 10.0.103.165:8301 and 10.0.103.226:8301 are claiming
2019/12/06 18:32:41 [ERR] memberlist: Conflicting address for ip-10-0-103-217.ec2.internal. Mine: 10.0.103.165:8301 Theirs: 10.0.103.226:8301 Old state: 2
2019/12/06 18:32:41 [WARN] serf: Name conflict for 'ip-10-0-103-217.ec2.internal' both 10.0.103.165:8301 and 10.0.103.226:8301 are claiming
2019/12/06 18:32:41 [ERR] memberlist: Conflicting address for ip-10-0-103-217.ec2.internal. Mine: 10.0.103.165:8301 Theirs: 10.0.103.226:8301 Old state: 2
2019/12/06 18:32:41 [WARN] serf: Name conflict for 'ip-10-0-103-217.ec2.internal' both 10.0.103.165:8301 and 10.0.103.226:8301 are claiming
2019/12/06 18:32:49 [INFO] consul: member 'ip-10-0-103-217.ec2.internal' left, deregistering
2019/12/06 18:33:17 [INFO] serf: EventMemberJoin: ip-10-0-103-217.ec2.internal 10.0.103.226
2019/12/06 18:33:17 [INFO] consul: member 'ip-10-0-103-217.ec2.internal' joined, marking health alive

server 1 logs

2019/12/06 18:32:34 [INFO] serf: EventMemberLeave: ip-10-0-103-217.ec2.internal 10.0.103.165
2019/12/06 18:32:40 [ERR] memberlist: Conflicting address for ip-10-0-103-217.ec2.internal. Mine: 10.0.103.165:8301 Theirs: 10.0.103.226:8301 Old state: 2
2019/12/06 18:32:40 [WARN] serf: Name conflict for 'ip-10-0-103-217.ec2.internal' both 10.0.103.165:8301 and 10.0.103.226:8301 are claiming
2019/12/06 18:33:17 [INFO] serf: EventMemberJoin: ip-10-0-103-217.ec2.internal 10.0.103.226

server 2 logs

2019/12/06 18:32:35 [INFO] serf: EventMemberLeave: ip-10-0-103-217.ec2.internal 10.0.103.165
2019/12/06 18:32:40 [ERR] memberlist: Conflicting address for ip-10-0-103-217.ec2.internal. Mine: 10.0.103.165:8301 Theirs: 10.0.103.226:8301 Old state: 2
2019/12/06 18:32:40 [WARN] serf: Name conflict for 'ip-10-0-103-217.ec2.internal' both 10.0.103.165:8301 and 10.0.103.226:8301 are claiming
2019/12/06 18:33:17 [INFO] serf: EventMemberJoin: ip-10-0-103-217.ec2.internal 10.0.103.226

client logs

    2019/12/06 18:32:40 [WARN] agent: Node name "ip-10-0-103-217.ec2.internal" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
    2019/12/06 18:32:40 [INFO] serf: EventMemberJoin: ip-10-0-103-217.ec2.internal 10.0.103.226
    2019/12/06 18:32:40 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
    2019/12/06 18:32:40 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
    2019/12/06 18:32:40 [INFO] agent: Started HTTP server on [::]:8500 (tcp)
    2019/12/06 18:32:40 [INFO] agent: started state syncer
==> Consul agent running!
    2019/12/06 18:32:40 [INFO] agent: Started gRPC server on [::]:8502 (tcp)
    2019/12/06 18:32:40 [INFO] agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce k8s mdns os packet scaleway softlayer triton vsphere
    2019/12/06 18:32:40 [INFO] agent: Joining LAN cluster...
    2019/12/06 18:32:40 [INFO] agent: (LAN) joining: [hashicorp-consul-server-0.hashicorp-consul-server.dec2.svc hashicorp-consul-server-1.hashicorp-consul-server.dec2.svc hashicorp-consul-server-2.hashicorp-consul-server.dec2.svc]
    2019/12/06 18:32:40 [WARN] manager: No servers available
    2019/12/06 18:32:40 [ERR] agent: failed to sync remote state: No known Consul servers
    2019/12/06 18:32:40 [INFO] serf: EventMemberJoin: hashicorp-consul-server-1 10.0.2.195
    2019/12/06 18:32:40 [INFO] serf: EventMemberJoin: hashicorp-consul-server-0 10.0.2.85
    2019/12/06 18:32:40 [WARN] memberlist: Refuting a suspect message (from: ip-10-0-103-217.ec2.internal)
    2019/12/06 18:32:40 [INFO] serf: EventMemberJoin: ip-10-0-2-46.ec2.internal 10.0.2.188
    2019/12/06 18:32:40 [INFO] serf: EventMemberJoin: ip-10-0-2-184.ec2.internal 10.0.2.215
    2019/12/06 18:32:40 [INFO] serf: EventMemberJoin: hashicorp-consul-server-2 10.0.103.5
    2019/12/06 18:32:40 [INFO] consul: adding server hashicorp-consul-server-1 (Addr: tcp/10.0.2.195:8300) (DC: dc1)
    2019/12/06 18:32:40 [INFO] consul: adding server hashicorp-consul-server-0 (Addr: tcp/10.0.2.85:8300) (DC: dc1)
    2019/12/06 18:32:40 [INFO] consul: adding server hashicorp-consul-server-2 (Addr: tcp/10.0.103.5:8300) (DC: dc1)
    2019/12/06 18:32:40 [INFO] agent: (LAN) joined: 3
    2019/12/06 18:32:40 [INFO] agent: Join LAN completed. Synced with 3 initial agents
    2019/12/06 18:32:41 [INFO] agent: Synced service "static-server-static-server"
    2019/12/06 18:34:21 [INFO] agent: Synced service "static-server-static-server"

typbug

Source

lkysow

Most helpful comment

My initial thoughts are that the most suspicious part of the logs are that server 0 sees the original IP as left and deregisters, but then still considers the new IP to be a conflict for ~30 seconds (it's actually a bit more but I think that's just because things are waiting a few seconds between retries to sync state). After ~30 seconds, the server logs for a second time that the member left and is being deregistered and then the next attempt for the new IP to join after that succeeds.

30s happens to be the default value for both PushPull interval and GossipToTheDead interval: https://github.com/hashicorp/memberlist/blob/bbfeba327f20b6edce65a2353b64aad44804f829/config.go#L245-L253

I suspect it's the second one causing an issue here. At memberlist layer, nodes are either Alive, Suspected, or Dead - that means it makes not distinction between failed nodes and ones that voluntarily left the cluster (join/leave are Serf-layer concerns) and so will keep the "dead" node in the member list for 30 seconds until that GossipToTheDead timeout has passed. Only once that's happened and the node has been removed from the leader's memberlist will the new IP be allowed to join and so be recognised as healthy by Consul. It's confusing that member list inconsistently uses terms "Leave" and "dead" - "dead" nodes are notified to Serf layer via NotifyLeave and you call Leave on termination however that is the same internally as just reporting yourself as dead so other nodes don't know if it was intentional or not (at memberlist layer - Serf knows).

The fix it would see would be to somehow make memberlist aware of intentional leaves so that we don't try to continue gossip to them to allow refutation. Intuitively this makes a lot of sense but it could be a significant change to that library which carries a decent amount of risk.

As a workaround and to confirm my understanding @lkysow we could try this repro with a custom build of consul that has GossipToTheDead set to 1 second right here rather than left at default value: https://github.com/hashicorp/consul/blob/dfa8d919f76c2ed9839973ea62e991fb2ab1abba/agent/agent.go#L1106.

I don't think that's a fix though just to make that configurable since that delay is important to avoid flapping in actual failure cases.

banks on 10 Dec 2019

👍2

All 8 comments

30s happens to be the default value for both PushPull interval and GossipToTheDead interval: https://github.com/hashicorp/memberlist/blob/bbfeba327f20b6edce65a2353b64aad44804f829/config.go#L245-L253

I don't think that's a fix though just to make that configurable since that delay is important to avoid flapping in actual failure cases.

banks on 10 Dec 2019

👍2

Okay I'll try that and report back.

lkysow on 10 Dec 2019

To be clear, I showed the line for RetransmitMult setting which is not directly relevant but I meant to indicate that any new config we don't set now would need to be added to that block to make it through to memberlist. Here is the complete block if it's clearer. None of these lines need to change but a new one added in the same way to set GossipToTheDead as well.

https://github.com/hashicorp/consul/blob/dfa8d919f76c2ed9839973ea62e991fb2ab1abba/agent/agent.go#L1095-L1109

banks on 12 Dec 2019

This seemed to work! There were no name conflict errors.

consul members output by time

ip-10-0-103-217.ec2.internal  10.0.103.145:8301  alive   client  1.7.0dev  2         dc1  <default>
Thu 12 Dec 2019 10:45:58 PST
ip-10-0-103-217.ec2.internal  10.0.103.145:8301  left    client  1.7.0dev  2         dc1  <default>
Thu 12 Dec 2019 10:46:00 PST
ip-10-0-103-217.ec2.internal  10.0.103.145:8301  left    client  1.7.0dev  2         dc1  <default>
Thu 12 Dec 2019 10:46:03 PST
ip-10-0-103-217.ec2.internal  10.0.103.145:8301  left    client  1.7.0dev  2         dc1  <default>
Thu 12 Dec 2019 10:46:05 PST
ip-10-0-103-217.ec2.internal  10.0.103.226:8301  alive   client  1.7.0dev  2         dc1  <default>
Thu 12 Dec 2019 10:46:07 PST
ip-10-0-103-217.ec2.internal  10.0.103.226:8301  alive   client  1.7.0dev  2         dc1  <default>
Thu 12 Dec 2019 10:46:10 PST
ip-10-0-103-217.ec2.internal  10.0.103.226:8301  alive   client  1.7.0dev  2         dc1  <default>
Thu 12 Dec 2019 10:46:12 PST
ip-10-0-103-217.ec2.internal  10.0.103.226:8301  alive   client  1.7.0dev  2         dc1  <default>
^C%

server 0 logs

    2019/12/12 18:45:57 [INFO] serf: EventMemberLeave: ip-10-0-103-217.ec2.internal 10.0.103.145
    2019/12/12 18:46:04 [INFO] serf: EventMemberJoin: ip-10-0-103-217.ec2.internal 10.0.103.226

server 1 logs

 2019/12/12 18:45:57 [INFO] serf: EventMemberLeave: ip-10-0-103-217.ec2.internal 10.0.103.145
    2019/12/12 18:45:57 [INFO] consul: member 'ip-10-0-103-217.ec2.internal' left, deregistering
    2019/12/12 18:46:04 [INFO] serf: EventMemberJoin: ip-10-0-103-217.ec2.internal 10.0.103.226
    2019/12/12 18:46:04 [INFO] consul: member 'ip-10-0-103-217.ec2.internal' joined, marking health alive

server 2 logs

    2019/12/12 18:45:57 [INFO] serf: EventMemberLeave: ip-10-0-103-217.ec2.internal 10.0.103.145
    2019/12/12 18:46:04 [INFO] serf: EventMemberJoin: ip-10-0-103-217.ec2.internal 10.0.103.226

lkysow on 12 Dec 2019

Thanks @lkysow

So it looks like it is what I suggested and the real fix would involve memeberlist and Serf changes to make intentional Serf Leaves behave differently in memberlist so it doesn't keep old IPs around for that interval.

@s-christoff would this be possible using the force-leave/purge functionality you worked on last year?

banks on 6 Jan 2020

Howdy!
So the neato thing about force-leave -prune is it bypasses the timeouts set to rip out any knowledge of the $DEADNODE that any node may have. It does this by sending out a EventMemberReap event whenever it is called. So in this case you could use it and should get the desired result here.

s-christoff on 6 Jan 2020

Regarding Consul Connect on Kubernetes. This conflict errors cause the services on that node to be deregistered. Here we see what happens when we delete the consul client pod on the node that's running static-server while we're curling static-client (which has static-server as its upstream). Note that if there is more than one instance of static-server (which is expected for a kube deployment), there would be no downtime.

At 18:45:49 the consul client pod is deleted.
This causes the static-server service to be removed from consul as expected.
At 18:45:58 the new consul client pod starts and tries to join the cluster.
At 18:46:07 the static-server service is re-registered with the new client and we see the requests succeeding again
At 18:46:31 the requests start to fail. This coincides with the server log 2020/01/16 18:46:31 [INFO] consul: member 'aks-agentpool-24451568-vmss000001' left, deregistering.
At 18:46:39 the requests succeed again, slightly ahead of the server logs about that node being re-registered.

curl from static-client to static-server

Thu 16 Jan 2020 12:45:41 CST
"hello world"
Thu 16 Jan 2020 12:45:43 CST
"hello world"
Thu 16 Jan 2020 12:45:46 CST
"hello world"
Thu 16 Jan 2020 12:45:49 CST
curl: (56) Recv failure: Connection reset by peer
command terminated with exit code 56
Thu 16 Jan 2020 12:45:51 CST
curl: (56) Recv failure: Connection reset by peer
command terminated with exit code 56
Thu 16 Jan 2020 12:45:54 CST
curl: (56) Recv failure: Connection reset by peer
command terminated with exit code 56
Thu 16 Jan 2020 12:45:56 CST
curl: (56) Recv failure: Connection reset by peer
command terminated with exit code 56
Thu 16 Jan 2020 12:45:59 CST
curl: (56) Recv failure: Connection reset by peer
command terminated with exit code 56
Thu 16 Jan 2020 12:46:02 CST
curl: (56) Recv failure: Connection reset by peer
command terminated with exit code 56
Thu 16 Jan 2020 12:46:04 CST
curl: (56) Recv failure: Connection reset by peer
command terminated with exit code 56
Thu 16 Jan 2020 12:46:07 CST
"hello world"
Thu 16 Jan 2020 12:46:09 CST
"hello world"
Thu 16 Jan 2020 12:46:12 CST
"hello world"
Thu 16 Jan 2020 12:46:15 CST
"hello world"
Thu 16 Jan 2020 12:46:17 CST
"hello world"
Thu 16 Jan 2020 12:46:20 CST
"hello world"
Thu 16 Jan 2020 12:46:23 CST
"hello world"
Thu 16 Jan 2020 12:46:25 CST
"hello world"
Thu 16 Jan 2020 12:46:28 CST
"hello world"
Thu 16 Jan 2020 12:46:31 CST
curl: (56) Recv failure: Connection reset by peer
command terminated with exit code 56
Thu 16 Jan 2020 12:46:33 CST
curl: (56) Recv failure: Connection reset by peer
command terminated with exit code 56
Thu 16 Jan 2020 12:46:36 CST
curl: (56) Recv failure: Connection reset by peer
command terminated with exit code 56
Thu 16 Jan 2020 12:46:39 CST
"hello world"
Thu 16 Jan 2020 12:46:41 CST
"hello world"
Thu 16 Jan 2020 12:46:44 CST
"hello world"

consul server logs

2020/01/16 18:45:49 [INFO] serf: EventMemberLeave: aks-agentpool-24451568-vmss000001 10.244.2.18
2020/01/16 18:45:49 [INFO] consul: member 'aks-agentpool-24451568-vmss000001' left, deregistering
2020/01/16 18:45:58 [ERR] memberlist: Conflicting address for aks-agentpool-24451568-vmss000001. Mine: 10.244.2.18:8301 Theirs: 10.244.2.24:8301 Old state: 2
2020/01/16 18:45:58 [WARN] serf: Name conflict for 'aks-agentpool-24451568-vmss000001' both 10.244.2.18:8301 and 10.244.2.24:8301 are claiming
2020/01/16 18:45:58 [ERR] memberlist: Conflicting address for aks-agentpool-24451568-vmss000001. Mine: 10.244.2.18:8301 Theirs: 10.244.2.24:8301 Old state: 2
2020/01/16 18:45:58 [WARN] serf: Name conflict for 'aks-agentpool-24451568-vmss000001' both 10.244.2.18:8301 and 10.244.2.24:8301 are claiming
2020/01/16 18:45:58 [ERR] memberlist: Conflicting address for aks-agentpool-24451568-vmss000001. Mine: 10.244.2.18:8301 Theirs: 10.244.2.24:8301 Old state: 2
2020/01/16 18:45:58 [WARN] serf: Name conflict for 'aks-agentpool-24451568-vmss000001' both 10.244.2.18:8301 and 10.244.2.24:8301 are claiming
2020/01/16 18:46:31 [INFO] consul: member 'aks-agentpool-24451568-vmss000001' left, deregistering
2020/01/16 18:46:43 [INFO] serf: EventMemberJoin: aks-agentpool-24451568-vmss000001 10.244.2.24
2020/01/16 18:46:43 [INFO] consul: member 'aks-agentpool-24451568-vmss000001' joined, marking health alive

consul client logs

==> Starting Consul agent...
           Version: 'v1.6.2+ent'
           Node ID: 'c7161480-25b6-9c7b-9ba8-1abbc53cec9e'
         Node name: 'aks-agentpool-24451568-vmss000001'
        Datacenter: 'dc1' (Segment: '')
            Server: false (Bootstrap: false)
       Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: 8502, DNS: 8600)
      Cluster Addr: 10.244.2.24 (LAN: 8301, WAN: 8302)
           Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false

==> Log data will now stream in as it occurs:

    2020/01/16 18:45:58 [INFO] serf: EventMemberJoin: aks-agentpool-24451568-vmss000001 10.244.2.24
    2020/01/16 18:45:58 [INFO] agent: Started DNS server 0.0.0.0:8600 (udp)
    2020/01/16 18:45:58 [INFO] agent: Started DNS server 0.0.0.0:8600 (tcp)
    2020/01/16 18:45:58 [INFO] agent: Started HTTP server on [::]:8500 (tcp)
    2020/01/16 18:45:58 [INFO] agent: started state syncer
==> Consul agent running!
    2020/01/16 18:45:58 [INFO] agent: Started gRPC server on [::]:8502 (tcp)
    2020/01/16 18:45:58 [INFO] agent: Retry join LAN is supported for: aliyun aws azure digitalocean gce k8s mdns os packet scaleway softlayer triton vsphere
    2020/01/16 18:45:58 [INFO] agent: Joining LAN cluster...
    2020/01/16 18:45:58 [INFO] agent: (LAN) joining: [consul-server-0.consul-server.consul.svc consul-server-1.consul-server.consul.svc consul-server-2.consul-server.consul.svc]
    2020/01/16 18:45:58 [WARN] manager: No servers available
    2020/01/16 18:45:58 [ERR] agent: failed to sync remote state: No known Consul servers
    2020/01/16 18:45:58 [WARN] memberlist: Refuting a suspect message (from: aks-agentpool-24451568-vmss000001)
    2020/01/16 18:45:58 [INFO] serf: EventMemberJoin: consul-server-0 10.244.4.2
    2020/01/16 18:45:58 [INFO] serf: EventMemberJoin: consul-server-2 10.244.3.2
    2020/01/16 18:45:58 [INFO] serf: EventMemberJoin: aks-meshgateways-24451568-vmss000000 10.244.6.3
    2020/01/16 18:45:58 [INFO] serf: EventMemberJoin: consul-server-1 10.244.5.2
    2020/01/16 18:45:58 [INFO] serf: EventMemberJoin: aks-agentpool-24451568-vmss000000 10.244.0.10
    2020/01/16 18:45:58 [INFO] serf: EventMemberJoin: aks-meshgateways-24451568-vmss000001 10.244.7.2
    2020/01/16 18:45:58 [INFO] serf: EventMemberJoin: aks-agentpool-24451568-vmss000002 10.244.1.10
    2020/01/16 18:45:58 [INFO] consul: adding server consul-server-0 (Addr: tcp/10.244.4.2:8300) (DC: dc1)
    2020/01/16 18:45:58 [INFO] consul: adding server consul-server-2 (Addr: tcp/10.244.3.2:8300) (DC: dc1)
    2020/01/16 18:45:58 [INFO] consul: adding server consul-server-1 (Addr: tcp/10.244.5.2:8300) (DC: dc1)
    2020/01/16 18:45:58 [INFO] agent: (LAN) joined: 3
    2020/01/16 18:45:58 [INFO] agent: Join LAN completed. Synced with 3 initial agents
    2020/01/16 18:46:00 [INFO] agent: Synced node info
    2020/01/16 18:46:07 [ERR] service-http-checks:static-server-static-server watch error: invalid type for service checks response: cache.FetchResult, want: []structs.CheckType
    2020/01/16 18:46:07 [INFO] agent: Synced service "static-server-static-server-sidecar-proxy"
    2020/01/16 18:46:07 [INFO] agent: Synced service "static-server-static-server-sidecar-proxy"
    2020/01/16 18:46:07 [INFO] agent: Synced service "static-server-static-server"
    2020/01/16 18:46:12 [INFO] agent: Synced check "service:static-server-static-server-sidecar-proxy:1"
    2020/01/16 18:46:17 [INFO] agent: Synced service "static-server-static-server-sidecar-proxy"
    2020/01/16 18:46:17 [INFO] agent: Synced service "static-server-static-server-sidecar-proxy"
    2020/01/16 18:46:17 [INFO] agent: Synced service "static-server-static-server"
    2020/01/16 18:46:27 [INFO] license: Consul license updated
    2020/01/16 18:46:27 [INFO] agent: Synced service "static-server-static-server-sidecar-proxy"
    2020/01/16 18:46:27 [INFO] agent: Synced service "static-server-static-server-sidecar-proxy"
    2020/01/16 18:46:27 [INFO] agent: Synced service "static-server-static-server"
    2020/01/16 18:46:37 [INFO] agent: Synced service "static-server-static-server-sidecar-proxy"
    2020/01/16 18:46:38 [INFO] agent: Synced service "static-server-static-server-sidecar-proxy"
    2020/01/16 18:46:38 [INFO] agent: Synced service "static-server-static-server"
    2020/01/16 18:46:48 [INFO] agent: Synced service "static-server-static-server-sidecar-proxy"
    2020/01/16 18:46:48 [INFO] agent: Synced service "static-server-static-server-sidecar-proxy"
    2020/01/16 18:46:48 [INFO] agent: Synced service "static-server-static-server"
    2020/01/16 18:46:58 [INFO] agent: Synced service "static-server-static-server-sidecar-proxy"
    2020/01/16 18:46:58 [INFO] agent: Synced service "static-server-static-server-sidecar-proxy"
    2020/01/16 18:46:58 [INFO] agent: Synced service "static-server-static-server"
    2020/01/16 18:47:08 [INFO] agent: Synced service "static-server-static-server-sidecar-proxy"

lkysow on 16 Jan 2020

I linked a PR for memberlist. This PR makes memberlist figure out if a node left on its own or not. In the first case, it will not hold onto the name, but allow another node to claim it.

Another default value that is set to 30s is DeadNodeReclaimTime: https://github.com/hashicorp/consul/blob/cf27dff62ff7c07eb34044f198347414a30abf8c/agent/consul/config.go#L580-L582

My Pr changes this to not be in effect when a node intentionally left.