Consul: recurring log "serf: attempting reconnect" to left server

Created on 4 Aug 2017  路  15Comments  路  Source: hashicorp/consul

consul version

Server: 0.9.0

consul info

Server:

agent:
    check_monitors = 0
    check_ttls = 0
    checks = 0
    services = 0
build:
    prerelease = 
    revision = b79d951
    version = 0.9.0
consul:
    bootstrap = false
    known_datacenters = 1
    leader = true
    leader_addr = 172.31.1.83:8300
    server = true
raft:
    applied_index = 182734
    commit_index = 182734
    fsm_pending = 0
    last_contact = 0
    last_log_index = 182734
    last_log_term = 4
    last_snapshot_index = 180231
    last_snapshot_term = 4
    latest_configuration = [{Suffrage:Voter ID:61567a07-7122-5ebd-677b-e5f437e9558c Address:172.31.3.20:8300} {Suffrage:Voter ID:c95cdd96-f493-3c9e-35d7-c9b0370ccbf9 Address:172.31.1.83:8300} {Suffrage:Voter ID:bbcbad42-a097-0fd0-f813-7e68c46d7178 Address:172.31.7.163:8300}]
    latest_configuration_index = 174747
    num_peers = 2
    protocol_version = 3
    protocol_version_max = 3
    protocol_version_min = 0
    snapshot_version_max = 1
    snapshot_version_min = 0
    state = Leader
    term = 4
runtime:
    arch = amd64
    cpu_count = 1
    goroutines = 102
    max_procs = 1
    os = linux
    version = go1.8.3
serf_lan:
    coordinate_resets = 0
    encrypted = true
    event_queue = 0
    event_time = 4
    failed = 0
    health_score = 0
    intent_queue = 0
    left = 4
    member_time = 34
    members = 10
    query_queue = 0
    query_time = 1
serf_wan:
    coordinate_resets = 0
    encrypted = true
    event_queue = 0
    event_time = 1
    failed = 1
    health_score = 0
    intent_queue = 0
    left = 0
    member_time = 12
    members = 4
    query_queue = 0
    query_time = 1

Operating system and Environment details

Amazon AMI 2017.03.1

Description of the Issue (and unexpected/desired result)

consul members:

[ec2-user@ip-172-31-1-83 ~]$ consul members
Node              Address             Status  Type    Build  Protocol  DC
ip-172-31-1-151   172.31.1.151:8301   left    server  0.9.0  2         eu-west-1
ip-172-31-1-83    172.31.1.83:8301    alive   server  0.9.0  2         eu-west-1
ip-172-31-10-77   172.31.10.77:8301   alive   client  0.9.0  2         eu-west-1
ip-172-31-11-106  172.31.11.106:8301  left    client  0.9.0  2         eu-west-1
ip-172-31-11-9    172.31.11.9:8301    alive   client  0.9.0  2         eu-west-1
ip-172-31-2-237   172.31.2.237:8301   left    client  0.9.0  2         eu-west-1
ip-172-31-3-20    172.31.3.20:8301    alive   server  0.9.0  2         eu-west-1
ip-172-31-4-153   172.31.4.153:8301   alive   client  0.9.0  2         eu-west-1
ip-172-31-7-163   172.31.7.163:8301   alive   server  0.9.0  2         eu-west-1
ip-172-31-7-176   172.31.7.176:8301   left    client  0.9.0  2         eu-west-1

I am seeing recurring logs of:

2017/08/04 12:00:41 [INFO] serf: attempting reconnect to ip-172-31-1-151.eu-west-1 172.31.1.151:8302
2017/08/04 12:02:44 [INFO] serf: attempting reconnect to ip-172-31-1-151.eu-west-1 172.31.1.151:8302
2017/08/04 12:03:17 [INFO] serf: attempting reconnect to ip-172-31-1-151.eu-west-1 172.31.1.151:8302
2017/08/04 12:03:50 [INFO] serf: attempting reconnect to ip-172-31-1-151.eu-west-1 172.31.1.151:8302
2017/08/04 12:06:23 [INFO] serf: attempting reconnect to ip-172-31-1-151.eu-west-1 172.31.1.151:8302

However, the server has left.

Reproduction steps

I have a AWS Autoscaling Group (Desired Count 3) with the following consul config:

{
  "server": true,
  "datacenter": "${AWS::Region}",
  "data_dir": "/var/consul",
  "encrypt": "${EncryptionKey}",
  "bootstrap_expect": 3,
  "retry_join_ec2": {
    "region": "${AWS::Region}",
    "tag_key": "consul",
    "tag_value": "server"
  },
  "raft_protocol": 3,
  "disable_update_check": true
}

I then increase the desired count to 4 and then back to 3. After that the then terminated consul server has left the cluster but the logs "serf: attempting reconnect" keep coming.

needs-investigation typbug

All 15 comments

I'm having the same issues. We're also using ASG and the nodes are attemptiong to connect to all the servers that have "left" the cluster.

@danilobuerger hey as a side note, i'm curious about the bootstrap_expect option, is that needed if one uses the retry_join_ec2 option?

I don't know if it's needed.

consul force-leave 172.31.1.151 should fix it.

@webengineer it does not.

@slackpad Nope. I just tried with consul 1.0.2, same problem. Logs keep on coming, force-leaving them as suggested by the CHANGELOG doesn't work either.

Same issue with 1.0.3.

1.0.3 - reconnect attempts went away after force-leave (still there for many hours) and OS-level restart of the server's process (immediately gone)

force-leave doesn't work for me neither => v1.0.3

# consul members
...
ip-10-28-11-230.ec2.internal  10.28.11.230:8301  left    server  1.0.3  2         aws  <all>
...
# consul monitor
2018/03/22 16:54:29 [INFO] Force leaving node: ip-10-28-11-230.ec2.internal
2018/03/22 16:55:23 [INFO] serf: attempting reconnect to ip-10-28-11-230.ec2.internal.aws 10.28.11.230:8302
2018/03/22 16:56:23 [INFO] serf: attempting reconnect to ip-10-28-11-230.ec2.internal.aws 10.28.11.230:8302
2018/03/22 16:57:23 [INFO] serf: attempting reconnect to ip-10-28-11-230.ec2.internal.aws 10.28.11.230:8302
for member in $(consul members -status=failed | awk 'NR>1{print $1;}'); do
consul force-leave $member
done

I have the same issue here: version1.2.2

Still seeing this issue in 1.2.3


edit:

running consul force-leave <node name>.<dc>
seemed to clean up the issue, as after doing this it mentions:

consul: Handled member-leave event for server "<node name>.<dc>" in area "wan"

and now life is happy again.

We are seeing same issue on version 1.2.2.

Seeing this on 1.5.1 when I use -retry-join with ASG. Stopping and starting consul on the server fixes the attempting reconnect error.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

matteoturra picture matteoturra  路  4Comments

slackpad picture slackpad  路  3Comments

satheeshCharles picture satheeshCharles  路  3Comments

nicholasjackson picture nicholasjackson  路  3Comments

powerman picture powerman  路  3Comments