consul versionServer: 0.9.0
consul infoServer:
agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 0
build:
prerelease =
revision = b79d951
version = 0.9.0
consul:
bootstrap = false
known_datacenters = 1
leader = true
leader_addr = 172.31.1.83:8300
server = true
raft:
applied_index = 182734
commit_index = 182734
fsm_pending = 0
last_contact = 0
last_log_index = 182734
last_log_term = 4
last_snapshot_index = 180231
last_snapshot_term = 4
latest_configuration = [{Suffrage:Voter ID:61567a07-7122-5ebd-677b-e5f437e9558c Address:172.31.3.20:8300} {Suffrage:Voter ID:c95cdd96-f493-3c9e-35d7-c9b0370ccbf9 Address:172.31.1.83:8300} {Suffrage:Voter ID:bbcbad42-a097-0fd0-f813-7e68c46d7178 Address:172.31.7.163:8300}]
latest_configuration_index = 174747
num_peers = 2
protocol_version = 3
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Leader
term = 4
runtime:
arch = amd64
cpu_count = 1
goroutines = 102
max_procs = 1
os = linux
version = go1.8.3
serf_lan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 4
failed = 0
health_score = 0
intent_queue = 0
left = 4
member_time = 34
members = 10
query_queue = 0
query_time = 1
serf_wan:
coordinate_resets = 0
encrypted = true
event_queue = 0
event_time = 1
failed = 1
health_score = 0
intent_queue = 0
left = 0
member_time = 12
members = 4
query_queue = 0
query_time = 1
Amazon AMI 2017.03.1
consul members:
[ec2-user@ip-172-31-1-83 ~]$ consul members
Node Address Status Type Build Protocol DC
ip-172-31-1-151 172.31.1.151:8301 left server 0.9.0 2 eu-west-1
ip-172-31-1-83 172.31.1.83:8301 alive server 0.9.0 2 eu-west-1
ip-172-31-10-77 172.31.10.77:8301 alive client 0.9.0 2 eu-west-1
ip-172-31-11-106 172.31.11.106:8301 left client 0.9.0 2 eu-west-1
ip-172-31-11-9 172.31.11.9:8301 alive client 0.9.0 2 eu-west-1
ip-172-31-2-237 172.31.2.237:8301 left client 0.9.0 2 eu-west-1
ip-172-31-3-20 172.31.3.20:8301 alive server 0.9.0 2 eu-west-1
ip-172-31-4-153 172.31.4.153:8301 alive client 0.9.0 2 eu-west-1
ip-172-31-7-163 172.31.7.163:8301 alive server 0.9.0 2 eu-west-1
ip-172-31-7-176 172.31.7.176:8301 left client 0.9.0 2 eu-west-1
I am seeing recurring logs of:
2017/08/04 12:00:41 [INFO] serf: attempting reconnect to ip-172-31-1-151.eu-west-1 172.31.1.151:8302
2017/08/04 12:02:44 [INFO] serf: attempting reconnect to ip-172-31-1-151.eu-west-1 172.31.1.151:8302
2017/08/04 12:03:17 [INFO] serf: attempting reconnect to ip-172-31-1-151.eu-west-1 172.31.1.151:8302
2017/08/04 12:03:50 [INFO] serf: attempting reconnect to ip-172-31-1-151.eu-west-1 172.31.1.151:8302
2017/08/04 12:06:23 [INFO] serf: attempting reconnect to ip-172-31-1-151.eu-west-1 172.31.1.151:8302
However, the server has left.
I have a AWS Autoscaling Group (Desired Count 3) with the following consul config:
{
"server": true,
"datacenter": "${AWS::Region}",
"data_dir": "/var/consul",
"encrypt": "${EncryptionKey}",
"bootstrap_expect": 3,
"retry_join_ec2": {
"region": "${AWS::Region}",
"tag_key": "consul",
"tag_value": "server"
},
"raft_protocol": 3,
"disable_update_check": true
}
I then increase the desired count to 4 and then back to 3. After that the then terminated consul server has left the cluster but the logs "serf: attempting reconnect" keep coming.
I'm having the same issues. We're also using ASG and the nodes are attemptiong to connect to all the servers that have "left" the cluster.
@danilobuerger hey as a side note, i'm curious about the bootstrap_expect option, is that needed if one uses the retry_join_ec2 option?
I don't know if it's needed.
consul force-leave 172.31.1.151 should fix it.
@webengineer it does not.
This should be fixed by https://github.com/hashicorp/consul/issues/3611.
@slackpad Nope. I just tried with consul 1.0.2, same problem. Logs keep on coming, force-leaving them as suggested by the CHANGELOG doesn't work either.
Same issue with 1.0.3.
1.0.3 - reconnect attempts went away after force-leave (still there for many hours) and OS-level restart of the server's process (immediately gone)
force-leave doesn't work for me neither => v1.0.3
# consul members
...
ip-10-28-11-230.ec2.internal 10.28.11.230:8301 left server 1.0.3 2 aws <all>
...
# consul monitor
2018/03/22 16:54:29 [INFO] Force leaving node: ip-10-28-11-230.ec2.internal
2018/03/22 16:55:23 [INFO] serf: attempting reconnect to ip-10-28-11-230.ec2.internal.aws 10.28.11.230:8302
2018/03/22 16:56:23 [INFO] serf: attempting reconnect to ip-10-28-11-230.ec2.internal.aws 10.28.11.230:8302
2018/03/22 16:57:23 [INFO] serf: attempting reconnect to ip-10-28-11-230.ec2.internal.aws 10.28.11.230:8302
for member in $(consul members -status=failed | awk 'NR>1{print $1;}'); do
consul force-leave $member
done
I have the same issue here: version1.2.2
Still seeing this issue in 1.2.3
edit:
running consul force-leave <node name>.<dc>
seemed to clean up the issue, as after doing this it mentions:
consul: Handled member-leave event for server "<node name>.<dc>" in area "wan"
and now life is happy again.
We are seeing same issue on version 1.2.2.
Seeing this on 1.5.1 when I use -retry-join with ASG. Stopping and starting consul on the server fixes the attempting reconnect error.