I'm trying to find out what caused this problem, but i'm failed.
the logs
2019/01/07 17:25:30 [DEBUG] memberlist: Failed ping: hkam30008 (timeout reached)
2019/01/07 17:25:31 [DEBUG] memberlist: Stream connection from=100.69.28.7:63057
2019/01/07 17:25:31 [DEBUG] memberlist: Failed ping: hkam30162 (timeout reached)
2019/01/07 17:25:32 [WARN] memberlist: Was able to connect to hkam30162 but other probes failed, network may be misconfigured
2019/01/07 17:25:32 [DEBUG] memberlist: Failed ping: hkam30009 (timeout reached)
2019/01/07 17:25:32 [DEBUG] agent: Node info in sync
2019/01/07 17:25:33 [WARN] memberlist: Was able to connect to hkam30009 but other probes failed, network may be misconfigured
2019/01/07 17:25:33 [DEBUG] memberlist: Stream connection from=100.69.28.1:42918
2019/01/07 17:25:33 [DEBUG] memberlist: Stream connection from=100.69.28.8:63323
2019/01/07 17:25:34 [DEBUG] memberlist: Stream connection from=100.69.28.16:47760
2019/01/07 17:25:37 [DEBUG] memberlist: Failed ping: hkam30059 (timeout reached)
2019/01/07 17:25:39 [DEBUG] memberlist: Stream connection from=100.69.31.156:24380
2019/01/07 17:25:39 [DEBUG] memberlist: Failed ping: hkam30020 (timeout reached)
2019/01/07 17:25:40 [WARN] memberlist: Was able to connect to hkam30020 but other probes failed, network may be misconfigured
2019/01/07 17:25:41 [DEBUG] memberlist: Failed ping: hkam30163 (timeout reached)
2019/01/07 17:25:42 [WARN] memberlist: Was able to connect to hkam30163 but other probes failed, network may be misconfigured
the consul info
agent:
check_monitors = 0
check_ttls = 0
checks = 0
services = 0
build:
prerelease =
revision = 75ca2ca
version = 0.9.2
consul:
bootstrap = false
known_datacenters = 4
leader = false
leader_addr = 100.69.88.1:8300
server = true
raft:
applied_index = 6271351
commit_index = 6271351
fsm_pending = 0
last_contact = 6.756206ms
last_log_index = 6271351
last_log_term = 16
last_snapshot_index = 6265248
last_snapshot_term = 14
latest_configuration = [{Suffrage:Voter ID:100.69.24.1:8300 Address:100.69.24.1:8300} {Suffrage:Voter ID:100.69.24.2:8300 Address:100.69.24.2:8300} {Suffrage:Voter ID:100.69.24.3:8300 Address:100.69.24.3:8300} {Suffrage:Voter ID:100.69.88.2:8300 Address:100.69.88.2:8300} {Suffrage:Voter ID:100.69.88.1:8300 Address:100.69.88.1:8300} {Suffrage:Voter ID:100.69.88.3:8300 Address:100.69.88.3:8300}]
latest_configuration_index = 5935917
num_peers = 5
protocol_version = 2
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Follower
term = 16
runtime:
arch = amd64
cpu_count = 40
goroutines = 141
max_procs = 40
os = linux
version = go1.8.3
serf_lan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 6
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1208
members = 41
query_queue = 0
query_time = 1
serf_wan:
coordinate_resets = 0
encrypted = false
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 45
members = 18
query_queue = 0
query_time = 1
the members
Node Address Status Type Build Protocol DC
hkam30005 100.69.28.1:8301 alive client 0.9.2 2 hk1
hkam30006 100.69.28.2:8301 alive client 0.9.2 2 hk1
hkam30007 100.69.28.3:8301 alive client 0.9.2 2 hk1
hkam30008 100.69.28.4:8301 alive client 0.9.2 2 hk1
hkam30009 100.69.28.5:8301 alive client 0.9.2 2 hk1
hkam30010 100.69.28.6:8301 alive client 0.9.2 2 hk1
hkam30011 100.69.28.7:8301 alive client 0.9.2 2 hk1
hkam30012 100.69.28.8:8301 alive client 0.9.2 2 hk1
hkam30013 100.69.24.1:8301 alive server 0.9.2 2 hk1
hkam30014 100.69.24.2:8301 alive server 0.9.2 2 hk1
hkam30015 100.69.24.3:8301 alive server 0.9.2 2 hk1
hkam30019 100.69.28.12:8301 alive client 0.9.2 2 hk1
hkam30020 100.69.28.13:8301 alive client 0.9.2 2 hk1
hkam30021 100.69.28.14:8301 alive client 0.9.2 2 hk1
hkam30022 100.69.28.15:8301 alive client 0.9.2 2 hk1
hkam30023 100.69.28.16:8301 alive client 0.9.2 2 hk1
hkam30058 100.69.31.150:8301 alive client 0.9.2 2 hk1
hkam30059 100.69.31.151:8301 alive client 0.9.2 2 hk1
hkam30162 100.69.31.156:8301 alive client 0.9.2 2 hk1
hkam30163 100.69.31.157:8301 alive client 0.9.2 2 hk1
hkbm30029 100.69.88.1:8301 alive server 0.9.2 2 hk1
hkbm30030 100.69.88.2:8301 alive server 0.9.2 2 hk1
hkbm30031 100.69.88.3:8301 alive server 0.9.2 2 hk1
hkbm30034 100.69.92.150:8301 alive client 0.9.2 2 hk1
hkbm30035 100.69.92.151:8301 alive client 0.9.2 2 hk1
hkbm30036 100.69.92.152:8301 alive client 0.9.2 2 hk1
hkbm30037 100.69.92.153:8301 alive client 0.9.2 2 hk1
hkbm30038 100.69.92.154:8301 alive client 0.9.2 2 hk1
hkbm30039 100.69.92.155:8301 alive client 0.9.2 2 hk1
hkbm30040 100.69.92.156:8301 alive client 0.9.2 2 hk1
hkbm30041 100.69.92.157:8301 alive client 0.9.2 2 hk1
hkbm30042 100.69.92.158:8301 alive client 0.9.2 2 hk1
hkbm30043 100.69.92.159:8301 alive client 0.9.2 2 hk1
hkbm30044 100.69.92.160:8301 alive client 0.9.2 2 hk1
hkbm30045 100.69.92.161:8301 alive client 0.9.2 2 hk1
hkbm30046 100.69.92.162:8301 alive client 0.9.2 2 hk1
hkbm30048 100.69.92.163:8301 alive client 0.9.2 2 hk1
hkbm30061 100.69.92.164:8301 alive client 0.9.2 2 hk1
hkbm30062 100.69.92.165:8301 alive client 0.9.2 2 hk1
hkbm30063 100.69.92.166:8301 alive client 0.9.2 2 hk1
hkbm30064 100.69.92.167:8301 alive client 0.9.2 2 hk1
the client config
{
"datacenter": "HK1",
"data_dir": "/pacloud/ccmp/base/ccmp/data",
"log_level": "DEBUG",
"node_name": "hkbm30064",
"server": false,
"client_addr":"0.0.0.0",
"bind_addr":"100.69.92.167",
"retry_join":["100.69.24.1","100.69.24.2","100.69.24.3","100.69.88.1","100.69.88.2","100.69.88.3"]
}
@TwitchChen : It appears that all the nodes which you are seeing those errors happen to be client nodes. Can you confirm that the network settings of those nodes are set up correctly and that all required ports are open?
@ChipV223 Thank you for your reply
I just made some testing by using nc or telnet
server to client
8300 tcp refused
8301 tcp udp all pass
8302 tcp refused; udp pass
8500 tcp pass
8600 tcp udp all pass
What have I missed?
8301 is the port in question here I believe. I'm confused you're seeing that message as that is generated by probing with a TCP fallback but your nc/telnet shows that that port is reachable on both protocols.
https://github.com/hashicorp/memberlist/blob/2072f3a3ff4b7b3d830be77678d5d4b978362bc4/state.go#L363-L386
This is pretty clear in the code, which you can see has a comment explaining the behavior. I'm quite confident this is a reachability issue in the network, so I'd double check protocols and different nodes communicating with each other, and not just rules that allow access from a bastion or similar.
Hope that helps a bit. I'm going to proactively close this but please feel free to open a new issue if you believe this is a bug.
@TwitchChen I faced this issue too until i fixed all my firewall issues. Specifically 8301 UDP was not opened. you can use nc <ip> 8301 -vzu to check. the -u flag forces nc to use UDP port
@madsonic thank you,i'll try it
I am also seeing the same behavior on clients in my cluster failed to ping xxx (timeout reached) from the print from the code linked below. From what I can tell from nc and tcpdump I think that all the ports are open that need to be. My server and client instances have multiple interfaces on each instance. How can I determine what interface or port failed to ping?
https://github.com/hashicorp/memberlist/blob/2072f3a3ff4b7b3d830be77678d5d4b978362bc4/state.go#L336
Hi all,
Any news on this?
Syslog file (/var/log/messages) is flooded with such errors...
Server/Agent runs in OpenVZ environment:
Agent: bind_addr = "{{ GetInterfaceIP \"venet0\" }}"
Server: bind_addr = "{{ GetInterfaceIP \"br0\" }}"
Does IPv6 interferes?
There is no selinux or FW between nodes...
Details:
:; consul version
Consul v1.5.3
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
Regards
Hey there,
This issue has been automatically locked because it is closed and there hasn't been any activity for at least _30_ days.
If you are still experiencing problems, or still have questions, feel free to open a new one :+1:.
Most helpful comment
@TwitchChen I faced this issue too until i fixed all my firewall issues. Specifically 8301 UDP was not opened. you can use
nc <ip> 8301 -vzuto check. the-uflag forcesncto use UDP port