Consul: memberlist: Was able to connect to xxx but other probes failed, network may be misconfigured

Created on 7 Jan 2019  路  8Comments  路  Source: hashicorp/consul

I'm trying to find out what caused this problem, but i'm failed.

the logs

    2019/01/07 17:25:30 [DEBUG] memberlist: Failed ping: hkam30008 (timeout reached)
    2019/01/07 17:25:31 [DEBUG] memberlist: Stream connection from=100.69.28.7:63057
    2019/01/07 17:25:31 [DEBUG] memberlist: Failed ping: hkam30162 (timeout reached)
    2019/01/07 17:25:32 [WARN] memberlist: Was able to connect to hkam30162 but other probes failed, network may be misconfigured
    2019/01/07 17:25:32 [DEBUG] memberlist: Failed ping: hkam30009 (timeout reached)
    2019/01/07 17:25:32 [DEBUG] agent: Node info in sync
    2019/01/07 17:25:33 [WARN] memberlist: Was able to connect to hkam30009 but other probes failed, network may be misconfigured
    2019/01/07 17:25:33 [DEBUG] memberlist: Stream connection from=100.69.28.1:42918
    2019/01/07 17:25:33 [DEBUG] memberlist: Stream connection from=100.69.28.8:63323
    2019/01/07 17:25:34 [DEBUG] memberlist: Stream connection from=100.69.28.16:47760
    2019/01/07 17:25:37 [DEBUG] memberlist: Failed ping: hkam30059 (timeout reached)
    2019/01/07 17:25:39 [DEBUG] memberlist: Stream connection from=100.69.31.156:24380
    2019/01/07 17:25:39 [DEBUG] memberlist: Failed ping: hkam30020 (timeout reached)
    2019/01/07 17:25:40 [WARN] memberlist: Was able to connect to hkam30020 but other probes failed, network may be misconfigured
    2019/01/07 17:25:41 [DEBUG] memberlist: Failed ping: hkam30163 (timeout reached)
    2019/01/07 17:25:42 [WARN] memberlist: Was able to connect to hkam30163 but other probes failed, network may be misconfigured

the consul info

agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 0
build:
        prerelease = 
        revision = 75ca2ca
        version = 0.9.2
consul:
        bootstrap = false
        known_datacenters = 4
        leader = false
        leader_addr = 100.69.88.1:8300
        server = true
raft:
        applied_index = 6271351
        commit_index = 6271351
        fsm_pending = 0
        last_contact = 6.756206ms
        last_log_index = 6271351
        last_log_term = 16
        last_snapshot_index = 6265248
        last_snapshot_term = 14
        latest_configuration = [{Suffrage:Voter ID:100.69.24.1:8300 Address:100.69.24.1:8300} {Suffrage:Voter ID:100.69.24.2:8300 Address:100.69.24.2:8300} {Suffrage:Voter ID:100.69.24.3:8300 Address:100.69.24.3:8300} {Suffrage:Voter ID:100.69.88.2:8300 Address:100.69.88.2:8300} {Suffrage:Voter ID:100.69.88.1:8300 Address:100.69.88.1:8300} {Suffrage:Voter ID:100.69.88.3:8300 Address:100.69.88.3:8300}]
        latest_configuration_index = 5935917
        num_peers = 5
        protocol_version = 2
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Follower
        term = 16
runtime:
        arch = amd64
        cpu_count = 40
        goroutines = 141
        max_procs = 40
        os = linux
        version = go1.8.3
serf_lan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 6
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 1208
        members = 41
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 45
        members = 18
        query_queue = 0
        query_time = 1

the members

Node       Address             Status  Type    Build  Protocol  DC
hkam30005  100.69.28.1:8301    alive   client  0.9.2  2         hk1
hkam30006  100.69.28.2:8301    alive   client  0.9.2  2         hk1
hkam30007  100.69.28.3:8301    alive   client  0.9.2  2         hk1
hkam30008  100.69.28.4:8301    alive   client  0.9.2  2         hk1
hkam30009  100.69.28.5:8301    alive   client  0.9.2  2         hk1
hkam30010  100.69.28.6:8301    alive   client  0.9.2  2         hk1
hkam30011  100.69.28.7:8301    alive   client  0.9.2  2         hk1
hkam30012  100.69.28.8:8301    alive   client  0.9.2  2         hk1
hkam30013  100.69.24.1:8301    alive   server  0.9.2  2         hk1
hkam30014  100.69.24.2:8301    alive   server  0.9.2  2         hk1
hkam30015  100.69.24.3:8301    alive   server  0.9.2  2         hk1
hkam30019  100.69.28.12:8301   alive   client  0.9.2  2         hk1
hkam30020  100.69.28.13:8301   alive   client  0.9.2  2         hk1
hkam30021  100.69.28.14:8301   alive   client  0.9.2  2         hk1
hkam30022  100.69.28.15:8301   alive   client  0.9.2  2         hk1
hkam30023  100.69.28.16:8301   alive   client  0.9.2  2         hk1
hkam30058  100.69.31.150:8301  alive   client  0.9.2  2         hk1
hkam30059  100.69.31.151:8301  alive   client  0.9.2  2         hk1
hkam30162  100.69.31.156:8301  alive   client  0.9.2  2         hk1
hkam30163  100.69.31.157:8301  alive   client  0.9.2  2         hk1
hkbm30029  100.69.88.1:8301    alive   server  0.9.2  2         hk1
hkbm30030  100.69.88.2:8301    alive   server  0.9.2  2         hk1
hkbm30031  100.69.88.3:8301    alive   server  0.9.2  2         hk1
hkbm30034  100.69.92.150:8301  alive   client  0.9.2  2         hk1
hkbm30035  100.69.92.151:8301  alive   client  0.9.2  2         hk1
hkbm30036  100.69.92.152:8301  alive   client  0.9.2  2         hk1
hkbm30037  100.69.92.153:8301  alive   client  0.9.2  2         hk1
hkbm30038  100.69.92.154:8301  alive   client  0.9.2  2         hk1
hkbm30039  100.69.92.155:8301  alive   client  0.9.2  2         hk1
hkbm30040  100.69.92.156:8301  alive   client  0.9.2  2         hk1
hkbm30041  100.69.92.157:8301  alive   client  0.9.2  2         hk1
hkbm30042  100.69.92.158:8301  alive   client  0.9.2  2         hk1
hkbm30043  100.69.92.159:8301  alive   client  0.9.2  2         hk1
hkbm30044  100.69.92.160:8301  alive   client  0.9.2  2         hk1
hkbm30045  100.69.92.161:8301  alive   client  0.9.2  2         hk1
hkbm30046  100.69.92.162:8301  alive   client  0.9.2  2         hk1
hkbm30048  100.69.92.163:8301  alive   client  0.9.2  2         hk1
hkbm30061  100.69.92.164:8301  alive   client  0.9.2  2         hk1
hkbm30062  100.69.92.165:8301  alive   client  0.9.2  2         hk1
hkbm30063  100.69.92.166:8301  alive   client  0.9.2  2         hk1
hkbm30064  100.69.92.167:8301  alive   client  0.9.2  2         hk1

the client config

{
  "datacenter": "HK1",
  "data_dir": "/pacloud/ccmp/base/ccmp/data",
  "log_level": "DEBUG",
  "node_name": "hkbm30064",
  "server": false,
  "client_addr":"0.0.0.0",
  "bind_addr":"100.69.92.167",
  "retry_join":["100.69.24.1","100.69.24.2","100.69.24.3","100.69.88.1","100.69.88.2","100.69.88.3"]
}

Most helpful comment

@TwitchChen I faced this issue too until i fixed all my firewall issues. Specifically 8301 UDP was not opened. you can use nc <ip> 8301 -vzu to check. the -u flag forces nc to use UDP port

All 8 comments

@TwitchChen : It appears that all the nodes which you are seeing those errors happen to be client nodes. Can you confirm that the network settings of those nodes are set up correctly and that all required ports are open?

@ChipV223 Thank you for your reply
I just made some testing by using nc or telnet

server to client

8300 tcp refused 
8301 tcp udp all pass
8302 tcp refused; udp pass
8500 tcp pass
8600 tcp udp all pass

What have I missed?

8301 is the port in question here I believe. I'm confused you're seeing that message as that is generated by probing with a TCP fallback but your nc/telnet shows that that port is reachable on both protocols.

https://github.com/hashicorp/memberlist/blob/2072f3a3ff4b7b3d830be77678d5d4b978362bc4/state.go#L363-L386

This is pretty clear in the code, which you can see has a comment explaining the behavior. I'm quite confident this is a reachability issue in the network, so I'd double check protocols and different nodes communicating with each other, and not just rules that allow access from a bastion or similar.

Hope that helps a bit. I'm going to proactively close this but please feel free to open a new issue if you believe this is a bug.

@TwitchChen I faced this issue too until i fixed all my firewall issues. Specifically 8301 UDP was not opened. you can use nc <ip> 8301 -vzu to check. the -u flag forces nc to use UDP port

@madsonic thank you,i'll try it

I am also seeing the same behavior on clients in my cluster failed to ping xxx (timeout reached) from the print from the code linked below. From what I can tell from nc and tcpdump I think that all the ports are open that need to be. My server and client instances have multiple interfaces on each instance. How can I determine what interface or port failed to ping?

https://github.com/hashicorp/memberlist/blob/2072f3a3ff4b7b3d830be77678d5d4b978362bc4/state.go#L336

Hi all,
Any news on this?
Syslog file (/var/log/messages) is flooded with such errors...

Server/Agent runs in OpenVZ environment:
Agent: bind_addr = "{{ GetInterfaceIP \"venet0\" }}"
Server: bind_addr = "{{ GetInterfaceIP \"br0\" }}"

Does IPv6 interferes?
There is no selinux or FW between nodes...

Details:

:; consul version
Consul v1.5.3
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

Regards

Hey there,

This issue has been automatically locked because it is closed and there hasn't been any activity for at least _30_ days.

If you are still experiencing problems, or still have questions, feel free to open a new one :+1:.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

wing731 picture wing731  路  3Comments

philsttr picture philsttr  路  3Comments

matteoturra picture matteoturra  路  4Comments

sandstrom picture sandstrom  路  3Comments

powerman picture powerman  路  3Comments