consul version for both Client and ServerClient: Consul v0.8.3
Server: Consul v0.8.3
Tried with all v0.8.x version it is the same behavior.
consul info for both Client and ServerClient:
agent:
check_monitors = 0
check_ttls = 0
checks = 3
services = 10
build:
prerelease =
revision = ea2a82b
version = 0.8.3
consul:
bootstrap = true
known_datacenters = 9
leader = true
leader_addr = SERVER_IP:8300
server = true
raft:
applied_index = 970663
commit_index = 970663
fsm_pending = 0
last_contact = 0
last_log_index = 970663
last_log_term = 8
last_snapshot_index = 967154
last_snapshot_term = 7
latest_configuration = [{Suffrage:Voter ID:SERVER_IP:8300 Address:SERVER_IP:8300}]
latest_configuration_index = 1
num_peers = 0
protocol_version = 2
protocol_version_max = 3
protocol_version_min = 0
snapshot_version_max = 1
snapshot_version_min = 0
state = Leader
term = 8
runtime:
arch = amd64
cpu_count = 8
goroutines = 101
max_procs = 8
os = linux
version = go1.8.1
serf_lan:
encrypted = false
event_queue = 1
event_time = 8
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 1
members = 1
query_queue = 0
query_time = 1
serf_wan:
encrypted = false
event_queue = 0
event_time = 1
failed = 0
health_score = 0
intent_queue = 0
left = 0
member_time = 687
members = 15
query_queue = 0
query_time = 1
Server:
Same as client
Ubuntu 16.04.02
Lot of log lines show this, it shoud be fixed:
[WARN] Service name " consul-http" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
yamux keepalive shows ERROR, with client and server on the same subnet, maybe timeout shoud be increased:
[ERR] yamux: keepalive failed: session shutdown
Hi @mnuic
[WARN] Service name " consul-http" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
That's not a built-in registration that Consul adds - it looks like something in your cluster is configured with a space in front of the name.
[ERR] yamux: keepalive failed: session shutdown
The timeout on that one is 30 seconds, which is pretty long. This often is the result of firewalls that track connections and close them when they are quiet, or other network connectivity issues.
Hope that helps!
Hi @slackpad
You were right, there was a space in service definition in one of my clusters.
Understand the part with firewalls, but I have 2 hosts in same subnet, no firewalls between them, iptables are ok. Disabled ufw on servers and the behavior is the same, consul reports "yamux keepalive failed". Tcpdump shows that both servers see each other, all ports are open and I don't see anything that could block or produce timeout. If it goes in the 30 seconds timeout that is totaly weird.
Understand the part with firewalls, but I have 2 hosts in same subnet, no firewalls between them, iptables are ok. Disabled ufw on servers and the behavior is the same, consul reports "yamux keepalive failed". Tcpdump shows that both servers see each other, all ports are open and I don't see anything that could block or produce timeout. If it goes in the 30 seconds timeout that is totaly weird.
That message could also come from a connection that failed from one of the Consul clients. I think if an agent died or dropped off the network you might also see that. Do you have agents coming and going?
No, nothing in log for yesterday or for the last week. Found one connection drop for one agent in different dc and that's it. Is it possible to lower the log level for this? Because it is not suggesting that there are any real problems that I see for now.
Yeah we get a lot of people concerned with these and they can occur for a number of reasons that aren't really important. I'll open this to track tweaking the log level.
We are getting a lot of excessive logging too.
For example:
Today 10:25:06 PM <redacted> consul [err] ==> Newer Consul version available: 0.8.4 (currently running: 0.8.3)
This shouldn't be err I think.
Randomly lots of this:
[ERR] yamux: keepalive failed: i/o deadline reached
WARN 2017/06/08 16:24:40 [WARN] memberlist: Refuting a suspect message (from: node3)
ERR 2017/06/08 16:26:00 [ERR] memberlist: Failed fallback ping: write tcp 10.0.4.102:40256->10.0.4.123:8301: i/o timeout
It was not the case with Consul 0.7.4.
Just installed consul version 0.9.2 and yamux messages still showing (randomly with no obvious reason on multiple server/clients):
x.y.z.q 2017/08/21 13:06:19 [ERR] yamux: keepalive failed: session shutdown
@slackpad could You please lower the log level for the next release for this log message.
@slackpad Same here, lots of these messages across 3 Consul servers, could you please add more logging info so it's at least more obvious what it's actually trying to do so we can debug properly instead of having to dig through tcpdumps? Or please change the logging for it so it doesn't show up in the logs, thanks.
2017/10/12 15:20:35 [ERR] yamux: keepalive failed: session shutdown
2017/10/12 16:16:02 [ERR] yamux: keepalive failed: session shutdown
2017/10/12 17:17:42 [ERR] yamux: keepalive failed: session shutdown
2017/10/12 18:23:12 [ERR] yamux: keepalive failed: session shutdown
2017/10/12 18:23:40 [ERR] yamux: keepalive failed: session shutdown
2017/10/12 18:37:50 [ERR] yamux: keepalive failed: session shutdown
Is there any chance to see this resolved in the next release? We have a lot consul nodes and our logs are still showing this kind of yamux messages with no obvious reason. Or could you just lower the log-level to info maybe?
Still happens on Consul 1.1.0:
2018/05/16 09:42:50 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 09:45:56 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 09:54:33 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 10:04:09 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 10:17:34 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 10:22:42 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 10:35:52 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 10:40:52 [ERR] yamux: keepalive failed: session shutdown
We have a lot of nodes. Can you change log level to warning or debug?
We are getting these too on version 1.0.7+ent:
2018/06/03 13:24:45 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 14:06:13 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 14:20:46 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 14:32:23 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 14:55:41 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 15:01:01 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 16:03:18 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 16:13:08 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 17:48:55 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 22:15:15 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 23:37:13 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 00:18:51 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 01:41:23 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 01:44:54 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 03:46:37 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 04:02:29 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 05:33:16 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 05:49:39 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 06:31:31 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 06:59:07 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 08:25:20 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 09:00:30 [ERR] yamux: keepalive failed: session shutdown
Most helpful comment
@slackpad Same here, lots of these messages across 3 Consul servers, could you please add more logging info so it's at least more obvious what it's actually trying to do so we can debug properly instead of having to dig through tcpdumps? Or please change the logging for it so it doesn't show up in the logs, thanks.