Consul: Consul 0.8.x - logging and yamux keepalive errors

Created on 13 May 2017 · 11Comments · Source: hashicorp/consul

`consul version` for both Client and Server

Client: Consul v0.8.3
Server: Consul v0.8.3
Tried with all v0.8.x version it is the same behavior.

`consul info` for both Client and Server

Client:

agent:
    check_monitors = 0
    check_ttls = 0
    checks = 3
    services = 10
build:
    prerelease =
    revision = ea2a82b
    version = 0.8.3
consul:
    bootstrap = true
    known_datacenters = 9
    leader = true
    leader_addr = SERVER_IP:8300
    server = true
raft:
    applied_index = 970663
    commit_index = 970663
    fsm_pending = 0
    last_contact = 0
    last_log_index = 970663
    last_log_term = 8
    last_snapshot_index = 967154
    last_snapshot_term = 7
    latest_configuration = [{Suffrage:Voter ID:SERVER_IP:8300 Address:SERVER_IP:8300}]
    latest_configuration_index = 1
    num_peers = 0
    protocol_version = 2
    protocol_version_max = 3
    protocol_version_min = 0
    snapshot_version_max = 1
    snapshot_version_min = 0
    state = Leader
    term = 8
runtime:
    arch = amd64
    cpu_count = 8
    goroutines = 101
    max_procs = 8
    os = linux
    version = go1.8.1
serf_lan:
    encrypted = false
    event_queue = 1
    event_time = 8
    failed = 0
    health_score = 0
    intent_queue = 0
    left = 0
    member_time = 1
    members = 1
    query_queue = 0
    query_time = 1
serf_wan:
    encrypted = false
    event_queue = 0
    event_time = 1
    failed = 0
    health_score = 0
    intent_queue = 0
    left = 0
    member_time = 687
    members = 15
    query_queue = 0
    query_time = 1

Server:

Same as client

Operating system and Environment details

Ubuntu 16.04.02

Description of the Issue (and unexpected/desired result)

Lot of log lines show this, it shoud be fixed:
[WARN] Service name " consul-http" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.

yamux keepalive shows ERROR, with client and server on the same subnet, maybe timeout shoud be increased:
[ERR] yamux: keepalive failed: session shutdown

themoperator-usability typenhancement

Source

mnuic

Most helpful comment

@slackpad Same here, lots of these messages across 3 Consul servers, could you please add more logging info so it's at least more obvious what it's actually trying to do so we can debug properly instead of having to dig through tcpdumps? Or please change the logging for it so it doesn't show up in the logs, thanks.

    2017/10/12 15:20:35 [ERR] yamux: keepalive failed: session shutdown
    2017/10/12 16:16:02 [ERR] yamux: keepalive failed: session shutdown
    2017/10/12 17:17:42 [ERR] yamux: keepalive failed: session shutdown
    2017/10/12 18:23:12 [ERR] yamux: keepalive failed: session shutdown
    2017/10/12 18:23:40 [ERR] yamux: keepalive failed: session shutdown
    2017/10/12 18:37:50 [ERR] yamux: keepalive failed: session shutdown

kerneljack on 12 Oct 2017

👍7

All 11 comments

Hi @mnuic

[WARN] Service name " consul-http" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.

That's not a built-in registration that Consul adds - it looks like something in your cluster is configured with a space in front of the name.

[ERR] yamux: keepalive failed: session shutdown

The timeout on that one is 30 seconds, which is pretty long. This often is the result of firewalls that track connections and close them when they are quiet, or other network connectivity issues.

Hope that helps!

slackpad on 25 May 2017

Hi @slackpad

You were right, there was a space in service definition in one of my clusters.

Understand the part with firewalls, but I have 2 hosts in same subnet, no firewalls between them, iptables are ok. Disabled ufw on servers and the behavior is the same, consul reports "yamux keepalive failed". Tcpdump shows that both servers see each other, all ports are open and I don't see anything that could block or produce timeout. If it goes in the 30 seconds timeout that is totaly weird.

mnuic on 25 May 2017

Understand the part with firewalls, but I have 2 hosts in same subnet, no firewalls between them, iptables are ok. Disabled ufw on servers and the behavior is the same, consul reports "yamux keepalive failed". Tcpdump shows that both servers see each other, all ports are open and I don't see anything that could block or produce timeout. If it goes in the 30 seconds timeout that is totaly weird.

That message could also come from a connection that failed from one of the Consul clients. I think if an agent died or dropped off the network you might also see that. Do you have agents coming and going?

slackpad on 1 Jun 2017

No, nothing in log for yesterday or for the last week. Found one connection drop for one agent in different dc and that's it. Is it possible to lower the log level for this? Because it is not suggesting that there are any real problems that I see for now.

mnuic on 1 Jun 2017

Yeah we get a lot of people concerned with these and they can occur for a number of reasons that aren't really important. I'll open this to track tweaking the log level.

slackpad on 2 Jun 2017

👍4

We are getting a lot of excessive logging too.

For example:

Today 10:25:06 PM <redacted> consul [err] ==> Newer Consul version available: 0.8.4 (currently running: 0.8.3)

This shouldn't be err I think.

Randomly lots of this:

 [ERR] yamux: keepalive failed: i/o deadline reached
WARN     2017/06/08 16:24:40 [WARN] memberlist: Refuting a suspect message (from: node3)
 ERR     2017/06/08 16:26:00 [ERR] memberlist: Failed fallback ping: write tcp 10.0.4.102:40256->10.0.4.123:8301: i/o timeout

It was not the case with Consul 0.7.4.

roman-vynar on 9 Jun 2017

Just installed consul version 0.9.2 and yamux messages still showing (randomly with no obvious reason on multiple server/clients):

x.y.z.q     2017/08/21 13:06:19 [ERR] yamux: keepalive failed: session shutdown

@slackpad could You please lower the log level for the next release for this log message.

mnuic on 21 Aug 2017

    2017/10/12 15:20:35 [ERR] yamux: keepalive failed: session shutdown
    2017/10/12 16:16:02 [ERR] yamux: keepalive failed: session shutdown
    2017/10/12 17:17:42 [ERR] yamux: keepalive failed: session shutdown
    2017/10/12 18:23:12 [ERR] yamux: keepalive failed: session shutdown
    2017/10/12 18:23:40 [ERR] yamux: keepalive failed: session shutdown
    2017/10/12 18:37:50 [ERR] yamux: keepalive failed: session shutdown

kerneljack on 12 Oct 2017

👍7

Is there any chance to see this resolved in the next release? We have a lot consul nodes and our logs are still showing this kind of yamux messages with no obvious reason. Or could you just lower the log-level to info maybe?

mnuic on 16 Apr 2018

👍1

Still happens on Consul 1.1.0:

2018/05/16 09:42:50 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 09:45:56 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 09:54:33 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 10:04:09 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 10:17:34 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 10:22:42 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 10:35:52 [ERR] yamux: keepalive failed: session shutdown
2018/05/16 10:40:52 [ERR] yamux: keepalive failed: session shutdown

We have a lot of nodes. Can you change log level to warning or debug?

mnuic on 16 May 2018

👍6

We are getting these too on version 1.0.7+ent:

2018/06/03 13:24:45 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 14:06:13 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 14:20:46 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 14:32:23 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 14:55:41 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 15:01:01 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 16:03:18 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 16:13:08 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 17:48:55 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 22:15:15 [ERR] yamux: keepalive failed: session shutdown
2018/06/03 23:37:13 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 00:18:51 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 01:41:23 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 01:44:54 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 03:46:37 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 04:02:29 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 05:33:16 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 05:49:39 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 06:31:31 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 06:59:07 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 08:25:20 [ERR] yamux: keepalive failed: session shutdown
2018/06/04 09:00:30 [ERR] yamux: keepalive failed: session shutdown