Consul: [ERR] serf: Rejected coordinate from HOST: round trip time not in valid range, duration -99.611868ms is not a positive value less than 10s

Created on 21 Nov 2017  路  7Comments  路  Source: hashicorp/consul

consul version for both Client and Server

Client: consul 1.0.1
Server: consul 1.0.1

consul info for both Client and Server

Client:

same as server

Server:

agent:
    check_monitors = 0
    check_ttls = 0
    checks = 32
    services = 45
build:
    prerelease =
    revision = 9564c29
    version = 1.0.1
consul:
    bootstrap = true
    known_datacenters = 7
    leader = true
    leader_addr = 10.0.66.150:8300
    server = true
raft:
    applied_index = 16074526
    commit_index = 16074526
    fsm_pending = 0
    last_contact = 0
    last_log_index = 16074526
    last_log_term = 15
    last_snapshot_index = 16070464
    last_snapshot_term = 15
    latest_configuration = [{Suffrage:Voter ID:386b24e2-c793-cd40-49dd-4116232b96bd Address:10.0.66.150:8300}]
    latest_configuration_index = 1
    num_peers = 0
    protocol_version = 3
    protocol_version_max = 3
    protocol_version_min = 0
    snapshot_version_max = 1
    snapshot_version_min = 0
    state = Leader
    term = 15
runtime:
    arch = amd64
    cpu_count = 8
    goroutines = 472
    max_procs = 8
    os = linux
    version = go1.9.2
serf_lan:
    coordinate_resets = 0
    encrypted = false
    event_queue = 1
    event_time = 15
    failed = 0
    health_score = 0
    intent_queue = 0
    left = 0
    member_time = 1
    members = 1
    query_queue = 0
    query_time = 1
serf_wan:
    coordinate_resets = 0
    encrypted = false
    event_queue = 0
    event_time = 1
    failed = 0
    health_score = 0
    intent_queue = 0
    left = 0
    member_time = 886
    members = 11
    query_queue = 0
    query_time = 1

Operating system and Environment details

Ubuntu 16.04.03LTS, Docker 17.09

Description of the Issue (and unexpected/desired result)

Upon upgrade consul to version 1.0.1 logs started to fill with messages:

a.b.c.d     2017/11/21 09:01:38 [ERR] serf: Rejected coordinate from HOST1: round trip time not in valid range, duration -206.486碌s is not a positive value less than 10s
a.b.c.d     2017/11/21 09:02:14 [ERR] serf: Rejected coordinate from HOST2: round trip time not in valid range, duration -99.611868ms is not a positive value less than 10s
a.b.c.d     2017/11/21 09:04:28 [ERR] serf: Rejected coordinate from HOST3: round trip time not in valid range, duration -765.777碌s is not a positive value less than 10s

Logs

  • no logs except for mention above
typbug

Most helpful comment

Hi @mnuic we tracked that down but the fix didn't make it into this release cycle but we will pick this up in the next minor release of Consul via https://github.com/hashicorp/memberlist/pull/139. Sorry for the log noise - these can be safely ignored.

All 7 comments

Hi @mnuic we tracked that down but the fix didn't make it into this release cycle but we will pick this up in the next minor release of Consul via https://github.com/hashicorp/memberlist/pull/139. Sorry for the log noise - these can be safely ignored.

@slackpad thank you for the info! Will wait for the next release for production use.

I'm afraid this is more than just log noise. consul 1.0.1 does break our test environment, whereas v0.9.3 works flawlessly. The above mentioned error messages are the only ones we see.

@sofax can you provide more details about what is broken for you?

@slackpad:
It may or may not be related to this issue - all I can say is that we don't see any other error messages.

Here is the scenario:
We have some integration tests for service health checks, e.g. one with two instances of service A, where initially both instances return an unhealthy state. Then service instance #2 is set to "healthy" (i.e. its health check resource returns a healthy state), which - as expected - makes it available via Consul. However, service instance #1 is suddenly available too, even though its health check resource still returns "unhealthy".

This does not happen with Consul 0.9.3.

@sofax thanks that's definitely not related to this error. Can you please open a new issue with some more details about how your test is working and we will take a look?

@slackpad:
Thanks - I think it turned out that the problem lies in our configuration (and in a misinterpration of the documentation or in a configuration example we found on the Internet, that was based on Consul > 0.9.3). We had the field id added to the check definition in both instances with the same value. v0.9.3 apparently/probably did not interpret that property at all, so it simply ignored it and assigned an automatic ID to the checks instead. v1.0.1 does interpret it though, but instead of treating the ID as local to the service instance (which IMO makes more sense), it seems to have global scope, so assigning the same ID to health checks for different service instances (of the same service) won't work.

Was this page helpful?
0 / 5 - 0 ratings