Consul: Consul servers won't elect a leader

Created on 2 Jun 2015  Â·  74Comments  Â·  Source: hashicorp/consul

I have 3 consul servers running (+ a handful of other nodes), and they can all speak to each other - or so I think; at least they're sending UDP messages between themselves.
The logs still show [ERR] agent: failed to sync remote state: No cluster leader, so even if the servers know about each other, it looks like they fail to perform an actual leader election...
Is there a way to trigger a leader election manually?

I'm running consul 0.5.2 on all nodes.

Most helpful comment

I've read through #750, #454 and this issue, but I don't feel like I'm any closer to understanding the Right Way (tm) to do things.

Here's the workflow I need and expect:

1) An easy way to start an N (typically 3 or 5) node cluster. bootstrap-expect is trying to satisfy this need.
2) An easy to way to _automatically_ recover when a server disappears and reappears, which could occur for multiple reasons: network outage, server outage, server upgrade, consul upgrade, etc. Bonus points if this works even when consul is running in a container.

2 is where everyone is getting stuck. Consul should Just Work even when all servers bounce due to a planned or unplanned outage - even if a split brain situation occurs temporarily (such as when node 2 / 3 disappears and the cluster loses quorum), the cluster should heal itself when any 2 (or all?) are back up, without requiring manual intervention.

If this is infeasible due to technical limitations in raft or Consul, I would like to see explicit documentation detailing which failure (and maintenance!) modes support automatic remediation, best practices around said maintenance, and a clear description of cases requiring manual remediation.

All 74 comments

This is our consul.json config:

{
  "datacenter": "sit0",
  "server": true,
  "leave_on_terminate": true,
  "retry_join": [
    "mod02.finn.no",
    "mod04.finn.no",
    "mod06.finn.no"
  ],
  "bootstrap_expect": 3,
  "retry_join_wan": [
    "mod01.finn.no",
    "mod03.finn.no",
    "mod05.finn.no"
  ]
}

even numbers belong to the sit0 data center, odd numbers belong to the sit1 data center. So it should already be in bootstrap mode? (We always start the consul servers in bootstrap mode, I guess it cannot hurt?)

I've had this same issue. I have brought up a cluster of 3 servers with -bootstrap-expect 3 and they come up and elect a leader just fine once I issue a join. If I then kill -TERM them all, wait for them all to shut down, then start them back up they still elect a leader fine. However, if I kill -INT them all when I bring them back up they never elect a leader.

After some fiddling, I got it to work now. There are 2 suspects:

1) All the consul servers were started with the bootstrap_expect setting - does that lead to a kind of split-brain scenario, where they're all trying to initiate an election, and none of them succeed?

2) The WAN pool - it looks like the servers join the WAN pool before they have elected a leader. Could that be the problem?

@eirslett what did you change to get it to work?

I removed the "retry_join_wan" setting from all servers, and removed the "bootstrap_expect" setting from all servers except one of them. Then, I restarted the servers, and the leader election worked.

The servers should all be started with the same bootstrap_expect setting
so they all know how many to wait for before running the first leader
election, so you shouldn't have to configure them differently.

Could you link to some logs in your original configuration so we can see
what's going on?

On Mon, Jun 8, 2015 at 11:11 AM, Eirik Sletteberg [email protected]
wrote:

I removed the "retry_join_wan" setting from all servers, and removed the
"bootstrap_expect" setting from all servers except one of them. Then, I
restarted the servers, and the leader election worked.

—
Reply to this email directly or view it on GitHub
https://github.com/hashicorp/consul/issues/993#issuecomment-110094998.

My configuration didn't have any wan connections at all so I doubt it had anything to do with it (unless it's automatic).

Here's the log from the 3 servers in my cluster when this happens:

==> Caught signal: interrupt
==> Gracefully shutting down agent...
    2015/06/09 21:30:39 [INFO] consul: server starting leave
    2015/06/09 21:30:39 [INFO] serf: EventMemberLeave: devt-crate00.dc1 172.16.204.152
    2015/06/09 21:30:39 [INFO] consul: removing server devt-crate00.dc1 (Addr: 172.16.204.152:8300) (DC: dc1)
    2015/06/09 21:30:40 [INFO] serf: EventMemberLeave: devt-crate00 172.16.204.152
    2015/06/09 21:30:40 [INFO] consul: removing server devt-crate00 (Addr: 172.16.204.152:8300) (DC: dc1)
    2015/06/09 21:30:40 [INFO] serf: EventMemberLeave: devt-crate01 172.16.204.153
    2015/06/09 21:30:40 [INFO] consul: removing server devt-crate01 (Addr: 172.16.204.153:8300) (DC: dc1)
    2015/06/09 21:30:40 [WARN] raft: Heartbeat timeout reached, starting election
    2015/06/09 21:30:40 [INFO] raft: Node at 172.16.204.152:8300 [Candidate] entering Candidate state
    2015/06/09 21:30:41 [INFO] raft: Duplicate RequestVote for same term: 10
    2015/06/09 21:30:42 [WARN] raft: Election timeout reached, restarting election
    2015/06/09 21:30:42 [INFO] raft: Node at 172.16.204.152:8300 [Candidate] entering Candidate state
    2015/06/09 21:30:42 [INFO] raft: Election won. Tally: 2
    2015/06/09 21:30:42 [INFO] raft: Node at 172.16.204.152:8300 [Leader] entering Leader state
    2015/06/09 21:30:42 [INFO] consul: cluster leadership acquired
    2015/06/09 21:30:42 [INFO] consul: New leader elected: devt-crate00
    2015/06/09 21:30:42 [INFO] raft: pipelining replication to peer 172.16.204.154:8300
    2015/06/09 21:30:42 [WARN] consul: deregistering self (devt-crate00) should be done by follower
    2015/06/09 21:30:42 [INFO] consul: member 'devt-crate01' left, deregistering
    2015/06/09 21:30:43 [INFO] serf: EventMemberLeave: devt-crate02 172.16.204.154
    2015/06/09 21:30:43 [INFO] consul: removing server devt-crate02 (Addr: 172.16.204.154:8300) (DC: dc1)
    2015/06/09 21:30:43 [INFO] raft: Removed peer 172.16.204.154:8300, stopping replication (Index: 5825)
    2015/06/09 21:30:43 [INFO] consul: removed server 'devt-crate02' as peer
    2015/06/09 21:30:43 [INFO] consul: member 'devt-crate02' left, deregistering
    2015/06/09 21:30:43 [INFO] raft: aborting pipeline replication to peer 172.16.204.154:8300
    2015/06/09 21:30:43 [INFO] agent: requesting shutdown
    2015/06/09 21:30:43 [INFO] consul: shutting down server
    2015/06/09 21:30:43 [INFO] agent: shutdown complete
==> WARNING: Expect Mode enabled, expecting 3 servers
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting raft data migration...
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'devt-crate00'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
      Cluster Addr: 172.16.204.152 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2015/06/09 21:31:22 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 21:31:22 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 21:31:22 [INFO] serf: EventMemberJoin: devt-crate00 172.16.204.152
    2015/06/09 21:31:22 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 21:31:22 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 21:31:22 [INFO] serf: EventMemberJoin: devt-crate00.dc1 172.16.204.152
    2015/06/09 21:31:22 [INFO] raft: Node at 172.16.204.152:8300 [Follower] entering Follower state
    2015/06/09 21:31:22 [INFO] serf: Attempting re-join to previously known node: devt-crate02: 172.16.204.154:8301
    2015/06/09 21:31:22 [WARN] serf: Failed to re-join any previously known node
    2015/06/09 21:31:22 [INFO] serf: Attempting re-join to previously known node: devt-crate01: 172.16.204.153:8301
    2015/06/09 21:31:22 [INFO] consul: adding server devt-crate00 (Addr: 172.16.204.152:8300) (DC: dc1)
    2015/06/09 21:31:22 [INFO] consul: adding server devt-crate00.dc1 (Addr: 172.16.204.152:8300) (DC: dc1)
    2015/06/09 21:31:22 [ERR] agent: failed to sync remote state: No cluster leader
    2015/06/09 21:31:22 [INFO] serf: EventMemberJoin: devt-crate01 172.16.204.153
    2015/06/09 21:31:22 [INFO] serf: Re-joined to previously known node: devt-crate01: 172.16.204.153:8301
    2015/06/09 21:31:22 [INFO] consul: adding server devt-crate01 (Addr: 172.16.204.153:8300) (DC: dc1)
    2015/06/09 21:31:22 [INFO] serf: EventMemberJoin: devt-crate02 172.16.204.154
    2015/06/09 21:31:22 [INFO] consul: adding server devt-crate02 (Addr: 172.16.204.154:8300) (DC: dc1)
    2015/06/09 21:31:22 [INFO] serf: EventMemberJoin: devt-crate03 172.16.204.155
    2015/06/09 21:31:22 [ERR] http: Request /v1/catalog/nodes, error: No cluster leader
    2015/06/09 21:31:22 [INFO] agent.rpc: Accepted client: 127.0.0.1:49409
    2015/06/09 21:31:22 [INFO] agent: (LAN) joining: [172.16.204.152 172.16.204.153 172.16.204.154]
    2015/06/09 21:31:22 [INFO] agent: (LAN) joined: 3 Err: <nil>
    2015/06/09 21:31:23 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
    2015/06/09 21:31:38 [ERR] agent: failed to sync remote state: No cluster leader
    2015/06/09 21:32:05 [ERR] agent: failed to sync remote state: No cluster leader
==> Caught signal: interrupt
==> Gracefully shutting down agent...
    2015/06/09 22:01:41 [INFO] consul: server starting leave
    2015/06/09 22:01:41 [INFO] serf: EventMemberLeave: devt-crate01.dc1 172.16.204.153
    2015/06/09 22:01:41 [INFO] consul: removing server devt-crate01.dc1 (Addr: 172.16.204.153:8300) (DC: dc1)
    2015/06/09 22:01:41 [INFO] serf: EventMemberLeave: devt-crate01 172.16.204.153
    2015/06/09 22:01:41 [INFO] consul: removing server devt-crate01 (Addr: 172.16.204.153:8300) (DC: dc1)
    2015/06/09 22:01:41 [INFO] serf: EventMemberFailed: devt-crate02 172.16.204.154
    2015/06/09 22:01:41 [INFO] consul: removing server devt-crate02 (Addr: 172.16.204.154:8300) (DC: dc1)
    2015/06/09 22:01:42 [INFO] agent: requesting shutdown
    2015/06/09 22:01:42 [INFO] consul: shutting down server
    2015/06/09 22:01:42 [INFO] agent: shutdown complete
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting raft data migration...
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'devt-crate01'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
      Cluster Addr: 172.16.204.153 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2015/06/09 22:05:17 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:05:17 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:05:17 [INFO] serf: EventMemberJoin: devt-crate01 172.16.204.153
    2015/06/09 22:05:17 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:05:17 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:05:17 [INFO] serf: EventMemberJoin: devt-crate01.dc1 172.16.204.153
    2015/06/09 22:05:17 [INFO] raft: Node at 172.16.204.153:8300 [Follower] entering Follower state
    2015/06/09 22:05:17 [INFO] serf: Attempting re-join to previously known node: devt-crate03: 172.16.204.155:8301
    2015/06/09 22:05:17 [WARN] serf: Failed to re-join any previously known node
    2015/06/09 22:05:17 [INFO] serf: Attempting re-join to previously known node: devt-crate02: 172.16.204.154:8301
    2015/06/09 22:05:17 [INFO] consul: adding server devt-crate01 (Addr: 172.16.204.153:8300) (DC: dc1)
    2015/06/09 22:05:17 [ERR] agent: failed to sync remote state: No cluster leader
    2015/06/09 22:05:17 [INFO] serf: EventMemberJoin: devt-crate00 172.16.204.152
    2015/06/09 22:05:17 [INFO] serf: EventMemberJoin: devt-crate02 172.16.204.154
    2015/06/09 22:05:17 [INFO] serf: Re-joined to previously known node: devt-crate02: 172.16.204.154:8301
    2015/06/09 22:05:17 [INFO] consul: adding server devt-crate00 (Addr: 172.16.204.152:8300) (DC: dc1)
    2015/06/09 22:05:17 [INFO] consul: adding server devt-crate02 (Addr: 172.16.204.154:8300) (DC: dc1)
    2015/06/09 22:05:17 [ERR] http: Request /v1/catalog/nodes, error: No cluster leader
    2015/06/09 22:05:17 [INFO] agent.rpc: Accepted client: 127.0.0.1:44608
    2015/06/09 22:05:17 [INFO] agent: (LAN) joining: [172.16.204.152 172.16.204.153 172.16.204.154]
    2015/06/09 22:05:17 [INFO] agent: (LAN) joined: 3 Err: <nil>
    2015/06/09 22:05:18 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
    2015/06/09 22:05:36 [ERR] agent: failed to sync remote state: No cluster leader
==> Caught signal: interrupt
==> Gracefully shutting down agent...
    2015/06/09 22:01:41 [INFO] consul: server starting leave
    2015/06/09 22:01:41 [INFO] serf: EventMemberLeave: devt-crate02.dc1 172.16.204.154
    2015/06/09 22:01:41 [INFO] serf: EventMemberLeave: devt-crate02 172.16.204.154
    2015/06/09 22:01:41 [INFO] consul: removing server devt-crate02.dc1 (Addr: 172.16.204.154:8300) (DC: dc1)
    2015/06/09 22:01:41 [INFO] consul: removing server devt-crate02 (Addr: 172.16.204.154:8300) (DC: dc1)
    2015/06/09 22:01:41 [INFO] agent: requesting shutdown
    2015/06/09 22:01:41 [INFO] consul: shutting down server
    2015/06/09 22:01:41 [INFO] agent: shutdown complete
==> WARNING: It is highly recommended to set GOMAXPROCS higher than 1
==> Starting raft data migration...
==> Starting Consul agent...
==> Starting Consul agent RPC...
==> Consul agent running!
         Node name: 'devt-crate02'
        Datacenter: 'dc1'
            Server: true (bootstrap: false)
       Client Addr: 127.0.0.1 (HTTP: 8500, HTTPS: -1, DNS: 8600, RPC: 8400)
      Cluster Addr: 172.16.204.154 (LAN: 8301, WAN: 8302)
    Gossip encrypt: false, RPC-TLS: false, TLS-Incoming: false
             Atlas: <disabled>

==> Log data will now stream in as it occurs:

    2015/06/09 22:04:21 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:04:21 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:04:21 [INFO] serf: EventMemberJoin: devt-crate02 172.16.204.154
    2015/06/09 22:04:21 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:04:21 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:04:21 [INFO] serf: EventMemberJoin: devt-crate02.dc1 172.16.204.154
    2015/06/09 22:04:21 [INFO] raft: Node at 172.16.204.154:8300 [Follower] entering Follower state
    2015/06/09 22:04:21 [INFO] serf: Attempting re-join to previously known node: devt-crate03: 172.16.204.155:8301
    2015/06/09 22:04:21 [WARN] serf: Failed to re-join any previously known node
    2015/06/09 22:04:21 [INFO] serf: Attempting re-join to previously known node: devt-crate01: 172.16.204.153:8301
    2015/06/09 22:04:21 [INFO] consul: adding server devt-crate02 (Addr: 172.16.204.154:8300) (DC: dc1)
    2015/06/09 22:04:21 [INFO] consul: adding server devt-crate02.dc1 (Addr: 172.16.204.154:8300) (DC: dc1)
    2015/06/09 22:04:21 [ERR] agent: failed to sync remote state: No cluster leader
    2015/06/09 22:04:21 [WARN] serf: Failed to re-join any previously known node
    2015/06/09 22:04:21 [ERR] http: Request /v1/catalog/nodes, error: No cluster leader
    2015/06/09 22:04:21 [INFO] agent.rpc: Accepted client: 127.0.0.1:36990
    2015/06/09 22:04:21 [INFO] agent: (LAN) joining: [172.16.204.152 172.16.204.153 172.16.204.154]
    2015/06/09 22:04:21 [INFO] serf: EventMemberJoin: devt-crate00 172.16.204.152
    2015/06/09 22:04:21 [INFO] consul: adding server devt-crate00 (Addr: 172.16.204.152:8300) (DC: dc1)
    2015/06/09 22:04:21 [INFO] agent: (LAN) joined: 2 Err: <nil>
    2015/06/09 22:04:22 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
    2015/06/09 22:04:42 [ERR] agent: failed to sync remote state: No cluster leader
    2015/06/09 22:05:01 [ERR] agent: failed to sync remote state: No cluster leader
    2015/06/09 22:05:17 [INFO] serf: EventMemberJoin: devt-crate01 172.16.204.153
    2015/06/09 22:05:17 [INFO] consul: adding server devt-crate01 (Addr: 172.16.204.153:8300) (DC: dc1)
    2015/06/09 22:05:19 [ERR] agent: failed to sync remote state: No cluster leader

Also, when this happens killing the servers with either QUIT or INT still leaves it in this state when they come back up, they won't elect a leader.

@reversefold Those look like the same node's log got pasted three times - can you grab the logs for the other two nodes?

Command-line I'm using:

consul agent -server -data-dir consul.data -config-dir etc -pid-file $consul.pid -bootstrap-expect 3 -rejoin

If I try killing them all and replacing -bootstrap-expect 3 with -bootstrap and just starting one node it elects itself leader but then drops it and ends up in the same situation...

    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot                                                                                                                                                              [9/1936]
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: EventMemberJoin: devt-crate00 172.16.204.152
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: Ignoring previous leave in snapshot
    2015/06/09 22:06:50 [INFO] serf: EventMemberJoin: devt-crate00.dc1 172.16.204.152
    2015/06/09 22:06:50 [INFO] raft: Node at 172.16.204.152:8300 [Follower] entering Follower state
    2015/06/09 22:06:50 [WARN] serf: Failed to re-join any previously known node
    2015/06/09 22:06:50 [WARN] serf: Failed to re-join any previously known node
    2015/06/09 22:06:50 [INFO] consul: adding server devt-crate00 (Addr: 172.16.204.152:8300) (DC: dc1)
    2015/06/09 22:06:50 [INFO] consul: adding server devt-crate00.dc1 (Addr: 172.16.204.152:8300) (DC: dc1)
    2015/06/09 22:06:50 [ERR] agent: failed to sync remote state: No cluster leader
    2015/06/09 22:06:50 [ERR] http: Request /v1/catalog/nodes, error: No cluster leader
    2015/06/09 22:06:50 [INFO] agent.rpc: Accepted client: 127.0.0.1:50493
    2015/06/09 22:06:50 [INFO] agent: (LAN) joining: [172.16.204.152 172.16.204.153 172.16.204.154]
    2015/06/09 22:06:50 [INFO] agent: (LAN) joined: 1 Err: <nil>
    2015/06/09 22:06:51 [WARN] raft: Heartbeat timeout reached, starting election
    2015/06/09 22:06:51 [INFO] raft: Node at 172.16.204.152:8300 [Candidate] entering Candidate state
    2015/06/09 22:06:51 [INFO] raft: Election won. Tally: 1
    2015/06/09 22:06:51 [INFO] raft: Node at 172.16.204.152:8300 [Leader] entering Leader state
    2015/06/09 22:06:51 [INFO] consul: cluster leadership acquired
    2015/06/09 22:06:51 [INFO] consul: New leader elected: devt-crate00
    2015/06/09 22:06:51 [INFO] raft: Disabling EnableSingleNode (bootstrap)
    2015/06/09 22:06:51 [INFO] raft: Added peer 172.16.204.153:8300, starting replication
    2015/06/09 22:06:51 [INFO] raft: Added peer 172.16.204.154:8300, starting replication
    2015/06/09 22:06:51 [INFO] raft: Removed peer 172.16.204.153:8300, stopping replication (Index: 19)
    2015/06/09 22:06:51 [INFO] raft: Removed peer 172.16.204.154:8300, stopping replication (Index: 19)
    2015/06/09 22:06:51 [INFO] raft: Removed ourself, transitioning to follower
    2015/06/09 22:06:51 [ERR] raft: Failed to AppendEntries to 172.16.204.153:8300: dial tcp 172.16.204.153:8300: connection refused
    2015/06/09 22:06:51 [ERR] raft: Failed to AppendEntries to 172.16.204.154:8300: dial tcp 172.16.204.154:8300: connection refused
    2015/06/09 22:06:51 [ERR] raft: Failed to AppendEntries to 172.16.204.154:8300: dial tcp 172.16.204.154:8300: connection refused
    2015/06/09 22:06:52 [INFO] raft: Node at 172.16.204.152:8300 [Follower] entering Follower state
    2015/06/09 22:06:52 [INFO] consul: cluster leadership lost
    2015/06/09 22:06:52 [ERR] consul: failed to wait for barrier: node is not the leader
    2015/06/09 22:06:53 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
    2015/06/09 22:06:53 [ERR] agent: failed to sync remote state: No cluster leader
    2015/06/09 22:07:21 [ERR] agent: failed to sync remote state: No cluster leader

Note that no other consul servers are up at this point, so I don't understand why it's losing leadership...

$ ./bin/consul members
Node          Address              Status  Type    Build  Protocol  DC
devt-crate00  172.16.204.152:8301  alive   server  0.5.2  2         dc1

@slackpad Apologies, I've updated the comment above with the other 2 logs.

I've also tried removing the data directory on one node and starting it with bootstrap. It elects itself as leader but when I start up the other 2 servers they won't accept it as leader.

    2015/06/09 22:31:45 [WARN] raft: Rejecting vote from 172.16.204.152:8300 since our last term is greater (9, 1)

Also worth mentioning that my config dir only has service entries in it.

@reversefold - After digging more I think you are seeing the same thing as is being discussed on #750 and #454.

I was able to reproduce your situation locally and verify that the consul.data/raft/peers.json file ends up with null inside on all three servers. That's why a TERM sig doesn't lead to this because it doesn't formally leave the set of peers (see leave_on_terminate, which defaults to false).

I'll let @armon and/or @ryanuber weigh in on the best practice to follow to get into a working state. Maybe we should add a new "Failure of all Servers in a Multi-Server Cluster" section in the outage recovery docs since this seems to be confusing to people. I think it will end up being something similar to the manual bootstrapping procedure, picking one of the servers as the lead. In my local testing I was able to get them going again by manually editing the peers.json file, but I'm not sure if that's a safe thing to do.

@slackpad is correct here. @reversefold when you delete the entire data directory, you are also deleting the raft log entirely, which is why you get the error about the voting terms not matching up (servers with data joining a leader with no data). Editing the peers.json file and leaving the raft data intact is the current best practice for recovering in outage scenarios, and the null is described in #750, TL;DR being that it is purposely nulled out during a graceful leave for safety reasons.

I understand, I was just throwing things at the wall to try to make the cluster come back up. This does seem to be a bug in the system, though. In general if I am shutting down a consul agent I want to do it "nicely" so the consul cluster doesn't complain about failing checks. I know I shut it down and I don't need spurious errors in my logs or being reported by the consul cluster.

Regardless of that, though, this is a state that the consul cluster is putting itself in. If, when a multi-node consul server cluster is shutdown with INT, the last server is leaving and is about to put itself into this state, it should throw a warning or error and not do that.

I've also tested this with a single node and the same problem does not happen, which I would expect given your explanation. If I have a single server with -bootstrap-expect 1 and I kill it with INT, making it "leave", then when it comes back up it should end up in the same state as the 3-node cluster did when its last server member was killed with INT, but it happily comes back up and elects itself leader again. In fact, if I bring up two servers, each with -bootstrap-expect 1 they work fine, then if I kill -INT them both, they both leave, then will come back up and elect a leader again. However, if I change to -bootstrap-expect 2 they then fail to elect a leader.

I'm seeing this same issue. Can someone either provide a sample or else point me to the documentation of what the peers.json is supposed to look like (you know, before it gets corrupted with a null value)?

Hi @orclev - there's a sample in the Outage Recovery Guide. It's basically a list of IPs and port numbers for the peers:

[
  "10.0.1.8:8300",
  "10.0.1.6:8300",
  "10.0.1.7:8300"
]

@slackpad thanks, I somehow missed that. I figured it was probably somewhere in the docs and I was just missing it.

This bug is hitting me as well on 0.5.2

@slackpad
can you give a patch build to fix this issue instead of manually fix it since it is almost impossible to mannually fix something on production env?

Hi @juaby. Unfortunately, there's not a good way we could automatically recover from the case where the servers have all left the cluster. I was just going to improve the documentation on how to recover from this case by manually editing the peers.json file per the comments above.

If this is a state that a human can recognize and fix, then it's a state that a computer can recognize and fix. At the very least there should be a command-line option that will fix this state.

Perhaps we should always start with --bootstrap-expect=1. That way we would work around the bring up problem. The remaining servers will join subsequently, so in the end we'll get our redundancy back. I believe joining servers need to nuke their raft data. I know this is not recommended but in our case, manual fiddling with descriptors is not an option. @slackpad do you see any issues with this approach?

I suppose --bootstrap-expect=1 would lead to split brain?

I hope not as the joining servers would run with --join whereas the first one would not

After a day of testing, this almost works. It starts in bootstrap-expect=1 and elects itself a leader. The others join and I have my cluster back. Unfortunately, I am running into a case, where it decides to give up as a leader. For some reason it detects long dead peers as active and wants to run election which it cannot win because well... the peers are really dead. Is this a bug or is there some reason for that?

http://pastebin.com/NR5RSvDq

I've read through #750, #454 and this issue, but I don't feel like I'm any closer to understanding the Right Way (tm) to do things.

Here's the workflow I need and expect:

1) An easy way to start an N (typically 3 or 5) node cluster. bootstrap-expect is trying to satisfy this need.
2) An easy to way to _automatically_ recover when a server disappears and reappears, which could occur for multiple reasons: network outage, server outage, server upgrade, consul upgrade, etc. Bonus points if this works even when consul is running in a container.

2 is where everyone is getting stuck. Consul should Just Work even when all servers bounce due to a planned or unplanned outage - even if a split brain situation occurs temporarily (such as when node 2 / 3 disappears and the cluster loses quorum), the cluster should heal itself when any 2 (or all?) are back up, without requiring manual intervention.

If this is infeasible due to technical limitations in raft or Consul, I would like to see explicit documentation detailing which failure (and maintenance!) modes support automatic remediation, best practices around said maintenance, and a clear description of cases requiring manual remediation.

We are hitting this behavior as well on a rolling restart. We have 3 servers running and only stop/start one at a time via consul leave. We will wait up to 30 seconds for an election after restarting the node. It took 6 retries before we saw a leader election occur within 30 seconds. It happened to be at the 15 second mark which never happened with the other restarts.

Maybe the consul servers should be self-aware?

We have changed how we are bootstrapping to a more simple model, but continue to notice that it can take up to 2 minutes to have a successful election. We will wait for an election to occur within 1 minute before leaving and rejoining.
We have noticed some bad behavior in the election process. In one situation, we noticed that after joining, we continually got

2015/07/21 23:22:40 [WARN] raft: Rejecting vote from 10.244.0.62:8300 since our last index is greater (24579, 38)

for the entire minute. And once we left and joined back, the election still took another 45 seconds but there was no veto just the Skipping application of old logs which I reported in issue #1126.

I hit the same issue than @reversefold in https://github.com/hashicorp/consul/issues/993#issuecomment-109661347

I made multiple tries on my computer (here is a summary to help you ;)).

Config

I have 3 nodes, and the configuration is the same on all hosts:

{
    "datacenter": "sl", 
    "data_dir": "/var/lib/consul", 
    "client_addr": "0.0.0.0", 
    "server": true, 
    "start_join": [
        "ansible-monitoring-es-0.lxc", 
        "ansible-monitoring-es-1.lxc", 
        "ansible-monitoring-es-2.lxc"
    ], 
    "bootstrap_expect": 3, 
    "ui_dir": "/usr/local/share/consul/ui/dist"

And here is my upstart script:

description "Consul Service Discovery Platform"

emits consul-up

start on runlevel [2345]
stop on runlevel [!2345]

setuid root

script
  export GOMAXPROCS=`nproc`
  CMD="/usr/local/bin/consul agent -config-dir /etc/consul.d"
  LOGFILE="/var/log/consul/consul.log"
  exec $CMD >> $LOGFILE 2>&1
end script

post-start exec initctl emit consul-up

#kill signal INT

respawn
respawn limit 10 10

And finally, I use ubuntu 10.04 in lxc.

Benchmark

Benchmark steps (play many times, and make an in memory average ;))

for i in `seq 0 2` ;do ssh ansible-monitoring-es-$i.lxc  'sudo start consul'  ; done
for i in `seq 0 2` ;do ssh ansible-monitoring-es-$i.lxc  'sudo consul info | grep leader'  ; done
# Repeat the last command many time, like during 15sec
for i in `seq 0 2` ;do ssh ansible-monitoring-es-$i.lxc  'sudo stop consul'  ; done
sleep  $((  RANDOM % 10 )) # Don't know why, but consul seems to fails more often after a "bigger" sleep.

Results

For my initial test, I uncommented #kill signal INT from upstart script.

  • if the stop/start is fast enough (<3 sec) => OK
  • if the stop/start slow (>3 sec) => always broken.

For the second test, I commented #kill signal INT from upstart script.
then I played for i inseq 0 2;do ssh ansible-monitoring-es-$i.lxc 'sudo rm -rf /var/lib/consul' ; done

  • if the stop/start is fast enough (<3 sec) => OK
  • if the stop/start slow (>3 sec) => OK.

_Note:_ I also played the rm command severals time during the same setups, in order to start again in a clean state.

I have this configuration works for me, also I use SIGTERM to stop servers (clients still use SIGINT), it may not cover everybody's use case

{
  "server": true,
...
  "bootstrap_expect": 3,
  "retry_join": ["10.0.0.2", "10.0.0.3", "10.0.0.4"],
  "rejoin_after_leave": true
}

The init script should use different signal for server and clients.

Hi guys, I met this problem today in my 3 server agents cluster, just post my workaround here in case anybody meet same issue:

Suspect cause:
Someone start a consul agent on an unknown node as a server agent by mistake and then gone, leave its status as failed.
consul monitor prints "no leader cluster" even I forcely remove the error node.

ps: my consul agent version 0.5.2

Workaround:
the content of file raft/peer.json on 3 nodes are all: null

I replace the null with my 3 consul server's address:

["10.0.40.122:8300","10.0.40.123:8300","10.0.40.124:8300"]

and then restart the 3 consul server agent with arg -bootstrap-expect=3, the leader is elected finally.

@kongchen same issue here
I gracefully(SIGINT) restarted my 3 consul server cluster 1by1, IPs stayed the same
all nodes came back, same ips, same instances - all nodes are shown as alive in consul members output
NO leader :(

Hrm, interesting. During your restarts, did the cluster loose quorum at any
point? After restarting the node with a leader, another consul process
should have taken over leadership. At all times quorum must be maintained.
If quorum is lost for any reason, the manual bootstrapping process needs to
be followed to restore quorum.

The process should be:

1) verify the cluster is healthy (at least has quorum + 1 nodes online)
2) restart a follower node
3) verify that it re-enters the follower state and that the committee and
applied index entries are basically caught up
4) verify a leader is still in the cluster
5) move to the next node. Rinse repeat from steps 2 until 5 until the
entire cluster has been updated.

If you look through the logs, I'd wager quorum was lost at one point. ? -sc

Sean Chittenden
[email protected]

@sean, thanks for the explanation
It appears I simply restarted nodes out of the order
I maintain my configuration by ansible and it executes tasks in parallel on all hosts...
Same thing applies to nomad I guess(stop follower, wait for leader election, do what you need, bring back)
It would be super nice to have this all documented somewhere for both consul and nomad

To be clear, you could restart the leader first, there is nothing wrong
with that, however from a "measure twice, cut once" kind of conservatism
re: ops, I bounce the followers first because I know I have a binary and
data directory that is the latest and a working config.

As for ansible parallel executing jobs, that is almost certainly your
problem. You need to make the restarts sequential and you should add a
pause or poll the status URL until a leader is found before proceeding.
Cheers. -sc

Sean Chittenden
[email protected]

same problem here:
here are the output of "consul members":
db1 172.31.62.121:8301 alive server 0.6.3 2 prems
rabbit1 172.31.62.122:8301 alive server 0.6.3 2 prems
rabbit2 172.31.62.9:8301 alive server 0.6.3 2 prems

meaning all nodes are up
but the log gives me:
016/03/03 15:40:31 [ERR] agent: coordinate update error: No cluster leader
2016/03/03 15:40:36 [ERR] agent: failed to sync remote state: No cluster leader

example of a config file is:
{ "datacenter": "prems",
"data_dir": "/tmp/consul",
"log_level": "INFO",
"node_name": "db1",
"server": true,
"bind_addr": "172.31.62.121",
"start_join": ["172.31.62.9", "172.31.62.122"]
}

If there is a manual process needs to be taken - this is bad. why not do it automatically?
Isn't that a "shoot and forget" kind of sw?

In production you can't expect manual ordering of things...

same issue here, can't get the cluster to elect leader..

Same problem here. I have set up a repo with vagrant to reproduce this easily.
Just clone my project and then vagrant up then to connect vagrant ssh host-1
https://github.com/casertap/consul-multi-examples

I'm seeing the same issue:

consul members
2016/04/21 09:05:35 [INFO] agent.rpc: Accepted client: 127.0.0.1:37353
Node Address Status Type Build Protocol DC
agent-one 10.64.188.38:8301 alive server 0.6.4 2 dc1
agent-three 10.64.188.4:8301 alive server 0.6.4 2 dc1
agent-two 10.64.188.39:8301 alive server 0.6.4 2 dc1

2016/04/21 09:05:38 [ERR] agent: failed to sync remote state: No cluster leader
2016/04/21 09:05:42 [ERR] http: Request PUT /v1/session/create, error: No cluster leader from=127.0.0.1:42404
2016/04/21 09:05:44 [ERR] http: Request GET /v1/kv/vault/core/upgrade/1, error: No cluster leader from=127.0.0.1:42404
2016/04/21 09:05:49 [ERR] agent: coordinate update error: No cluster leader
2016/04/21 09:05:52 [ERR] http: Request PUT /v1/session/create, error: No cluster leader from=127.0.0.1:42407
2016/04/21 09:06:01 [ERR] agent: failed to sync remote state: No cluster leader
2016/04/21 09:06:02 [ERR] http: Request PUT /v1/session/create, error: No cluster leader from=127.0.0.1:42407
2016/04/21 09:06:12 [ERR] http: Request PUT /v1/session/create, error: No cluster leader from=127.0.0.1:42407
2016/04/21 09:06:14 [ERR] http: Request GET /v1/kv/vault/core/upgrade/1, error: No cluster leader from=127.0.0.1:42407
2016/04/21 09:06:17 [ERR] agent: coordinate update error: No cluster leader

I also manually updated the peers.json to include the raft peer members to no avail.

My config for one of the servers:

2016/04/21 09:08:22 [ERR] http: Request PUT /v1/session/create, error: No cluster leader from=127.0.0.1:42418

"bootstrap_expect": 3,
"ca_file": "/etc/consul.d/ssl/ca.cert",
"cert_file": "/etc/consul.d/ssl/consul.cert",
"client_addr": "0.0.0.0",
"data_dir": "/var/consul",
"encrypt": "*",
"key_file": "/etc/consul.d/ssl/consul.key",
"ports": {
"dns": 8600,
"http": 8500,
"rpc": 8400,
"serf_lan": 8301,
"serf_wan": 8302,
"server": 8300,
"https": 8501
},
"server": true,
"ui": true,
"ui_dir": "/srv/consul-ui/c

The two others omit the "bootstrap_expect": 3 from their config.

Any thoughts?

For what it's worth, I was having the election issue and with the info on this page I was able to get it straight:

Apr 30 10:26:10 swarm1 consul: 2016/04/30 10:26:10 [ERR] agent: failed to sync remote state: No cluster leader
Apr 30 10:26:31 swarm1 consul: 2016/04/30 10:26:31 [ERR] agent: coordinate update error: No cluster leader
Apr 30 10:26:39 swarm1 consul: 2016/04/30 10:26:39 [ERR] agent: failed to sync remote state: No cluster leader
Apr 30 10:26:49 swarm1 consul: 2016/04/30 10:26:49 [ERR] agent: coordinate update error: No cluster leader

Adding "rejoin_after_leave": true and putting "bootstrap_expect": 2, on only one server (I just have 2 nodes right now) was the first part and frankly the other part was deleting everything in /var/lib/consul "rm -rf /var/lib/consul/*" before restarting consul got me straight, which I realize is OK to do since I am not in production yet. I am using centos7 with consul as a systemd service.

My configs in case they help someone (I don't think having both the bind and advertise addresses lines in there is necessary but its working now so I am not messing with it):

[root@swarm1 ~]# cat /etc/consul/config.json
{
"retry_join": ["192.168.100.7"],
"server": true,
"bootstrap_expect": 2,
"data_dir": "/var/lib/consul",
"log_level": "INFO",
"enable_syslog": false,
"datacenter": "Morrisville",
"bind_addr": "192.168.100.6",
"advertise_addr": "192.168.100.6",
"rejoin_after_leave": true
}

[root@swarm2 ~]# cat /etc/consul/config.json
{
"retry_join": ["192.168.100.6"],
"server": true,
"data_dir": "/var/lib/consul",
"log_level": "INFO",
"enable_syslog": false,
"datacenter": "Morrisville",
"bind_addr": "192.168.100.7",
"advertise_addr": "192.168.100.7",
"rejoin_after_leave": true
}

Hi all,

Our production cluster experienced this issue Friday night. Unfortunately, because of my panicked state in the middle of the night and my desire to restore Consul services ASAP, I did not save any logs. I ended up killing the cluster, rebuilding a new one, and using consulate to restore our KV.

Bottom line, the one node failed, which I believe was the leader, and the cluster was unable to enter reelection. The logs I were seeing were a little different such that they were getting timeouts from the leader, which was already gone. I spun up a new 5th server to take its place, but the other 4 nodes were in such a bad state that the 5th never entered in as a server even though you could run consul members from this new server and see it has joined the cluster.

Another clue, the leader that failed was running on an instance that was pegged at 100% CPU. SSH'ing into the box was impossible. This is when I terminated it.

While looking at the logs from the other servers, they constantly spammed timeout and reelection failures over and over again.

consul force-leave did not resolve the issue. Is there an API call to force stop and restart leader election?

Next time it happens, if it ever happens again, i'll remind myself to get some logs.

Just as I posted I didn't have any logs, our test cluster failed for the same reason.

This is a 3 node cluster. One of nodes died recently, probably the leader.

Here are the logs from one of the two remaining servers:

2016/05/01 11:38:30 [INFO] consul: New leader elected: ip-10-185-3-3
    2016/05/01 11:38:33 [ERR] raft: Failed to make RequestVote RPC to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout
    2016/05/01 11:38:38 [ERR] raft: Failed to make RequestVote RPC to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout
==> Newer Consul version available: 0.6.4
    2016/05/01 16:43:29 [INFO] consul.fsm: snapshot created in 30.769µs
    2016/05/01 16:43:29 [INFO] raft: Starting snapshot up to 764724
    2016/05/01 16:43:29 [INFO] snapshot: Creating new snapshot at /data/raft/snapshots/6191-764724-1462121009525.tmp
    2016/05/01 16:43:29 [INFO] snapshot: reaping snapshot /data/raft/snapshots/6123-748311-1462078059119
    2016/05/01 16:43:29 [INFO] raft: Compacting logs from 746293 to 754484
    2016/05/01 16:43:30 [INFO] raft: Snapshot to 764724 complete
    2016/05/01 17:08:02 [INFO] serf: EventMemberLeave: ip-10-185-15-177 10.185.15.177
    2016/05/01 17:14:17 [INFO] serf: EventMemberJoin: ip-10-185-2-219 10.185.2.219
    2016/05/01 18:31:34 [INFO] serf: EventMemberReap: ip-10-185-21-70
    2016/05/01 19:03:22 [INFO] serf: EventMemberLeave: ip-10-185-76-58 10.185.76.58
    2016/05/01 19:09:38 [INFO] serf: EventMemberJoin: ip-10-185-89-235 10.185.89.235
    2016/05/01 21:13:36 [INFO] serf: EventMemberLeave: ip-10-185-74-17 10.185.74.17
    2016/05/01 21:20:26 [INFO] serf: EventMemberJoin: ip-10-185-92-101 10.185.92.101
    2016/05/01 22:42:52 [INFO] consul.fsm: snapshot created in 23.06µs
    2016/05/01 22:42:52 [INFO] raft: Starting snapshot up to 772959
    2016/05/01 22:42:52 [INFO] snapshot: Creating new snapshot at /data/raft/snapshots/6191-772959-1462142572767.tmp
    2016/05/01 22:42:52 [INFO] snapshot: reaping snapshot /data/raft/snapshots/6186-756532-1462099576362
    2016/05/01 22:42:52 [INFO] raft: Compacting logs from 754485 to 762719
    2016/05/01 22:42:53 [INFO] raft: Snapshot to 772959 complete
    2016/05/01 23:51:04 [INFO] serf: EventMemberReap: ip-10-185-31-127
    2016/05/02 00:59:57 [WARN] raft: Heartbeat timeout reached, starting election
    2016/05/02 00:59:57 [INFO] raft: Node at 10.185.3.2:8300 [Candidate] entering Candidate state
    2016/05/02 00:59:57 [INFO] raft: Node at 10.185.3.2:8300 [Follower] entering Follower state
    2016/05/02 00:59:58 [ERR] agent: coordinate update error: No cluster leader
    2016/05/02 00:59:59 [WARN] raft: Heartbeat timeout reached, starting election
    2016/05/02 00:59:59 [INFO] raft: Node at 10.185.3.2:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:00 [INFO] raft: Node at 10.185.3.2:8300 [Follower] entering Follower state
    2016/05/02 01:00:01 [INFO] memberlist: Suspect ip-10-185-80-227 has failed, no acks received
    2016/05/02 01:00:02 [WARN] raft: Heartbeat timeout reached, starting election
    2016/05/02 01:00:02 [INFO] raft: Node at 10.185.3.2:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:03 [WARN] raft: Election timeout reached, restarting election
    2016/05/02 01:00:03 [INFO] raft: Node at 10.185.3.2:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:04 [INFO] raft: Node at 10.185.3.2:8300 [Follower] entering Follower state
    2016/05/02 01:00:05 [WARN] raft: Heartbeat timeout reached, starting election
    2016/05/02 01:00:05 [INFO] raft: Node at 10.185.3.2:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:07 [WARN] raft: Election timeout reached, restarting election
    2016/05/02 01:00:07 [INFO] raft: Node at 10.185.3.2:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:07 [INFO] raft: Node at 10.185.3.2:8300 [Follower] entering Follower state
    2016/05/02 01:00:07 [INFO] memberlist: Marking ip-10-185-80-227 as failed, suspect timeout reached
    2016/05/02 01:00:07 [INFO] serf: EventMemberFailed: ip-10-185-80-227 10.185.80.227
    2016/05/02 01:00:07 [INFO] consul: removing LAN server ip-10-185-80-227 (Addr: 10.185.80.227:8300) (DC: test)
    2016/05/02 01:00:07 [ERR] raft: Failed to make RequestVote RPC to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout
    2016/05/02 01:00:07 [ERR] raft: Failed to make RequestVote RPC to 10.185.80.227:8300: read tcp 10.185.3.2:48315->10.185.80.227:8300: i/o timeout
    2016/05/02 01:00:08 [WARN] raft: Heartbeat timeout reached, starting election
    2016/05/02 01:00:08 [INFO] raft: Node at 10.185.3.2:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:09 [ERR] raft: Failed to make RequestVote RPC to 10.185.80.227:8300: dial tcp 10.185.80.227:8300: i/o timeout
    2016/05/02 01:00:09 [ERR] raft: Failed to make RequestVote RPC to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout
    2016/05/02 01:00:10 [WARN] raft: Election timeout reached, restarting election
    2016/05/02 01:00:10 [INFO] raft: Node at 10.185.3.2:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:10 [INFO] raft: Node at 10.185.3.2:8300 [Follower] entering Follower state
    2016/05/02 01:00:11 [WARN] raft: Heartbeat timeout reached, starting election
    2016/05/02 01:00:11 [INFO] raft: Node at 10.185.3.2:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:12 [ERR] raft: Failed to make RequestVote RPC to 10.185.80.227:8300: dial tcp 10.185.80.227:8300: i/o timeout
    2016/05/02 01:00:12 [ERR] raft: Failed to make RequestVote RPC to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout
    2016/05/02 01:00:13 [INFO] raft: Node at 10.185.3.2:8300 [Follower] entering Follower state
    2016/05/02 01:00:13 [ERR] raft: Failed to make RequestVote RPC to 10.185.80.227:8300: dial tcp 10.185.80.227:8300: i/o timeout
    2016/05/02 01:00:13 [ERR] raft: Failed to make RequestVote RPC to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout
    2016/05/02 01:00:14 [WARN] raft: Heartbeat timeout reached, starting election
    2016/05/02 01:00:14 [INFO] raft: Node at 10.185.3.2:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:15 [INFO] raft: Node at 10.185.3.2:8300 [Follower] entering Follower state
    2016/05/02 01:00:15 [ERR] raft: Failed to make RequestVote RPC to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout
    2016/05/02 01:00:15 [ERR] raft: Failed to make RequestVote RPC to 10.185.80.227:8300: dial tcp 10.185.80.227:8300: i/o timeout
    2016/05/02 01:00:16 [WARN] raft: Heartbeat timeout reached, starting election
    2016/05/02 01:00:16 [INFO] raft: Node at 10.185.3.2:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:17 [ERR] raft: Failed to make RequestVote RPC to 10.185.80.227:8300: dial tcp 10.185.80.227:8300: i/o timeout
    2016/05/02 01:00:17 [ERR] raft: Failed to make RequestVote RPC to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout
    2016/05/02 01:00:18 [INFO] raft: Node at 10.185.3.2:8300 [Follower] entering Follower state
    2016/05/02 01:00:18 [ERR] raft: Failed to make RequestVote RPC to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout
    2016/05/02 01:00:18 [ERR] raft: Failed to make RequestVote RPC to 10.185.80.227:8300: dial tcp 10.185.80.227:8300: i/o timeout
    2016/05/02 01:00:19 [WARN] raft: Heartbeat timeout reached, starting election
    2016/05/02 01:00:19 [INFO] raft: Node at 10.185.3.2:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:20 [ERR] raft: Failed to make RequestVote RPC to 10.185.80.227:8300: dial tcp 10.185.80.227:8300: i/o timeout
    2016/05/02 01:00:20 [ERR] raft: Failed to make RequestVote RPC to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout
    2016/05/02 01:00:20 [WARN] raft: Election timeout reached, restarting election
    2016/05/02 01:00:20 [INFO] raft: Node at 10.185.3.2:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:21 [INFO] raft: Node at 10.185.3.2:8300 [Follower] entering Follower state
    2016/05/02 01:00:21 [ERR] raft: Failed to make RequestVote RPC to 10.185.80.227:8300: dial tcp 10.185.80.227:8300: i/o timeout
    2016/05/02 01:00:21 [ERR] raft: Failed to make RequestVote RPC to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout
    2016/05/02 01:00:23 [WARN] raft: Heartbeat timeout reached, starting election
    2016/05/02 01:00:23 [INFO] raft: Node at 10.185.3.2:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:24 [WARN] raft: Election timeout reached, restarting election
    2016/05/02 01:00:24 [INFO] raft: Node at 10.185.3.2:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:24 [INFO] raft: Node at 10.185.3.2:8300 [Follower] entering Follower state
    2016/05/02 01:00:24 [ERR] raft: Failed to make RequestVote RPC to 10.185.80.227:8300: dial tcp 10.185.80.227:8300: i/o timeout
    2016/05/02 01:00:24 [ERR] raft: Failed to make RequestVote RPC to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout
    2016/05/02 01:00:26 [WARN] raft: Heartbeat timeout reached, starting election
    2016/05/02 01:00:26 [INFO] raft: Node at 10.185.3.2:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:26 [ERR] agent: coordinate update error: No cluster leader
    2016/05/02 01:00:26 [ERR] raft: Failed to make RequestVote RPC to 10.185.80.227:8300: dial tcp 10.185.80.227:8300: i/o timeout
    2016/05/02 01:00:26 [ERR] raft: Failed to make RequestVote RPC to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout
    2016/05/02 01:00:27 [INFO] raft: Node at 10.185.3.2:8300 [Follower] entering Follower state
    2016/05/02 01:00:28 [WARN] raft: Heartbeat timeout reached, starting election
    2016/05/02 01:00:28 [INFO] raft: Node at 10.185.3.2:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:29 [ERR] raft: Failed to make RequestVote RPC to 10.185.80.227:8300: dial tcp 10.185.80.227:8300: i/o timeout
    2016/05/02 01:00:29 [ERR] raft: Failed to make RequestVote RPC to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout
    2016/05/02 01:00:29 [INFO] raft: Node at 10.185.3.2:8300 [Follower] entering Follower state

Consul members from the remaining server:

/ # consul members
Node              Address             Status  Type    Build  Protocol  DC
ip-10-185-12-100  10.185.12.100:8301  alive   client  0.6.4  2         test
ip-10-185-12-109  10.185.12.109:8301  left    client  0.6.4  2         test
ip-10-185-13-11   10.185.13.11:8301   alive   client  0.6.4  2         test
ip-10-185-2-219   10.185.2.219:8301   alive   client  0.6.4  2         test
ip-10-185-21-111  10.185.21.111:8301  alive   client  0.6.4  2         test
ip-10-185-22-57   10.185.22.57:8301   alive   client  0.6.4  2         test
ip-10-185-24-127  10.185.24.127:8301  left    client  0.6.4  2         test
ip-10-185-27-54   10.185.27.54:8301   alive   client  0.6.4  2         test
ip-10-185-3-2     10.185.3.2:8301     alive   server  0.6.3  2         test
ip-10-185-3-3     10.185.3.3:8301     alive   server  0.6.3  2         test
ip-10-185-68-252  10.185.68.252:8301  alive   client  0.6.4  2         test
ip-10-185-69-115  10.185.69.115:8301  alive   client  0.6.4  2         test
ip-10-185-7-78    10.185.7.78:8301    alive   client  0.6.4  2         test
ip-10-185-74-17   10.185.74.17:8301   left    client  0.6.4  2         test
ip-10-185-76-58   10.185.76.58:8301   left    client  0.6.4  2         test
ip-10-185-77-174  10.185.77.174:8301  alive   client  0.6.4  2         test
ip-10-185-80-183  10.185.80.183:8301  alive   client  0.6.4  2         test
ip-10-185-80-227  10.185.80.227:8301  failed  server  0.6.3  2         test
ip-10-185-81-146  10.185.81.146:8301  alive   client  0.6.4  2         test
ip-10-185-82-50   10.185.82.50:8301   alive   server  0.6.3  2         test
ip-10-185-85-184  10.185.85.184:8301  alive   client  0.6.4  2         test
ip-10-185-87-209  10.185.87.209:8301  alive   client  0.6.4  2         test
ip-10-185-89-235  10.185.89.235:8301  left    client  0.6.4  2         test
ip-10-185-90-168  10.185.90.168:8301  alive   client  0.6.4  2         test
ip-10-185-92-101  10.185.92.101:8301  left    client  0.6.4  2         test
/ #

Logs from the other remaining server:

016/05/02 00:59:56 [WARN] raft: Failed to contact 10.185.80.227:8300 in 500.172734ms
    2016/05/02 00:59:56 [WARN] raft: Failed to contact quorum of nodes, stepping down
    2016/05/02 00:59:56 [INFO] raft: Node at 10.185.3.3:8300 [Follower] entering Follower state
    2016/05/02 00:59:56 [INFO] consul: cluster leadership lost
    2016/05/02 00:59:56 [INFO] raft: aborting pipeline replication to peer 10.185.80.227:8300
    2016/05/02 00:59:56 [INFO] raft: aborting pipeline replication to peer 10.185.3.2:8300
    2016/05/02 00:59:56 [WARN] consul.coordinate: Batch update failed: leadership lost while committing log
    2016/05/02 00:59:57 [WARN] raft: Heartbeat timeout reached, starting election
    2016/05/02 00:59:57 [INFO] raft: Node at 10.185.3.3:8300 [Candidate] entering Candidate state
    2016/05/02 00:59:59 [WARN] raft: Election timeout reached, restarting election
    2016/05/02 00:59:59 [INFO] raft: Node at 10.185.3.3:8300 [Candidate] entering Candidate state
    2016/05/02 00:59:59 [INFO] raft: Node at 10.185.3.3:8300 [Follower] entering Follower state
    2016/05/02 00:59:59 [ERR] http: Request GET /v1/health/state/any, error: No cluster leader from=127.0.0.1:37342
    2016/05/02 00:59:59 [ERR] http: Request GET /v1/catalog/services, error: No cluster leader from=127.0.0.1:37343
    2016/05/02 01:00:00 [WARN] raft: Heartbeat timeout reached, starting election
    2016/05/02 01:00:00 [INFO] raft: Node at 10.185.3.3:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:01 [ERR] raft: Failed to heartbeat to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout
    2016/05/02 01:00:02 [ERR] raft: Failed to get log at index 757732: log not found
    2016/05/02 01:00:02 [INFO] raft: Node at 10.185.3.3:8300 [Follower] entering Follower state
    2016/05/02 01:00:04 [WARN] raft: Heartbeat timeout reached, starting election
    2016/05/02 01:00:04 [INFO] raft: Node at 10.185.3.3:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:05 [WARN] raft: Election timeout reached, restarting election
    2016/05/02 01:00:05 [INFO] raft: Node at 10.185.3.3:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:05 [INFO] raft: Node at 10.185.3.3:8300 [Follower] entering Follower state
    2016/05/02 01:00:06 [ERR] raft: Failed to heartbeat to 10.185.80.227:8300: read tcp 10.185.3.3:50824->10.185.80.227:8300: i/o timeout
    2016/05/02 01:00:07 [WARN] raft: Heartbeat timeout reached, starting election
    2016/05/02 01:00:07 [INFO] raft: Node at 10.185.3.3:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:07 [INFO] serf: EventMemberFailed: ip-10-185-80-227 10.185.80.227
    2016/05/02 01:00:07 [INFO] consul: removing LAN server ip-10-185-80-227 (Addr: 10.185.80.227:8300) (DC: test)
    2016/05/02 01:00:07 [ERR] yamux: keepalive failed: i/o deadline reached
    2016/05/02 01:00:07 [ERR] consul.rpc: multiplex conn accept failed: keepalive timeout from=10.185.80.227:45647
    2016/05/02 01:00:07 [ERR] raft: Failed to make RequestVote RPC to 10.185.80.227:8300: dial tcp 10.185.80.227:8300: i/o timeout
    2016/05/02 01:00:07 [ERR] raft: Failed to make RequestVote RPC to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout
    2016/05/02 01:00:08 [ERR] agent: coordinate update error: No cluster leader
    2016/05/02 01:00:08 [INFO] raft: Node at 10.185.3.3:8300 [Follower] entering Follower state
    2016/05/02 01:00:09 [ERR] raft: Failed to make RequestVote RPC to 10.185.80.227:8300: dial tcp 10.185.80.227:8300: i/o timeout
    2016/05/02 01:00:09 [ERR] raft: Failed to make RequestVote RPC to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout
    2016/05/02 01:00:10 [WARN] raft: Heartbeat timeout reached, starting election
    2016/05/02 01:00:10 [INFO] raft: Node at 10.185.3.3:8300 [Candidate] entering Candidate state
    2016/05/02 01:00:10 [ERR] raft: Failed to make RequestVote RPC to 10.185.80.227:8300: dial tcp 10.185.80.227:8300: i/o timeout
    2016/05/02 01:00:10 [ERR] raft: Failed to make RequestVote RPC to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout
    2016/05/02 01:00:11 [INFO] raft: Node at 10.185.3.3:8300 [Follower] entering Follower state
    2016/05/02 01:00:12 [ERR] raft: Failed to install snapshot 6191-772920-1462142461838: dial tcp 10.185.78.24:8300: i/o timeout
    2016/05/02 01:00:12 [ERR] raft: Failed to send snapshot to 10.185.78.24:8300: dial tcp 10.185.78.24:8300: i/o timeout

Consul members from the other remaining server:

/ # consul members
Node              Address             Status  Type    Build  Protocol  DC
ip-10-185-12-100  10.185.12.100:8301  alive   client  0.6.4  2         test
ip-10-185-12-109  10.185.12.109:8301  left    client  0.6.4  2         test
ip-10-185-13-11   10.185.13.11:8301   alive   client  0.6.4  2         test
ip-10-185-2-219   10.185.2.219:8301   alive   client  0.6.4  2         test
ip-10-185-21-111  10.185.21.111:8301  alive   client  0.6.4  2         test
ip-10-185-22-57   10.185.22.57:8301   alive   client  0.6.4  2         test
ip-10-185-24-127  10.185.24.127:8301  left    client  0.6.4  2         test
ip-10-185-27-54   10.185.27.54:8301   alive   client  0.6.4  2         test
ip-10-185-3-2     10.185.3.2:8301     alive   server  0.6.3  2         test
ip-10-185-3-3     10.185.3.3:8301     alive   server  0.6.3  2         test
ip-10-185-68-252  10.185.68.252:8301  alive   client  0.6.4  2         test
ip-10-185-69-115  10.185.69.115:8301  alive   client  0.6.4  2         test
ip-10-185-7-78    10.185.7.78:8301    alive   client  0.6.4  2         test
ip-10-185-74-17   10.185.74.17:8301   left    client  0.6.4  2         test
ip-10-185-76-58   10.185.76.58:8301   left    client  0.6.4  2         test
ip-10-185-77-174  10.185.77.174:8301  alive   client  0.6.4  2         test
ip-10-185-80-183  10.185.80.183:8301  alive   client  0.6.4  2         test
ip-10-185-80-227  10.185.80.227:8301  failed  server  0.6.3  2         test
ip-10-185-81-146  10.185.81.146:8301  alive   client  0.6.4  2         test
ip-10-185-82-50   10.185.82.50:8301   alive   server  0.6.3  2         test
ip-10-185-85-184  10.185.85.184:8301  alive   client  0.6.4  2         test
ip-10-185-87-209  10.185.87.209:8301  alive   client  0.6.4  2         test
ip-10-185-89-235  10.185.89.235:8301  left    client  0.6.4  2         test
ip-10-185-90-168  10.185.90.168:8301  alive   client  0.6.4  2         test
ip-10-185-92-101  10.185.92.101:8301  left    client  0.6.4  2         test
/ # 

Here are the logs from the new instance that joined the cluster but is not part of the leader election:

2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-82-50 10.185.82.50
    2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-82-50.test 10.185.82.50
    2016/05/02 01:13:25 [INFO] raft: Node at 10.185.82.50:8300 [Follower] entering Follower state
    2016/05/02 01:13:25 [INFO] consul: adding LAN server ip-10-185-82-50 (Addr: 10.185.82.50:8300) (DC: test)
    2016/05/02 01:13:25 [INFO] consul: adding WAN server ip-10-185-82-50.test (Addr: 10.185.82.50:8300) (DC: test)
    2016/05/02 01:13:25 [INFO] agent: (LAN) joining: [consul.test.inspcloud.com]
    2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-68-252 10.185.68.252
    2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-3-3 10.185.3.3
    2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-13-11 10.185.13.11
    2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-92-101 10.185.92.101
    2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-85-184 10.185.85.184
    2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-89-235 10.185.89.235
    2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-2-219 10.185.2.219
    2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-3-2 10.185.3.2
    2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-24-127 10.185.24.127
    2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-77-174 10.185.77.174
    2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-12-100 10.185.12.100
    2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-90-168 10.185.90.168
    2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-27-54 10.185.27.54
    2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-87-209 10.185.87.209
    2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-7-78 10.185.7.78
    2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-21-111 10.185.21.111
    2016/05/02 01:13:25 [INFO] serf: EventMemberJoin: ip-10-185-12-109 10.185.12.109
    2016/05/02 01:13:25 [INFO] consul: adding LAN server ip-10-185-3-3 (Addr: 10.185.3.3:8300) (DC: test)
    2016/05/02 01:13:25 [INFO] consul: adding LAN server ip-10-185-3-2 (Addr: 10.185.3.2:8300) (DC: test)
    2016/05/02 01:13:25 [INFO] agent: (LAN) joined: 2 Err: <nil>
    2016/05/02 01:13:25 [ERR] agent: failed to sync remote state: No cluster leader
    2016/05/02 01:13:26 [WARN] raft: EnableSingleNode disabled, and no known peers. Aborting election.
    2016/05/02 01:13:27 [WARN] memberlist: Was able to reach ip-10-185-89-235 via TCP but not UDP, network may be misconfigured and not allowing bidirectional UDP
==> Newer Consul version available: 0.6.4
    2016/05/02 01:13:44 [ERR] agent: failed to sync remote state: No cluster leader
    2016/05/02 01:13:46 [ERR] agent: coordinate update error: No cluster leader
    2016/05/02 01:14:08 [ERR] agent: coordinate update error: No cluster leader
    2016/05/02 01:14:12 [ERR] agent: failed to sync remote state: No cluster leader
    2016/05/02 01:14:30 [ERR] agent: coordinate update error: No cluster leader
    2016/05/02 01:14:42 [ERR] agent: failed to sync remote state: No cluster leader
    2016/05/02 01:14:46 [ERR] agent: coordinate update error: No cluster leader
    2016/05/02 01:15:04 [ERR] agent: coordinate update error: No cluster leader
    2016/05/02 01:15:04 [ERR] agent: failed to sync remote state: No cluster leader
    2016/05/02 01:15:29 [ERR] agent: failed to sync remote state: No cluster leader
    2016/05/02 01:15:34 [ERR] agent: coordinate update error: No cluster leader
    2016/05/02 01:15:46 [ERR] agent: failed to sync remote state: No cluster leader
    2016/05/02 01:15:50 [ERR] agent: coordinate update error: No cluster leader
    2016/05/02 01:16:03 [ERR] agent: failed to sync remote state: No cluster leader
    2016/05/02 01:16:11 [ERR] agent: coordinate update error: No cluster leader
    2016/05/02 01:16:29 [ERR] agent: coordinate update error: No cluster leader

Consul members from the new server:

/ # consul members
Node              Address             Status  Type    Build  Protocol  DC
ip-10-185-12-100  10.185.12.100:8301  alive   client  0.6.4  2         test
ip-10-185-12-109  10.185.12.109:8301  left    client  0.6.4  2         test
ip-10-185-13-11   10.185.13.11:8301   alive   client  0.6.4  2         test
ip-10-185-2-219   10.185.2.219:8301   alive   client  0.6.4  2         test
ip-10-185-21-111  10.185.21.111:8301  alive   client  0.6.4  2         test
ip-10-185-22-57   10.185.22.57:8301   alive   client  0.6.4  2         test
ip-10-185-24-127  10.185.24.127:8301  left    client  0.6.4  2         test
ip-10-185-27-54   10.185.27.54:8301   alive   client  0.6.4  2         test
ip-10-185-3-2     10.185.3.2:8301     alive   server  0.6.3  2         test
ip-10-185-3-3     10.185.3.3:8301     alive   server  0.6.3  2         test
ip-10-185-68-252  10.185.68.252:8301  alive   client  0.6.4  2         test
ip-10-185-69-115  10.185.69.115:8301  alive   client  0.6.4  2         test
ip-10-185-7-78    10.185.7.78:8301    alive   client  0.6.4  2         test
ip-10-185-77-174  10.185.77.174:8301  alive   client  0.6.4  2         test
ip-10-185-80-183  10.185.80.183:8301  alive   client  0.6.4  2         test
ip-10-185-81-146  10.185.81.146:8301  alive   client  0.6.4  2         test
ip-10-185-82-50   10.185.82.50:8301   alive   server  0.6.3  2         test
ip-10-185-85-184  10.185.85.184:8301  alive   client  0.6.4  2         test
ip-10-185-87-209  10.185.87.209:8301  alive   client  0.6.4  2         test
ip-10-185-89-235  10.185.89.235:8301  left    client  0.6.4  2         test
ip-10-185-90-168  10.185.90.168:8301  alive   client  0.6.4  2         test
ip-10-185-92-101  10.185.92.101:8301  left    client  0.6.4  2         test
/ # 

As you can see, the remaining servers can see that 10.185.80.227 has failed and doesnt even see 10.185.78.24 , but it continually tries to make a RequestVote RPC to them, why?

I believe this is has started to happen because we greatly increased the variables we are using in Consul-Template. We will roll that back for now, but it seems with growth, that this will be an eventual problem.

Not sure if this helps, or is just more noise, but here it is anyway:

I have a 3 master, 4 agent setup.
One of the masters is in a different data center (physically), so I reckoned the -retry-wan-join would be the "right" thing, between the the servers.
All servers started up with bootstrap-expect 3.

This setup refused to work (cluster simply would NOT elect a leader).

Eventually I (because I was out of ideas) killed the one remote server (docker container stop and rm), and started it up again with just retry-join.
Voila - success.

So I'm wondering if bootstrap-expect and join-wan might be having some unexpected interactions? Or was my experience just a massive coincidence?

I am having the same issue with Consul 0.6.4. After playing with it for a couple of days I found that the easiest way to fix this is:

  1. login to one of the server node.
  2. edit consul.conf
  3. remove the bootstrap_expect key
  4. add bootstrap: true
  5. restart consul
  6. This should elect a leader in the cluster. once leader is elected, remove bootstrap key and add bootstrap_expect key again.

On my Ansible playbook, I have a shell task that deals with this problem:
result=$(curl http://localhost:8500/v1/status/leader?token={TOKEN})
if [ -z "$result" -o "$result" == '""' ]; then
jq 'del(.bootstrap_expect) | .bootstrap=true' /etc/consul.conf > /tmp/consul.conf
mv -f /etc/consul.conf /etc/consul.conf.bak
mv -f /tmp/consul.conf /etc/consul.conf
service consul restart
mv -f /etc/consul.conf.bak /etc/consul.conf
fi

Hope this helps.

I just resolved this issue in a 3 node cluster on 0.6.4 with this process (assuming your service is called 'consul-server') ...

On 2 of the 3 nodes

  1. $ consul leave
  2. $ rm -rf /var/consul/*
  3. $ sudo service consul-service restart
  4. $ consul join <3rd node IP address>

On the 3rd node

  1. $ consul leave
  2. $ rm -rf /var/consul/*
  3. $ sudo service consul-service restart
  4. $ consul join <1st node IP address>

The service recovered after the action on nodes 1 and 2 but the log on 3rd was populated by and endless sequence of

Sep 16 11:33:55 ip-10-75-69-38 consul[5498]: raft: Skipping application of old log: 151718
Sep 16 11:33:55 ip-10-75-69-38 consul[5498]: raft: Skipping application of old log: 151719
Sep 16 11:33:55 ip-10-75-69-38 consul[5498]: raft: Skipping application of old log: 151723
Sep 16 11:33:55 ip-10-75-69-38 consul[5498]: raft: Skipping application of old log: 151724
Sep 16 11:33:55 ip-10-75-69-38 consul[5498]: raft: Skipping application of old log: 151725
Sep 16 11:33:55 ip-10-75-69-38 consul[5498]: raft: Skipping application of old log: 151728
Sep 16 11:33:55 ip-10-75-69-38 consul[5498]: raft: Skipping application of old log: 151730

The action on node 3 resolved this.

@michaelmcguinness

I was having to do that on all my nodes after un-graceful restarts. So I took it one step further and added this to my consul.service systemd file and I have had no trouble since:

ExecStartPre=/usr/bin/bash -c '/usr/bin/rm -rf /var/lib/consul/*'

the full contents of my systemd service file are:
`[root@util-swarm-000 ~]# cat /etc/systemd/system/consul.service
[Unit]
Description=Consul is a tool for service discovery and configuration.
Documentation=https://consul.io
After=network-online.target
Wants=network-online.target
After=rsyslog.service
Wants=rsyslog.service

[Service]
User=consul
Group=consul
EnvironmentFile=/etc/sysconfig/consul
ExecStartPre=/usr/bin/bash -c '/usr/bin/rm -rf /var/lib/consul/*'
ExecStart=/usr/bin/consul agent -config-dir=${CONFIG_DIR} -data-dir=${DATA_DIR} $EXTRA_OPTS
ExecReload=-/bin/kill -HUP $MAINPID
KillSignal=SIGINT
LimitNOFILE=65536
Restart=on-success
IgnoreSIGPIPE=yes

[Install]
WantedBy=multi-user.target
`

Consul 0.7.0 provides a new diagnostic command:

consul operator raft --list-peers --stale=true
Operator "raft" subcommand failed: Unexpected response code: 500 (No cluster leader)

@markhu

That's cool but the issue is essentially we want consul to elect a cluster leader in the event of a graceful or "un-graceful" restart of a server.

Yes. I am also hitting this issue.

 docker version
Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

i used below code to bring up my cluster master node

 "sudo docker run -d  --restart=unless-stopped -p 8500:8500 --name=consul progrium/consul -server -bootstrap"
 "sudo docker run -d  --restart=unless-stopped -p 4000:4000 swarm manage -H :4000 --replication --advertise ${aws_instance.swarm_master.0.private_ip}:4000 consul://${aws_instance.swarm_master.0.private_ip}:8500"

i reboot machine hosting swam cluster master. i see that swarm process is running in docker. But it not connectable

root@XXXXXXX:~# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                                                            NAMES
32c227fbab03        swarm               "/swarm join --advert"   13 hours ago        Up 13 hours         2375/tcp                                                                         romantic_joliot
f8427a36e1f4        swarm               "/swarm manage -H :40"   13 hours ago        Up 13 hours         2375/tcp, 0.0.0.0:4000->4000/tcp                                                 backstabbing_hoover
44df7d59752d        progrium/consul     "/bin/start -server -"   13 hours ago        Up 13 hours         53/tcp, 53/udp, 8300-8302/tcp, 8400/tcp, 8301-8302/udp, 0.0.0.0:8500->8500/tcp   consul

I cant list members

root@XXXXXXXX:~#  docker  run  swarm list consul://$IP:8500
time="2016-10-09T06:46:17Z" level=info msg="Initializing discovery without TLS"
2016/10/09 06:46:17 Unexpected response code: 500

Then i saw docker logs for the swarm container

root@XXXXXXXX:~# docker logs f8427a36e1f4
time="2016-10-09T06:35:34Z" level=info msg="Leader Election: Cluster leadership    lost"
time="2016-10-09T06:35:34Z" level=error msg="Unexpected response code: 500 (No cluster leader)"
time="2016-10-09T06:35:41Z" level=error msg="Discovery error: Unexpected response code: 500"
time="2016-10-09T06:35:41Z" level=error msg="Discovery error: Unexpected response code: 500 (No cluster leader)"
time="2016-10-09T06:35:41Z" level=error msg="Discovery error: Unexpected watch error"

Is there a workaround to overcome this issue? If I restart my vm for some reason all containers hosted on this swarm cant run as the swarm is down

Thanks
Shankar KC

I also just hit this same issue, which caused a downtime in our services. also using bootstrap_expect=3. what seemed to be the cause here is that a server was switched off over night, so the cluster was degraded. I tried adding more server nodes, but that did not help at all (they all went "blabla no cluster leader"). restarting the failed client did it.

the original cluster had 3 nodes, degraded to 2, which should be just fine (and was before). now the cluster has 5 nodes, which will degrade to 4 this night, and we shall see.

btw, the "new debug command" also did everything but work for me - first the "no cluster leader" error, now this: Operator "raft" subcommand failed: Unexpected response code: 500 (rpc error: rpc: can't find service Operator.RaftGetConfiguration) (consul v0.7.1)

btw, the "new debug command" also did everything but work for me - first the "no cluster leader" error, now this: Operator "raft" subcommand failed: Unexpected response code: 500 (rpc error: rpc: can't find service Operator.RaftGetConfiguration) (consul v0.7.1)

Are you running Consul 0.7.0+ on all your servers or is it possible you have a mix of versions?

all consul masters are 0.7.1, now all consuls probably are. this morning only the masters.

Ok - that RPC error looks like it may have talked to an old server. I'll need to look at the stale issue as that seems to have been reported by you and another person.

If a single server of a 3 server cluster going down causes an outage that's likely from a stale peer in the Raft configuration. You can use consul force-leave <node name> command to kick it if it has recently been removed (but still shows in consul members, or the consul operator raft -remove-peer -address="<ip:port>" if it's stale an no longer known in consul members. consul operator raft -list-peers should let you inspect the configuration to see if this is the case.

Our staging environment went down in a similar fashion; here's my write-up https://gist.github.com/haf/1983206cf11846f6f3f291f78acee5cf

Raising my hand as another one hitting this issue. We had the same thing, where the current leader aws node died, a new one was spun up, and nothing converged. We ran into the same issue trying to manually fix with the "no leader" making it difficult to find out which raft node is dead and remove it. Also tried doing the peers.json recovery, and that failed because the server wouldn't even start with that file in a format as documented. ): Our ultimate solution/fix was to blow away all 3 nodes and let it bootstrap from scratch. This left it disconnected from all the agents, but doing a join from to the agents that were all still part of the old cluster, brought everything back into sync. (Services anyway, didn't check KV data) Our cluster is all 0.7.2+. We're still in test mode, so no production impact from it, just some slowed development cycles and an injection of a yellow flag to the consul solution rollout.

This is very easy to reproduce. Setup a new 3 node cluster with --bootstrap 3, wait until it's all converged with a leader, then kill off the leader (terminate the instance). The cluster will never recover.

Isn't this the most basic feature consul should support? Unbelievable it's still not working. Any workarounds?

that isso good to hear :) . our workaround is to scratch consul data dirs on EVERY master host, and re-run puppet which then re-sets consul. our set-up automation can handle that pretty well, without this we'd have been lost a couple of times.

Hi We are also facing this issue(no leader elected after system restart). However our consul instance is running in a docker container on multiple EC2 instances. Can any one suggest what is the simple workaround in case of dockerization ?

Closing this out now that Autopilot is available in 0.8.x - https://www.consul.io/docs/guides/autopilot.html.

We've also (in 0.7.x):

  • Changed the default for leave_on_terminate and skip_leave_on_interrupt for servers to not make them leave when shut down, which is safer by default.
  • Removed the peer store, so there's not the confusing null.
  • Improved the outage documentation with more details and a peers.json example.

We also (in 0.7.x) made this change:

Servers will now abort bootstrapping if they detect an existing cluster with configured Raft peers. This will help prevent safe but spurious leader elections when introducing new nodes with bootstrap_expect enabled into an existing cluster. [GH-2319]

@slackpad In our situation, we have 3-member consul deployed on Kubernetes cluster. Each member is in its own pod. We've recently made changes into our cluster and did a rolling-update. After that the 3 consuls are running fine as per status in Kubernetes but looking at the logs on each member it says no cluster leader. I am able to list all members with consul members (pls see below)

Node Address Status Type Build Protocol DC
consul-consul-0 100.96.2.7:8301 alive server 0.7.5 2 dc1
consul-consul-1 100.96.1.3:8301 alive server 0.7.5 2 dc1
consul-consul-2 100.96.3.6:8301 alive server 0.7.5 2 dc1

Should I try the peers.json file?

Hi @edbergavera if the servers are trying to elect a leader and there are dead servers in the quorum from the rolling update that's preventing it, then you would need to use peers.json per https://www.consul.io/docs/guides/outage.html#manual-recovery-using-peers-json.

Hello James,

I did follow the instructions described in outage document but to no avail.
I think this is specific to Kubenetes pod issue with Consul. So, I ended up
re-creating the cluster in Kubernetes and restored KVs and that worked.

Thank you for your suggestion and looking into this.

On Thu, May 18, 2017 at 11:27 PM, James Phillips notifications@github.com
wrote:

Hi @edbergavera https://github.com/edbergavera if the servers are
trying to elect a leader and there are dead servers in the quorum from the
rolling update that's preventing it, then you would need to use peers.json
per https://www.consul.io/docs/guides/outage.html#manual-
recovery-using-peers-json.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/hashicorp/consul/issues/993#issuecomment-302439998,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABhWeY8zsS3IDPVUk-KQDGxPO2lsE48fks5r7GN0gaJpZM4E0aKV
.

--
Eduardo D. Bergavera, Jr.
Linux Admin
Email: [email protected]
OpenID: https://launchpad.net/~edbergavera
Github: https://github.com/edbergavera

Having this exact same issue with 0.8.1. A new leader is elected and then leadership is lost, election is restarted. Was able to recover using peers.json, praise the lord.

Experienced this issue when I activated raft_protocol version 3, reverting to raft_protocol version 2 fixed the issue. But still investigating why switch to v3 triggered the issue.

Cluster of 5 running 0.9.0 will not elect a leader w/raft_protocol = 3, but will elect with raft_protocol = 2.

working config:
consul.json:
{
"bootstrap_expect": 5,
"retry_join": [ "a.a.a.a", "b.b.b.b", "c.c.c.c", "d.d.d.d", "e.e.e.e"],
"server": true,
"rejoin_after_leave": true,
"enable_syslog": true,
"data_dir": "/var/consul/data",
"datacenter": "us-east-1",
"recursor": "10.0.0.2",
"advertise_addrs": {
"serf_lan": "a.a.a.a:8301",
"serf_wan": "a.a.a.a:8302",
"rpc": "a.a.a.a:8300"
},
"bind_addr": "z.z.z.z",
"encrypt": "NANANANA",
"ui": true,
"encrypt_verify_incoming": true,
"encrypt_verify_outgoing": true,
"key_file": "/var/consul/data/pki/private/server.key",
"cert_file": "/var/consul/data/pki/certs/server.crt",
"ca_file": "/var/consul/data/pki/certs/ca.crt",
"raft_protocol": 2,
"protocol": 3
}

Hi @dgulinobw can you please open a new issue and include a gist with the server logs when you see this? Thanks!

I also ran into this on a new consul cluster running 1.6.0

As soon as I made sure all the consul servers had both a default token and an agent token, the cluster was able to select a leader. Just having a default _or_ agent token was insufficient.

token=11111111111
export CONSUL_HTTP_TOKEN=00000000000000
consul acl set-agent-token default $token
consul acl set-agent-token agent $token
cat /opt/consul/acl-tokens.json

@spuder Interesting. What was your token policy for $token that you set for both default and agent?

Was this page helpful?
0 / 5 - 0 ratings