In thesis section 4.2.1 Catching up new servers, it says
When a server is added to the cluster, it typically will not store any log entries. If it is added to the
cluster in this state, its log could take quite a while to catch up to the leader’s, and during this time,
the cluster is more vulnerable to unavailability. For example, a three-server cluster can normally
tolerate one failure with no loss in availability. However, if a fourth server with an empty log is
added to the same cluster and one of the original three servers fails, the cluster will be temporarily
unable to commit new entries (see Figure 4.4(a)). Another availability issue can occur if many new
servers are added to a cluster in quick succession, where the new servers are needed to form a
majority of the cluster (see Figure 4.4(b)). In both cases, until the new servers’ logs were caught up
to the leader’s, the clusters would be unavailable.
etcd behave like this currently. The solution is the following
In order to avoid availability gaps, Raft introduces an additional phase before the configuration
change, in which a new server joins the cluster as a non-voting member. The leader replicates
log entries to it, but it is not yet counted towards majorities for voting or commitment purposes.
Once the new server has caught up with the rest of the cluster, the reconfiguration can proceed as
described above. (The mechanism to support non-voting servers can also be useful in other contexts;
for example, it can be used to replicate the state to a large number of servers, which can serve read-
only requests with relaxed consistency.)
both https://github.com/logcabin/logcabin and https://github.com/hashicorp/raft/blob/library-v2-stage-one/membership.md
implemented non-voting member. In hashicorp/raft
We propose to re-define a configuration as a set of servers, where each server includes an address (as it does today) and a mode that is either:
Voter: a server whose vote is counted in elections and whose match index is used in advancing the leader's commit index.
Nonvoter: a server that receives log entries but is not considered for elections or commitment purposes.
Staging: a server that acts like a nonvoter with one exception: once a staging server receives enough log entries to catch up sufficiently to the leader's log,
the leader will invoke a membership change to change the staging server to a voter.
| |
| Start -> +--------+ |
| ,------<------------| | |
| / | absent | |
| / RemovePeer--> | | <---RemovePeer |
| / | +--------+ \ |
| / | | \ |
| AddNonvoter | AddVoter \ |
| | ,->---' `--<-. | \ |
| v / \ v \ |
| +----------+ +----------+ +----------+ |
| | | ---AddVoter--> | | -log caught up --> | | |
| | nonvoter | | staging | | voter | |
| | | <-DemoteVoter- | | ,- | | |
| +----------+ \ +----------+ / +----------+ |
| \ / |
| `--------------<---------------' |
| |
+-----------------------------------------------------------------------------+
md5-b7da125f9c210ba907ccd15f1a4537cf
enum SuffrageState {
Nonvoter = 0;
Staging = 1;
Voter = 2;
}
then add SuffrageState to Raft (for node itself), and Progress (for other nodes),
persist SuffrageState to Snapshot, ConfState...
mark
/cc @bdarnell
It seems fine to add this feature.
AddNonVoter should be fairly easy to add. Probably adding a new node state is enough (so we wont count it in voting and it wont vote). Not sure if we want to implement the demoteVoter initially though.
fixed by #8751
Does this solve the unavailability problem if currently the number of nodes is equal to majority and a new node is added or was down for a while and comes back up, while at the same time a current node goes down?
Most helpful comment
fixed by #8751