In clustered setup it should be possible to remove "unstarted" (i.e. uninitialised) member.
Also one might argue that cluster should not become "unhealthy" immediately after etcdctl member add but only after new member is actually connected/registered and went away...
At the moment 2.2.0 do not allow to remove new member even if it never connected.
Why do you think you cannot remove an unstarted member?
How?
I add second member (etcdctl member add);
verify its ID (etcdctl member list):
1ac2d85aedf46b74[unstarted]
Then etcdctl member remove 1ac2d85aedf46b74 return
Recieved an error trying to remove member 1ac2d85aedf46b74: client: etcd cluster is unavailable or misconfigured
As soon as I start new member command etcdctl member remove works but not while newly-created member is in "unstarted" state...
@onlyjob After you added the second member, the quorum becomes two. You have to start the 2nd member to process any further operation.
etcd can remove a member and it does not care about if it started. You need to ensure your etcd cluster is heathy and we cannot help on that. :(
How do you recommend to recover if second member can not join?
@onlyjob What do you mean by it cannot join? It cannot start? Or your reconfiguration command gave to etcd is wrong?
As reported in #3491, I believe member add can incorrectly register member due to wrong defaults so member fails to start with error validating peerURLs, unmatched member while checking PeerURLs...
@onlyjob See https://github.com/coreos/etcd/blob/master/Documentation/admin_guide.md#disaster-recovery.
We strongly encourage you not to start a 1 member cluster anyway.
We strongly encourage you not to start a 1 member cluster anyway.
Why don't you get it that I'm trying to remove and re-add second member?? Something went wrong (details in #3491) and I want to re-create new member properly.
What do you mean by "reconfiguration command"?
@onlyjob You broke the cluster and it is not recoverable. I have replied you with a doc about how you can recover from this.
See https://github.com/coreos/etcd/blob/master/Documentation/admin_guide.md#disaster-recovery.
What do you mean by "reconfiguration command"?
Your add member command.
Thanks, --force-new-cluster works but IMHO that's overly difficult recovery for unsuccessful add of second member...
@onlyjob
Glad that works for you.
that's overly difficult recovery for unsuccessful add of second member...
Your case falls into the quorum loss case. etcd will not try to solve a lot of special cases.
I totally agree that even if for quorum loss, this experience is not optimal.
But that is a hard decision we have made, if you want to learn more you can read https://github.com/coreos/etcd/blob/master/Documentation/runtime-reconf-design.md#permanent-loss-of-quorum-requires-new-cluster.
We basically have to tradeoff some ux for consistency/safety. That is also why I do not suggest you to start a one member cluster at the beginning.
In the future, we might reduce the steps to recover the failed cluster.
Your case falls into the quorum loss case.
That's not how I see it. To me it is a case of initial setup of a cluster.
Naturally you start with deploying first member and then add other members one by one.
How can we talk about quorum when second member hasn't been seen yet? Quorum is not _lost_, it was never achieved in first place.
That is also why I do not suggest you to start a one member cluster at the beginning.
So is it wrong to bootstrap cluster with several members by adding members one by one?
When etcd is first started it assumes single/standalone mode by default. Do you mean it is recommended to pass multiple values in ETCD_INITIAL_CLUSTER instead of adding members one by one?
How can we talk about quorum when second member hasn't been seen yet? Quorum is not lost, it was never achieved in first place.
No. After you submitted the reconfiguration command, the quorum is changed.
That is how our raft works. I am not going to delve into details here.
Do you mean it is recommended to pass multiple values in ETCD_INITIAL_CLUSTER instead of adding members one by one?
Yes. If you could.
It makes sense. Thank you.
This documentation has changed. This is now the new URL: https://github.com/coreos/etcd/blob/master/Documentation/op-guide/recovery.md
Most helpful comment
I add second member (
etcdctl member add);verify its ID (
etcdctl member list):Then
etcdctl member remove 1ac2d85aedf46b74returnAs soon as I start new member command
etcdctl member removeworks but not while newly-created member is in "unstarted" state...