Etcd: Node are not showing in etcd cluster

Created on 4 Oct 2019 · 5Comments · Source: etcd-io/etcd

Hello Team,

Currently we are using etcd with single node. But now we are planning to use with 3 etcd node. So we are trying to setup etcd cluster with 3 node. But we are facing very strange issue.

We are using the below config in /etc/default/etcd

ETCD_NAME="slave1"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.56.9:2380"
ETCD_LISTEN_PEER_URLS="http://192.168.56.9:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.56.9:2379,http://127.0.0.1:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.56.9:2379"
ETCD_INITIAL_CLUSTER_TOKEN="cluster1"
ETCD_INITIAL_CLUSTER="slave1=http://192.168.56.9:2380,master1=http://192.168.56.19:2380,slave2=http://192.168.56.11:2380"
ETCD_INITIAL_CLUSTER_STATE="new"

We have used similar config over all 3 node after changing the value of parameters according to etcd host and start the service on 3 node using systemctl start etcd2.service. The service is started successfully on all 3 node.

But when we are checking the member and cluster-health only single node is listed on which node we run the command.

But when we are running these parameters using command line on all 3 node the all the node are joining the cluster and listed in member as well as cluster health.

etcd --name slave1 --initial-advertise-peer-urls http://192.168.56.9:2380 --listen-peer-urls http://192.168.56.9:2380 --listen-client-urls http://192.168.56.9:2379,http://127.0.0.1:2379 --advertise-client-urls http://192.168.56.9:2379 --initial-cluster-token cluster1 --initial-cluster slave1=http://192.168.56.9:2380,master1=http://192.168.56.19:2380,slave2=http://192.168.56.11:2380 --initial-cluster-state new

ETCD version 2.5.5

Can you please help me to fix this strange issue?

Why the same parameters are working with CLI mode but when put into config file its not working.

Please help me its last thing which become roadblock in my project.

Any help will be appreciated.

Thanks.

Source

Tekchanddagar

Most helpful comment

@jingyih,
I got the answer of my question from below link:

https://github.com/etcd-io/etcd/blob/master/Documentation/faq.md#what-is-failure-tolerance

Thank you for your helping hands.

Tekchanddagar on 7 Oct 2019

👍3

All 5 comments

Hello Team,
Can you please help me on above issue?
Thanks.

Tekchanddagar on 4 Oct 2019

Could you provide server log? There are couple possible reasons. For example, if the data directory exists, server will not use cluster information provided in the flags. Instead it will just continue to use its previous identity in the data store.

BTW, the officially supported version is 3.2 and above, please try to use newer versions if possible.

jingyih on 4 Oct 2019

❤1 👍1

@jingyih, Thank you for your response.

if the data directory exists, server will not use cluster information provided in the flags. Instead it will just continue to use its previous identity in the data store.

It was seems the issue.

I have removed the data from /var/lib/etcd/default directory from all the 3 node and then restart the etcd service on all node. Now all the node are able to join the cluster and its working fine.

Now i have more question or may be last:

When i am stopping the etcd service over currently master node then remaining two node choose the master and cluster working fine.

But when i stopped the etcd service on newly elected master node then only one node left in the cluster and it promote itself as master logs also showing same.

Oct  7 11:32:41 slave1 patroni[1966]: etcd.EtcdConnectionFailed: No more machines in the cluster
Oct  7 11:32:41 slave1 patroni[1966]: 2019-10-07 11:32:41,669 ERROR: failed to update leader lock
Oct  7 11:32:41 slave1 patroni[1966]: 2019-10-07 11:32:41,670 INFO: Selected new etcd server http://192.168.99.4:2379
Oct  7 11:32:43 slave1 patroni[1966]: 2019-10-07 11:32:43,348 WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=0, status=None)) after connection broken by 'ReadTimeoutError("HTTPConnectionPool(host='192.168.99.4', port=2379): Read timed out. (read timeout=1.6666666666666667)",)': /v2/keys/db/tcd/members/slave1
Oct  7 11:32:45 slave1 patroni[1966]: 2019-10-07 11:32:45,024 ERROR: Request to server http://192.168.99.4:2379 failed: MaxRetryError('HTTPConnectionPool(host=\'192.168.99.4\', port=2379): Max retries exceeded with url: /v2/keys/db/tcd/members/slave1 (Caused by ReadTimeoutError("HTTPConnectionPool(host=\'192.168.99.4\', port=2379): Read timed out. (read timeout=1.6666666666666667)",))',)

But my postgresql database (using patroni+etcd) showing down on haproxy dashboard. The database show down when patroni unable to found any etcd node.

When we start etcd service on any one node from two which is down then all thing started working fine again.

From above use case i am assuming that etcd cluster need atleast 2 etcd node should be up and running fine from 3 node and only one etcd node can be down at a time.

I am right or not?

Thanks.

Tekchanddagar on 7 Oct 2019

@jingyih,
I got the answer of my question from below link:

https://github.com/etcd-io/etcd/blob/master/Documentation/faq.md#what-is-failure-tolerance

Thank you for your helping hands.

Tekchanddagar on 7 Oct 2019

👍3

I am closing the issue as it's now resolved. Thanks!