Etcd: client is unaware of unhealthy server when getting response

Created on 12 Aug 2015 · 10Comments · Source: etcd-io/etcd

Because client pins to one endpoint if possible, it will always make requests to one server. If the target server is isolated from etcd cluster and becomes unhealthy for a long time, the client will be outdated too for that long time. We don't provide way to handle it or warn it today.

Source

yichengq

Most helpful comment

The readme still says reads can be satisfied by unhealthy member, and references this issue. Is that still the case? If not, perhaps amend the readme?

immesys on 26 Jul 2016

👍3

All 10 comments

/cc @mx2323 You might want to follow on this.

xiang90 on 25 Aug 2015

Is this still an issue?

Raft says:

The leader handles all client requests (if a client contacts a follower, the follower redirects it to the leader).

Can we use the same mechanism to notify client at least the target server is not reachable? When used with proxy, I see the error message of proxy endpoint is not available when the proxy is wrongly configured. Does etcd handle differently in this case?

Thanks,

gyuho on 12 Oct 2015

@gyuho For watch and read, it gets the data from local store. It is allowed/normal for clients to get data from local store. We just don't provide way to handle or warn about unhealthy server today.

yichengq on 13 Oct 2015

I see. Thanks,

gyuho on 14 Oct 2015

@yichengq @gyuho if a client turns on client.GetOption.Quorum, reading from leader's store is enforced so the client can be aware of unhealthy server.

For example,

create a cluster with 3 nodes
create a key
kill 2 nodes
get the key with etcdctl get with the option --quorum (which turns on client.GetOption.Quorum)
in such a sequence, the get request in 4th step will fail because of losing quorum. So I think current etcd already has a solution for this issue.

mitake on 8 Dec 2015

The issue isn't on reads, it's when a watch is registered on an etcd server, but the server on which the watch is sent to is partitioned from the rest of the etcd cluster. In these situations, the client will sit with a watch untriggered when it could have gone to another server and gotten the new update.

mx2323 on 8 Dec 2015

@mx2323 thanks for your pointing, I couldn't consider the case of watch.

mitake on 14 Dec 2015

closed by https://github.com/coreos/etcd/pull/5332

xiang90 on 13 May 2016

The readme still says reads can be satisfied by unhealthy member, and references this issue. Is that still the case? If not, perhaps amend the readme?

immesys on 26 Jul 2016

👍3

Readme still seems to be out of date.