Etcd: Etcd watch recovery behaves differently on network disconnect vs VPN connectivity disconnect

Created on 27 Sep 2017  路  4Comments  路  Source: etcd-io/etcd

I am running into an odd issue with etcd watches and not able to understand the behavior that we are seeing. We connect to etcd over vpn and our etcd sits behind a HA proxy, and are observing these 2 scenarios:

Open etcdclt watch <path> in one window and loop etcdctl put <path> in another window and put messages and see that the watch is able to get these messages

  1. scenario A: disconnect the VPN connectivity from VPN client
    watch remains connected even after VPN connection is disconnected. Reconnect the VPN, and try putting more messages. The watch does not seem to be reporting any of the messages though get confirms that the messages are there.

  2. scenario B: switch off internet connectivity by turning off WiFi
    watch remains connected even after network is disconnected. Reconnect the network, and try putting more messages. In this case the watch reports all the new messages that get put.

Most helpful comment

@gyuho I have tried to spend some more time understanding this and I am still confused. The official golang etcd v3 client, does it auto reconnect on watch failure due to connectivity issues or does the client code needs to handle that.

sorry for being persistent with the question, wasn't able to find any good guidance on this. I see this comment here, which broadly talks about handling errors, https://github.com/coreos/etcd/issues/8495#issuecomment-327241587 but are there any graceeful examples of how to reconnect the watch

All 4 comments

etcd watch API is not meant for detecting connection issues.
You can use Session.

Disconnect should be handled in client balancer layer.
We've added HTTP/2 keepalive and client balancer health checking.
Once stabilized, it will be released in v3.3.

@gyuho Fair enough. How about if we use the official golang client. I am broadly under the assumption that the watch blocks till the connection is reestablished, broadly from this comment https://github.com/coreos/etcd/issues/7860#issuecomment-317368084

we use the golang client so I am assuming these are already handled for us
Disconnect should be handled in client balancer layer. We've added HTTP/2 keepalive and client balancer health checking. Once stabilized, it will be released in v3.3.

Alternatively can you provide examples, or refer me to some sort of documentation.

my understanding is that with the go client the watch remains blocked on connection failure and then can continue back when the connection comes back. you think this is a valid ?

@atinsood If watch is issued with WithCreatedNotify https://godoc.org/github.com/coreos/etcd/clientv3#WithCreatedNotify, you can wait for watch create event, thus waiting for initial connection. And https://godoc.org/github.com/coreos/etcd/clientv3#WithProgressNotify.

I suggest reading this thread https://github.com/coreos/etcd/issues/8495#issuecomment-327242932.

@gyuho I have tried to spend some more time understanding this and I am still confused. The official golang etcd v3 client, does it auto reconnect on watch failure due to connectivity issues or does the client code needs to handle that.

sorry for being persistent with the question, wasn't able to find any good guidance on this. I see this comment here, which broadly talks about handling errors, https://github.com/coreos/etcd/issues/8495#issuecomment-327241587 but are there any graceeful examples of how to reconnect the watch

Was this page helpful?
0 / 5 - 0 ratings