Etcd: Watch endpoint should have a timeout option

Created on 9 Mar 2015 · 28Comments · Source: etcd-io/etcd

Currently, the ?wait=true option to GET (to create a watch) will wait indefinitely for the key to change. This interacts poorly with socket timeouts.

If you have any non-infinite socket timeout, you must know that a socket timeout exception can actually happen simply if no change has happened before the timeout. You can't differentiate between this state and the server disappearing and being unable to reply.

If you have an infinite socket timeout, and the remote server crashes, you may wait forever for a connection which is long dead.

Thoughts on how to improve this:

Add a "timeout" to watches which, on expiry, returns a unique HTTP code, say 204 No Content
Periodically write a small amount of (ignored) data out to the HTTP stream as a keepalive (whitespace, so it's still equivalent JSON?)

areperformance

Source

stevenschlansker

Most helpful comment

@xiang90 so what's the solution? Could you provided some examples of using v3' watcher to distinguish the status of no event and server crashs which being unable to reply?
I have confronted the problem with v3's watcher.

...
wch := etcdClient.Watch(context.Background(), "/EtcdWatcherTest/TestWatchWithFailover/", opts...)

go func() {
   // shutdown etcd server
   // sleep 120s
   //  restart etcd server
   // KV.Put(xxxx)
}
wch get nothing and no err, just wait for a long long time
...

shuting-yst on 4 Jan 2017

👍4

All 28 comments

Hi, I am a newbie to ETCD and right now studying it closely.
However, to improve such scenario, is it possible to apply event and delegate scenario or Publish/subscribe partially so that remote servers don't need to wait for sockets to be arrived?

il9ue on 16 Mar 2015

It may be possible to add a pub/sub interface, but that would be a new feature and should probably be discussed separately.

stevenschlansker on 16 Mar 2015

We just built the timeout into our java etcd client instead. So when we watch we only watch for 20 seconds max and then timeout and retry so we avoid this. Once we get events then we can remove this sorta thing.

brentryan on 2 Apr 2015

@stevenschlansker @brentryan

We have some discussion about this internally:

add tcp keepalive for the long polling connections
add notification for progress every N index increases even if there are no events happening
add a max_index flag to stop watching when it reaches.

But these are for v3 api.

xiang90 on 2 Apr 2015

I think option 2 and 3 do not really solve the problem, because the etcd index is intentionally time-independent, but TCP connections are explicitly time-dependent. So in an idle cluster you may not receive any index changes for a period of time, and then still observe timeouts.

Why does this need to be a v3 feature? At least the simple version (adding a timeout query parameter) is not a breaking change, so it would be nice to consider it for a minor release rather than waiting all the way for v3?

stevenschlansker on 2 Apr 2015

@stevenschlansker

I think option 2 and 3 do not really solve the problem

They are relevant things. Client needs to detect the server status and server also needs to detect client status. It might not help with you specific problem. But a lot of people want to control the timeout of watchers themselves and they need to when where to start re-watch.

If you have an infinite socket timeout, and the remote server crashes, you may wait forever for a connection which is long dead.

This is a client-side thing. It should not be handled at server side. etcd server are detecting dead client via tcp keepalive.

xiang90 on 2 Apr 2015

cc @hchoudh

bgrant0607 on 3 Apr 2015

I'd like to see a timeout parameter and a dedicated "nothing happened" response such as a 204 at the end of it.

The current behaviour interacts _very_ poorly with HTTP connection pooling because the server never completes the HTTP transaction if no events occur.

Since HTTP 1.x is request-response based, there's no way to time out a request without also tearing down the underlying socket. Doing that is very expensive; it wastes file descriptors at both ends and adds an additional TCP or even a TLS handshake in order to reconnect.

fasaxc on 8 Apr 2015

Since HTTP 1.x is request-response based, there's no way to time out a request without also tearing down the underlying socket.

This is a good point. We are actually considering switch to stream based communication.

xiang90 on 8 Apr 2015

Streaming is nice but it's hard to write a correct client for that.

I'd recommend doing something simple to allow connection reuse and then adding streaming later for advanced users via format=stream, for example.

fasaxc on 8 Apr 2015

I've just come across another corner case that is hard to handle without a response. I want my code to be able to detect if the cluster has been replaced wholesale by a backup, for example. In that case, I'll poll against the backup server but with an index from the pre-backup server; that index will be far in the future for the new server so I'll never get a response. Without a response, I can't check the X-Etcd-Cluster-Id header and spot I polled against the wrong one.

fasaxc on 10 Apr 2015

We also just ran into the "backup restore" edge case -- we had a storage outage and restored from backup, all clients that were watching ended up in an indefinite hang as the cluster index had moved far into the past.

stevenschlansker on 21 Sep 2015

@stevenschlansker Can you file a new bug? I am confused on what you mean by "backup restore" edge case.

philips on 5 Nov 2015

This blog post resports some odd behavior: http://www.projectclearwater.org/adventures-in-debugging-etcd-http-pipelining-and-file-descriptor-leaks/

Eventually, Python exits, and closes TCP connection A, by sending a FIN packet – but etcd doesn’t send a FIN of its own, or send a zero-byte chunk to finish the 200 OK, as it does in the success case. This causes the socket to leak on etcd’s side – we think this is a bug in Go’s HTTP stack.

philips on 5 Nov 2015

Adding a waitTimeout parameter seems like a workaround rather than fixing the real problem: the client here doesn't have any opinion on how long to wait; the real issue is how long the server is willing to let possibly-abandoned file descriptors hang around. I would suggest fixing this by setting defaultWatchTimeout to something reasonable (vs the current MaxInt64) instead of (or in addition to) a client-specified timeout.

Also, in the case described on projectclearwater.org, the connection was shut down cleanly, so using CloseNotify would have worked too. This is even better than the timeout since you can free up the file descriptor immediately, but it doesn't always detect unclean disconnects promptly so you need the timeout as well.

bdarnell on 5 Nov 2015

@bdarnell

Adding a waitTimeout parameter seems like a workaround rather than fixing the real problem

I agree. So I am not convinced that we should support it now.

I would suggest fixing this by setting defaultWatchTimeout to something reasonable (vs the current MaxInt64) instead of (or in addition to) a client-specified timeout.

We tried. But timeout at the server side breaks the connections. And go client library seems to have a bug that cannot handle it well. See https://github.com/golang/go/issues/8946.

We set tcp level keepalive and also use CloseNotify. But in his case, the connection are pooled in the client itself. So tcp connection is not closed. Keepalive does not work in that case. I am not entirely sure why CloseNotify does not work. We need to investigate into this more.

xiang90 on 5 Nov 2015

@bdarnell Yicheng reminded me that you might suggest set the timeout for context, not for the connection. So the server will finish the HTTP. That probably works.

xiang90 on 5 Nov 2015

Sorry, I thought you weren't using CloseNotify because the only occurrence grep found was in client.go, but it turns out that file is a server-side implementation of API _for_ clients. So I'm surprised that didn't work. The CloseNotifier should have fired when the client closed its connection.

I don't follow how a timeout on the server side breaks things. When the timeout fires you should just send back an empty response, or whatever you want to use to signify to the client that the timeout elapsed without any news.

bdarnell on 5 Nov 2015

@bdarnell

I don't follow how a timeout on the server side breaks things.

I was wrong. Sorry. See https://github.com/coreos/etcd/issues/2468#issuecomment-154115630.
I was still thinking in the old world, when we set the timeout on http.Server struct (https://golang.org/pkg/net/http/#Server). And it breaks the client.

xiang90 on 5 Nov 2015

Just a note, keep in mind that TCP keep-alive timeout parameters are configured system-wide (at least on Linux), and generally are set such that timeout happens after 10+ minutes. It's a nice thing to have as a fallback, but may not meet the needs of finding dead connections within a reasonable time for some etcd use cases.

joelnn on 12 May 2016

TCP keep-alive timeout parameters are configured system-wide

It can be configured per connection. (see 4.3 of http://www.tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/)

xiang90 on 12 May 2016

You are right, thanks.

That will work for the server to detect unreachable clients, but I doubt the client will be able to set it properly through grpc, where the socket is quite hidden. I don't see any support in https://github.com/grpc/grpc

grpc-java has an issue open for it here https://github.com/grpc/grpc-java/issues/1648

joelnn on 12 May 2016

You can turn TCP keepalives on and off per socket, but the duration of the timeout is system-wide (and usually very large - hours in some cases). If you need to detect dropped connections faster than that you have to use your own timer and send some data over the connection. This is tricky to do cleanly with HTTP/1; it's easier in HTTP/2 or GRPC.

bdarnell on 12 May 2016

@bdarnell that was my understanding as well, but on careful re-reading of that doc, there are setsockopt options to override the system-wide parameters. TCP_KEEPCNT, TCP_KEEPIDLE, TCP_KEEPINTVL.

joelnn on 12 May 2016

the duration of the timeout is system-wide

You can tune that too, although it is not that portable and usually at minute level. For etcd, we do application level keepalive on peer communication.

On clientside, transportation is the responsibility of gRPC. We do application level keep alive on long living streams.

xiang90 on 12 May 2016

This is irrelevant to v3 api which uses stream to support watch and clients can require from periodically etcd layer keep alive notification. We do not have plan to add timeout feature as a workaround into v2 either.

xiang90 on 17 Jun 2016

Closing. v3 cancellation works well so far without timeout option.

xiang90 on 1 Nov 2016

...
wch := etcdClient.Watch(context.Background(), "/EtcdWatcherTest/TestWatchWithFailover/", opts...)

go func() {
   // shutdown etcd server
   // sleep 120s
   //  restart etcd server
   // KV.Put(xxxx)
}
wch get nothing and no err, just wait for a long long time
...

shuting-yst on 4 Jan 2017

👍4

Was this page helpful?

0 / 5 - 0 ratings