Etcd: grpc keepalive: test server-to-client HTTP/2 pings

Created on 5 Oct 2017 · 14Comments · Source: etcd-io/etcd

Need add tests around https://github.com/coreos/etcd/pull/8535.

aretesting stale

Source

gyuho

All 14 comments

Can you fill in more details here?

maybe @spzala can give this a try.

xiang90 on 15 Dec 2017

👍1

By configuring server-side keepalive time parameters in https://godoc.org/google.golang.org/grpc/keepalive#ServerParameters, we want to test (either manually or write integration tests):

server pings client after ServerParameters.Time
client had no activity during ServerParameters.Timeout
server closes the connection to the client

gyuho on 15 Dec 2017

👍1

WIP.

spzala on 8 Jan 2018

@spzala We can test this manually first.

I would try

Set up --grpc-keepalive-min-time, --grpc-keepalive-interval, --grpc-keepalive-timeout to etcd server (ref https://github.com/coreos/etcd/pull/8535 and gRPC doc).
Since this is server-to-client HTTP/2 ping, we should disable client-to-server ping and discard incoming packets from server in client-side (iptables, tc).
Server closes the connection to the client (confirm by just looking at the logs maybe?)
Client comes back (blackhole removed) but connection closed so cannot talk to server.

Then translate this to integration tests with wrapper (https://github.com/coreos/etcd/blob/master/clientv3/integration/black_hole_test.go or later https://github.com/coreos/etcd/pull/9081).

gyuho on 12 Jan 2018

👍1

@gyuho thanks much!!

spzala on 12 Jan 2018

@gyuho hi, I am trying to see the behavior with setting --grpc-keepalive-min-time and --grpc-keepalive-interval, --grpc-keepalive-timeout following steps you suggested above but I guess I am probably not doing it right and not able to see errors like too many ping or even server closing the connection. I have a single node cluster, and I tried testing with using simple go program from another machine (i.e. client pings server every few seconds (set higher than timeout or keep alive interval) I sent you earlier but without any luck of reproducing this behavior. Appreciates any help from you to go forward with it:) Thanks!

spzala on 30 Aug 2018

@spzala

probably not doing it right and not able to see errors like too many ping or even server closing the connection.

Have you dropped packets? We want to simulate faulty networks with iptables.

gyuho on 31 Aug 2018

👍1

@gyuho hi, thanks, so I run this on client machine iptables -A INPUT -s <serverip> -j DROP and I see that stopped server message and then when I unblock it, the connection was back and started receiving messages. But no error message or disconnect from server.

spzala on 31 Aug 2018

We expect

Server closes the connection to the client (confirm by just looking at the logs maybe?)

You may tune gRPC keepalive timeout in server-side. The disconnect may have been too short for server-side keepalive to kick-in.

gyuho on 6 Sep 2018

👍1

I would also add more debugging lines or adjust log levels in gRPC side. Server may not display all logs.

gyuho on 6 Sep 2018

👍1

@gyuho hi, I am getting back from some vacation and work travel :). I could run tests manually and I think we should try adding two integration test - one for MinTime (i.e. goaway too many pings error) and second for Timeout.
While testing manually, Timeout works as expected to me with actual closing of connection. With MinTime I see couple of things:

I could only see the log messages if I set up env variables export GRPC_GO_LOG_VERBOSITY_LEVEL=2 and export GRPC_GO_LOG_SEVERITY_LEVEL=info on both server and client CLI. I also have to comment this line https://github.com/etcd-io/etcd/blob/7a759c18d294698f537f8be91927354818a71e51/clientv3/logger.go#L47 So to rely on log messages, we need to think more here.
I see following log messages after enabling logging as step one above,
Server side: ERROR: 2018/10/15 22:35:18 transport: Got too many pings from the client, closing the connection.
Cline side: INFO: 2018/10/115 22:35:18 Client received GoAway with http2.ErrCodeEnhanceYourCalm.

I will be working on creating the integration test but have another related question, you mentioned to use blackhole - can I use the Blackhole() or something similar on client (clientv3.New(ccfg))?, something similar to https://github.com/etcd-io/etcd/blob/7a759c18d294698f537f8be91927354818a71e51/clientv3/integration/black_hole_test.go#L91 Per what I see is the method is used for cluster members. Thanks!

spzala on 18 Oct 2018

@spzala Tests would be great. Thanks a lot!

gyuho on 18 Oct 2018

👍1

@gyuho thanks, and qq can you please help me understand how you are thinking on using blackhole on client side?. Any thoughts would be helpful. Thanks!

spzala on 18 Oct 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.