Etcd: grpc keepalive: test server-to-client HTTP/2 pings

Created on 5 Oct 2017  路  14Comments  路  Source: etcd-io/etcd

aretesting stale

All 14 comments

Can you fill in more details here?

maybe @spzala can give this a try.

Similar to

https://github.com/coreos/etcd/blob/9deaee3ea1b1f0c4119aab865eceff38eb5d5ade/clientv3/integration/black_hole_test.go#L33-L49

By configuring server-side keepalive time parameters in https://godoc.org/google.golang.org/grpc/keepalive#ServerParameters, we want to test (either manually or write integration tests):

  1. server pings client after ServerParameters.Time
  2. client had no activity during ServerParameters.Timeout
  3. server closes the connection to the client

WIP.

@spzala We can test this manually first.

I would try

  1. Set up --grpc-keepalive-min-time, --grpc-keepalive-interval, --grpc-keepalive-timeout to etcd server (ref https://github.com/coreos/etcd/pull/8535 and gRPC doc).
  2. Since this is server-to-client HTTP/2 ping, we should disable client-to-server ping and discard incoming packets from server in client-side (iptables, tc).
  3. Server closes the connection to the client (confirm by just looking at the logs maybe?)
  4. Client comes back (blackhole removed) but connection closed so cannot talk to server.

Then translate this to integration tests with wrapper (https://github.com/coreos/etcd/blob/master/clientv3/integration/black_hole_test.go or later https://github.com/coreos/etcd/pull/9081).

@gyuho thanks much!!

@gyuho hi, I am trying to see the behavior with setting --grpc-keepalive-min-time and --grpc-keepalive-interval, --grpc-keepalive-timeout following steps you suggested above but I guess I am probably not doing it right and not able to see errors like too many ping or even server closing the connection. I have a single node cluster, and I tried testing with using simple go program from another machine (i.e. client pings server every few seconds (set higher than timeout or keep alive interval) I sent you earlier but without any luck of reproducing this behavior. Appreciates any help from you to go forward with it:) Thanks!

@spzala

probably not doing it right and not able to see errors like too many ping or even server closing the connection.

Have you dropped packets? We want to simulate faulty networks with iptables.

@gyuho hi, thanks, so I run this on client machine iptables -A INPUT -s <serverip> -j DROP and I see that stopped server message and then when I unblock it, the connection was back and started receiving messages. But no error message or disconnect from server.

We expect

Server closes the connection to the client (confirm by just looking at the logs maybe?)

You may tune gRPC keepalive timeout in server-side. The disconnect may have been too short for server-side keepalive to kick-in.

I would also add more debugging lines or adjust log levels in gRPC side. Server may not display all logs.

@gyuho hi, I am getting back from some vacation and work travel :). I could run tests manually and I think we should try adding two integration test - one for MinTime (i.e. goaway too many pings error) and second for Timeout.
While testing manually, Timeout works as expected to me with actual closing of connection. With MinTime I see couple of things:

  1. I could only see the log messages if I set up env variables export GRPC_GO_LOG_VERBOSITY_LEVEL=2 and export GRPC_GO_LOG_SEVERITY_LEVEL=info on both server and client CLI. I also have to comment this line https://github.com/etcd-io/etcd/blob/7a759c18d294698f537f8be91927354818a71e51/clientv3/logger.go#L47 So to rely on log messages, we need to think more here.
  2. I see following log messages after enabling logging as step one above,
    Server side: ERROR: 2018/10/15 22:35:18 transport: Got too many pings from the client, closing the connection.
    Cline side: INFO: 2018/10/115 22:35:18 Client received GoAway with http2.ErrCodeEnhanceYourCalm.

I will be working on creating the integration test but have another related question, you mentioned to use blackhole - can I use the Blackhole() or something similar on client (clientv3.New(ccfg))?, something similar to https://github.com/etcd-io/etcd/blob/7a759c18d294698f537f8be91927354818a71e51/clientv3/integration/black_hole_test.go#L91 Per what I see is the method is used for cluster members. Thanks!

@spzala Tests would be great. Thanks a lot!

@gyuho thanks, and qq can you please help me understand how you are thinking on using blackhole on client side?. Any thoughts would be helpful. Thanks!

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings