Please answer these questions before submitting your issue.
Downstream etcd issue: https://github.com/coreos/etcd/pull/9623
v1.7.5, but behavior is the same on latest
go version)?go version go1.10 linux/amd64
Linux 4.15.17-300.fc27.x86_64
After mis-configuring a gRPC client with the wrong CA, I noticed the error is logged at a WARNING level here. After relaxing our logger to print WARNING messages, we started seeing routine disconnects, apparently caused by the internal gRPC load balancer:
$ etcdctl user list
WARNING: 2018/04/25 13:06:05 Failed to dial etcd-2.local:22379: grpc: the connection is closing; please retry.
WARNING: 2018/04/25 13:06:05 Failed to dial etcd-1.local:12379: grpc: the connection is closing; please retry.
$ etcdctl user list
WARNING: 2018/04/25 13:06:05 Failed to dial etcd-1.local:12379: grpc: the connection is closing; please retry.
$ etcdctl user list
WARNING: 2018/04/25 13:06:06 Failed to dial etcd-3.local:32379: grpc: the connection is closing; please retry.
WARNING: 2018/04/25 13:06:06 Failed to dial etcd-2.local:22379: grpc: the connection is closing; please retry.
I expected these kind of errors logged at different levels.
No difference between fatal errors (such as mis-configured TLS), and routine ones.
Maybe something like:
diff --git a/clientconn.go b/clientconn.go
index 9745484..6263b30 100644
--- a/clientconn.go
+++ b/clientconn.go
@@ -925,10 +925,12 @@ func (ac *addrConn) connect() error {
// Start a goroutine connecting to the server asynchronously.
go func() {
if err := ac.resetTransport(); err != nil {
- grpclog.Warningf("Failed to dial %s: %v; please retry.", ac.addrs[0].Addr, err)
if err != errConnClosing {
+ grpclog.Errorf("Failed to dial %s: %v; please retry.", ac.addrs[0].Addr, err)
// Keep this ac in cc.conns, to get the reason it's torn down.
ac.tearDown(err)
+ } else {
+ grpclog.Warningf("Failed to dial %s: %v; please retry.", ac.addrs[0].Addr, err)
}
return
}
@menghanl @ericchiang I believe this happens when etcd client manually triggers down(errConnClosing) on unused connections (from old balancer interface in v1.7). We manually close connections after pinning the first connection out of multiple endpoints. We are upgrading gRPC with balancer rewrite. In our new implementation, we are getting rid of such manual connection close.
@gyuho Thanks for clarifying!
@ericchiang
In the newer releases, we include the last connection error in RPC errors, (#1855, part of release 1.11). So if you have logs for RPC errors, you will see information about connections there, which could help you tell if the error was fatal or not.
The proposal about log levels in this issue sounds good. We have plan to define our logging policy (how to pick log levels), we will make sure this issue gets covered.
The logging doc was submitted in #2033, and sweeping through our logging is tracked in #2146.
Closing this issue now.
Most helpful comment
@gyuho Thanks for clarifying!
@ericchiang
In the newer releases, we include the last connection error in RPC errors, (#1855, part of release 1.11). So if you have logs for RPC errors, you will see information about connections there, which could help you tell if the error was fatal or not.
The proposal about log levels in this issue sounds good. We have plan to define our logging policy (how to pick log levels), we will make sure this issue gets covered.