Grpc-go: Transport is closing and transient failure errors since upgrading to 1.18

Created on 1 Mar 2019 · 18Comments · Source: grpc/grpc-go

Please answer these questions before submitting your issue.

What version of gRPC are you using?

1.18.0

What version of Go are you using (`go version`)?

1.11.5

What operating system (Linux, Windows, …) and version?

Alpine Linux 3.9 running on Google Kubernetes Engine

What did you do?

Since upgrading, I've noticed a large increase in grpc errors leading to failed requests and service disruptions.

We don't use any connection settings on the clients, nor do we have any special settings on the server (keepalives, fastfail, etc). The only thing we have on the connections is Mutual TLS. It's roughly configured like this:

Client:

creds := grpc.WithTransportCredentials(credentials.NewTLS(&tls.Config{
        RootCAs:      certPool,
        Certificates: []tls.Certificate{cert},
        MinVersion:   tls.VersionTLS12,
}))
grpc.Dial("foo.foo.svc.cluster.local:80", creds)

Server:

creds := grpc.Creds(credentials.NewTLS(&tls.Config{
    Certificates:             []tls.Certificate{cert},
    ClientCAs:                certPool,
    ClientAuth:               tls.RequireAndVerifyClientCert,
    MinVersion:               tls.VersionTLS12,
    NextProtos:               []string{"h2"},
    PreferServerCipherSuites: true,
    CurvePreferences: []tls.CurveID{
        tls.CurveP256,
        tls.X25519,
    },
    CipherSuites: []uint16{
        tls.TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
        tls.TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
        tls.TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,
        tls.TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,
        tls.TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
        tls.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
    },
}))
grpc.NewServer(creds)

This may be related to one of these other two issues
https://github.com/grpc/grpc-go/issues/2653
https://github.com/grpc/grpc-go/issues/2636

Based on the feedback in one of the above linked issues, I set these ENV vars on one service that had a lot of failures: GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info

I added (__GRPC CALL HERE__) at the end of the log lines below where the error was logged as a failed grpc call that we made.

WARNING: 2019/03/01 04:41:42 grpc: addrConn.createTransport failed to connect to {foo.foo.svc.cluster.local:80 0  <nil>}. Err :connection error: desc = "transport: authentication handshake failed: context deadline exceeded". Reconnecting...
INFO: 2019/03/01 04:41:42 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 04:41:42 pickfirstBalancer: HandleSubConnStateChange: 0xc000806290, TRANSIENT_FAILURE
timestamp=2019-03-01T04:41:42.887562062Z level=error error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: authentication handshake failed: context deadline exceeded\"" (__GRPC CALL HERE))
INFO: 2019/03/01 04:41:43 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 04:41:43 pickfirstBalancer: HandleSubConnStateChange: 0xc000806290, CONNECTING
INFO: 2019/03/01 04:41:44 Subchannel Connectivity change to READY
INFO: 2019/03/01 04:41:44 pickfirstBalancer: HandleSubConnStateChange: 0xc000806290, READY
INFO: 2019/03/01 04:45:22 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 04:45:22 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 04:45:22 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 04:45:22 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 04:45:23 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 04:45:23 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 04:45:28 Subchannel Connectivity change to READY
INFO: 2019/03/01 04:45:28 Subchannel Connectivity change to READY

INFO: 2019/03/01 05:34:23 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 05:34:23 Subchannel Connectivity change to READY
INFO: 2019/03/01 05:35:25 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 05:35:25 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 05:35:27 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 05:35:32 Subchannel Connectivity change to READY
INFO: 2019/03/01 05:35:59 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 05:35:59 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 05:36:01 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 05:36:01 Subchannel Connectivity change to READY
INFO: 2019/03/01 05:38:19 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 05:38:19 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 05:38:19 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 05:38:19 pickfirstBalancer: HandleSubConnStateChange: 0xc0008293d0, TRANSIENT_FAILURE
INFO: 2019/03/01 05:38:19 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
timestamp=2019-03-01T05:38:19.972966567Z level=error error="rpc error: code = Unavailable desc = transport is closing" (__GRPC CALL HERE__)
INFO: 2019/03/01 05:38:19 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 05:38:19 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 05:38:19 pickfirstBalancer: HandleSubConnStateChange: 0xc000806290, TRANSIENT_FAILURE
INFO: 2019/03/01 05:38:21 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 05:38:21 pickfirstBalancer: HandleSubConnStateChange: 0xc0008293d0, CONNECTING
INFO: 2019/03/01 05:38:21 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 05:38:21 pickfirstBalancer: HandleSubConnStateChange: 0xc000806290, CONNECTING
INFO: 2019/03/01 05:38:21 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 05:38:21 Subchannel Connectivity change to READY
INFO: 2019/03/01 05:38:21 transport: loopyWriter.run returning. connection error: desc = "transport is closing"

INFO: 2019/03/01 16:13:05 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 16:13:05 pickfirstBalancer: HandleSubConnStateChange: 0xc0008293d0, TRANSIENT_FAILURE
INFO: 2019/03/01 16:13:05 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
timestamp=2019-03-01T16:13:05.92908361Z level=error error="rpc error: code = Unavailable desc = transport is closing" (__GRPC CALL HERE__)
INFO: 2019/03/01 16:13:07 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 16:13:07 pickfirstBalancer: HandleSubConnStateChange: 0xc0008293d0, CONNECTING
INFO: 2019/03/01 16:13:07 Subchannel Connectivity change to READY
INFO: 2019/03/01 16:13:07 pickfirstBalancer: HandleSubConnStateChange: 0xc0008293d0, READY
INFO: 2019/03/01 16:20:17 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 16:20:17 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 16:20:18 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 16:20:23 Subchannel Connectivity change to READY
INFO: 2019/03/01 16:42:24 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 16:42:24 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 16:42:25 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 16:42:25 Subchannel Connectivity change to READY
INFO: 2019/03/01 17:20:47 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 17:20:47 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 17:20:49 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 17:20:54 Subchannel Connectivity change to READY
INFO: 2019/03/01 17:43:01 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2019/03/01 17:43:01 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
INFO: 2019/03/01 17:43:02 Subchannel Connectivity change to CONNECTING
INFO: 2019/03/01 17:43:07 Subchannel Connectivity change to READY

P1 Bug

Source

cristiangraz

All 18 comments

Can you also get server side logs to see if it contains more information?

The handshake deadline exceeded error looks very suspicious. It would help if we know why it happened.

menghanl on 1 Mar 2019

One more thing to try: can you set GRPC_GO_REQUIRE_HANDSHAKE=off on client side and see if that changes anything? (#2406)

menghanl on 1 Mar 2019

@menghanl I have experienced same issue here. Here is a reproduce procedure.

Example program

client

package main

import (
    "context"
    "log"
    "time"

    "github.com/golang/glog"
    "github.com/kazegusuri/grpc-test/pb"
    "google.golang.org/grpc"
    _ "google.golang.org/grpc/grpclog/glogger"
)

func main() {
    ctx := context.Background()
    conn1, err := grpc.DialContext(ctx, "127.0.0.1:9000", grpc.WithInsecure())
    if err != nil {
        log.Fatalf("grpc.DialContext failed: %v", err)
    }
    defer func() { _ = conn1.Close() }()

    cli1 := pb.NewEchoClient(conn1)
    for {
        time.Sleep(1 * time.Millisecond)
        _, err := cli1.Echo(ctx, &pb.EchoMessage{Message: "hello"})
        if err != nil {
            glog.Infof("goroutine: echo failed: %v", err)
        } else {
            glog.Infof("goroutine: echo succeeded")
        }
    }
}

server

func main() {
    server := newServer()
    s := grpc.NewServer(grpc.KeepaliveParams(keepalive.ServerParameters{
        MaxConnectionAge: 10 * time.Second, // this kills connection 10s after connected
    }))
    pb.RegisterEchoServer(s, server)

    lis, err := net.Listen("tcp", ":9000")
    if err != nil {
        log.Fatalf("failed to listen: %v", err)
    }

    if err := s.Serve(lis); err != nil {
        log.Fatalf("err %v\n", err)
    }

    select {}
}

Results

With above programs, I run the server in v1.19.0 and the client with v1.17, 1.18, 1.19.
The server kills the connection from the client 10 second after connected.
The client should reconnect to the server but its behavior is different in client versions.
With 1.18 and 1.19 the client takes over 1 second to recconect the server.

v1.17.0 (default)

I0303 02:22:59.745529    6773 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00001e030, TRANSIENT_FAILURE
I0303 02:22:59.745566    6773 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00001e030, CONNECTING
I0303 02:22:59.745732    6773 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00001e030, READY

v1.18.0 (default)

I0303 02:20:58.820793    5169 clientconn.go:941] Subchannel Connectivity change to TRANSIENT_FAILURE
I0303 02:20:58.820892    5169 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00016e010, TRANSIENT_FAILURE
I0303 02:20:59.821028    5169 clientconn.go:941] Subchannel Connectivity change to CONNECTING
I0303 02:20:59.821189    5169 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00016e010, CONNECTING
I0303 02:20:59.822058    5169 clientconn.go:941] Subchannel Connectivity change to READY
I0303 02:20:59.822155    5169 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00016e010, READY

v1.18.0 (GRPC_GO_REQUIRE_HANDSHAKE=off)

I0303 02:25:59.393033    8568 clientconn.go:941] Subchannel Connectivity change to TRANSIENT_FAILURE
I0303 02:25:59.393096    8568 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00017c010, TRANSIENT_FAILURE
I0303 02:26:00.393385    8568 clientconn.go:941] Subchannel Connectivity change to CONNECTING
I0303 02:26:00.393510    8568 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00017c010, CONNECTING
I0303 02:26:00.393937    8568 clientconn.go:941] Subchannel Connectivity change to READY
I0303 02:26:00.393979    8568 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc00017c010, READY


v1.19.0 (default)

I0303 02:28:57.655932   11292 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000160010, TRANSIENT_FAILURE
I0303 02:28:58.656125   11292 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000160010, CONNECTING
I0303 02:28:58.656489   11292 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000160010, READY

v1.19.0 (GRPC_GO_REQUIRE_HANDSHAKE=off)

I0303 02:30:30.642447   11438 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000162010, TRANSIENT_FAILURE
I0303 02:30:31.642680   11438 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000162010, CONNECTING
I0303 02:30:31.643004   11438 pickfirst.go:73] pickfirstBalancer: HandleSubConnStateChange: 0xc000162010, READY

kazegusuri on 2 Mar 2019

👀5

This causes performance regression on the server which provides single subconn like Google Spanner.
Spanner seems to kill connections in the server-side in an hour, then the connection becomes transient failure. For a second in transient failure until becoming ready, the connection keeps unavaiable.

kazegusuri on 2 Mar 2019

If it's okay to try to create transport immediately after transient failure for first retry, how about adding code like this here?

if ac.backoffIdx == 0 {
    backoffFor = 0
}

kazegusuri on 3 Mar 2019

I have a question.
I found baseDelay = 1.0 * time.Second in here.
I think this value is little bit big. Why we use 1 sec by default?
1 sec is a too big from user-facing frontend side.

vvakame on 6 Mar 2019

I encountered the same issue using the gc bigtable go client (grpc based), bigtable grpc server seems to have a connection max age of one hour, which means that every hour, I litteraly stop handling my traffic for one second.

I agree with @vvakame this default value is too high for a lot of use cases and should be easily configurable in the other google clients based on this repo.

mpipet on 6 Mar 2019

There's a bug in the reconnect logic, the backoff time is not reset after creating a connection.

The fix is in #2669. Can you guys try it and see if it fixes the problems you are seeing? Thanks!

menghanl on 6 Mar 2019

👍1

Closing. Please comment if you are still experiencing problems.

dfawley on 14 Mar 2019

Does the fix in https://github.com/grpc/grpc-go/pull/2669 still require GRPC_GO_REQUIRE_HANDSHAKE=off to be set?

mikesun on 14 Jun 2019

I hope not, since I intend to remove that option soon (this or next week). Please let me know if you are experiencing any problems with it unset.

dfawley on 14 Jun 2019

@dfawley

It seems to still be an issue for us. I reported it to Google Cloud support which pointed me to this thread and suggested the workaround was to set GRPC_GO_REQUIRE_HANDSHAKE=off. It appears that is no longer the case.

We running a Golang Google App Engine flexible app with Go v1.11 and grpc-gio v1.20.0.

With Spanner, we see context timeouts:

spanner: code = “Canceled”, desc = “context canceled, lastErr is <<nil>>”

With Cloud Tasks, we see these RPC errors:

all SubConns are in TransientFailure, latest connection error:

These errors are not occuring consistently, only intermittently and in bursts.

mikesun on 14 Jun 2019

These errors are not occuring consistently, only intermittently and in bursts.

How long do these bursts last and how often do they occur?

Does setting GRPC_GO_REQUIRE_HANDSHAKE=off help for you?

dfawley on 14 Jun 2019

These errors are not occuring consistently, only intermittently and in bursts.

How long do these bursts last and how often do they occur?

Currently, we do not have a production environment with any meaningful load, but in a trial production run that we ran for 3 days, we had one burst of errors for ~20 minutes. Otherwise, we see it less than once a day, for a minute.

mikesun on 14 Jun 2019

I turned on GRPC logging on one of our deployments and it appears I get the TRANSIENT_FAILURE errors continuously regardless if GRPC_GO_REQUIRE_HANDSHAKE=off is set.

grpc v1.21.1 and GRPC_GO_REQUIRE_HANDSHAKE=off

2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc000239980, TRANSIENT_FAILURE
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc000239980, CONNECTING
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc000239980, READY
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc000239770, TRANSIENT_FAILURE
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc000239770, CONNECTING
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc000239770, READY
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, TRANSIENT_FAILURE
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, CONNECTING
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, READY
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, TRANSIENT_FAILURE
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, CONNECTING
2019-06-14 20:50:14 default[20190614t163806]  INFO: 2019/06/14 20:50:14 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, READY

The reconnects seems to happening immediately and I haven't experienced any RPC failures with all SubConns are in TransientFailure, so I guess this is acceptable behavior?

We do configure http.DefaultTransport.(*http.Transport).MaxIdleConnsPerHost = 500, would that be a problem?

mikesun on 14 Jun 2019

@mikesun

I guess this is acceptable behavior?

It's not normal for your connection to be dying so frequently (several per second). Is this happening continuously or are these entries the only ones? Unfortunately, there isn't enough logging in place to see why the connection is being closed. But it does go into READY, which means at least TLS was successful.

We do configure http.DefaultTransport.(*http.Transport).MaxIdleConnsPerHost = 500, would that be a problem?

Go HTTP server settings should not affect grpc-go - we have our own server implementation.

dfawley on 15 Jun 2019

It's not normal for your connection to be dying so frequently (several per second). Is this happening continuously or are these entries the only ones? Unfortunately, there isn't enough logging in place to see why the connection is being closed. But it does go into READY, which means at least TLS was successful.

It seems to happen in bunches (~3/sec) every few minutes (1-4 min). This is on an GAE flexible instance with basically no load:

A  INFO: 2019/06/17 16:10:15 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:10:15 pickfirstBalancer: HandleSubConnStateChange: 0xc000239900, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:10:15 pickfirstBalancer: HandleSubConnStateChange: 0xc000239900, CONNECTING
A  INFO: 2019/06/17 16:10:15 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:10:15 pickfirstBalancer: HandleSubConnStateChange: 0xc000239870, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:10:15 pickfirstBalancer: HandleSubConnStateChange: 0xc000239870, CONNECTING
A  INFO: 2019/06/17 16:10:15 pickfirstBalancer: HandleSubConnStateChange: 0xc000239900, READY
A  INFO: 2019/06/17 16:10:15 pickfirstBalancer: HandleSubConnStateChange: 0xc000239870, READY
A  INFO: 2019/06/17 16:10:37 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:10:37 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:10:37 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, CONNECTING
A  INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, READY
A  INFO: 2019/06/17 16:10:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, CONNECTING
A  INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, READY
A  INFO: 2019/06/17 16:10:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, CONNECTING
A  INFO: 2019/06/17 16:10:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, READY
A  INFO: 2019/06/17 16:14:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, CONNECTING
A  INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, READY
A  INFO: 2019/06/17 16:14:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, CONNECTING
A  INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, READY
A  INFO: 2019/06/17 16:14:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, CONNECTING
A  INFO: 2019/06/17 16:14:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, READY
A  INFO: 2019/06/17 16:18:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, CONNECTING
A  INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, READY
A  INFO: 2019/06/17 16:18:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, CONNECTING
A  INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, READY
A  INFO: 2019/06/17 16:18:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, CONNECTING
A  INFO: 2019/06/17 16:18:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, READY
A  INFO: 2019/06/17 16:22:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, CONNECTING
A  INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc000238ec0, READY
A  INFO: 2019/06/17 16:22:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, CONNECTING
A  INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00030c150, READY
A  INFO: 2019/06/17 16:22:38 transport: loopyWriter.run returning. connection error: desc = "transport is closing"
A  INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, TRANSIENT_FAILURE
A  INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, CONNECTING
A  INFO: 2019/06/17 16:22:38 pickfirstBalancer: HandleSubConnStateChange: 0xc00081c350, READY

mikesun on 17 Jun 2019

@mikesun - one last thing I can think of: make sure you aren't running a grpc-go version between #2740 (82fdf27 - 21d ago) and #2862 (a1d4c28 - 10d ago). We had a bug in that range that could potentially cause problems like this.

If it's not that, then most likely the problem is not on the client side. Logs from the server/proxy side would be needed for further debugging. Also, since your problems are not related to this issue, could you file a new issue if you suspect a problem with grpc-go?

dfawley on 20 Jun 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

DNS resolver has started looking for TXT records

arcana261 · 3Comments

Notice: Upcoming Experimental Balancer/Resolver API Changes

dfawley · 3Comments

Recv() after GracefulStop() getting rpc error "transport is closing"

23doors · 3Comments

Client not reconnecting via resetTransport on "connection refused" error

jbrook · 4Comments

serviceconfig: support specifying default service config using a native go struct

kazegusuri · 4Comments

Grpc-go: Transport is closing and transient failure errors since upgrading to 1.18

What version of gRPC are you using?

What version of Go are you using (go version)?

What operating system (Linux, Windows, …) and version?

What did you do?

All 18 comments

Example program

client

server

Results

Related issues

What version of Go are you using (`go version`)?