Grpc-go: gRPC call is stuck even with timeout set in the context

Created on 20 Feb 2017  Â·  6Comments  Â·  Source: grpc/grpc-go

My gRPC invocation is stuck for nearly 5 minutes, even I set a timeout of 300ms in my context. This phenomenon appears about 1~2 times a day.

Here is my code:

// client is myproto.MyServiceClient
ctx, cancel := context.WithTimeout(context.Background(), 300*time.Millisecond)
defer cancel()
resp, err := client.MyMethod(ctx, &myproto.Empty{})

And this is the stack trace collected from pprof:

1 @ 0x42e5da 0x428e9b 0x4285f9 0x4782d8 0x4783a4 0x47aa47 0x48a9c0 0x60c887 0x60cd23 0x60dd9b 0x64d385 0x64d2eb 0x6f434c 0x6e7d5d 0x5597b6 0x55a283 0x559b3c 0x74e612 0x597abd 0x5985aa 0x45f301
#   0x4285f8    net.runtime_pollWait+0x58                                   /usr/local/Cellar/go/1.7.5/libexec/src/runtime/netpoll.go:160
#   0x4782d7    net.(*pollDesc).wait+0x37                                   /usr/local/Cellar/go/1.7.5/libexec/src/net/fd_poll_runtime.go:73
#   0x4783a3    net.(*pollDesc).waitWrite+0x33                                  /usr/local/Cellar/go/1.7.5/libexec/src/net/fd_poll_runtime.go:82
#   0x47aa46    net.(*netFD).Write+0x346                                    /usr/local/Cellar/go/1.7.5/libexec/src/net/fd_unix.go:329
#   0x48a9bf    net.(*conn).Write+0x6f                                      /usr/local/Cellar/go/1.7.5/libexec/src/net/net.go:185
#   0x60c886    crypto/tls.(*Conn).write+0x196                                  /usr/local/Cellar/go/1.7.5/libexec/src/crypto/tls/conn.go:819
#   0x60cd22    crypto/tls.(*Conn).writeRecordLocked+0x342                          /usr/local/Cellar/go/1.7.5/libexec/src/crypto/tls/conn.go:895
#   0x60dd9a    crypto/tls.(*Conn).Write+0x1fa                                  /usr/local/Cellar/go/1.7.5/libexec/src/crypto/tls/conn.go:1047
#   0x64d384    bufio.(*Writer).flush+0x74                                  /usr/local/Cellar/go/1.7.5/libexec/src/bufio/bufio.go:563
#   0x64d2ea    bufio.(*Writer).Flush+0x2a                                  /usr/local/Cellar/go/1.7.5/libexec/src/bufio/bufio.go:552
#   0x6f434b    google.golang.org/grpc/transport.(*framer).writeData+0x8b                   /Users/chengyufeng/go/src/google.golang.org/grpc/transport/http_util.go:408
#   0x6e7d5c    google.golang.org/grpc/transport.(*http2Client).Write+0x3ec                 /Users/chengyufeng/go/src/google.golang.org/grpc/transport/http2_client.go:694
#   0x5597b5    google.golang.org/grpc.sendRequest+0x305                            /Users/chengyufeng/go/src/google.golang.org/grpc/call.go:121
#   0x55a282    google.golang.org/grpc.invoke+0x712                             /Users/chengyufeng/go/src/google.golang.org/grpc/call.go:244
#   0x559b3b    google.golang.org/grpc.Invoke+0x19b                             /Users/chengyufeng/go/src/google.golang.org/grpc/call.go:143
...

The stack trace of this goroutine stays like that for about 5 minutes. It seems get blocked on socket write.

Bug

Most helpful comment

@MakMukhi please take a look

All 6 comments

It seems that framer.writeData calls http2.Framer.WriteData, which will eventually flush some data into the framer's underlying net.Conn. net.Conn.Write cannot be canceled by the context and this gRPC call will block if the underlying net.Conn hangs.

One solution is to queue all frame writes into a channel, and use a goroutine to flush the frames from the channel into the underlying connection. This way, we can do select on the send channel and avoid blocking if the underlying connection hangs.

@MakMukhi please take a look

Hello, is there any other update or workaround to this issue?

This should've been fixed by #1498

On Sep 25, 2017 5:43 PM, "Ashwanth Kumar" notifications@github.com wrote:

Hello, is there any other update or workaround to this issue?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/grpc/grpc-go/issues/1078#issuecomment-332052892, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ATtnR0pBdl2jd4gDaRRCSSwZgfNp2h_Pks5smEiwgaJpZM4MF4Ys
.

What about read connections? I've a stuck connection in readFrame

#       0x4297f6        internal/poll.runtime_pollWait+0x56                                                             /usr/lib/go-1.9/src/runtime/netpoll.go:173
#       0x498f8d        internal/poll.(*pollDesc).wait+0xad                                                             /usr/lib/go-1.9/src/internal/poll/fd_poll_runtime.go:85
#       0x49900c        internal/poll.(*pollDesc).waitRead+0x3c                                                         /usr/lib/go-1.9/src/internal/poll/fd_poll_runtime.go:90
#       0x499da9        internal/poll.(*FD).Read+0x189                                                                  /usr/lib/go-1.9/src/internal/poll/fd_unix.go:126
#       0x50c7f1        net.(*netFD).Read+0x51                                                                          /usr/lib/go-1.9/src/net/fd_unix.go:202
#       0x51ed8c        net.(*conn).Read+0x6c                                                                           /usr/lib/go-1.9/src/net/net.go:176
#       0x540f5a        bufio.(*Reader).Read+0x30a                                                                      /usr/lib/go-1.9/src/bufio/bufio.go:213
#       0x475335        io.ReadAtLeast+0x85                                                                             /usr/lib/go-1.9/src/io/io.go:309
#       0x4754a7        io.ReadFull+0x57                                                                                /usr/lib/go-1.9/src/io/io.go:327
#       0x78f6ca        .../vendor/golang.org/x/net/http2.readFrameHeader+0x7a                 /root/.go/src/.../vendor/golang.org/x/net/http2/frame.go:237
#       0x790033        .../vendor/golang.org/x/net/http2.(*Framer).ReadFrame+0xa3             /root/.go/src/.../vendor/golang.org/x/net/http2/frame.go:492
#       0x7c1542        .../vendor/google.golang.org/grpc/transport.(*http2Client).reader+0xc2 /root/.go/src/.../vendor/google.golang.org/grpc/transport/http2_client.go:1167

Are you saying that your rpc is stuck on a recv call even after the context has cancelled? This should never happen. Can you provide more details?
The read call that you're seeing in your stack trace is the reader go routine of a client's transport. It sits there waiting for data. But if an error occurred elsewhere and the transport was closed the underlying net.Conn will be closed and this goroutine will see that and return.
If this doesn't answer all your questions, please file a new issue and provide more information: What exactly are you doing? What behavior did you see? What did you expect?
Thanks.

Was this page helpful?
0 / 5 - 0 ratings