My gRPC invocation is stuck for nearly 5 minutes, even I set a timeout of 300ms in my context. This phenomenon appears about 1~2 times a day.
Here is my code:
// client is myproto.MyServiceClient
ctx, cancel := context.WithTimeout(context.Background(), 300*time.Millisecond)
defer cancel()
resp, err := client.MyMethod(ctx, &myproto.Empty{})
And this is the stack trace collected from pprof:
1 @ 0x42e5da 0x428e9b 0x4285f9 0x4782d8 0x4783a4 0x47aa47 0x48a9c0 0x60c887 0x60cd23 0x60dd9b 0x64d385 0x64d2eb 0x6f434c 0x6e7d5d 0x5597b6 0x55a283 0x559b3c 0x74e612 0x597abd 0x5985aa 0x45f301
# 0x4285f8 net.runtime_pollWait+0x58 /usr/local/Cellar/go/1.7.5/libexec/src/runtime/netpoll.go:160
# 0x4782d7 net.(*pollDesc).wait+0x37 /usr/local/Cellar/go/1.7.5/libexec/src/net/fd_poll_runtime.go:73
# 0x4783a3 net.(*pollDesc).waitWrite+0x33 /usr/local/Cellar/go/1.7.5/libexec/src/net/fd_poll_runtime.go:82
# 0x47aa46 net.(*netFD).Write+0x346 /usr/local/Cellar/go/1.7.5/libexec/src/net/fd_unix.go:329
# 0x48a9bf net.(*conn).Write+0x6f /usr/local/Cellar/go/1.7.5/libexec/src/net/net.go:185
# 0x60c886 crypto/tls.(*Conn).write+0x196 /usr/local/Cellar/go/1.7.5/libexec/src/crypto/tls/conn.go:819
# 0x60cd22 crypto/tls.(*Conn).writeRecordLocked+0x342 /usr/local/Cellar/go/1.7.5/libexec/src/crypto/tls/conn.go:895
# 0x60dd9a crypto/tls.(*Conn).Write+0x1fa /usr/local/Cellar/go/1.7.5/libexec/src/crypto/tls/conn.go:1047
# 0x64d384 bufio.(*Writer).flush+0x74 /usr/local/Cellar/go/1.7.5/libexec/src/bufio/bufio.go:563
# 0x64d2ea bufio.(*Writer).Flush+0x2a /usr/local/Cellar/go/1.7.5/libexec/src/bufio/bufio.go:552
# 0x6f434b google.golang.org/grpc/transport.(*framer).writeData+0x8b /Users/chengyufeng/go/src/google.golang.org/grpc/transport/http_util.go:408
# 0x6e7d5c google.golang.org/grpc/transport.(*http2Client).Write+0x3ec /Users/chengyufeng/go/src/google.golang.org/grpc/transport/http2_client.go:694
# 0x5597b5 google.golang.org/grpc.sendRequest+0x305 /Users/chengyufeng/go/src/google.golang.org/grpc/call.go:121
# 0x55a282 google.golang.org/grpc.invoke+0x712 /Users/chengyufeng/go/src/google.golang.org/grpc/call.go:244
# 0x559b3b google.golang.org/grpc.Invoke+0x19b /Users/chengyufeng/go/src/google.golang.org/grpc/call.go:143
...
The stack trace of this goroutine stays like that for about 5 minutes. It seems get blocked on socket write.
It seems that framer.writeData calls http2.Framer.WriteData, which will eventually flush some data into the framer's underlying net.Conn. net.Conn.Write cannot be canceled by the context and this gRPC call will block if the underlying net.Conn hangs.
One solution is to queue all frame writes into a channel, and use a goroutine to flush the frames from the channel into the underlying connection. This way, we can do select on the send channel and avoid blocking if the underlying connection hangs.
@MakMukhi please take a look
Hello, is there any other update or workaround to this issue?
This should've been fixed by #1498
On Sep 25, 2017 5:43 PM, "Ashwanth Kumar" notifications@github.com wrote:
Hello, is there any other update or workaround to this issue?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/grpc/grpc-go/issues/1078#issuecomment-332052892, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ATtnR0pBdl2jd4gDaRRCSSwZgfNp2h_Pks5smEiwgaJpZM4MF4Ys
.
What about read connections? I've a stuck connection in readFrame
# 0x4297f6 internal/poll.runtime_pollWait+0x56 /usr/lib/go-1.9/src/runtime/netpoll.go:173
# 0x498f8d internal/poll.(*pollDesc).wait+0xad /usr/lib/go-1.9/src/internal/poll/fd_poll_runtime.go:85
# 0x49900c internal/poll.(*pollDesc).waitRead+0x3c /usr/lib/go-1.9/src/internal/poll/fd_poll_runtime.go:90
# 0x499da9 internal/poll.(*FD).Read+0x189 /usr/lib/go-1.9/src/internal/poll/fd_unix.go:126
# 0x50c7f1 net.(*netFD).Read+0x51 /usr/lib/go-1.9/src/net/fd_unix.go:202
# 0x51ed8c net.(*conn).Read+0x6c /usr/lib/go-1.9/src/net/net.go:176
# 0x540f5a bufio.(*Reader).Read+0x30a /usr/lib/go-1.9/src/bufio/bufio.go:213
# 0x475335 io.ReadAtLeast+0x85 /usr/lib/go-1.9/src/io/io.go:309
# 0x4754a7 io.ReadFull+0x57 /usr/lib/go-1.9/src/io/io.go:327
# 0x78f6ca .../vendor/golang.org/x/net/http2.readFrameHeader+0x7a /root/.go/src/.../vendor/golang.org/x/net/http2/frame.go:237
# 0x790033 .../vendor/golang.org/x/net/http2.(*Framer).ReadFrame+0xa3 /root/.go/src/.../vendor/golang.org/x/net/http2/frame.go:492
# 0x7c1542 .../vendor/google.golang.org/grpc/transport.(*http2Client).reader+0xc2 /root/.go/src/.../vendor/google.golang.org/grpc/transport/http2_client.go:1167
Are you saying that your rpc is stuck on a recv call even after the context has cancelled? This should never happen. Can you provide more details?
The read call that you're seeing in your stack trace is the reader go routine of a client's transport. It sits there waiting for data. But if an error occurred elsewhere and the transport was closed the underlying net.Conn will be closed and this goroutine will see that and return.
If this doesn't answer all your questions, please file a new issue and provide more information: What exactly are you doing? What behavior did you see? What did you expect?
Thanks.
Most helpful comment
@MakMukhi please take a look