go version)?$ go version go version go1.14.2 darwin/amd64
Yes
A hang was reported between a gRPC client (grpc-go v1.27.0) hitting a gRPC server in one of our production environments. The client and server are both running on the same host. I captured a core dump of the client and server code to analyze with delve. I noticed that the google.golang.org/grpc/internal/transport.loopyWriter.cbuf.sendQuota was 0 in the client code, which indicates that the client's connection-level send window had run out and was at 0. In the server's core dump, I tracked down the corresponding http2.serverConn and noticed that it's serverConn.inflow.n was set to 0 too. I then tracked down the two places in http2/server.go that call inflow.take and noticed what I believe is the issue in processData:
func (sc *serverConn) processData(f *DataFrame) error {
...
if f.Length > 0 {
// Check whether the client has flow control quota.
if st.inflow.available() < int32(f.Length) {
return streamError(id, ErrCodeFlowControl)
}
st.inflow.take(int32(f.Length))
if len(data) > 0 {
wrote, err := st.body.Write(data)
if err != nil {
return streamError(id, ErrCodeStreamClosed)
}
if wrote != len(data) {
panic("internal error: bad Writer")
}
st.bodyBytes += int64(len(data))
}
// Return any padded flow control now, since we won't
// refund it later on body reads.
if pad := int32(f.Length) - int32(len(data)); pad > 0 {
sc.sendWindowUpdate32(nil, pad)
sc.sendWindowUpdate32(st, pad)
}
}
...
In this code, st.inflow.take is called, but if st.body.Write returns an error then the flow control is not refunded to the client since the code bails and returns a streamError (nor is it added to the st.body's pipeBuffer since pipe.Write returns immediately if it has an error to return).
Side note: st.body.Write may return an error if st.body.Close is called. The server which had this issue is using grpc-go's serverHandlerTransport which does, in fact, call req.Body.Close (see here). A gRPC bi-directional streaming endpoint is running between the client and server, and what i suspect is happening is the client is sending the server data over the bi-di stream while an error happens in the gRPC server that causes the request to end, and therefore req.Body.Close to be called while data is in flight.
Here's what I think a possible fix to net/http2 could look like:
diff --git a/http2/server.go b/http2/server.go
index 01f4ecc..ba3ebd1 100644
--- a/http2/server.go
+++ b/http2/server.go
@@ -1650,6 +1650,7 @@ func (sc *serverConn) processData(f *DataFrame) error {
if len(data) > 0 {
wrote, err := st.body.Write(data)
if err != nil {
+ sc.sendWindowUpdate32(nil, int32(f.Length))
return streamError(id, ErrCodeStreamClosed)
}
if wrote != len(data) {
cc @fraenkel
Change https://golang.org/cl/245158 mentions this issue: net/http2: send WINDOW_UPDATE on a body's write failure
/cc @bradfitz @tombergan
Thanks for looking into this so quickly! Once the fix is merged, is there any way we could get this fix backported to 1.14 (or make it's way into a 1.15 release)?
Hello, i'm suspecting that I might have run into this issue when talking to GCS using the google-cloud-go package. It's a service that more or less proxies HTTP to GCS and under quite high load it usually stops talking to GCS after ~15min.
When running with GODEBUG=http2debug=2 this is what is see just before it stops doing requests:
...
2020/08/05 15:13:06 http2: Framer 0xc0000c8460: wrote HEADERS flags=END_STREAM|END_HEADERS stream=3067 len=121
2020/08/05 15:13:11 http2: Framer 0xc0000c8460: wrote WINDOW_UPDATE stream=3061 len=4 incr=5120
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=21 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=21 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=23 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=23 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=2725 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=2725 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=2971 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=2971 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=3037 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=3037 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=3053 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=3053 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=3067 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=3067 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=3061 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=3061 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=2717 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=2717 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Framer 0xc0000c8460: read RST_STREAM stream=547 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport received RST_STREAM stream=547 len=4 ErrCode=INTERNAL_ERROR
2020/08/05 15:19:06 http2: Transport encoding header ":authority" = "storage.googleapis.com"
2020/08/05 15:19:06 http2: Transport encoding header ":method" = "GET"
2020/08/05 15:19:06 RoundTrip failure: stream error: stream ID 21; INTERNAL_ERROR
...
Or is this something else or i'm doing something wrong?
cc @bcmills @neild
Heya, just checking in post-1.15 release to see if we can get the CR above merged? I wouldn't be surprised if @wadells' issue above is the result of this bug, and I suspect it wouldn't be surprised if this bug is affecting several other users (basically anyone that uses the gRPC http.Handler server).
cc @menghanl
Let me know if there something i can help with, try to create a small reproduction case etc?
Does anyone know if this is regression? is it an issue with specific combination of client and server implementation and version?
@wader it's super easy to test if the patch fixes your issue - you can just apply it to a fork of https://github.com/golang/net, and then use go module's replace statement to depend upon your fork. If you can no longer repro the issue then this is the root cause!
@jared2501 Thanks, good idea. Tried with patched net package (hope i patched it correctly, seems c is now sc) but get similar errors.
...
2020/08/25 17:03:25 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=63 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Transport received RST_STREAM stream=63 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=93 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Transport received RST_STREAM stream=93 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=73 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Transport received RST_STREAM stream=73 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=77 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Transport received RST_STREAM stream=77 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=87 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Transport received RST_STREAM stream=87 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=79 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Transport received RST_STREAM stream=79 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=95 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Transport received RST_STREAM stream=95 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=91 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Transport received RST_STREAM stream=91 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=3449 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Transport received RST_STREAM stream=3449 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=3525 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Transport received RST_STREAM stream=3525 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=97 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Transport received RST_STREAM stream=97 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=99 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Transport received RST_STREAM stream=99 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 RoundTrip failure: stream error: stream ID 77; INTERNAL_ERROR
2020/08/25 17:03:25 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=3437 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Transport received RST_STREAM stream=3437 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=3439 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Transport received RST_STREAM stream=3439 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=3519 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Transport received RST_STREAM stream=3519 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=3527 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 http2: Transport received RST_STREAM stream=3527 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:03:25 RoundTrip failure: stream error: stream ID 73; INTERNAL_ERROR
2020/08/25 17:03:25 RoundTrip failure: stream error: stream ID 93; INTERNAL_ERROR
...
Patch with HEAD at c89045814202410a2d67ec20ecf177ec77ceae7f
diff --git a/http2/server.go b/http2/server.go
index 345b7cd..8f4d17a 100644
--- a/http2/server.go
+++ b/http2/server.go
@@ -1694,6 +1694,7 @@ func (sc *serverConn) processData(f *DataFrame) error {
if len(data) > 0 {
wrote, err := st.body.Write(data)
if err != nil {
+ sc.sendWindowUpdate32(nil, int32(f.Length))
return streamError(id, ErrCodeStreamClosed)
}
if wrote != len(data) {
go.mod
module ...
go 1.14
require (
cloud.google.com/go v0.62.0 // indirect
gocloud.dev v0.20.0
)
replace golang.org/x/net => /Users/wader/src/net
Left it running for while after the error and it seems to be able to service some request after a while. Not sure why it's exactly 10 minutes between cancel and new errors. Could be some timeout duration in the client using the service also i guess?
2020/08/25 17:29:29 http2: Framer 0xc0002fe1c0: wrote RST_STREAM stream=3843 len=4 ErrCode=CANCEL
2020/08/25 17:35:29 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=3901 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Transport received RST_STREAM stream=3901 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=3905 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Transport received RST_STREAM stream=3905 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=3903 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Transport received RST_STREAM stream=3903 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=3915 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Transport received RST_STREAM stream=3915 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=3907 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Transport received RST_STREAM stream=3907 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=3913 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Transport received RST_STREAM stream=3913 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=3911 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Transport received RST_STREAM stream=3911 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=3909 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Transport received RST_STREAM stream=3909 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=3917 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Transport received RST_STREAM stream=3917 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Framer 0xc0002fe1c0: read RST_STREAM stream=3867 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 http2: Transport received RST_STREAM stream=3867 len=4 ErrCode=INTERNAL_ERROR
2020/08/25 17:35:29 RoundTrip failure: stream error: stream ID 3903; INTERNAL_ERROR
2020/08/25 17:35:29 RoundTrip failure: stream error: stream ID 3901; INTERNAL_ERROR
2020/08/25 17:35:29 RoundTrip failure: stream error: stream ID 3913; INTERNAL_ERROR
Got ya, yeah sounds like it's not the issue then. FWIW, I've applied the linked patch to our production environments and have seen no further hangs. Before the patch, we were getting hangs once a day.
@networkimprov @cagedmantis @fraenkel - sorry for the ping here, but any way we could find a person to CR https://go-review.googlesource.com/c/net/+/245158/? The fix and test look simple enough to me, and I can confirm this fix works in our prod env.
cc @ianlancetaylor
@jared2501 Ok, i get those error after 30min-1h. Are these kind of error "normal" to get if you run into throttling/quotas etc?
@andybons @ianlancetaylor @rsc - just a quick ping on this. We've applied the above patch to our production environment and seen it fix the above issues. Would it be possible to get https://go-review.googlesource.com/c/net/+/245158/ merged and back ported onto 1.15 and/or 1.14?
@gopherbot Please open backport issues.
This problem can reportedly cause an HTTP/2 connection to hang. There doesn't seem to be any reasonable workaround.
Backport issue(s) opened: #41386 (for 1.14), #41387 (for 1.15).
Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.
@ianlancetaylor - any chance we can find someone else to +1 the CR?
cc @odeke-em as possible reviewer
Thank you for the ping @networkimprov :) I've provided a review after Ian's and should be ready to roll out soon, and then we'll need to pull it into net/http/h2_bundle.go
Woo, thank you! Can't wait to get this in the next 1.14/1.15 hot fixes!
Btw @networkimprov - does anything have to be done to get the cherry picks made, or do those get picked up automatically?
If no changes are required for the 1.14 & 1.15 branches, I think the release folks can cherrypick the CL. You might want to ping the CL author on the two backport issues to ask about it.
Change https://golang.org/cl/258359 mentions this issue: src/go.mod, net/http: update bundled and latest golang.org/x/net
Thank you @jared2501 and @networkimprov for raising it!
So this is a tricky one to backport given that it is from x/net/http2 that then needs to be bundled into net/http/h2_bundle.go, so to begin with:
1.) I have mailed out CL 258359 firstly to net/http/h2_bundle.go
2.) After that merge, I'll make backports to Go1.14 and Go1.15
and after that merge, we'll be good to go.
However, please note that those issues are still "CherryPickCandidate" labelled, thus will only be released if approved and when the releases are being cut.
Change https://golang.org/cl/258478 mentions this issue: [release-branch.go1.15] net/http2: send WINDOW_UPDATE on a body's write failure
Change https://golang.org/cl/258497 mentions this issue: [release-branch.go1.14] net/http2: send WINDOW_UPDATE on a body's write failure
Change https://golang.org/cl/258537 mentions this issue: [release-branch.go1.14] net/http: backport HTTP2 send WINDOW_UPDATE on a body's write failure
Change https://golang.org/cl/258538 mentions this issue: [release-branch.go1.14] net/http: backport HTTP2 send WINDOW_UPDATE on a body's write failure
Change https://golang.org/cl/258540 mentions this issue: [release-branch.go1.15] net/http: backport HTTP2 send WINDOW_UPDATE on a body's write failure
Awesome, thank you @odeke-em & @networkimprov for your quick responses and effort pushing this through!
Change https://golang.org/cl/261919 mentions this issue: [release-branch.go1.15] src, net/http: update vendor, h2_bundle.go regeneration
Change https://golang.org/cl/264058 mentions this issue: [release-branch.go1.15-bundle] net/http2: send WINDOW_UPDATE on a body's write failure