Go: x/net/http2: misbehaved streams can cause connections to exhaust flow control

Created on 15 Oct 2018  Â·  16Comments  Â·  Source: golang/go

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go1.11 darwin/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/jnewman/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/jnewman/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/_t/hg9f_j4x1q743pqh00kdv6m00000gn/T/go-build803039253=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do? / What did you expect to see? / What did you see instead?

Use httputil.ReverseProxy with an http2.Transport. I'm still trying to find a minimal reproducible example.

Basically, the http2.Transport is closing an HTTP/2 stream and then sending a number of DATA frames on the connection. The http2.Server is detecting that the stream has been half-closed by the remote and is following RFC7540 and responding with an ErrCodeStreamClosed (see http2/server.go).

However, since the http2.Server does not send a window update the remote http2.Transport does not add back the sent DATA frames to its flow control. This causes a re-used http2.Transport to eventually expend all connection flow control and halt sending.

A proposed fix that I've applied to my fork is to make the following patch

diff --git a/http2/server.go b/http2/server.go
index 56859d1..7a39456 100644
--- a/http2/server.go
+++ b/http2/server.go
@@ -1586,6 +1586,8 @@ func (sc *serverConn) processData(f *DataFrame) error {
        // "open" or "half-closed (local)" state, the recipient MUST respond with a
        // stream error (Section 5.4.2) of type STREAM_CLOSED.
        if state == stateClosed {
+               sc.inflow.take(int32(f.Length))
+               sc.sendWindowUpdate(nil, int(f.Length))
                return streamError(id, ErrCodeStreamClosed)
        }
        if st == nil || state != stateOpen || st.gotTrailerHeader || st.resetQueued {

Although, one could argue that http2.Transport/httputil.Reverse proxy should be fixed to make sure that data frames aren't sent after a close.

NeedsInvestigation

All 16 comments

I think this might be related to https://github.com/golang/go/commit/8f38f28222abccc505b9a1992deecfe3e2cb85de#diff-d863507a61be206d112f6e00e6d812a2 since it appears that the http.Server marks the stream as stateClosed if the handler panics, even though the remote side hasn't sent a RESET_STREAM frame. Therefore, the stream should be in "half-closed (local)" and so should send back WINDOW_UPDATE frames but it's not.

/cc @bradfitz

Can you capture the logs (from setting GODEBUG=http2debug=2) while this happens?

I can capture some debug logs, I was using then to debug this and had to
add my own logging to capture the flow control values. Would you like me to
keep my log statements in there too?

On Tue, Oct 16, 2018, 7:58 PM Brad Fitzpatrick notifications@github.com
wrote:

Can you capture the logs (from setting GODEBUG=http2debug=2) while this
happens?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/28204#issuecomment-430472009, or mute
the thread
https://github.com/notifications/unsubscribe-auth/ACu-MlsmdhrhqhK-x99LEqbcpPVLjVLHks5ulpzhgaJpZM4XbrWd
.

We ran into the same issue, and I've put together a small repro using the gRPC client, and a gRPC server hosted via the x/net HTTP/2 server:
https://github.com/prashantv/grpc-http2-repro

This example doesn't contain use the reverse proxy, and the issue is exactly as @jared2501 pointed out -- when the server sends RST_STREAM with stream closed, it's not updating flow control, and so the client stops sending data.

When this test is run, the client reports a "deadline exceeded" (since it's waiting to send data, but never gets to it since the window is full):

--- FAIL: TestGRPCLargePayload (1.02s)
        require.go:794:
                        Error Trace:    repro_test.go:58
                                                                repro_test.go:59
                        Error:          Received unexpected error:
                                        rpc error: code = DeadlineExceeded desc = context deadline exceeded
                        Test:           TestGRPCLargePayload
                        Messages:       Expect following requests to succeed

I have the GODEBUG=http2debug=2 logs:
https://gist.github.com/prashantv/e181206b39b030ffea44b05f16596ac1

The client doesn't stop sending frames, and so the server keeps responding with:
RST_STREAM stream=1 len=4 ErrCode=STREAM_CLOSED. However, there's no corresponding WINDOW_UPDATE (which it did before the stream was closed).

This seems like a regression introduced in https://github.com/golang/net/commit/039a4258aec0ad3c79b905677cceeab13b296a77 -- before that change, the code would send a window update.

I also have the "fix" (update flow-control) on a branch called "fix", where the test mostly passes. However, around 1-2% of the time, it still fails with the same error -- I think there might be other paths where the server sends RST_STREAM without taking flow control into account.

Hey @prashantv - glad to see you can repro my issue! I looked over your example and it's possible you're also being hit by https://github.com/golang/go/issues/28634. Although, even with patches for this issue and #28634, I was still seeing some flow control issues when using grpc with a context with a timeout...

@prashantv You are correct. I did introduce a bug, but there is another one there too. Once we receive a data frame that does not cause a connection error, we must account for the data. The take() logic should be before all the additional state checks regardless of the path taken.

I believe there may also be a similar bug on the client side.

Change https://golang.org/cl/153977 mentions this issue: http2: Revert a closed stream cannot receive data

@bcmills @fraenkel - quick ping on this issue, the http/2 flow control stalls are still biting us a fair bit. (Sorry, I'm sure you guys probably have a lot of other stuff going on with go 1.12 release date upcoming)

Change https://golang.org/cl/159179 mentions this issue: net/http: update bundled http2

You don't need a misbehaved stream to trigger this. If the server doesn't read the request body, flow control gets exhausted.

Here's a client/server where the server ignores the body on the first few POST (it just WriteHeader). The client blocks because it's available() flow is 0.

It's broken (server-side) from 1.11.0 through 1.12beta2, and fixed in 1.12rc1 by https://github.com/golang/net/commit/ed066c81e75eba56dd9bd2139ade88125b855585 . We see this in production when the client is Chrome, as well as Go, so I don't think it's a misbehaving client.

Repro unit test: https://play.golang.org/p/EOUShlw2SNk (replace cert.pem, cert.key, and myhost).

Client

package main

import (
    "bytes"
    "log"
    "net/http"
)

func main() {
    for {
        body := bytes.Repeat([]byte{0}, 128*1024*1024)
        _, err := http.Post("https://myhost:8443", "", bytes.NewReader(body))
        if err != nil {
            log.Fatal(err)
        }
    }
}

Server

package main

import (
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
)

func main() {
    var i int
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        fmt.Println(i)
        i++
        if i <= 5 {
            w.WriteHeader(http.StatusAccepted)
            return
        }
        if _, err := ioutil.ReadAll(r.Body); err != nil {
            log.Fatal(err)
        }
        w.Write([]byte("OK"))
    })
    log.Fatal(http.ListenAndServeTLS(":8443","cert.pem","cert.key",nil))
}

If the above doesn't reproduce it for you try increasing if i <= 5 to 10 or 20.

To make this easier for others to find, could the issue's title be updated to something like "Not reading request body can cause connections to exhaust flow control"?

To make this easier for others to find, could the issue's title be updated to something like "Not reading request body can cause connections to exhaust flow control"?

The title should make it clear whether this is a Server or Transport issue. Which is it? I see discussion of both in here. (I haven't looked into this yet.)

I don't know enough to say if it's Server or Transport, but I guess Server. It's an accounting error in http2serverConn.flow, in the connection-level flow control.

What I think is happening is that when the HTTP handler returns without reading the body (say it returns a 404), any DATA frames in flight are counted towards the connection-level flow control by the client, but not by the server, so they get out of sync.

Once the HTTP handler returns the server closes the stream like this:

http2: Framer 0xc00013c1c0: wrote HEADERS flags=END_STREAM|END_HEADERS stream=5 len=26
http2: Framer 0xc00013c1c0: wrote RST_STREAM stream=5 len=4 ErrCode=NO_ERROR

The in-flight DATA frames are considered invalid:

http2: Framer 0xc00013c1c0: wrote RST_STREAM stream=5 len=4 ErrCode=STREAM_CLOSED

If I print out the flow control values (http2flow.n) at the connection level, and use the client/server from my previous comment, when it blocks the server has 1,073,807,359 bytes but the client has 0.

This is the same as @prashantv described above.

They added a clarification to the spec for this:

A receiver that receives a flow-controlled frame MUST always account
for its contribution against the connection flow-control window,
unless the receiver treats this as a connection error
(Section 5.4.1). This is necessary even if the frame is in error.
The sender counts the frame toward the flow-control window, but if
the receiver does not, the flow-control window at the sender and
receiver can become different.

In http2 the flow control values are always relative, which makes problems like this very difficult to catch (it survived all of Go 1.11.X). Personally I think the inability to re-sync flow control window sizes between client and server makes the protocol brittle.

I started a discussion about that on the IETF HTTP Working Group's mailing list, here https://lists.w3.org/Archives/Public/ietf-http-wg/2019JanMar/0155.html .

The emerging consensus seems to be that the protocol is fine but better debugging support would be welcome.

Heya, I was looking at the latest http2 server code and this looks to be fixed here. Is this issue worth closing?

The issue reported was fixed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

natefinch picture natefinch  Â·  3Comments

stub42 picture stub42  Â·  3Comments

ashb picture ashb  Â·  3Comments

longzhizhi picture longzhizhi  Â·  3Comments

myitcv picture myitcv  Â·  3Comments