Google-cloud-go: Canceled GCS Multipart upload leaks HTTP2 streams and leads to REFUSED_STREAM errors

Created on 6 Nov 2018  路  7Comments  路  Source: googleapis/google-cloud-go

Client

Google Cloud Storage

Describe Your Environment

SDK is used in Hashicorp Vault running on Ubuntu 18.04, on GCE

Expected Behavior

When canceling a context passed to ObjectHandle.NewWriter(ctx), all resources associated with the request should be cleaned up and it won't affect subsequent operations

Actual Behavior

I have a minimal reproduction test case here: https://github.com/KJTsanaktsidis/refused_stream_repro

The reproduction tries to upload lots of files to GCS with ObjectHandle.NewWriter(ctx), and cancels some of the contexts at random times. Eventually, after running for a few minutes, all uploads start returning the following error:

Post https://www.googleapis.com/upload/storage/v1/b/kjs_cool_vault_bucket/o?alt=json&prettyPrint=false&projection=full&uploadType=multipart: stream error: stream ID 45457; REFUSED_STREAM

I used a packet capture to debug the communication between Vault and GCS, and I found that the SDK would often

  • Create a new HTTP2 stream to perform the multipart uplaod
  • Send a HEADERS frame for POST /upload/storage/v1/b/vaultgcs_backend_us1_staging/o?alt=json&prettyPrint=false&projection=full&uploadType=multipart
  • Never send a subsequent DATA frame for the request body, and
  • Never send a subsequent RST_STREAM frame to clean up the stream

So, eventually, the GCS server starts sending REFUSED_STREAM errors back to the client when any new upload attempt is made, because the number of forgotten-about upload streams has exceeded the servers HTTP2 MAX_CONCURRENT_STREAM value of 100.

This issue seems to be the cause of https://github.com/hashicorp/vault/issues/5419

storage investigating

All 7 comments

This is likely related to #753, as well.

Another related issues: golang/go#20985.

Thanks for the reproduction @KJTsanaktsidis. It's been very helpful.

Meanwhile, here's another related issue: golang/go#27208.

Yup - https://github.com/golang/go/issues/27208 is exactly it!

I tried my reproduction using golang.org/x/net from the provided PR from that issue https://github.com/golang/net/pull/18 (see branch working_version in my repo). This seems to have fixed the issue - I no longer got STREAM_REFUSED errors!

So I think this isn't a bug in google-cloud-go at all, and should go away on go 1.12?

Well done!

Provided the patch lands in Go 1.12, yes, the problem should go away. I'll ping the Go issue.

Awesome - thanks heaps for your help in joining these dots!

Since this is an issue outside this repo, I'm going to close it.

Thanks again for all your help getting to the bottom of this.

Was this page helpful?
0 / 5 - 0 ratings