cli.Bucket(bucketName).Object(object).NewReader(ctx)
go version go1.11.4 linux/amd64
machines: Ubuntu 18.10(bare metal), CentOS Linux release 7.6.1810(GCE)
[[projects]]
name = "cloud.google.com/go"
...
revision = "458e1f376a2b44413160b5d301183b65debaa3f6"
version = "v0.37.2"
[[projects]]
name = "google.golang.org/api"
...
revision = "bce707a4d0ea3488942724b3bcc1c8338f38f991"
version = "v0.3.0"
Start several parallel goroutines which try to call NewReader on object and cancel the context afterwards, some of goroutines share the same context, so when one is to cancel the context another handles the context canceled error and process continues the execution.
Detailed code to this behavior is represented here
No errors to appear during the whole period of test execution
After several minutes of execution NewReader fails with write on closed buffer error
2019/04/04 17:13:09 send err: error on get reader: Get https://storage.googleapis.com/chew-bucket/0_0: write on closed buffer
2019/04/04 17:13:09 send err: error on get reader: Get https://storage.googleapis.com/chew-bucket/0_0: write on closed buffer
2019/04/04 17:13:09 send err: error on get reader: Get https://storage.googleapis.com/chew-bucket/0_0: write on closed buffer
2019/04/04 17:13:09 send err: error on get reader: Get https://storage.googleapis.com/chew-bucket/0_0: write on closed buffer
2019/04/04 17:13:09 send err: error on get reader: Get https://storage.googleapis.com/chew-bucket/0_0: write on closed buffer
2019/04/04 17:13:09 send err: error on get reader: Get https://storage.googleapis.com/chew-bucket/0_0: write on closed buffer
2019/04/04 17:13:09 send err: error on get reader: Get https://storage.googleapis.com/chew-bucket/0_0: write on closed buffer
2019/04/04 17:13:09 send err: error on get reader: Get https://storage.googleapis.com/chew-bucket/0_0: write on closed buffer
2019/04/04 17:13:09 send err: error on get reader: Get https://storage.googleapis.com/chew-bucket/0_0: write on closed buffer
--- FAIL: TestG (60.31s)
tst1_test.go:79: error on get reader: Get https://storage.googleapis.com/chew-bucket/0_0: write on closed buffer
FAIL
Hello @lwsanty, thank you for the report and welcome to the Googleapis Go cloud project!
So I am curious, how come you'd like a download that has been cancelled explicitly (the actual request has been cancelled by you invoking cancel) to not return a single error?
@odeke-em thank you for your reply!
I probably was a bit unclear in my description of the issue, because in code I am ignoring context canceled errors and I'm expecting them to appear. On the other hand write on closed buffer is an error that confuses me
update
Also, I commented the cancel function call on line 127 of my example and reproduced this issue after 13 minutes.
So basically, I suspect that NewReader starts to fail with write on closed buffer after several minutes of concurrent calls...
Gotcha! My apologies for the late reply, there was a work freeze for almost 2 weeks but now am back.
Running that repro for a while on my Macbook Pro doesn't reproduce that error. Also would be nice not to have it as a test since there is a limit to which tests are stopped after running for a long time.
The write on closed buffer is definitely an x/net/http2 error. We just have to dig up how to reproduce it and perhaps.
If you can reliably reproduce this error, we can do HTTP/2 debugging when running it
GODEBUG=http2debug=2 go run main.go
Hi @odeke-em !
Sorry for being silent in a while.
I tried to reproduce my case with GODEBUG=http2debug=2 but gave up after several hours of execution.
However I still can reproduce the issue without this env
Hello @odeke-em
I want to explain to you where did the issue come from. We run n downloads from google storages and wait for m fastest results. After receiving the results, we cancel other requests by closing context. I think there is nothing illegal in this approach. However, we came across strange behaviors when write on closed buffer error appears or requests hang. It seems there is a race that makes a client unusable. Trying to minimize the reproduction we came up with this simplified version. Just tell us if you need additional information.
Hello @lwsanty and @chessman, I now get what y'all meant.
In deed, it is allowed to cancel requests and the error returned is context.Canceled, which you are handling or can be handled with the direct comparison. Thanks for articulating that problem patiently.
I've spent this afternoon trying to run scenarios to try to reproduce that "write on closed buffer" error but still in vain on my Macbook Pro: I have ran your provided tests at least 3 times, with max test timeouts of 50minutes but no dice for hours so unfortunately I still can't reproduce that error
GOOGLE_APPLICATION_CREDENTIALS=creds.json GO111MODULE=off go test -timeout=50m -v
The error is definitely produced by a faulty condition and reported by the x/net/http2 stack https://github.com/golang/go/blob/8b058cfbcec5dc3002ddd6cedc7070ff946b2257/src/net/http/h2_bundle.go#L3528

A data race could be plausible, but the only similar bug I could find reported in https://github.com/golang/go/issues/16974 which was fixed way before Go.1.11 with https://github.com/golang/net/commit/6c4ac8bdbf06a105c4baf3dcda28bd9b0fb15588
Let's try to make this but not using a *testing.T test.
Perhaps this can help: below I have made a mock GCS HTTP/2 server that'll serve content back after you've created a storage.NewReader.
Perhaps y'all can tweak the requests sequence to help with a minimal repro of the error?
package main
import (
"context"
"fmt"
"io"
"log"
"net/http"
"net/http/httptest"
"net/http/httputil"
"net/url"
"os"
"sync"
"time"
"golang.org/x/net/http2"
"cloud.google.com/go/storage"
"google.golang.org/api/option"
)
type withFixedHost struct {
overrideHost string
tr http.RoundTripper
}
func (wf *withFixedHost) RoundTrip(req *http.Request) (*http.Response, error) {
// Override the host.
req.URL.Host = wf.overrideHost
return wf.tr.RoundTrip(req)
}
func main() {
cst := httptest.NewUnstartedServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
fl := w.(http.Flusher)
cln := w.(http.CloseNotifier)
// Simulate a slow read
<-time.After(111 * time.Millisecond)
blob, _ := httputil.DumpRequest(r, false)
log.Printf("\033[33mRequest in:\n%s\033[00m\n\n", blob)
path := r.URL.Path
switch {
case path == "/b": // Bucket creation/retrieval.
w.Write([]byte(`{"kind":"storage#bucket","id":"bucket","name":"bucket"}`))
return
default:
for {
fmt.Fprintf(w, "%q:: Time right now: %s\n", path, time.Now().UTC())
fl.Flush()
select {
case <-time.After(500 * time.Millisecond):
// Just the normal pause, till the next body-line send.
case <-cln.CloseNotify():
// They've gone away, goodbye!
return
}
}
return
}
}))
http2.ConfigureServer(cst.Config, new(http2.Server))
cst.TLS = cst.Config.TLSConfig
cst.StartTLS()
defer cst.Close()
ux, _ := url.Parse(cst.URL)
mockServerHost := ux.Host
tr := &http.Transport{TLSClientConfig: cst.Config.TLSConfig}
if err := http2.ConfigureTransport(tr); err != nil {
log.Fatalf("Failed to configure http2 transport: %v", err)
}
tr.TLSClientConfig.InsecureSkipVerify = true
client := &http.Client{
Transport: &withFixedHost{
overrideHost: mockServerHost,
tr: tr,
},
}
sc, err := storage.NewClient(context.Background(), option.WithHTTPClient(client))
if err != nil {
log.Fatalf("Failed to create gcs client: %v", err)
}
defer sc.Close()
ctx, cancel := context.WithCancel(context.Background())
go func() {
<-time.After(1800 * time.Millisecond)
cancel()
defer println("Invoked cancel()")
}()
var wg sync.WaitGroup
defer wg.Wait()
nWorkers := 20
semaphore := make(chan struct{}, nWorkers)
for i := 0; i < 1e7; i++ {
semaphore <- struct{}{}
wg.Add(1)
go func() {
defer func() {
time.Sleep(313 * time.Millisecond)
<-semaphore
wg.Done()
}()
r, err := sc.Bucket("bucket").Object("object").NewReader(ctx)
if err != nil {
log.Printf("Failed to create GCS reader: %v", err)
return
}
n, err := io.Copy(os.Stdout, r)
log.Printf("n: %d err: %T %v\n\n", n, err, err)
defer r.Close()
}()
}
return
}
Thank you!
Kindly pinging y'all @lwsanty @chessman!
@odeke-em Sorry, I'm not working on this project anymore so I don't have access to Compute Engine. But the idea was to run it on the default Linux configuration (CentOS or Ubuntu). Maybe @tnamoradze can help you.
I believe this error is most likely to to the fact you are not reading the entire body of the request, or the Reader in this case. If you are able to provide a reproducer that can more consistently trigger this issue we might be able to help pinpoint the issue. Closing as I believe this is the cause and there seems to be no more follow-up from the reporters. Let me know though if you are still experiencing this issue and can provide a more consistent reproducer.