We are sending gzip compressed files to GCS.
We are looking to confirm that our comprehension of the cancelation of an object upload is correct.
We are also open to eventually contribute to the project to add some examples to the docs.
I think the documentation regarding cancellation is very limited, and it is also very hard to find the information. This is the case in the godocs and the web documentation.
First things first, here comes an extracted snippet of what we have:
var resolver GzipGCSWriterResolver
// Store stores gzip compressed file into GCS.
func Store(ctx context.Context, r io.Reader, bucket string, path string) (written int64, err error) {
ctx, cancel := context.WithCancel(ctx)
defer cancel()
w := resolver.Resolve(ctx, bucket, path)
written, err = io.Copy(w, r)
if err != nil {
cancel()
w.Close()
return
}
err = w.Close()
return
}
// GzipGCSWriterResolver resolves a writer that writes an object in Google Cloud Storage.
type GzipGCSWriterResolver struct {
client *storage.Client
}
// NewGzipGCSWriterResolver creates a new GzipGCSWriterResolver.
// Returns an error if no GCS client can be created.
func NewGzipGCSWriterResolver(ctx context.Context) (*GzipGCSWriterResolver, error) {
c, err := storage.NewClient(ctx)
if err != nil {
return nil, err
}
return &GzipGCSWriterResolver{
client: c,
}, nil
}
// Resolve resolves a writer that writes an object in Google Cloud Storage.
func (w *GzipGCSWriterResolver) Resolve(ctx context.Context, bucket string, name string) io.WriteCloser {
b := w.client.Bucket(bucket)
o := b.Object(name)
ctx, cancel := context.WithCancel(ctx)
writer := o.NewWriter(ctx)
writer.ContentType = "text/plain"
writer.ContentEncoding = "gzip"
return &gzipGCSWriter{
ctx: ctx,
cancelGCSWrite: cancel,
gcsWriter: writer,
gzipWriter: gzip.NewWriter(writer),
}
}
// Close closes the underlying GCS client.
func (w *GzipGCSWriterResolver) Close() error {
return w.client.Close()
}
type gzipGCSWriter struct {
ctx context.Context
cancelGCSWrite context.CancelFunc
gcsWriter *storage.Writer
gzipWriter io.WriteCloser
}
func (w *gzipGCSWriter) Write(p []byte) (n int, err error) {
return w.gzipWriter.Write(p)
}
func (w *gzipGCSWriter) Close() error {
// When the context is canceled, the GCS writer will abort its job and ensure no object is stored.
// It is important NOT to Close the GCS writer if something went wrong.
// See: https://godoc.org/cloud.google.com/go/storage#Writer.CloseWithError
// By deferring the context cancellation, we ensure it either gets called if an error occurs
// or we release the context after the GCS writer is closed in a success scenario.
defer w.cancelGCSWrite()
err := w.gzipWriter.Close()
if err != nil {
return err
}
err = w.ctx.Err()
if err != nil {
// Avoid closing the GCS writer if it has been already closed.
// It is automatically closed when the context is canceled.
return err
}
return w.gcsWriter.Close()
}
Our understanding is that in order to abort a transmission of a file, we should cancel the context passed to the NewWriter func.
In our case we are wrapping the GCS writer with a gzip compressor.
What we tried to do in the above is only Close the whenever everything else succeeded.
Reason being that it seems that calling Close on an already closed item, or an item that would already be canceled would result in the connection being opened again by calling https://github.com/GoogleCloudPlatform/google-cloud-go/blob/master/storage/writer.go#L81.
The godocs says the following:
It is the caller's responsibility to call Close when writing is done.
I think we should also add: "It is the caller's responsibility to either call Close when writing is done or to cancel the context to abort the transmission and clear the underlying resources."
I feel like our code above is a bit cumbersome and error prone for maintainers to understand.
Is there something we didn't get right or something that is redundant and not needed?
Ideally we are looking for a way that the user of the GzipGCSWriterResolver wouldn't have to worry about the behaviour of the storage.Writer.
However here we have an impedance mismatch between the idiom of always closing a classic writer to free resources and the storage writer that needs not to be closed in case of error.
I hope this all makes sense and that you would be able to help make this implementation a bit more robust.
I'd be happy to be pointed to an good implementation of a gzip writer with correct handling of errors, we've been unable to find a single example that properly leveraged the cancelation and that was gzipping.
I think we should also add: "It is the caller's responsibility to either call Close when writing is done or to cancel the context to abort the transmission and clear the underlying resources."
What do you mean by "clear the underlying resources"? I believe canceling the context is sufficient. Close() does not do any cleanup really; it just makes sure subsequent calls panic.
I would agree that changing our documentation from "It is the caller's responsibility to call Close when writing is done." to "It is the caller's responsibility to call Close or to cancel the context when writing is done." seems reasonable, though.
+ @jba in case I got any part of that wrong.
I believe canceling the context is sufficient. Close() does not do any cleanup really
If it doesn't do any cleanup, then I'm fine with It is the caller's responsibility to call Close or to cancel the context when writing is done.
@jadekler, your phrasing suggests it's okay to cancel the context on a successful write ("writing is done" doesn't distinguish between the two). Maybe
It is the caller's responsibility to call Close after writing completes successfully. To stop writing without saving the data, cancel the context.
Does that work?
it's okay to cancel the context on a successful write
That is fine right? as long as the close has been called before?
Oh, good distinction! https://code-review.googlesource.com/c/gocloud/+/32790
Most helpful comment
@jadekler, your phrasing suggests it's okay to cancel the context on a successful write ("writing is done" doesn't distinguish between the two). Maybe
Does that work?