All documentation points to the following being the best practice for uploading a large file
String bucketName = "my_unique_bucket";
String blobName = "my_blob_name";
BlobId blobId = BlobId.of(bucketName, blobName);
InputStream inputStream = new FileInputStream(new File("largefile.zip"));
BlobInfo blobInfo = BlobInfo.builder(blobId).contentType("application/octet-stream").build();
try (WriteChannel writer = storage.writer(blobInfo)) {
try {
while ((limit = inputStream.read(buffer)) >= 0) {
writer.write(ByteBuffer.wrap(buffer, 0, limit));
}
} catch (Exception ex) {
// handle exception
}
}
The question is how do we fail an upload if an exception is thrown midway?
WriteChannel finalizes the upload when close() is called. To avoid finalizing the upload when an exception is thrown you can use explicit close() instead of try-with-resources. Roughly, something like:
WriteChannel writer = storage.writer(blobInfo);
boolean uploadFailed = false;
try {
while ((limit = inputStream.read(buffer)) >= 0) {
writer.write(ByteBuffer.wrap(buffer, 0, limit));
}
} catch (Exception ex) {
uploadFailed = true;
// handle exception
}
if (!uploadFailed) {
writer.close();
}
We might consider adding an abort() method to WriteChannel to be called inside a catch statement to prevent following close() from finalizing the stream. /cc @aozarov
Thanks!
What about a GC finalizer that would close the writer if it is not explicitly cancelled?
@mziccard After som discussions internally we are still not convinced this would fail the upload due to it might get closed + flushed anyway by a finalizer at some later point.
Anybody know if the above statement holds true?
After som discussions internally we are still not convinced this would fail the upload due to it might get closed + flushed anyway by a finalizer at some later point.
I am not aware of any way to "close/abort" a resumable upload session, without actually finalizing the upload. My understanding is that the resumable upload session will expire after a while, if the upload is not finalized. @Capstan is surely more informed than I am (have to ping you directly as I see you are not yet part of the @GoogleCloudPlatform/cloud-storage team).
FYI @stephenplusplus
Edited for a _much_ simpler answer.
Use the DELETE verb on the upload URL. The response to this method and to all further attempts to upload and query status will yield
499 Client Closed Request
Content-Type: application/json; charset=UTF-8
{
"error": {
"errors": [
{
"domain": "global",
"reason": "clientClosedRequest",
"message": "clientClosedRequest"
}
],
"code": 499,
"message": "clientClosedRequest"
}
}
@Capstan (and @mziccard ) Thanks for the information, is the suggestion that gcloud-java would use this to implement an abort on a file write that fails mid-way?
@fonzy2013 Yes we could implement that but it's not a priority at the moment.
Is there a particular use case for deleting a resumable upload session? Regardless of the reason for which a chunk upload failed it seems like a better idea to retry uploading using the resumable session until it expires. Rather than deleting the session and opening a new one.
@fonzy2013 You'd really only want to delete a session if you somehow are no longer in control the session URL and know you need to abort it, e.g.,
The protocol as described by the Google Cloud Storage JSON documentation is essentially the Google Data resumable protocol, aka gdata, albeit minus the GData-Version and Slug headers. I've updated & republished those docs to include instructions on cancelation.
The usecase is that we download assets from another system and uploads them to GCS. The other system which is a build server deletes the assets after a time which can lead to halfuploaded assets to GCS.
We do retry today but we would rather have the start of the retry being clean of previous errors.
I'm not entirely sure what is meant by a session in this context as our uploader runs as a service and uploads all the time.
A bit more information that might be useful: we don't use resumable uploads.
What we would like to guarantee is that a halfwritten file is not commited to GCS.
The solution was not to close the writer which led to the question about finalizer or GC closing it automatically.
We should add abort() following Capstan's instructions.
This has been added to our feature backlog: https://github.com/GoogleCloudPlatform/google-cloud-java/wiki/Feature-backlog. This issue will be closed but is linked in the backlog and can continue to be used for comment and discussion.