I have an issue where I am downloading large files (a few GB) from Google Cloud Storage using the download_to_filename method. My application needs to be able to transition into and out of offline mode nicely so during testing I was disconnecting from the network and found that if I disconnect in the middle of download_to_filename, the application just hangs and the function never returns.
Example code:
from google.cloud import storage
client = storage.Client('project-id')
bucket = client.get_bucket('bucket-name')
blob = storage.blob.Blob('Large-file.txt', bucket)
blob.download_to_filename(blob.name)
If I disconnect from the network just before calling blob.download_to_filename, then I get a ConnectionError from requests which is what I would expect. But if I disconnect from the network after blob.download_to_filename starts, the function just hangs. There is no timeout or exception raised.
I am using Python 3.6.1 on Windows 10 with these package versions:
google-api-core==1.3.0
google-auth==1.5.1
google-cloud-core==0.28.1
google-cloud-storage==1.10.0
google-resumable-media==0.3.1
googleapis-common-protos==1.5.3
I have not been able to find a good work-around to this, but in my opinion it would make sense for download_to_filename to take a timeout argument and raise an exception if it fails.
I realize that it does resume nicely after network connection is re-established, but in my application the state of those files can change while offline so that does not help me. So maybe having a timeout that by default is None and would maintain the existing behavior, but provides the option to force the function to return within a reasonable amount of time.
I also posted a question to StackOverflow about a possible solution (https://stackoverflow.com/questions/52239860/download-to-filename-hangs-if-network-disconnects-in-the-middle), but this is more of a request for an improvement to the API.
Also, I am not sure if this issue also applies to the upload_from_filename or not, but that might also require a similar modification.
/cc @frankyn
+1, I am experiencing a similar issue when uploading files as well.
Being able to set a timeout on such operations would help me recover more gracefully.
We might need to handle this in google-resumable-media. @dhermes WDYT?
@tseaver Sounds reasonable. Any good ideas on reproducing this?
@dhermes maybe monkey-patch requests.request with something which doesn't return?
@tseaver @dhermes We experience the same issue when uploading and our thread hangs forever. In our setup every few seconds we upload a relatively small file to a bucket (below 100 kB). And we depend on those files to exist.
What's the recommended workaround for this issue? Currently it makes our code not production-grade because of this problem...
Most helpful comment
@tseaver @dhermes We experience the same issue when uploading and our thread hangs forever. In our setup every few seconds we upload a relatively small file to a bucket (below 100 kB). And we depend on those files to exist.
What's the recommended workaround for this issue? Currently it makes our code not production-grade because of this problem...