Dvc: Failed to upload to GCS due to connection time out

Created on 4 Oct 2019  路  8Comments  路  Source: iterative/dvc

Please provide information about your setup
DVC version(i.e. dvc --version), Platform and method of installation (pip, homebrew, pkg Mac, exe (Windows), DEB(Linux), RPM(Linux))

DVC version: 0.61.2
Method of installation: pip
Platform: Mac

As discussed in Discord, I'm having an issue uploading a dataset to GCS with size of ~50MB. I'm using google-cloud-storage=1.19.0. Below is the error message using -v.

DEBUG: PRAGMA user_version;
DEBUG: fetched: [(3,)]
DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
DEBUG: PRAGMA user_version = 3;
DEBUG: Preparing to upload data to 'gs://dse/dvc/dataset_20191004'
DEBUG: Preparing to collect status from gs://dse/dvc/dataset_20191004
DEBUG: Collecting information from local cache...
DEBUG: Path .dvc/cache/a7/a404b4826cca3b01dd5d8e6326de3e inode 7438277                                                                                                                                  
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?                                                                                                                                      
DEBUG: fetched: [('1570168433714557952', '54785677', 'a7a404b4826cca3b01dd5d8e6326de3e', '1570174167312645120')]                                                                                        
DEBUG: UPDATE state SET timestamp = ? WHERE inode = ?                                                                                                                                                   
DEBUG: cache '.dvc/cache/a7/a404b4826cca3b01dd5d8e6326de3e' expected 'a7a404b4826cca3b01dd5d8e6326de3e' actual 'a7a404b4826cca3b01dd5d8e6326de3e'                                                       
DEBUG: Collecting information from remote cache...                                                                                                                                                      
DEBUG: Uploading '.dvc/cache/a7/a404b4826cca3b01dd5d8e6326de3e' to 'gs://dse/dvc/dataset_20191004/a7/a404b4826cca3b01dd5d8e6326de3e'                 
ERROR: failed to upload '.dvc/cache/a7/a404b4826cca3b01dd5d8e6326de3e' to 'gs://dse/dvc/dataset_20191004/a7/a404b4826cca3b01dd5d8e6326de3e' - ('Connection aborted.', timeout('The write operation timed out',))

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
DEBUG: SELECT count from state_info WHERE rowid=?
DEBUG: fetched: [(2,)]
DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?
ERROR: failed to push data to the cloud - 1 files failed to upload
------------------------------------------------------------
Traceback (most recent call last):
  File "dvc/command/data_sync.py", line 50, in run
  File "dvc/repo/__init__.py", line 33, in wrapper
  File "dvc/repo/push.py", line 28, in push
  File "dvc/data_cloud.py", line 63, in push
  File "dvc/remote/local/__init__.py", line 403, in push
  File "dvc/remote/local/__init__.py", line 393, in _process
dvc.exceptions.UploadError: 1 files failed to upload

I have tried to push using option -j 1. But still encounter the same issue.

bug p0-critical research

Most helpful comment

For the record: tried running minimal example of:

from google.cloud.storage import Client

client = Client("myprojectname")

bucket = client.bucket("mybucket")
blob = bucket.blob("mypath")
blob.upload_from_filename("path/to/my/local/file/to/upload")

and user was still experiencing the same error. Though, turned out, that lowering chunk_size from default 100M to minimal 256KB made this work. ~8M worked too in this example, but dvc push was still failing, as it seems like our multithreaded upload was overwhelming the network.

It is also important to note that gsutil works differently(it is not even using google-cloud-storage package) and has dynamic chunk size https://github.com/googleapis/google-cloud-dotnet/issues/1480#issuecomment-330962961 . It might be worth it implementing a similar thing for us too(maybe even consider contributing it back to google-cloud-storage).

All 8 comments

For the record: Another user is experiencing a similar issue: https://discordapp.com/channels/485586884165107732/485596304961962003/629269604614799370

Not able to reproduce this on linux, will try on mac later.

For the record: both users are from the same team.

For the record: tried running minimal example of:

from google.cloud.storage import Client

client = Client("myprojectname")

bucket = client.bucket("mybucket")
blob = bucket.blob("mypath")
blob.upload_from_filename("path/to/my/local/file/to/upload")

and user was still experiencing the same error. Though, turned out, that lowering chunk_size from default 100M to minimal 256KB made this work. ~8M worked too in this example, but dvc push was still failing, as it seems like our multithreaded upload was overwhelming the network.

It is also important to note that gsutil works differently(it is not even using google-cloud-storage package) and has dynamic chunk size https://github.com/googleapis/google-cloud-dotnet/issues/1480#issuecomment-330962961 . It might be worth it implementing a similar thing for us too(maybe even consider contributing it back to google-cloud-storage).

@syahrulhamdani How is it going, need any help? :slightly_smiling_face:

Another user is experiencing this issue https://discordapp.com/channels/485586884165107732/485596304961962003/636843168356368387 . Bumping up the priority.

Hi @efiop, sorry for the super late respond. I'm just starting to work on this.
So, I've been thinking few things to implement. But, looks like you've done it. How's it?

@syahrulhamdani Sorry, should've notified you about the progress. Indeed, I've bumped the priority since more users ran into this, so I've submitted #2661 . Could you try it out and tell us if it works for you as well?

@syahrulhamdani The patch was released in 0.66.0, please upgrade and give it a try 馃檪 Thanks for the feedback!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

TezRomacH picture TezRomacH  路  3Comments

dnabanita7 picture dnabanita7  路  3Comments

analystanand picture analystanand  路  3Comments

tc-ying picture tc-ying  路  3Comments

siddygups picture siddygups  路  3Comments