Dvc: Google Drive pulls (via links) not working from different google accounts

Created on 1 Sep 2020  ยท  10Comments  ยท  Source: iterative/dvc

Bug Report

Please provide information about your setup

I am trying to set up a google drive folder as a remote (following the QuickStart section of https://dvc.org/doc/user-guide/setup-google-drive-remote) , in a way that different users with access to that folder can dvc pull from it.
After setting up the remote with the suggested commands and pushing the wanted data to the drive from the original repository, I decided to try it out in a different repository by pulling those same data.

If I use the same google account to pull the data, that is, the account that owns the Google Drive folder, everything functions properly and the data is downloaded:

$ dvc pull
  0% Querying cache in gdrive://1paaaG4GUnX6JbIA8YjgaIoBny-75PWOr| |0/1 [00:00<?Go to the following link in your browser:

    https://accounts.google.com/o/oauth2/auth?client_id=710796635688-iivsgbgsb6uv1fap6635dhvuei09o66c.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.appdata&access_type=offline&response_type=code&approval_prompt=force

Enter verification code: 4/3gGXwx3Az_jqvajzFgs83fnIuJB5MHGtl5Yx5lxYjGvp5DgfItbzfOo
Authentication successful.
A       data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_60m.tif                                                                   
A       data/classifier.p                                                       
A       data/COS/                                                               
A       data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_20m.tif                                                                   
A       data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_10m.tif
5 files added and 6 files fetched

However, if I try to pull the data using a different google account, that has access (tried with both read-only and read-write) to the google drive folder, but that is not the account that created said folder, some of the files end up not being pulled:

$ dvc pull
  0% Querying cache in gdrive://1paaaG4GUnX6JbIA8YjgaIoBny-75PWOr| |0/1 [00:00<?Go to the following link in your browser:

    https://accounts.google.com/o/oauth2/auth?client_id=710796635688-iivsgbgsb6uv1fap6635dhvuei09o66c.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.appdata&access_type=offline&response_type=code&approval_prompt=force

Enter verification code: 4/3gHwqEbvv9UFkGO6gWolMrPFZ7TQL8-mCiTZf9Ff6iLhIXdzI1kp82M
Authentication successful.
WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files:                                                               
name: data/classifier.p, md5: c5fca7973ca9dd67fa01e4838c28894d
name: data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_20m.tif, md5: cd964055edcd27e510c8678e0cbfc762
name: data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_10m.tif, md5: c5f3b2745e4bdda26c967b97cb4d87ae
name: data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_60m.tif, md5: 2c46e072ade3aff18a28819a811ddedc
WARNING: Cache 'c5f3b2745e4bdda26c967b97cb4d87ae' not found. File 'data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_10m.tif' won't be created.                                                                     
WARNING: Cache 'c5fca7973ca9dd67fa01e4838c28894d' not found. File 'data/classifier.p' won't be created.
WARNING: Cache '2c46e072ade3aff18a28819a811ddedc' not found. File 'data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_60m.tif' won't be created.
WARNING: Cache 'cd964055edcd27e510c8678e0cbfc762' not found. File 'data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_20m.tif' won't be created.
A       data/COS/                                                               
1 file added and 4 files failed
ERROR: failed to pull data from the cloud - Checkout failed for following targets:
data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_10m.tif
data/classifier.p
data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_60m.tif
data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_20m.tif
Is your cache up to date?
<https://error.dvc.org/missing-files>

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

Output of dvc version:

$ dvc version
DVC version: 1.6.1 (brew)
---------------------------------
Platform: Python 3.8.5 on macOS-10.15.6-x86_64-i386-64bit
Supports: azure, gdrive, gs, http, https, s3, ssh, oss, webdav, webdavs
Cache types: reflink, hardlink, symlink
Repo: dvc, git

Additional Information (if any):

Verbose version of the dvc pull on the second account:

dvc pull -v
2020-09-01 14:45:55,898 DEBUG: Check for update is enabled.
2020-09-01 14:45:55,903 DEBUG: fetched: [(3,)]                        
2020-09-01 14:45:55,926 DEBUG: Assuming '/Users/franciscocastanheira/Documents/test/forest_classifier/.dvc/cache/94/ec581ec2cb6d4285008cfa12751544.dir' is unchanged since it is read-only
2020-09-01 14:45:55,927 DEBUG: Assuming '/Users/franciscocastanheira/Documents/test/forest_classifier/.dvc/cache/94/ec581ec2cb6d4285008cfa12751544.dir' is unchanged since it is read-only
2020-09-01 14:45:55,931 DEBUG: Preparing to download data from 'gdrive://1paaaG4GUnX6JbIA8YjgaIoBny-75PWOr'
2020-09-01 14:45:55,931 DEBUG: Preparing to collect status from gdrive://1paaaG4GUnX6JbIA8YjgaIoBny-75PWOr
2020-09-01 14:45:55,931 DEBUG: Collecting information from local cache...
2020-09-01 14:45:55,933 DEBUG: Assuming '/Users/franciscocastanheira/Documents/test/forest_classifier/.dvc/cache/94/ec581ec2cb6d4285008cfa12751544.dir' is unchanged since it is read-only
2020-09-01 14:45:55,934 DEBUG: Assuming '/Users/franciscocastanheira/Documents/test/forest_classifier/.dvc/cache/8a/a69ad03842ece8aefb82269994e95a' is unchanged since it is read-only
2020-09-01 14:45:55,935 DEBUG: cache '/Users/franciscocastanheira/Documents/test/forest_classifier/.dvc/cache/c5/fca7973ca9dd67fa01e4838c28894d' expected 'c5fca7973ca9dd67fa01e4838c28894d' actual 'None'
2020-09-01 14:45:55,936 DEBUG: cache '/Users/franciscocastanheira/Documents/test/forest_classifier/.dvc/cache/2c/46e072ade3aff18a28819a811ddedc' expected '2c46e072ade3aff18a28819a811ddedc' actual 'None'
2020-09-01 14:45:55,936 DEBUG: cache '/Users/franciscocastanheira/Documents/test/forest_classifier/.dvc/cache/cd/964055edcd27e510c8678e0cbfc762' expected 'cd964055edcd27e510c8678e0cbfc762' actual 'None'
2020-09-01 14:45:55,938 DEBUG: Assuming '/Users/franciscocastanheira/Documents/test/forest_classifier/.dvc/cache/09/695ec8e64d473a2207320c555728f3' is unchanged since it is read-only
2020-09-01 14:45:55,939 DEBUG: cache '/Users/franciscocastanheira/Documents/test/forest_classifier/.dvc/cache/c5/f3b2745e4bdda26c967b97cb4d87ae' expected 'c5f3b2745e4bdda26c967b97cb4d87ae' actual 'None'
2020-09-01 14:45:55,940 DEBUG: Collecting information from remote cache...      
2020-09-01 14:45:55,941 DEBUG: Querying 1 hashes via object_exists
2020-09-01 14:45:56,295 DEBUG: GDrive remote auth with config '{'client_config_backend': 'settings', 'client_config_file': 'client_secrets.json', 'save_credentials': True, 'oauth_scope': ['https://www.googleapis.com/auth/drive', 'https://www.googleapis.com/auth/drive.appdata'], 'save_credentials_backend': 'file', 'save_credentials_file': '/Users/franciscocastanheira/Documents/test/forest_classifier/.dvc/tmp/gdrive-user-credentials.json', 'get_refresh_token': True, 'client_config': {'client_id': '710796635688-iivsgbgsb6uv1fap6635dhvuei09o66c.apps.googleusercontent.com', 'client_secret': 'a1Fz59uTpVNeG_VGuSKDLJXv', 'auth_uri': 'https://accounts.google.com/o/oauth2/auth', 'token_uri': 'https://oauth2.googleapis.com/token', 'revoke_uri': 'https://oauth2.googleapis.com/revoke', 'redirect_uri': ''}}'.
2020-09-01 14:45:58,334 DEBUG: Querying 0 hashes via object_exists              
2020-09-01 14:45:58,336 DEBUG: Matched '0' indexed hashes                       
2020-09-01 14:45:58,337 DEBUG: Estimated remote size: 256 files                 
2020-09-01 14:45:58,337 DEBUG: Querying '4' hashes via traverse                 
2020-09-01 14:45:59,137 WARNING: Some of the cache files do not exist neither locally nor on remote. Missing cache files:
name: data/classifier.p, md5: c5fca7973ca9dd67fa01e4838c28894d
name: data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_20m.tif, md5: cd964055edcd27e510c8678e0cbfc762
name: data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_10m.tif, md5: c5f3b2745e4bdda26c967b97cb4d87ae
name: data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_60m.tif, md5: 2c46e072ade3aff18a28819a811ddedc
2020-09-01 14:45:59,146 DEBUG: checking if 'data/COS'('{'md5': '94ec581ec2cb6d4285008cfa12751544.dir'}') has changed.
2020-09-01 14:45:59,147 DEBUG: Assuming '/Users/franciscocastanheira/Documents/test/forest_classifier/.dvc/cache/94/ec581ec2cb6d4285008cfa12751544.dir' is unchanged since it is read-only
2020-09-01 14:45:59,147 DEBUG: Assuming '/Users/franciscocastanheira/Documents/test/forest_classifier/.dvc/cache/8a/a69ad03842ece8aefb82269994e95a' is unchanged since it is read-only
2020-09-01 14:45:59,148 DEBUG: Assuming '/Users/franciscocastanheira/Documents/test/forest_classifier/.dvc/cache/09/695ec8e64d473a2207320c555728f3' is unchanged since it is read-only
2020-09-01 14:45:59,149 DEBUG: Path '/Users/franciscocastanheira/Documents/test/forest_classifier/data/COS' inode '8709005095'
2020-09-01 14:45:59,150 DEBUG: fetched: [('daedd6e6cbd4a0d06666761dad2f983b', '429299716', '94ec581ec2cb6d4285008cfa12751544.dir', '1598950176726693888')]
2020-09-01 14:45:59,150 DEBUG: 'data/COS' hasn't changed.                       
2020-09-01 14:45:59,151 DEBUG: Data 'data/COS' didn't change.                   
2020-09-01 14:45:59,153 DEBUG: checking if 'data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_20m.tif'('{'md5': 'cd964055edcd27e510c8678e0cbfc762'}') has changed.
2020-09-01 14:45:59,153 DEBUG: 'data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_20m.tif' doesn't exist.
2020-09-01 14:45:59,154 DEBUG: cache '/Users/franciscocastanheira/Documents/test/forest_classifier/.dvc/cache/cd/964055edcd27e510c8678e0cbfc762' expected 'cd964055edcd27e510c8678e0cbfc762' actual 'None'
2020-09-01 14:45:59,155 WARNING: Cache 'cd964055edcd27e510c8678e0cbfc762' not found. File 'data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_20m.tif' won't be created.
2020-09-01 14:45:59,157 DEBUG: checking if 'data/classifier.p'('{'md5': 'c5fca7973ca9dd67fa01e4838c28894d'}') has changed.
2020-09-01 14:45:59,158 DEBUG: 'data/classifier.p' doesn't exist.               
2020-09-01 14:45:59,159 DEBUG: cache '/Users/franciscocastanheira/Documents/test/forest_classifier/.dvc/cache/c5/fca7973ca9dd67fa01e4838c28894d' expected 'c5fca7973ca9dd67fa01e4838c28894d' actual 'None'
2020-09-01 14:45:59,160 WARNING: Cache 'c5fca7973ca9dd67fa01e4838c28894d' not found. File 'data/classifier.p' won't be created.
2020-09-01 14:45:59,162 DEBUG: checking if 'data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_10m.tif'('{'md5': 'c5f3b2745e4bdda26c967b97cb4d87ae'}') has changed.
2020-09-01 14:45:59,163 DEBUG: 'data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_10m.tif' doesn't exist.
2020-09-01 14:45:59,164 DEBUG: cache '/Users/franciscocastanheira/Documents/test/forest_classifier/.dvc/cache/c5/f3b2745e4bdda26c967b97cb4d87ae' expected 'c5f3b2745e4bdda26c967b97cb4d87ae' actual 'None'
2020-09-01 14:45:59,164 WARNING: Cache 'c5f3b2745e4bdda26c967b97cb4d87ae' not found. File 'data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_10m.tif' won't be created.
2020-09-01 14:45:59,166 DEBUG: checking if 'data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_60m.tif'('{'md5': '2c46e072ade3aff18a28819a811ddedc'}') has changed.
2020-09-01 14:45:59,167 DEBUG: 'data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_60m.tif' doesn't exist.
2020-09-01 14:45:59,168 DEBUG: cache '/Users/franciscocastanheira/Documents/test/forest_classifier/.dvc/cache/2c/46e072ade3aff18a28819a811ddedc' expected '2c46e072ade3aff18a28819a811ddedc' actual 'None'
2020-09-01 14:45:59,168 WARNING: Cache '2c46e072ade3aff18a28819a811ddedc' not found. File 'data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_60m.tif' won't be created.
2020-09-01 14:45:59,170 DEBUG: fetched: [(6,)]                                  
4 files failed
2020-09-01 14:45:59,173 ERROR: failed to pull data from the cloud - Checkout failed for following targets:
data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_20m.tif
data/classifier.p
data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_10m.tif
data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_60m.tif
Is your cache up to date?
<https://error.dvc.org/missing-files>
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/Cellar/dvc/1.6.1/libexec/lib/python3.8/site-packages/dvc/command/data_sync.py", line 26, in run
    stats = self.repo.pull(
  File "/usr/local/Cellar/dvc/1.6.1/libexec/lib/python3.8/site-packages/dvc/repo/__init__.py", line 34, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/usr/local/Cellar/dvc/1.6.1/libexec/lib/python3.8/site-packages/dvc/repo/pull.py", line 36, in pull
    stats = self._checkout(  # pylint: disable=protected-access
  File "/usr/local/Cellar/dvc/1.6.1/libexec/lib/python3.8/site-packages/dvc/repo/checkout.py", line 101, in _checkout
    raise CheckoutError(stats["failed"], stats)
dvc.exceptions.CheckoutError: Checkout failed for following targets:
data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_20m.tif
data/classifier.p
data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_10m.tif
data/ImageOrigin/S2A_MSIL1C_20180130T112311_N0206_R037_T29TNE_20180130T133625_60m.tif
Is your cache up to date?
<https://error.dvc.org/missing-files>
------------------------------------------------------------

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2020-09-01 14:45:59,179 DEBUG: Analytics is enabled.
2020-09-01 14:45:59,405 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/76/g7sd_rv572537qcyv7v77jk40000gr/T/tmpmdfblcw1']'
2020-09-01 14:45:59,407 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/76/g7sd_rv572537qcyv7v77jk40000gr/T/tmpmdfblcw1']'

Let me know if you need any more information.

bug

Most helpful comment

@castanhas98 thanks for the clarification. I'll check this scenario tomorrow (my time), but my first guess would be that just setting the "everybody with a link can access" is not enough for accessing folders/files vis Google Drive API. DVC doesn't have a link in that same sense as you do when you copy it.

Could you instead do the same but click the Share button (on the remote directory 1paaaG4GUnX6JbIA8YjgaIoBny-75PWOr in your case) in the menu and add explicitly an email of the second account?

All 10 comments

@castanhas98 could you please login into the UI with your second email (account) and try to find and download one of the files DVC reports in logs as missing?

also, could you share the way, how did you exactly gave access to the second account in the UI?

@shcheklein I was able to download one of the missing files using the second email account.

Regarding giving access through the UI:

  • Right-click the folder name
  • Get shareable link
  • Edit the options for "Get link" to allow anyone with the link to be able to edit.
  • Copy the link
  • Open the link in a different browser where the second account is logged in.

I hope this helps.

@castanhas98 thanks for the clarification. I'll check this scenario tomorrow (my time), but my first guess would be that just setting the "everybody with a link can access" is not enough for accessing folders/files vis Google Drive API. DVC doesn't have a link in that same sense as you do when you copy it.

Could you instead do the same but click the Share button (on the remote directory 1paaaG4GUnX6JbIA8YjgaIoBny-75PWOr in your case) in the menu and add explicitly an email of the second account?

@shcheklein It does work like this, thank you!

Either way, please let me know if you find a way of doing so without the explicit share, since, ideally, the objective is for it to be accessible to everyone with the link without having to do anything else.

Thank you again for your help!

@castanhas98 I'll check, but to be honest I doubt that it can be done- the problem is that when you access things programmatically it doesn't have a link- all it has some IDs, REST APIs, etc.

What we can in your case is to use a Service Account - https://dvc.org/doc/user-guide/setup-google-drive-remote#using-service-accounts . Setup is a little bit more involved, but outcome is that you will have a single secret file that you can share with team mates you want have access to this directory.

But to be honest, my take on this is that by giving access explicitly, you have better and granular security in place. For example, you could always deny access to a specific user. Most likely you can see who accessed what, etc.

@jorgeorpinel should we update docs to explain better how exactly directory should be shared?

@shcheklein I see, thanks for the help, I'll check out the service account alternative!

@castanhas98 thanks, let us know how it goes and if you have more questions!

@jorgeorpinel should we update docs to explain better how exactly directory should be shared?

Will look into this... โณ

Edit the options for "Get link" to allow anyone with the link to be able to edit.
Could you instead do the same but click the Share button... and add explicitly an email of the second account?

Alright, noted this in https://github.com/iterative/dvc.org/pull/1787/files#diff-8c9a2bb064543d6c885069e9eba8d2b8 @shcheklein

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mfrata picture mfrata  ยท  3Comments

TezRomacH picture TezRomacH  ยท  3Comments

dnabanita7 picture dnabanita7  ยท  3Comments

jorgeorpinel picture jorgeorpinel  ยท  3Comments

ghost picture ghost  ยท  3Comments