Dvc: error adding a file to dvc repo on NAS storage

Created on 20 Nov 2019  ·  16Comments  ·  Source: iterative/dvc

DVC version is 0.62.1 installed by pip on Ubuntu.

I have shared NAS storage mounted on my system and want to create a DVC repo in the storage.
I could successfully initialized a repo by dvc init command, but adding a file fails with an error message shown below.

$ ls -al
total 0
drwxrwxrwx 2 root root 0 Nov 20  2019 .  <-- has write permission
drwxrwxrwx 2 root root 0 Oct 25 10:37 ..
drwxrwxrwx 2 root root 0 Oct 29 15:30 data  <-- want to add this directory after creating a repo
$ dvc init --no-scm
+---------------------------------------------------------------------+
|                                                                     |
|        DVC has enabled anonymous aggregate usage analytics.         |
|     Read the analytics documentation (and how to opt-out) here:     |
|              https://dvc.org/doc/user-guide/analytics               |
|                                                                     |
+---------------------------------------------------------------------+

What's next?
------------
- Check out the documentation: https://dvc.org/doc
- Get help and share ideas: https://dvc.org/chat
- Star us on GitHub: https://github.com/iterative/dvc
$ ls -al
total 0
drwxrwxrwx 2 root root 0 Nov 20 09:50 .
drwxrwxrwx 2 root root 0 Oct 25 10:37 ..
drwxrwxrwx 2 root root 0 Oct 29 15:30 data
drwxrwxrwx 2 root root 0 Nov 20 09:50 .dvc  <-- repo created successfully
$ dvc add -R data  <-- trying to add the directory recursively
ERROR: unexpected error - [Errno 1] Operation not permitted  <-- failed with an error message which is unclear to find the cause 

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!  <-- made me write this issue
$
awaiting response bug

Most helpful comment

@skshetry You are right! I tried again with considering your comment, and it turned out the error about dvc version -v was neither caused by the weird FS nor dvc itself. There must be incomplete '.git' directory remained after git init before trying git status and caused the error. Thanks for make it clear.

I should be more precise. Sorry for my misleading report on dvc version command.

All 16 comments

Hi @midnightradio !

Could you please show full log for dvc add -v -R data?

Also, are you sure -R is really what you want? Is data a directory with giant number of files?

Hi, @efiop
Thanks for your quick follow up!

There are just a few files in the directory and some verbose messages are following.

$ tree data
data
├── darpa-timit-acousticphonetic-continuous-speech.zip
└── openslr
    └── zeroth
        ├── README
        └── zeroth_korean.tar.gz

2 directories, 3 files
$ dvc add -v -R data
ERROR: unexpected error - [Errno 1] Operation not permitted
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/hjlee/miniconda3/envs/pandas/lib/python3.7/site-packages/dvc/main.py", line 41, in main
    cmd = args.func(args)
  File "/home/hjlee/miniconda3/envs/pandas/lib/python3.7/site-packages/dvc/command/base.py", line 47, in __init__
    updater.check()
  File "/home/hjlee/miniconda3/envs/pandas/lib/python3.7/site-packages/dvc/updater.py", line 51, in check
    self._with_lock(self._check, "checking")
  File "/home/hjlee/miniconda3/envs/pandas/lib/python3.7/site-packages/dvc/updater.py", line 41, in _with_lock
    with self.lock:
  File "/home/hjlee/miniconda3/envs/pandas/lib/python3.7/site-packages/flufl/lock/_lockfile.py", line 334, in __en
ter__
    self.lock()
  File "/home/hjlee/miniconda3/envs/pandas/lib/python3.7/site-packages/dvc/lock.py", line 54, in lock
    super(Lock, self).lock(timedelta(seconds=DEFAULT_TIMEOUT))
  File "/home/hjlee/miniconda3/envs/pandas/lib/python3.7/site-packages/flufl/lock/_lockfile.py", line 208, in lock
    self._touch()
  File "/home/hjlee/miniconda3/envs/pandas/lib/python3.7/site-packages/flufl/lock/_lockfile.py", line 462, in _tou
ch
    os.utime(filename or self._claimfile, (t, t))
PermissionError: [Errno 1] Operation not permitted
------------------------------------------------------------

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

@midnightradio Hm, weird. Could you show dvc version output? Also, could you check your permissions? Are you able to create files in your project's directory? Are you able to touch foo && stat too? So far this seems like you have an issue with your mount, maybe some incorrect or missing mounting options, hard to put my finger on anything specific. Also, does git status work in your repo dir?

@efiop Actually answers for your questions are already shown in the first post, but let me make a double check.

$ dvs --version
0.62.1
$ touch foo
$ stat foo
  File: foo
  Size: 0               Blocks: 0          IO Block: 16384  regular empty file
Device: 48h/72d Inode: 8260955347  Links: 1
Access: (0777/-rwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-11-20 13:27:51.860794700 +0900
Modify: 2019-11-20 13:27:51.860794700 +0900
Change: 2019-11-20 13:28:44.555995200 +0900
 Birth: -

When I do the same operation (init and add) for the same data in local storage, it works well.

Previously, I missed the last thing @efiop asked me to try and found something interesting this time with trying git on the same directory. Seems the filesystem does not allow to make lock file though the directory has full permission.

$ git init
error: chmod on /mnt/DSshare/DAI/STT/.git/config.lock failed: Operation not permitted
fatal: could not set 'core.filemode' to 'false'

I think we still should do something about lock. Probably Git is not the best analogy for us in this case.

@midnightradio could you elaborate on your case a little bit please? Why do you want to run the repo on a NAS storage directly?

@midnightradio I meant dvc version and not dvc --version 🙂 Could you run that one please?

@efiop dvc version gives an error like this. It prints out the same message twice.. /mnt/DSshare is a mount point of the NAS storage and /mnt/DSshare/DAI/STT is a working directory for running dvc version

$ dvc version
ERROR: unexpected error - /mnt/DSshare/DAI/STT is not a git repository

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
$ ERROR: unexpected error - /mnt/DSshare/DAI/STT is not a git repository

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

Here's the result of the same command ran when working directory is in local storage.

dvc version
DVC version: 0.62.1
Python version: 3.7.3
Platform: Linux-4.15.0-64-generic-x86_64-with-debian-buster-sid
Binary: False
Cache: reflink - False, hardlink - True, symlink - True
Filesystem type (cache directory): ('ext4', '/dev/sdb1')
Filesystem type (workspace): ('ext4', '/dev/sdb1')

@midnightradio Hm, that's interesting. Could you run dvc version -v please?

Hi, @shcheklein

There's no specific reason for storing data or keeping dvc repo on shared NAS storage. Just accidentally there was not enough space for newly downloaded data on my local disk and I stored them on shared disk. Then I wanted to make dvc remote for the data and tried make dvc repo on the same directory where I stored the data but failed.

I don't think this is a bug when git even not allows to initiate a repo on this kind of storage.

@efiop Looks like it's a bug to be fixed when dvc version has dependency on having a git repo while dvc init allows --no-scm option to create a repo.

ERROR: unexpected error - /mnt/DSshare/DAI/STT is not a git repository
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/hjlee/miniconda3/envs/pandas/lib/python3.7/site-packages/dvc/scm/git/__init__.py", line 50, in __ini
t__
    self.repo = git.Repo(self.root_dir)
  File "/home/hjlee/miniconda3/envs/pandas/lib/python3.7/site-packages/git/repo/base.py", line 184, in __init__
    raise InvalidGitRepositoryError(epath)
git.exc.InvalidGitRepositoryError: /mnt/DSshare/DAI/STT

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hjlee/miniconda3/envs/pandas/lib/python3.7/site-packages/dvc/main.py", line 42, in main
    ret = cmd.run()
  File "/home/hjlee/miniconda3/envs/pandas/lib/python3.7/site-packages/dvc/command/version.py", line 48, in run
    repo = Repo()
  File "/home/hjlee/miniconda3/envs/pandas/lib/python3.7/site-packages/dvc/repo/__init__.py", line 84, in __init__
    self.scm = SCM(self.root_dir)
  File "/home/hjlee/miniconda3/envs/pandas/lib/python3.7/site-packages/dvc/scm/__init__.py", line 27, in SCM
    return Git(root_dir)
  File "/home/hjlee/miniconda3/envs/pandas/lib/python3.7/site-packages/dvc/scm/git/__init__.py", line 53, in __ini
t__
    raise SCMError(msg.format(self.root_dir))
dvc.scm.base.SCMError: /mnt/DSshare/DAI/STT is not a git repository
------------------------------------------------------------

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

As I mentioned previously, I would rather move the data to local space before making a dvc repo when the shared storage even not play well with git. So, I'm ready to close this issue for that point if there's no others who still want to create dvc repo on weird storage more then I did.

However, at the point of git version resulting an error on dvc repo made not to be a git repo, I think it should be fixed. Maybe I can try.

Thanks @midnightradio ! I think both issue are pretty valid. It's a reasonable case when you run dvc add on the attached storage to get data initially under DVC control since it's large enough to fit into your local disk/space. @efiop will decide if keep this one open or not (since it looks like you have a workaround), but will definitely keep this scenario in mind.

For the version issue - yes, let's create a separate ticket and if you can contribute the PR that would be awesome 🙏 We'll try to help you with that.

@midnightradio, the error in https://github.com/iterative/dvc/issues/2818#issuecomment-555885457 is due to git init earlier. But, anyway, version should quietly work here.

I was able to reproduce the same error with:

temp=$(mktemp -d)
cd $temp
dvc init --no-scm
mkdir .git
dvc version -v

Output is quite similar:

Traceback (most recent call last):
  File "/home/saugat/repos/iterative/dvc/dvc/scm/git/__init__.py", line 52, in __init__
    self.repo = git.Repo(self.root_dir)
  File "/home/saugat/repos/iterative/dvc/.env/py36/lib/python3.6/site-packages/git/repo/base.py", line 184, in __init__
    raise InvalidGitRepositoryError(epath)
git.exc.InvalidGitRepositoryError: /tmp/tmp.hOiYxhj5Wh

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/saugat/repos/iterative/dvc/dvc/main.py", line 49, in main
    ret = cmd.run()
  File "/home/saugat/repos/iterative/dvc/dvc/command/version.py", line 49, in run
    repo = Repo()
  File "/home/saugat/repos/iterative/dvc/dvc/repo/__init__.py", line 87, in __init__
    self.scm = SCM(self.root_dir)
  File "/home/saugat/repos/iterative/dvc/dvc/scm/__init__.py", line 26, in SCM
    return Git(root_dir)
  File "/home/saugat/repos/iterative/dvc/dvc/scm/git/__init__.py", line 55, in __init__
    raise SCMError(msg.format(self.root_dir))
dvc.scm.base.SCMError: /tmp/tmp.hOiYxhj5Wh is not a git repository
------------------------------------------------------------


Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
ERROR: unexpected error - /tmp/tmp.hOiYxhj5Wh is not a git repository                                                                                   


Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

/tmp/tmp.hOiYxhj5Wh via py36 
ERROR: unexpected error - /tmp/tmp.hOiYxhj5Wh is not a git repository

@skshetry Thanks for the reproducer! So looks like we are dealing here with a broken .git, effectively. I'm a bit hesitant to catch this error and fallback to NoSCM, as it might cause us to ignore real errors in the future. Broken git in a git repo seems like a very bad env issue to me, so I don't feel like it should be solved on dvc's side. Also, initial error with "operation not permitted" is even more serious(os.utime not working!), and no one knows what else would break there. So I'll close this issue for now, since there is a workaround of simply using that weird FS as an external cache directory.

@skshetry You are right! I tried again with considering your comment, and it turned out the error about dvc version -v was neither caused by the weird FS nor dvc itself. There must be incomplete '.git' directory remained after git init before trying git status and caused the error. Thanks for make it clear.

I should be more precise. Sorry for my misleading report on dvc version command.

@midnightradio No worries! Glad you've found the cause! Thank you for the feedback! 🙂

Was this page helpful?
0 / 5 - 0 ratings

Related issues

siddygups picture siddygups  ·  3Comments

analystanand picture analystanand  ·  3Comments

mfrata picture mfrata  ·  3Comments

dmpetrov picture dmpetrov  ·  3Comments

ghost picture ghost  ·  3Comments