Dvc: DVC usage: 'dvc add .'

Created on 5 Mar 2020  ·  7Comments  ·  Source: iterative/dvc

DVC version: 0.84

Is there a way that I give just one command and DVC updates all my tracked folders?
Normally on git I just do a git add . and all my files, except the ones on .gitignore, are updated. On .gitignore I write an '* except ....' like this. It's simple and just works.
But on DVC I have 5 files and there's no way I can do a dvc add ., using a .dvcignore file the same way I do on git. The command tries to add the folder above the one that I'm using:

(tf-od) vntdeca@WW-LG-9121:~/visionexperiments$ dvc add .
Adding...
ERROR: stage working or file path '/home/vntdeca' is outside of DVC repo

Normally, I follow the process:

git add .
dvc add folder1
dvc add folder2 #and so on for all dvc tracked files...
git commit -m "..."
dvc push

Am I using it wrongly or it's the command's fault?

discussion enhancement

Most helpful comment

@denisb411
I think you are right that we have this inverted in our documentation. @shcheklein Would you agree?

As to committing itself: there is -f flag that forces overwriting, so if you want to commit without the prompt, you can use it.
eg:
dvc commit -f data.dvc

All 7 comments

@denisb411 there is no way to avoid the initial explicit add with DVC. I'm not sure how dvc add . would even work - how would it determine the difference between small and large files?

So, first time you have to do:

dvc add folder1
dvc add folder2
...

There is a command though that can kinda _update_ those directories that are already tracked by DVC - dvc commit

# add a new file to folder1
# remove a file from folder2
dvc commit
# folder1.dvc updated, new files saved to the DVC cache (.dvc/cache)
# folder2.dvc updated
git commit folder1.dvc folder2.dvc -m "update my data"
git push 
dvc push

would something like this satisfy your needs?

@denisb411 there is no way to avoid the initial explicit add with DVC. I'm not sure how dvc add . would even work - how would it determine the difference between small and large files?

So, first time you have to do:

dvc add folder1
dvc add folder2
...

There is a command though that can kinda _update_ those directories that are already tracked by DVC - dvc commit

# add a new file to folder1
# remove a file from folder2
dvc commit
# folder1.dvc updated, new files saved to the DVC cache (.dvc/cache)
# folder2.dvc updated
git commit folder1.dvc folder2.dvc -m "update my data"
git push 
dvc push

would something like this satisfy your needs?

@shcheklein I was thinking that the ideia behind the dvc add . is to add everything except files explicited on .dvcignore. I do this with git.

For example, on .gitignore I'm doing:

# Ignore everything
*

# But not these files...
*.ipynb

And works pretty well by doing git add ..

I tried to use dvc commit, in mind that it would update all my files (documentations said that), but it keeps forcing me to answer [Yes/No] for each file that was excluded (or something like that), and this is so annoying... there's no way to force yes for every file. The dvc add file doesn't react this way tho.

Also, why on every documentations it says to do a dvc commit after doing git commit? Isn't the process inverted? Because dvc updates .dvc files for you then commit these on git.... That was what I understood about the tool after watching several videos and documentation.

image

I don't know if it's bad usage of mine or it's an UX fault.

@denisb411
I think you are right that we have this inverted in our documentation. @shcheklein Would you agree?

As to committing itself: there is -f flag that forces overwriting, so if you want to commit without the prompt, you can use it.
eg:
dvc commit -f data.dvc

Also, why on every documentations it says to do a dvc commit after doing git commit? Isn't the process inverted? Because dvc updates .dvc files for you then commit these on git.... That was what I understood about the tool after watching several videos and documentation.

great catch! this is a bug ... would be great if can send a PR otherwise we will take care as a regular update soon (cc @jorgeorpinel )

I was thinking that the ideia behind the dvc add . is to add everything except files explicited on .dvcignore

Ok, let me try to clarify a bit.

so, let's say I have a folder3 and I didn't tun dvc add folder3 on it yet ... should It add it as well?

or I have README.md in the workspace - should it add it?

Ok, let me try to clarify a bit.

so, let's say I have a folder3 and I didn't tun dvc add folder3 on it yet ... should It add it as well?

or I have README.md in the workspace - should it add it?

@shcheklein Using this example, when you run dvc add . you would not worry about folder3 or README.md being added wrongly, because would be stated on .dvcignore file something like:

* #ignore everything

#except.... 
!folder1
!folder2

So, every time you run dvc add . it would update only folder1 and folder2 due the .dvcignore. Currently it's now working like this... but this is the normal behavior of git.

Let's not enter in discussion if this is right or wrong... I know a lot of people work this way on git, including me, and I think that dvc had to behave the same way if the idea of the tool was to behave with an UX very similar as git.

@pared Thanks about the -f flag on dvc commit. Didn't know about this and I think this is a workaround for the case I mentioned above.

there's no way I can do a dvc add . using .dvcignore file the same way I do on git

I'm not sure how dvc add . would even work

@shcheklein it could ignore anything tracked by Git AND files in .dvcignore. So it's a request for 2 separate enhancements, it seems to me. Not a bad idea! We may also want to consider a dvc add --all option (recursive) to complete the git analogy? However...

let's say I have a folder3 and I didn't tun dvc add folder3 on it yet ... should It add it as well?

shcheklein Using this example, when you run dvc add . you

I agree with Ivan here though. There may be regular files not yet added to Git, and dvc add . would put them in .gitignore, so the user would have to be very careful about that and know about dvcignore which isn't that common. It could become a regular source of confusion for most users.

dvc add folder2 #and so on for all dvc tracked files...
git commit -m "..."

Am I using it wrong

@denisb411 probably not but you just seem to be missing git add AFTER dvc add (add DVC-files) but I'll assume that was just a typo 🙂

dvc commit... it keeps forcing me to answer [Yes/No] for each file

Another good point. Maybe we should add a -y option to the command. Yes, there's -f @pared but it has other implications.

I know a lot of people work this way on git

Denis, DVC is not Git, and since it works on top of Git, we can't always expect the exact same behavior, it just doesn't always makes sense, as explained above.

So in conclusion the main proposal here is interesting but not fully convincing.
We should consider adding dvc commit -y flag though, I think. Thoughts?


p.s. I'll look into the documentation issue ⏳

Also, why on every documentations it says to do a dvc commit after doing git commit? Isn't the process inverted? Because dvc updates .dvc files for you then commit these on git

you are right that we have this inverted in our documentation...
great catch! this is a bug...

Yes, we can definitely improve lots of explanations around dvc commit, thanks for the heads up Denis! I'm addressing in iterative/dvc.org/pull/1094

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dnabanita7 picture dnabanita7  ·  3Comments

nik123 picture nik123  ·  3Comments

siddygups picture siddygups  ·  3Comments

TezRomacH picture TezRomacH  ·  3Comments

dmpetrov picture dmpetrov  ·  3Comments