Same as .gitignore, but for dvc. It will tell dvc which paths to ignore when caching data. Would be extremely useful for ignoring auto-generated/temporary files/directories that are created as a side effect and don't carry anything useful.
First iteration should make patterns listed in dvcignore be ignored on Repo.stages() when collecting stages.
echo '/directory_with_millions_of_files' > .dvcignore
dvc status # should not enter `directory_with_millons_of_files`
Second iteration should support dvc add/run
. We have a separate issue for it at https://github.com/iterative/dvc/issues/1876
@efiop could you please clarify a bit more? Is it for users mostly or for internal DVC usage? Should we include .dvc/
into .dvcignore
in the case of internal usage?
It is for users mostly as a convenient way to ignore some paths/files/dirs from being cached/tracked by dvc. Good point about .dvc/
though, we will act as if it is in .dvcignore
by default :)
Is this somehow related https://github.com/iterative/dvc/issues/1471 ?
I saw on the chat that the problem was that the MD5 of a directory changed because the file system created a .DS_Store
on it and detected it as changed.
I can see the same stuff happening for other things if you are not careful enough (e.g. vim swap files, file locking mechanisms that create dotfiles, IDE specific files / .tags
, etc.)
If we are introducing .dvcignore
it would also be great to have a global .dvcignore
the same way you can have a global .gitignore
@mroutis Not related to #1471 , it is exclusively about dvc tracking files.
Great idea about a global .dvcignore! I didn't know .gitignore could be global.
I confirm that .DS_Store
files are a pain when dvc add
ing directories on mac Os, and completely agree with the need of a .dvcignore
file.
I think an equally pressing issue is that having a large (4 million files) un-cached folder slows down dvc, as it needs to traverse the whole folder before executing any command.
Adding support for .dvcignore would add the required capability to address this issue.
Please, create a ticket or a page to document the changes.
Most helpful comment
I think an equally pressing issue is that having a large (4 million files) un-cached folder slows down dvc, as it needs to traverse the whole folder before executing any command.
Adding support for .dvcignore would add the required capability to address this issue.