dvc: support .dvcignore

Created on 14 Jan 2019  路  7Comments  路  Source: iterative/dvc

Same as .gitignore, but for dvc. It will tell dvc which paths to ignore when caching data. Would be extremely useful for ignoring auto-generated/temporary files/directories that are created as a side effect and don't carry anything useful.

First iteration should make patterns listed in dvcignore be ignored on Repo.stages() when collecting stages.

echo '/directory_with_millions_of_files' > .dvcignore
dvc status # should not enter `directory_with_millons_of_files`

Second iteration should support dvc add/run. We have a separate issue for it at https://github.com/iterative/dvc/issues/1876

enhancement feature request p2-medium

Most helpful comment

I think an equally pressing issue is that having a large (4 million files) un-cached folder slows down dvc, as it needs to traverse the whole folder before executing any command.

Adding support for .dvcignore would add the required capability to address this issue.

All 7 comments

@efiop could you please clarify a bit more? Is it for users mostly or for internal DVC usage? Should we include .dvc/ into .dvcignore in the case of internal usage?

It is for users mostly as a convenient way to ignore some paths/files/dirs from being cached/tracked by dvc. Good point about .dvc/ though, we will act as if it is in .dvcignore by default :)

Is this somehow related https://github.com/iterative/dvc/issues/1471 ?

I saw on the chat that the problem was that the MD5 of a directory changed because the file system created a .DS_Store on it and detected it as changed.

I can see the same stuff happening for other things if you are not careful enough (e.g. vim swap files, file locking mechanisms that create dotfiles, IDE specific files / .tags, etc.)

If we are introducing .dvcignore it would also be great to have a global .dvcignore the same way you can have a global .gitignore

@mroutis Not related to #1471 , it is exclusively about dvc tracking files.

Great idea about a global .dvcignore! I didn't know .gitignore could be global.

I confirm that .DS_Store files are a pain when dvc adding directories on mac Os, and completely agree with the need of a .dvcignore file.

I think an equally pressing issue is that having a large (4 million files) un-cached folder slows down dvc, as it needs to traverse the whole folder before executing any command.

Adding support for .dvcignore would add the required capability to address this issue.

Please, create a ticket or a page to document the changes.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

shcheklein picture shcheklein  路  3Comments

mdscruggs picture mdscruggs  路  3Comments

dnabanita7 picture dnabanita7  路  3Comments

shcheklein picture shcheklein  路  3Comments

shcheklein picture shcheklein  路  3Comments