Dvc: .dvcignore is broken on negation when blacklisting all

Created on 24 Jun 2020  路  5Comments  路  Source: iterative/dvc

Bug Report

Please provide information about your setup

I am not sure how far this extends to, but a lot of trials for .dvcignore failed when I blacklisted all and tried to whitelist some:

*
!scripts
*
!/scripts
/*
!/scripts/
/*
!/scripts/*
/*
!/scripts/**
/*
!scripts/

What worked:

/*
!/scripts

Why do I feel suddenly the spelling of scripts is wrong? :smile:

bug p2-medium research

Most helpful comment

@karajan1001, I think this issue is with the upstream library, as this issue looks similar to: cpburnz/python-path-specification#19, but I haven't looked closely to verify.

@skshetry I agreed.

I had looked into the PathSpec code and the following result is weird to me. It is very different from the logic I learned from the PathSpec code.

ignore content:
/*
!dir/
!file/


git results:
['data/dir/file', ]

All 5 comments

CC @pared is this expected?

Considering that we have been designing .dvcignore to comply with .gitignore, I will refer to original:
It seems to me that we have some disrepancies in behaviour between git and dvc. I prepared a script to illustrate that:

#!/bin/bash

rm -rf repo
mkdir repo

pushd repo

git init --quiet
dvc init --quiet

git commit -am "ble"

mkdir data
echo 1 >> data/1
echo 2 >> data/2
echo 3 >> data/3
echo not_ignored >> data/file
mkdir data/dir
echo not_ignored >> data/dir/file

echo 'dvci*' >> .gitignore 

echo unrelated >> dvci0

echo '*' > dvci1

echo '*' > dvci2
echo '!/dir' >> dvci2
echo '!/file' >> dvci2

echo '*' > dvci3
echo 'dir' >> dvci3
echo 'file' >> dvci3

echo '/*' > dvci4
echo '!/dir/' >> dvci4
echo '!/file/' >> dvci4

echo '/*' > dvci5
echo '!/dir/*' >> dvci5

echo '/*' > dvci6
echo '!/dir/**' >> dvci6

echo '/*' > dvci7
echo '!dir/' >> dvci7
echo '!file/' >> dvci7

echo '/*' > dvci8
echo '!/dir' >> dvci8
echo '!/file' >> dvci8


for i in {0..8}
do
    echo '### START ###'
    echo 'ignore content:'
    cat dvci$i
    echo '----------------'
    cp dvci$i data/.dvcignore
    echo 'dvc results:' 
    python -c "from dvc.repo import Repo;print(list(Repo().tree.walk_files('data')))"


    # git test
    echo ''
    echo 'git results:'
    cp dvci$i data/.gitignore
    res=()
    for value in "data/1" "data/2" "data/3" "data/dir/file" "data/file"
    do
        if [[ ! $(git check-ignore $value) ]]; then
            res+="'$value', "
        fi
    done
    echo "[${res[@]}]"

    echo '### STOP ###'
    echo ''
    echo ''
done

And the two differences occur in following situations:

ignore content:
/*
!/dir/
!/file/
----------------
dvc results:
[]

git results:
['data/dir/file', ]

and

ignore content:
/*
!dir/
!file/
----------------
dvc results:
[]

git results:
['data/dir/file', ]

I also compared the results from before latest dvc optimization, https://github.com/iterative/dvc/pull/3967 results were the same as for master.

So, in conclusion, I think its a bug.

I compared the result to #4120 and there is no difference.

@karajan1001, I think this issue is with the upstream library, as this issue looks similar to: https://github.com/cpburnz/python-path-specification/issues/19, but I haven't looked closely to verify.

@karajan1001, I think this issue is with the upstream library, as this issue looks similar to: cpburnz/python-path-specification#19, but I haven't looked closely to verify.

@skshetry I agreed.

I had looked into the PathSpec code and the following result is weird to me. It is very different from the logic I learned from the PathSpec code.

ignore content:
/*
!dir/
!file/


git results:
['data/dir/file', ]

Was this page helpful?
0 / 5 - 0 ratings

Related issues

tc-ying picture tc-ying  路  3Comments

ghost picture ghost  路  3Comments

jorgeorpinel picture jorgeorpinel  路  3Comments

shcheklein picture shcheklein  路  3Comments

ghost picture ghost  路  3Comments