Dvc: `dvc add` error using wild match operator in windows?

Created on 12 Oct 2020  路  7Comments  路  Source: iterative/dvc

Bug Report

image
image

dvc add failed while using wild match mode. It adds a wrong suffix .dvc to my pattern.

Please provide information about your setup

Output of dvc version:
Platform: Python 3.8.3 on Windows-10-10.0.18362-SP0
Supports: http, https, ssh
Cache types: hardlink
Cache directory: NTFS on D:\
Workspace directory: NTFS on D:\
Repo: dvc, git

$ dvc version

Additional Information (if any):

If applicable, please also provide a --verbose output of the command, eg: dvc add --verbose.

awaiting response feature request good first issue help wanted p3-nice-to-have windows

Most helpful comment

+1 on this feature!

All 7 comments

@karajan1001 What does that error say, btw? And also could you provide verbose output, please?

So far looks like there might be something about the way your shell is evaluating the regex. It actually looks like it is not expanding the wildcards and ? at all and passing them as is to the dvc. Dvc itself doesn't support regexes at all, we simply rely on your shell to do that and then pass the list to dvc add. E.g. on bash dvc add *.mov would actually result in dvc add 1.mov 2.mov ... 10.mov, so dvc will get the list of files and not the original regex.

@karajan1001 What does that error say, btw? And also could you provide verbose output, please?

So far looks like there might be something about the way your shell is evaluating the regex. It actually looks like it is not expanding the wildcards and ? at all and passing them as is to the dvc. Dvc itself doesn't support regexes at all, we simply rely on your shell to do that and then pass the list to dvc add. E.g. on bash dvc add *.mov would actually result in dvc add 1.mov 2.mov ... 10.mov, so dvc will get the list of files and not the original regex.

Yes obviously, DVC didn't get correctly file list. This might be an issue of 1. environment 2. package DVC relied on, not one in DVC itself. But

  1. Git didn't suffer from the same issue in the same directory.
  2. Wildcards are wildly used in daily work, some other DVC commands may have the same problem.

@karajan1001 Great point about the git. Git indeed supports some globing natively, so to match it we also need to pass the targets through os.glob. The use case is limited to shells that don't support globbing natively (I'm surprised PS didn't do that, maybe I'm missing something), so it is pretty limited :slightly_frowning_face:

@efiop
I tested on my computer, PowerShell didn't expand patterns.

image
image

According to stackoverflow:
We have to implement wildcard expansion ourselves.

@karajan1001 Thanks for the research! :pray: So we indeed need to pass targets through os.glob to implement that. We could start with doing just that only in dvc/repo/add.py, but there might be a better way to do it everywhere. Obviously we could add a custom argparse action that would pass the targets through os.glob, but it seems to be more fitting to implement it on API level (dvc/repo/) instaed of just CLI (dvc/command/).

Related https://github.com/iterative/dvc/issues/4419

+1 on this feature!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

TezRomacH picture TezRomacH  路  3Comments

shcheklein picture shcheklein  路  3Comments

jorgeorpinel picture jorgeorpinel  路  3Comments

robguinness picture robguinness  路  3Comments

dmpetrov picture dmpetrov  路  3Comments