Dvc: ui: add a way to undo `dvc add` easily

Created on 5 Oct 2019  ·  9Comments  ·  Source: iterative/dvc

Maybe repurpose dvc remove for this, which at least be somewhat intuitive.

The hard part here is removing added file from cache: dvc gc is dangerous now and might be slow, simply removing cache file doesn't account for possible other refs to it.

This is inspired by https://github.com/iterative/dvc.org/issues/625 and a corresponding Stackoverflow question.

discussion ui

Most helpful comment

Hi, what about dvc reset to follow git naming?

All 9 comments

@Suor Are you suggesting that dvc remove file.dvc removes the cached outputs of file.dvc and the file.dvc itself, but does not touch the files on the work place? This would indeed undo dvc add file. Maybe it should also delete the output files from .gitignore, to be exactly the undo operation.

@dashohoxha basically yes. We need to decide on retiring current dvc remove behaviour first though.

@Suor: We need to decide on retiring current dvc remove behaviour first though.

@shcheklein has already asked for dvc remove to be deprecated/removed. I proposed that maybe we should improve it, instead of removing. But now I think that your approach of re-purposing it is better.

Hi, what about dvc reset to follow git naming?

@mazzma12 The analogy between Git and DVC is not perfect because DVC does not have an intermediate stage, before you commit. When you make dvc add it also does an implicit dvc commit (saving file to cache). To remove from Git a file that is committed by mistake you actually need git rm --cached file && git commit --amend -CHEAD (https://help.github.com/en/enterprise/2.15/user/articles/removing-files-from-a-repositorys-history#removing-a-file-added-in-the-most-recent-unpushed-commit)

But dvc reset makes sense to me, since dvc remove might seem a bit scary to a beginner. Even dvc untrack would be ok for me.

git reset is a bad name, we shouldn't use it as a guide.

вс, 6 окт. 2019 г., 15:35 mazzma12 notifications@github.com:

Hi, what about dvc reset to follow git naming?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/iterative/dvc/issues/2575?email_source=notifications&email_token=AACFLR5CPF2CX3DYBVKGXMLQNGPNFA5CNFSM4I5WDPF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAOEM4I#issuecomment-538723953,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AACFLR3FPQTBDA7MRV3NEJDQNGPNFANCNFSM4I5WDPFQ
.

git reset is a bad name, we shouldn't use it as a guide

maybe bad but the functionality will be understandable by end users without much explanation.

Before we discuss name, let's try to come up with a some description on what this command supposed to do in what cases? Then it should be easier to discuss is reset relevant or not. As it often happens I think we have a bit different implementations/behavior in mind.

Questions I have im mind:

  • does it accept DVC-file, output path?
  • what does it do in case of no arguments (default)?
  • does it only affect cache and/or workspace?

I'm just getting started with DVC so I will miss a lot of things, but maybe my impression as a new user helps.

As I followed the dvc.org video tutorial I accidentally ran dvc add data instead of dvc add data/data.xml. As an inexperienced dvc (but experienced git) user, I tried undoing my mistake with dvc rm data. I would have been fine if rm actually deleted my data; if it did, I would have expected an option similar to git rm --cached.

I tried dvc remove but was irritated since I know nothing yet about stage entries. dvc remove data told me:

ERROR: failed to remove 'data' - 'dvc.yaml' does not exist.

It was frustrating to be able to add a file, but removing it requires some additional file I know nothing about. But I guess this is fine if the command will be improved or deprecated anyway.


does it accept DVC-file, output path?

Since I don't know yet much about the *.dvc files, I'm intuitively trying to run commands on my paths / files, not on dvc managed files (I wouldn't run git commands on the .git/ directory either)

what does it do in case of no arguments (default)?

Just show usage as git rm does?

I'll just remove the whole hello-world and will finish the tutorial first, but maybe my new-user-experience is of any help :)

Was this page helpful?
0 / 5 - 0 ratings