Checkout requires unnecessary user interaction if a data file was modified:
$ cd example-get-started
$ dvc pull
$ echo 111 >> data/data.xml
$ git checkout - # jump to a previous commit
$ dvc checkout
WARNING: data 'data/data.xml' exists. Removing before checkout.
83%|████████▎ |data/prepared 5/6 [00:00<00:00, 1.32kfile/s]
file 'data/data.xml' is going to be removed. Are you sure you want to proceed? [y/n]
### <---- waiting for an input
It feels like dvc checkout needs to be consistent with git checkout which does not modify changed files but just warns about it (even without risk of deleting data):
$ vi prepare.dvc
$ git checkout -
M prepare.dvc
Note: checking out '72e0f12cb2eca10f557c846bc706d0a5a321c9f3'.
...
Another advantage of the git-checkout is avoiding user interaction (y/n questions) which has benefits as discussed in https://github.com/iterative/dvc/issues/2027#issuecomment-530239931
EDITED:
To be consistent to git checkout DVC should fail when a modified file in the workspace was based on an updated version of a file (compared to a file which needs to be checkout).
So, two dvc checkout behaviors are proposed none of them with user prompt: keep a modified file in workspace or fail.
to be fair it is not exactly the same as git because you have to specify explicitly - to overwrite the file
git checkout does not do this by default as far as I remember. So, the question should be - should we ignore changed files? Should we fail (raise an exception)?
And it's def not a "bug" - it should "enhancement" and/or "research".
Also, this change (and the discussion we had around dvc run) should be consistent with each other and with other commands that have prompts now, or don't have anything at all (dvc repro removes all data w/o asking or failing, for example).
to be fair it is not exactly the same as
gitbecause you have to specify explicitly-to overwrite the file
Could you please explain what is the difference?
So, the question should be - should we ignore changed files? Should we fail (raise an exception)?
Why should we ignore or fail? Why don't we just behave as Git? What is the reason to change user behavior?
EDITED:
Also, this change (and the discussion we had around dvc run) should be consistent with each other and with other commands that have prompts now, or don't have anything at all (dvc repro removes all data w/o asking or failing, for example).
Absolutely! My proposal - we should get rid of prompts whenever it is possible.
Could you please explain what is the difference?
@dmpetrov dvc checkout is the same as git checkout(which does not behave the way you described). dvc checkout is _not_ the same as git checkout - which is btw doing this for me:
error: Your local changes to the following files would be overwritten by checkout:
<file-name>
Please commit your changes or stash them before you switch branches.
Aborting
Even git checkout - is _not_ overwriting data. To actually loose something you have to explicitly do something like:
git checkout -- file-name which is _very explicit_ and is very different from just running git checkout (dvc checkout in out case).
@shcheklein please explain the difference between git checkout abc1234 and git checkout -?
@dmpetrov
dvc checkoutis the same asgit checkout(which does not behave the way you described).dvc checkoutis _not_ the same asgit checkout -which is btw doing this for me:
It behaves the way I described. It just handles conflicts differently depending on file that needs to be checkout. Also, it does not ask user question in any of the cases.
because you have to specify explicitly
-to overwrite the file
It looks like this is the root cause of the miss understanding. - means the last commit. It behaves the same way as git checkout abc1234.
ok, agreed, git checkout w/o arguments is no-op (so it's hard to compare it exactly). What I wanted to say, that if you want for it to loose data you have to explicitly specify it (with -- <file name>)
What I wanted to say, that if you want for it to loose data you have to explicitly specify it (with
-- <file name>)
This is not related to this issue. This is an independent feature.
This is not related to this issue. This is an independent feature.
Probably, I don't follow you this time. My point is that git checkout - is not silently overwriting uncommitted changes, unless I'm missing something. It fails with a non-zero error code, right? So, it does not but just warns about it.
For example, non-zero exit code behavior forces you account for it when you write scripts. So, like a prompt - it won't be free in terms of scripting. It's exactly the same to my mind.
Also, link to #2027 (comment) is not relevant to my mind. While in dvc run you specify something explicitly (I would still prefer to fail at least there, because it can extremely painful to lose even intermediate, end results - which might happen easily, because right now dvc run is used for multiple purposes - creating, modifying and running stages), in dvc checkout you don't specify anything explicitly.
Probably, I don't follow you this time. My point is that
git checkout -is not silently overwriting uncommitted changes, unless I'm missing something. It fails with a non-zero error code, right? So, it does notbut just warns about it.
I did not propose overwriting uncommitted changes. I proposed getting rid of user interaction and getting to the parity with Git.
Just to make it clear... In some cases, Git keeps uncommitted changes in your workspace after checkout with no aborting the command. This happens when there are no file changes between the current commit and a commit you are jumping to. In other cases - it aborts checkout.
Both of the cases make sense. We should not introduce a new logic. Instead, we should reutilize users knowledge and intuition with Git. However, there might be a good reason to change the logic - I'd love to hear this.
Also, link to #2027 (comment) is not relevant to my mind.
It was a link to a specific comment (not the whole #2027) and to a discussion about user interaction in the command line.
So, to summarize and make it clear. Your suggestion is to abort all DVC commands that potentially overwrite uncommitted changes. If that is the case, then I'm totally fine with this.
This is my take on abort vs prompt:
set -eux) abort is less safer choice.Am I missing any other pros/cons?
Here I'm talking only about this particular issue - prompt in dvc checkout. Yes, I suggest replacing prompt in dvc checkout to abort in the cases when keeping a current file can lead to inconsystency\loosing data.
In general, I only suggest avoiding to use prompt. In some cases, it should be replaced by overwriting files in other - aborting commands.
This summarisation needs to be discussed separately from this issue. I've created a separate issue\question: https://github.com/iterative/dvc/issues/2498
@dmpetrov awesome! thanks. I would probably recommend updating the issue/ticker description to be more specific - like in your last comment. So that it won't be necessary to read it to the end (at least for me it was confusing as you can tell, haha :) ) And may be include the link to the new ticket - abort vs prompt as well.
@shcheklein thank you for the feedback - updated the description of the current issue a bit and added more details to #2498
Duplication. #2801 contains more details.