dvc update --rev hello_world file.dvc
it is still not supported but it seems like a very handy alternative to re-importing:
dvc import --rev hello_world https://github.com/dmpetrov/dataset file
Copying main part of the discussion about this from Slack:
@Suor: When you you do
dvc import -r ...
on an existing import stage it rewrites revisions thus switching a tag or a branch.
Dmitry: ...we need a shortcut for re-importing/
import
, andupdate
looks like a reasonable alternative...
Alexander: You mean you want avoid retyping the url?
Dmitry: Yes. Also,
update
seems a proper command name for updating version.
I understand that switching branches is kind of exception, but this is not what I usually do with imported datasets.
Updating vs. re-importing is definitely a source of confusion, and it's reflected in our docs (which also requires the term "fixed-revision import". See the following excerpts:
From https://dvc.org/doc/command-reference/import#example-fixed-revisions-re-importing:
If the Git revision moves (e.g. a branch), you may use dvc update to bring the data up to date. However, for typically static references (e.g. tags), or for SHA commits, in order to actually "update" an import, it's necessary to re-import the data instead, by using dvc import again without or with a different --rev. This will overwrite the import stage...
From https://dvc.org/doc/command-reference/update#examples:
For typically static references (e.g. tags), or for SHA commits, dvc update will not have any effect on the import. Refer to the re-importing example to learn how to "update" fixed-revision imports.
From https://dvc.org/doc/use-cases/data-registry#example (in an expandable section):
In order to actually "update" it, do not use dvc update. Instead, re-import the data
So I like the idea about dvc update --rev
because it will probably simplify all these explanations, although it will also require changing its command reference to explain that there are 2 types of updates supported:
url
AND rev
(if present) – only updating the actual data and rev_lock
field.rev
AND look for changes, updating data and rev_lock
as well.Alternatively we could introduce a new command or subcommand such as dvc import move
to change an import stage's rev
value, in order to then dvc update
it.
Hey, I'll give it a try to create a solution for this feature!
@dmpetrov hey, what is the expected behavior after dvc update --rev another-rev
, should it lock original import to another-rev
or do not create side effects like that?
current behaviour is, it locks to another-rev
but also pulls the rev each time from the remote which I think it should just cache if the last commits are the same?
@ilgooz dvc update
and dvc update --rev latest_rev
should do the same, so yes, it should lock to another-rev
, because that is how dvc update
currently works too.
Do you mean that it is pulling the rev
on each update
? If so, it is how our current caching is implemented. There is no need to tackle that in this PR, feel free to create an issue for it though :slightly_smiling_face:
@ilgooz
dvc update
anddvc update --rev latest_rev
should do the same, so yes, it should lock toanother-rev
, because that is howdvc update
currently works too.Do you mean that it is pulling the
rev
on eachupdate
? If so, it is how our current caching is implemented. There is no need to tackle that in this PR, feel free to create an issue for it though 🙂
Thanks, yes, I asked two different Qs and got my answers!
@efiop, why is this p0 btw?
@skshetry Just trying to unblock @andronovhopf . Thank you so much for the fix! :pray:
Most helpful comment
Hey, I'll give it a try to create a solution for this feature!