Extracted from https://github.com/iterative/dvc/issues/3369#issuecomment-589248972
This would be useful for dataset registries where each dataset is a directory that uses a different DVC remote. No default remote is set in this kind of project to help prevent people from accidentally pushing a dataset to the wrong remote (e.g. different S3 bucket keys). In a project like that, get and import won't work because they expect a default remote.
Possible implementation solutions
--remote option to get/import: not great because requires manually inspecting the source project config file before being able to get/import.dvc remote default to be able to set default remotes for specific paths in a project (i.e. this would be done in the source data reg, not when getting/importing)Thoughts?
Add --remote option to get/import: not great because requires manually inspecting the source project config file before being able to get/import.
Will work, and is already in TODO :)
get/import can try every remote in the source project config file sequentially and use the first one that contains the target data to download
Kinda like things like yum/apt/etc try out different mirrors until something works. I like that idea! I would not jump into implementing it right away, but would give it a thought, since there might be some arch decisions that we need to figure out.
New behavior in dvc remote default to be able to set default remotes for specific paths in a project (i.e. this would be done in the source data reg, not when getting/importing)
It has been discussed previously somewhere, but in reality, it is not trivial to organise in a nice way. We either have to create some file that tells which paths are pushed where, or we have to abuse dvc-files and specify remotes inside, which means that they will be overwritten on some operations, which is not great. There is a subrepo(--subdir) PR though, that solves that problem for monorepo cases, which is, i would say, one of the most popular scenarios in which users might want to have different remotes for different paths. So I would wait for --subdir and then see if there are new requests coming in.
Add --remote option
already in TODO :)
Nice, I didn't know. Is there an issue or PR for it? This could be the short-term solution.
get/import can try every remote
I like that idea! ... there might be some arch decisions
arch decisions?
There is a subrepo (
--subdir) PR though
Yeah, this is an alternative solution but adds complexity (managing parent vs children DVC projects).
Thanks Ruslan!
Nice, I didn't know. Is there an issue or PR for it? This could be the short-term solution.
@jorgeorpinel, check #2466
OK so this issue is left here just to consider the enhancement of:
get/import can try every remote in the source project config file sequentially and use the first one that contains the target data to download
I think.
Most helpful comment
@jorgeorpinel, check #2466