EDIT by @shcheklein
Repurposed from a few issues - both come from some misunderstanding/lack of proper docs on how config files (default vs local vs system vs global) play together.
https://github.com/iterative/dvc.org/issues/1368#issuecomment-634456922
Currently, there are only two ways to handle secrets (e.g. connection strings, tokens, etc)
.dvc/config.local file. Since this file is not shared among developers in Git, a team has to agree on a local configuration to pull/push data.AZURE_STORAGE_CONNECTION_STRING - but if one has different storage destinations, one variable is not enough.It would be very handy to use .dvc/config but with own placeholders for secrets:
['remote "my-azure-remote"']
url = azure://my-container/dvc/
connection_string = $MY_SECRET
If the right secret is set, we can share .dvc/config without revealing secrets in Git.
(at the request of @pared after speaking on Discord)
I have a data registry backed by Azure blob storage, hosted on Github. The credentials for Azure/the definition of the remote (let's call it "data-registry") are in
.dvc/config.local(in the data-registry repo), and as such are not commited to version control.I now want to use data from it in a project, which I would expect to be able to do with something like:
dvc import [email protected]:myorganization/data-registry.git data/data.json. It clones the git repo, findsdata.json.dvc, checks the local cache (doesn't find it), and then breaks onremote_conf = repo.config["remote"][name.lower()]withKeyError: 'data-registry'. I have the remote defined inconfig.localsame as in the data-registry, but it never seems to get to the point of using that information.
I am using the --local option because I don't want to put my credentials under version control (i.e. the connection_string when it comes to Azure). I would expect that since it is not defined in the original repo, but it is defined in the project repo that DVC understands it should use the remote definition that actually exists, albeit not in the original repo. Taking this principle further, if I would have defined it in the original repo and in the project repo, I would expect DVC to overwrite the config from the original repo with the one provided in the project repo.
A workaround is possible by using an environment variable to supply the connection string.
@steffansluis You could also use your --system or --global configs to define those remotes. Those will be used by get/import.
Looks we have a two different use cases here. One for the regular workflow, one for get and import. May be it's better to split this into two tickets (or even close if can't come up with some actions points).
@kopytjuk have you tried to use two configs simultaneously?
The way I understood your concern is that if we define a remote (name and all settings) in a local config (for security concerns) all other team members have to agree on the url and potentially name for the remote as well.
It's not a well known or documented feature of DVC, but DVC merges sections with same name from different configs. What it means in our case:
we put
['remote "my-azure-remote"']
url = azure://my-container/dvc/
in the regular .dvc/config and it is shared across all team members.
at the same moment, we can put
['remote "my-azure-remote"']
connection_string = <conn string>
in the local config - .dvc/config.local.
This way every team member can specify a connection string per remote, while everyone agree on URL being used.
Would it solve the issue for you, @kopytjuk ?
@steffansluis
in your case, it is the same idea, but you should be using --system or --global configs. It makes sense, since dvc get, for example by definition a "global" command - you can run it outside of repo, so it does not even have an access to .dvc/config.local.
Please, let us know what do you think?
@shcheklein I am now indeed using --global to define my the connection_string for my remotes, that works nicely for dvc import (and presumably dvc get as well). I think the config behavior could use some documentation and maybe a slight rethinking to make it a little more intuitive?
I discussed it with @efiop a bit, the main point of the current implementation is to avoid collisions in the config as that would lead to strange and hard to debug behavior. There seems be be some resemblance to how package managers like apt/yum deal with this, .e.g. "mirrors" (https://discordapp.com/channels/485586884165107732/565699007037571084/713859443204554792).
I think it makes sense to close or "rework" this issue into a general polishing of the remotes/config, although it would be even less actionable at this point, using --system or --global should cover the use cases that this issue is about. It might be nice to document the current behavior in the mean time.
@steffansluis okay, repurposing and moving this to the iterative/dvc.org for now ! Thanks for the feedback.
It's not a well known or documented feature of DVC, but DVC merges sections with same name from different configs. What it means in our case ...
Hey, thank you for your idea - that "merging" behaviour was not known to me - seems like a good solution.