Dvc: webdav: parameterize the url

Created on 29 Nov 2020  路  15Comments  路  Source: iterative/dvc

The cause

The recent urls for webdav access from nextcloud and owncloud contain the username like this:

example.com/owncloud/remote.php/dav/files/USERNAME/path/to/the/content

If one wants to share a repository with such a remote, the user has to change the url, together with the user name and password. Whereas user name and password are clearly local information, the url, or at least the structure, is common for all users.

Proposal

One could introduce common variables in the strings. Hence a webdav url might look like

example.com/owncloud/remote.php/dav/files/${USERNAME}/path/to/the/content

where the dollar sign and the braces denote the variable access.

Questions

  • Is that approach feasible?
  • Which variablenames should be allowed? If we allow too much, it might get a security hazard.
awaiting response

Most helpful comment

@ANaumann85, a good workaround is to use password-protected public-link share and use share-id as username and token as a password.

Check https://doc.owncloud.org/server/10.5/user_manual/files/access_webdav.html#accessing-public-link-shares-over-webdav for more details. The downside of this is that you won't be able to see "Activity" if that matters (though, there's not much of a use of it as dvc's cache is content-addressable).

All 15 comments

@ANaumann85 quick question, since I don't know that much about webdav. If user A sets dav/files/A/storage as a remote and user B is trying to access it as dav/files/B/storage - we she the same set of files? Another way to put this question - is

dav/files/${USERNAME} defines per user location or ${USERNAME} here serves as auth only?

@ANaumann85, a good workaround is to use password-protected public-link share and use share-id as username and token as a password.

Check https://doc.owncloud.org/server/10.5/user_manual/files/access_webdav.html#accessing-public-link-shares-over-webdav for more details. The downside of this is that you won't be able to see "Activity" if that matters (though, there's not much of a use of it as dvc's cache is content-addressable).

@ANaumann85 quick question, since I don't know that much about webdav. If user A sets dav/files/A/storage as a remote and user B is trying to access it as dav/files/B/storage - we she the same set of files? Another way to put this question - is

dav/files/${USERNAME} defines per user location or ${USERNAME} here serves as auth only?

Strictly speaking these are two different questions.

  1. The url dav/files/<username>/storage links to the same files only if the storage folder is shared between the users. Otherwise they link to independent folders.
  2. The url part dav/files/${USERNAME} references only the location and has nothing directly todo with authentication. The authentication might be done through user/password or token.

So after thinking and looking at the rclone remote of git annex, there is even a better way than variables.

One could introduce

  • a webdav provider (like owncloud, nextcloud, ..)
  • specify a base url, that would be example.com/owncloud/remote.php (or just example.com?)
  • a path (or prefix) inside the webdav. That would be the path/to/the/content part in the example above.

Then the webdav backend would construct the the correct url based on that information. Furthermore, one could see that as two configuration flavors for the same kind of backend:

  1. just url + authentication
  2. provider based configuration

@ANaumann85, a good workaround is to use password-protected public-link share and use share-id as username and token as a password.

Check https://doc.owncloud.org/server/10.5/user_manual/files/access_webdav.html#accessing-public-link-shares-over-webdav for more details. The downside of this is that you won't be able to see "Activity" if that matters (though, there's not much of a use of it as dvc's cache is content-addressable).

Now it works. At the beginning, I did not know, where to get the token. It is simply the 15-digit string at the end of the share link. And I did not know, that dvc changes the remote type, if one changes the protocol in the url.

@ANaumann85, from the sharing tab of the item, open Public Link -> "Create Public Link".
A popup appears. Then, add an appropriate share name, choose either of "Download/View/Upload" and "Download/View/Edit" permissions, enter a password and then click "Share" button.

Refer here for more info: https://doc.owncloud.com/server/user_manual/files/public_link_shares.html#creating-public-link-shares

Note that if the public link is disabled or the permission is restricted by the Administrator, they might not appear on the UI.


Then, copy the link address, it will be similar to: https://webdav.host.com/s/ylh7am5LshCVLU8.

The part after the /s/ is the share-id/user, i.e. _ylh7am5LshCVLU8_ on above. The password is the one that you added on the popup. The url should be webdav://example.com/owncloud/public.php/webdav.

Please let me know if you face any issues.

Oh, I see that you already solved the issue. dvc gc might not work properly based on what permission you used (Download/View/Upload vs Download/View/Edit), but other should work.

Interesting to see, that you can select different permissions. I can only select either "readonly", or "upload and edit" or "upload only".

But to get you right, you mean that dvc gc --cloud might not work without Edit (or upload?) permissions, right? Without the argument --cloud, the command dvc gc should not change the remote.

Interesting to see, that you can select different permissions

You must be on an old version. I just use it locally most of the time, so it's the latest one. You can try out the latest here: https://demo.owncloud.org/

But to get you right, you mean that dvc gc --cloud might not work without Edit (or upload?) permissions, right? Without the argument --cloud, the command dvc gc should not change the remote.

Yup, I meant --cloud/-c, as it requires delete permission which is only available on the Edit option.

@ANaumann85

So after thinking and looking at the rclone remote of git annex, there is even a better way than variables.

sounds like a right approach, also it can be introduced w/o breaking the compatibility (only if base_url is specified, resolve remote url using it).

Does resolve mechanism also takes user and appends in to the base_url?

Then the webdav backend would construct the the correct url based on that information. Furthermore, one could see that as two configuration flavors for the same kind of backend:

@ANaumann85 not sure I follow here, could you elaborate a bit please?


One question that is still not clear to (again, mostly because I don't "feel" the regular WebDAV workflow) is what would be the recommended way to setup a DVC remote (considering that it should be shared). Inside user home directories? And share it like it was mentioned? How conflicts are solved in that case? (if user already has a a directory with the same name in the her user space?)

Do all providers support this? How common across providers to have URLs with public shares by ID)?

@ANaumann85

So after thinking and looking at the rclone remote of git annex, there is even a better way than variables.

sounds like a right approach, also it can be introduced w/o breaking the compatibility (only if base_url is specified, resolve remote url using it).

Does resolve mechanism also takes user and appends in to the base_url?

Yes. My idea was, that the remote internally uses <base_url>/<providerSpecificPart>/<prefix> for storage. The would be

  • in case of nextCloud and ownCloud
  • can be anything, for other cloud providers. Sadly, I know only our nextcloud instance for sure.

Then the webdav backend would construct the the correct url based on that information. Furthermore, one could see that as two configuration flavors for the same kind of backend:

@ANaumann85 not sure I follow here, could you elaborate a bit please?

One question that is still not clear to (again, mostly because I don't "feel" the regular WebDAV workflow) is what would be the recommended way to setup a DVC remote (considering that it should be shared). Inside user home directories? And share it like it was mentioned? How conflicts are solved in that case? (if user already has a a directory with the same name in the her user space?)

With the recent nextcloud instance, I did not face conflicts with already existing folders. But I'll check what happens than. In an earlier owncloud instance, a shared folder with name "A" where in a subfolder "shared/A". But for me it feels, that the new approach allows to take these possibilities automatically into account.

But that problem would also arise, if different users create shares with the same name.. that needs some more investigations.

Do all providers support this? How common across providers to have URLs with public shares by ID)?

At least nextcloud and owncloud support the shares by ID and token. I have no experience with other cloud stores.

Yes. My idea was, that the remote internally uses // for storage.

makes sense. Does rclone has a logic internally that detects a specific provider type?

can we ask users to specify providerSpecificPart as part of the base_url - probably not that will change in this case?

At least nextcloud and owncloud support the shares by ID and token.

for them probably it means that using shared ID is the preferable way (no name collisions, same stable URL, etc) ... I would update docs in this case.

Yes, the shared ID is preferable. But I do not know, if they work with group shares.

Regarding the providerSpecificPart : The Idea behind that is, that it might contain the username. Hence, if we ask the user for that, he has to add that information twice. Rclone asks the user for the providername and implements the access internally.

From looking at the rclone interface and the number of supported cloud stores, I got the impression, that it would be a fruitful alternative to provide something like an "rclone remote". With that you would automatically support lot of cloud store providers.

Another workaround is to just overwrite the url. E.g.

in .dvc/config

['remote "myremote"']
url = webdavs://example.com/owncloud/remote.php/dav/files/USERNAME

and in .dvc/config.local

['remote "myremote"']
url = webdavs://example.com/owncloud/remote.php/dav/files/efiop
password = 123456789

Also base_url is similar to remote:// notation that we currently have. E.g.

['remote "mybase"']
url = webdavs://example.com/owncloud/remote.php/dav/files/
['remote "myremote"']
url = remote://mybase/efiop/some/path

Yes, you are right. But the disadvantage of url is, that it contains the username.
My use case was a group shared folder. If the folders gets restructured, than every user has to change the url part manually in the local configuration.
If one splits the url in username and base part, one has to change only the url (or base url) once in the file .dvc/config.

But I think, the simplest and stable solution is the usage of the shared links from the answer above.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

ghost picture ghost  路  3Comments

prihoda picture prihoda  路  3Comments

mfrata picture mfrata  路  3Comments

anotherbugmaster picture anotherbugmaster  路  3Comments

TezRomacH picture TezRomacH  路  3Comments