Hi,
I set up a NextCloud as a remote for DVC. So far so good! Only weak spot is, that I cannot screen my pushed data as the file-extension is missing after doing a push.
Is it possible to preserver the file-extension during a push, so one can have a look at it's data on a remote server / cloud?
Best regards
Sebastian
Hi @smoosbau, thanks for creating an issue. This is fundamental to how dvc and dvc remote works.
DVC remote is a backup cache, meant to be used for DVC.
And, because DVC takes care of file duplications, there's no 1-1 mapping to preserve file extension. I know that this does not look good for file-hosting services such as GDrive/NextCloud, etc. but right now, the best approach is to take it as a black box.
Not to add all the metadata are not in DVC remote but in the git repository. Best way to check files are by using dvc commands, or getting a direct link using dvc get --show-url.
That said, we had a similar open issue, can't seem to find it. I'll comment if I do. Thanks again.
Found the issue: https://github.com/iterative/dvc/issues/3621, though it does not solve this.
It's more of a way to recover a repo, i.e. mapping of the file in cache/remote with a new empty file based on prefix, you'd still need to _ignore_ the cache.
Thanks for your quick response @skshetry!
Then I'll have to find a workaround! :)
@smoosbau just to add my 2cs to this :), primarily to help with a workaround.
As @skshetry mentioned, DVC default mode is to use so called "content-addressable" key-value store to keep files. It's done for a few reason - deduplication is one of them, another is an ability to version things in the first place, and other benefits that you can read about here, for example. If you have data.xml in your repo and you update the content of it, which one should we store? In DVC we store both, we just rename them in a way that DVC can find them later. Namely, it creates .dvc or dvc.lock/yaml files to store the information about the initial file name and puts into Git.
This way Git becomes a place where we keep the information about file names, about extensions, etc. Thus, a few questions:
There is no one single answer to this, but I can share some things we have done so far, and would really love to hear your opinion on what is missing, what is your use case, etc:
dvc list command. You do something like dvc list https://github.com/iterative/example-get-started and you get all the files in that repo/including DVC-tracked:
.gitignore
README.md
data
dvc.lock
dvc.yaml
model.pkl
params.yaml
prc.json
scores.json
src
(Here model.pkl is tracked by DVC, it is not visible in the UI here.
Then you could use dvc get, dvc import or dvc.api with the similar interface to access those files.

Bottom line, with default DVC way of organizing projects and data you don't access data directly. Git repo becomes an entry point with all the benefits, but with some potential limitations. Would be really great to learn more about your use case and what do you think about the options ^^.