I've noticed that in general the different mechanisms DVC has to handle external storage are confusing. This includes DVC remotes, external dependencies/outputs/cache, and dvc import, dvc import-url commands, or even more general concepts like "remote location" and "external data source".
We have independent docs explaining each one but no single document (or consistent references throughout) so users can clearly identify which mechanism does what or solves which problem – In fact I wonder whether the same external data problem could be solved with one or another mechanism (which could add to the confusion).
Well, we do have this page https://dvc.org/doc/user-guide/managing-external-data (and issue #143) that try to combine external outputs and external cache into a single doc but I think the effort should probably encompass all of these mechanisms.
I'm not sure exactly how to address this so just opening this possible issue and cc @iterative/engineering for a discussion. Thanks
UPDATE: Also related #103, maybe #143, #497, and #563 (specially given https://github.com/iterative/dvc.org/issues/563#issuecomment-521731979 and https://github.com/iterative/dvc.org/issues/563#issuecomment-522054626).
@jorgeorpinel , I don't have any solution to this but I agree that having a page explaining how to deal with external storage would be helpful.
A good next step would be to have several questions that we want this page to answer.
Maybe searching on Discord for keywords like external storage and remote or something that will help to figure out the FAQs from the community.
What I've identified (from skimming Discord) is that there are a lot of questions related to remote configuration and usage. Currently, there's no single page to refer to for tips and tricks when setting up a remote. Users need to go through the remote sub commands and click on the expandable boxes to know about it. There was an issue already (currently closed) but I think it might be related #499
Another frequent question is about NFS / CIFS we have no entry about this in our docs (as you already noted in your comment)
Per https://github.com/iterative/dvc.org/issues/654#issuecomment-536330090
The idea is to eventually (and I was waiting for the engine to properly support 3 levels) - is to have one single Manage External Data second level section, that will include _External Outputs_, _External Dependencies_, _External Cache_, _Shared Cache_ (or a better title to explain cases like #565, #455), and other related articles.
So who can let us know when the 3rd level of the sidebar is enabled @shcheklein @iAdramelk ? Or what issue can I follow for that purpose? Thanks 🙂
@jorgeorpinel it was done by @iAdramelk here https://github.com/iterative/dvc.org/issues/305
Closing in favor of epic #520
Most helpful comment
@jorgeorpinel , I don't have any solution to this but I agree that having a page explaining how to deal with external storage would be helpful.
A good next step would be to have several questions that we want this page to answer.
Maybe searching on Discord for keywords like
external storageandremoteor something that will help to figure out the FAQs from the community.What I've identified (from skimming Discord) is that there are a lot of questions related to remote configuration and usage. Currently, there's no single page to refer to for tips and tricks when setting up a remote. Users need to go through the
remotesub commands and click on the expandable boxes to know about it. There was an issue already (currently closed) but I think it might be related #499Another frequent question is about NFS / CIFS we have no entry about this in our docs (as you already noted in your comment)