First, https://dvc.org/doc/user-guide/external-outputs describes caches configuration, but not outputs. All the examples describe how to reconfigure cache. Let's rename it to something like Cache Reconfiguration or Setup Cache and let's change the section introduction paragraph accordingly.
[ ] Local cache reconfiguration is missing. Something like dvc config cache.dir /mnt/cache.
Also, it is not clear explain why we use cache.{s3, gs, ssh, hdfs} but for local cache we use cache.dir, not cache.local: cache.dir is a shortcut for cache.local.
[ ] Consolidate everything related to Managing External Data (also including https://dvc.org/doc/user-guide/external-dependencies) under this document, adding an index.md file to do an overview of different approaches and features DVC gives to support it.
I think External outputs name is suitable since the article explains both cache configuration(as a necessity for outputs) and outputs themselves. You can see dvc run commands showing how to use external outputs.
We indeed need to explain better external output case for local file outside of the dvc project. Also, cache.dir is a shortcut for cache.local. That needs better explaining as well.
This issue is a little old and I see the doc is now called "Managing External Data" but from what I'm reading in the description this problem is still pending an update right? And should I also rename the URL so it's not external-outputs but that it matches external-data at least?
I somehow missed when the renaming took place, not sure about the reasons, to me it makes it more confusing, since external-data is more about dvc add, but external outputs is about dvc run, which is described there as well. Maybe it would be worth splitting that article into two external-data and external-output?
@efiop @jorgeorpinel the idea was to consolidate everything related to "external data management" under this new section + add index.md to do an overview of different approaches and features DVC gives to support it. That was just the first step, but it still way better than having just "external outputs" (as a User Guide top level secion name) which is not descriptive at all, especially for the dvc add case.
Makes sense. Maybe we should update the title and description of the issue to detail on that idea? Then we'll just prioritize accordingly. I could address this after #425 for example.
@jorgeorpinel yep, feel free to update accordingly ... please, check if there are other tickets related to this already
@shcheklein ok I updated the description, does it look good?
@jorgeorpinel looks good!
Question:
Consolidate everything related to Managing External Data
What about external data sources? I.e. DVC remotes, dvc import and dvc import-url commands, or even more general concepts like "remote location" and "external data source". (from #566) Should we mention any of those in this new consolidated doc, or keep separate?
@jorgeorpinel I think we can definitely mention them to provide a full picture of different options how one can connect data from different sources
So basically #566 is a duplicate of this issue. Meaning, the consolidated document here described could solve both issues, right? I'll merge them if so.
This is a duplicate of #566 and https://github.com/iterative/dvc.org/issues/657#issuecomment-536857675. Please close 馃檪