I was a bit surprised to see that DVC recently added analytics tracking without asking users' permission, and without updating the docs: https://github.com/iterative/dvc/pull/1395
I bet most users would rather not send system and other information to some random endpoint in the cloud. This kind of feature (and more importantly the lack of transparency) makes it harder to trust tools like DVC.
I suggest asking users upfront during dvc init
if they want to enable analytics. That adds transparency and would at least give users a chance to opt in/out.
@mdscruggs, thank you for the great question!
Yes, we've indeed added an additional telemetry in DVC via https://github.com/iterative/dvc/pull/1395. We are working on additional documentation and changing dvc init
. It will mention that we are collecting some anonymized usage stats, the way to to opt-out, motivation, terms, etc. Overall, it'll improve transparency. It was never our intention to hide this information (all analytics related stuff was done via github tickets), we just haven't had enough time to put all the things together at once.
To give some overview (just to highlight certain important things, while we are preparing the document):
dvc config core.analytics false
to disable it within a project + add --global
to disable it per user, or --system
to disable it for everyone.We'll keep this ticket open and will close it when we change dvc init
and provide additional documentation. Thanks, @mdscruggs again!
Thanks for the thorough and quick reply @shcheklein, and thank you and the team for the work on DVC. It's a really useful tool!
It's good to hear the intent behind the analytics feature. My main concerns as a potential user of any open-source software include trust (in the devs/project/code), security, API stability, and of course performance/features (roughly in that order). Security risks can be introduced merely by upgrading OSS packages...which is akin to what could happen here with the release of DVC's analytics feature. Appropriate transparency and documentation should come alongside such features, not afterwards.
I suggest that your docs also include clear specifics of how the data is used in addition to how it is not used, along with how the data is stored, replicated, shared, retained, accessed, etc. Ideally the data would be rapidly aggregated, such that individual events are not persisted...although I realize that may not be feasible. Even better would be to share the statistics you're gathering publicly, in the spirit of OSS (you have a nice website to put them on too!).
Thanks again. Please know my intent is to provide constructive feedback, and that DVC is quite a nice tool.
@mdscruggs sorry, for the delay in response.
We've made a change to the dvc init
We've published a document that describes all the details - https://dvc.org/doc/user-guide/analytics
Appropriate transparency and documentation should come alongside such features, not afterwards.
Totally agree, we should have put that document before and start by modifying dvc init
to mention that. It's our mistake. Even though we didn't have intention to hide anything (we had an open to everyone ticket on Github).
Even better would be to share the statistics you're gathering publicly, in the spirit of OSS (you have a nice website to put them on too!).
This our intention. To share an aggregated view of the data when we have a better understanding how to do that.
Please know my intent is to provide constructive feedback
Totally, rely on this kind of feedback to grow a healthy and open community around DVC. Thanks again!
Please take a look at these changes and let me know what your thoughts are.
Most helpful comment
@mdscruggs sorry, for the delay in response.
We've made a change to the

dvc init
We've published a document that describes all the details - https://dvc.org/doc/user-guide/analytics
Totally agree, we should have put that document before and start by modifying
dvc init
to mention that. It's our mistake. Even though we didn't have intention to hide anything (we had an open to everyone ticket on Github).This our intention. To share an aggregated view of the data when we have a better understanding how to do that.
Totally, rely on this kind of feedback to grow a healthy and open community around DVC. Thanks again!
Please take a look at these changes and let me know what your thoughts are.