Having an option to share data files in a peer-to-peer way is probably a good idea. It eliminates the need to pay for external services, and scales much better in the "public open project" situation (where lot of cloners would mean substantial S3 costs).
IPFS is probably the easiest to support here, together with DAT. Using BitTorrent directly seems complicated.
Hi @remram44 !
Great idea, thank you! If you wish to implement support for some of those, please feel free to take a look at dvc/remote/
directory in our project, where all current remote drivers are implemented. To support data pulling/pushing, you would only need to implement download()
, upload()
and exists()
methods, so it should be pretty easy. We will be happy to merge any proper pull request :slightly_smiling_face:
A current workaround for that would be to just pack your .dvc/cache
directory and share it with others using any P2P protocols you'd like, including bittorrent :slightly_smiling_face:
Thanks,
Ruslan
Indeed this looks easy to add. I don't have the cycles to attempt this now, but I might try in the future.
The idea of storing data in p2p\blockchain looks very appealing. We develop DVC based mostly on our industrial data science experience where p2p is not a big part of this industrial environment. But it might become soon! Recently, I got another request (not in GItHub) regarding denet.pro dApp for storing data for DVC.
It would be great to understand this p2p datasets landscape:
If there is a demand we can definitely implement this.
@remram44 please let me know if you use this kind of storages. I would really like to discuss what are use cases and your thoughts. Or if you can connect us to other users or the tool\protocol creators.
I'm thinking about the case where you make a analysis public, e.g. publish it on GitHub. Having everyone download from your S3 bucket would incur charges, hosting it on some box in your lab would provide very limited bandwidth. Peer-to-peer solutions would scale nicely.
Are there any plans to implement this? @remram44 @dmpetrov
Would a contribution still be valued?
Also I can see further applications in the field of scientific reproducibility and general public data sharing.
@icks No such plans from the core team, at least for now. Would appreciate if you could share your thoughts on this and in which scenarios you would like to use it. Contributions are always welcomed, feel free to give it shot. Ping us here or on discord if you need any help 馃檪
@icks it would be a good addition indeed. Unfortunately, it would take a while for the core team to prioritize this like @efiop mentioned :( We would really love for the community to do a contribution in this case and we can provide all the support and help on this.
Most helpful comment
Are there any plans to implement this? @remram44 @dmpetrov
Would a contribution still be valued?
Also I can see further applications in the field of scientific reproducibility and general public data sharing.