In the past download locations for software have disappeared for various reasons: source releases were deleted, domain name registration expired, website redesign, and so on. This has led to packages failing to build.
Although there are sometimes mirrors of source code these might come with the same issue, for example when they are exact clones of the original site.
It might be good to use Software Heritage as a fallback download location, or at least as a source in case we want to provide our own mirror of the source code that disappeared from the original site. Currently Software Heritage already indexes everything from Debian, GitHub, GitLab, PyPI, Google Code, GNU and several others and make it available under a unique hash
None
None
Adding @edolstra
Do they have a content-addressable mirror?
Yes, you can get content from SH using various checksums. There is rate limiting in place though (but for for example Hydra that could possibly be lifted if needed).
I like the idea! Do they support such usage of their system?
My guess is that it depends on how we implement it and we would need to talk to them about it. My guess is that if we do it just as a backup (as soon as the original is no longer available) and first limit it to Hydra that it would be no problem for Software Heritage to lift the rate limit.
Guix implemented it recently: https://lwn.net/Articles/784401/
And a direct link to the blog post.
Imho "Debian, GitHub, GitLab, PyPI, Google Code, GNU" are not the pain points we have with nixpkgs right now. The biggest issue i see are mirrors of nonfree software such as Oracle, Nvidia, AMD or dropbox. Sources like that are no issue for GUIX due to the free nature of the repository packages.
I really like the idea of mirroring all the sources we use but maybe we can get help from other projects or technologies (ipfs, archive.org) which do not have the restriction to only mirror free source code.
@makefu right. for me, steam, amd and adobe flash are the only issues of this kind. I think most of the time it would be a licensing issue to redistribute nonfree software.
But it would still be very elegant to use Software Heritage in the longterm perspective. It might saves you some day. Imagine you want to open an obscure file format popular today in 50 years. When we have implemented it NOW, you will be able to use todays nixpkgs then.
That's two different problems to solve.
Imho "Debian, GitHub, GitLab, PyPI, Google Code, GNU" are not the pain points we have with nixpkgs right now.
@makefu They are not the problem right now, but they're guaranteed to be one in the future, especially the commercial companies in that list.
Given that all of those repositories will eventually go down, isn't it better to have as fallback an institution that explicitly has the purpose of long-term conservancy? OTOH I agree that when the problem presents itself, we can implement a SH backend, so maybe this doesn't need our attention now.
I agree that a decentralized storage layer like IPFS would also be very interesting. But maybe that's the topic for another issue?
I am totally for implementing software heritage as backend, i just wanted to point out that the things currently mirrored are only a subset of the sources we have (and break) in nixpkgs.
The nice thing with the current setup in nixpkgs is that we can just add more options.
I agree that a decentralized storage layer like IPFS would also be very interesting. But maybe that's the topic for another issue?
@makefu :
I really like the idea of mirroring all the sources we use but maybe we can get help from other projects or technologies (ipfs, archive.org) which do not have the restriction to only mirror free source code.
We do not have that restriction in Software Heritage. You're free to deposit code with a non-free license. It's just easier for us to mirror the main forges in priority, because that's where most of the code is available.
@seirl thanks for the reply! That is great to hear.
Is it also possible to mirror blobs (e.g. VirtualBox Extensions or Adobe Flash Player binary)?
@makefu It's technically possible, but it's not the intent of Software Heritage to mirror binaries. We won't filter them, but be aware that there are size restrictions that will apply. If it's for the exception rather than the norm (e.g, one package that vendors a small proprietary .so) it seems totally fine, but please don't use Software Heritage as a binary cache :-)
Cheers.
Just to add, the size restriction is currently at 100 MiB, but it's not a hard guarantee and could change in the future. We expect most of the source files deposited to be way under that.
@seirl thanks for clearing that up, i am sure more people are interested in this response as well :+1:
Imho "Debian, GitHub, GitLab, PyPI, Google Code, GNU" are not the pain points we have with nixpkgs right now.
Just because GitHub is not expected to go down anytime soon doesn't prevent people from deleting their repository. Oh, and it also regularly happens to GitHub to be down / have troubles for a few hours.
This issue has been mentioned on NixOS Discourse. There might be relevant details there:
https://discourse.nixos.org/t/how-to-fetch-lfs-enabled-repo-with-fetchfromgithub/5890/8
Hello, I'm a bot and I thank you in the name of the community for opening this issue.
To help our human contributors focus on the most-relevant reports, I check up on old issues to see if they're still relevant. This issue has had no activity for 180 days, and so I marked it as stale, but you can rest assured it will never be closed by a non-human.
The community would appreciate your effort in checking if the issue is still valid. If it isn't, please close it.
If the issue persists, and you'd like to remove the stale label, you simply need to leave a comment. Your comment can be as simple as "still important to me". If you'd like it to get more attention, you can ask for help by searching for maintainers and people that previously touched related code and @ mention them in a comment. You can use Git blame or GitHub's web interface on the relevant files to find them.
Lastly, you can always ask for help at our Discourse Forum or at #nixos' IRC channel.
This is still not merged.
Most helpful comment
Just because GitHub is not expected to go down anytime soon doesn't prevent people from deleting their repository. Oh, and it also regularly happens to GitHub to be down / have troubles for a few hours.