Currently in registries we store the SHA-1 hash of the tree of each registered version (git-tree-sha1).
Since SHA-1 is not super secure anymore, and since Git is planning to move to SHA-256, I think we should start also storing the SHA-256 hash of tree of each registered version. Presumably we will call it git-tree-sha256.
I'm not sure where this issue should go. Candidates include:
Feel free to move this issue to a different repo
Two steps here:
It should be helpful that Tar.jl can compute the git-tree-256 hash of any tarball, e.g.:
julia> import Tar
julia> Tar.tree_hash(`bzcat /Users/stefan/tmp/General.tar.bz2`)
"e387f456eb982cf13aed1a214df2a8b8434302c4"
julia> Tar.tree_hash(`bzcat /Users/stefan/tmp/General.tar.bz2`, algorithm="git-sha256")
"87a284f324fc6df4778b81de03bdc303d59587b4b1c839d586cf49a80bb3eec3"
I'm not even sure how to coax git into doing that yet, so we're definitely getting the jump on this.
@staticfloat, since the storage servers already have all the tarballs in question, what's the best way to leverage that? Hit them directly to serve said tarballs, thereby avoiding trashing the LRU cache on the package servers? Or run a script directly on one of those machines?
Regarding location of issue, I think we can coordinate from here and open specific issues on various other repos about specific changes. E.g. RegistryTools seems like the right place for step 1 while General is where the PR resulting from doing step 2 will have to go. They can link back here to make it easier to keep track of what's going on.
Most helpful comment
Two steps here:
It should be helpful that Tar.jl can compute the git-tree-256 hash of any tarball, e.g.:
I'm not even sure how to coax git into doing that yet, so we're definitely getting the jump on this.
@staticfloat, since the storage servers already have all the tarballs in question, what's the best way to leverage that? Hit them directly to serve said tarballs, thereby avoiding trashing the LRU cache on the package servers? Or run a script directly on one of those machines?