Pkg.jl: Feature request: start recording SHA-256 tree hashes in registries

Created on 10 Jun 2020  路  2Comments  路  Source: JuliaLang/Pkg.jl

Currently in registries we store the SHA-1 hash of the tree of each registered version (git-tree-sha1).

Since SHA-1 is not super secure anymore, and since Git is planning to move to SHA-256, I think we should start also storing the SHA-256 hash of tree of each registered version. Presumably we will call it git-tree-sha256.


I'm not sure where this issue should go. Candidates include:

  • General registry
  • Pkg.jl
  • Registrator.jl
  • RegistryTools.jl
  • some other place?

Feel free to move this issue to a different repo

Most helpful comment

Two steps here:

  1. Update the tooling to start saving git-tree-sha256 as well as git-tree-sha1 when registering new versions.
  2. Write a script to run through all the versions in the registry, get a tarball for each one and compute the git-tree-sha256 hash of the tarball and add it to the registry data.

It should be helpful that Tar.jl can compute the git-tree-256 hash of any tarball, e.g.:

julia> import Tar

julia> Tar.tree_hash(`bzcat /Users/stefan/tmp/General.tar.bz2`)
"e387f456eb982cf13aed1a214df2a8b8434302c4"

julia> Tar.tree_hash(`bzcat /Users/stefan/tmp/General.tar.bz2`, algorithm="git-sha256")
"87a284f324fc6df4778b81de03bdc303d59587b4b1c839d586cf49a80bb3eec3"

I'm not even sure how to coax git into doing that yet, so we're definitely getting the jump on this.

@staticfloat, since the storage servers already have all the tarballs in question, what's the best way to leverage that? Hit them directly to serve said tarballs, thereby avoiding trashing the LRU cache on the package servers? Or run a script directly on one of those machines?

All 2 comments

Two steps here:

  1. Update the tooling to start saving git-tree-sha256 as well as git-tree-sha1 when registering new versions.
  2. Write a script to run through all the versions in the registry, get a tarball for each one and compute the git-tree-sha256 hash of the tarball and add it to the registry data.

It should be helpful that Tar.jl can compute the git-tree-256 hash of any tarball, e.g.:

julia> import Tar

julia> Tar.tree_hash(`bzcat /Users/stefan/tmp/General.tar.bz2`)
"e387f456eb982cf13aed1a214df2a8b8434302c4"

julia> Tar.tree_hash(`bzcat /Users/stefan/tmp/General.tar.bz2`, algorithm="git-sha256")
"87a284f324fc6df4778b81de03bdc303d59587b4b1c839d586cf49a80bb3eec3"

I'm not even sure how to coax git into doing that yet, so we're definitely getting the jump on this.

@staticfloat, since the storage servers already have all the tarballs in question, what's the best way to leverage that? Hit them directly to serve said tarballs, thereby avoiding trashing the LRU cache on the package servers? Or run a script directly on one of those machines?

Regarding location of issue, I think we can coordinate from here and open specific issues on various other repos about specific changes. E.g. RegistryTools seems like the right place for step 1 while General is where the PR resulting from doing step 2 will have to go. They can link back here to make it easier to keep track of what's going on.

Was this page helpful?
0 / 5 - 0 ratings