Go: Add ability to get full commit hash using go mod

Created on 7 Oct 2019  路  12Comments  路  Source: golang/go

What version of Go are you using (go version)? - 1.13

$ go version
go version go1.13 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

amd64

go env Output

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/mankad/Library/Caches/go-build"
GOENV="/Users/mankad/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/mankad/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/mankad/go/athens/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/gr/y6wkhscx6mb8nnkvqp48g83h0000gn/T/go-build496226794=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

This is not an issue -- this is an enhancement request

What did you expect to see?

github.com/xi2/xz 48954b6210f8d154cb5f8484d3a3e1f83489309e

What did you see instead?

github.com/xi2/xz v0.0.0-20171230120015-48954b6210f8

FeatureRequest WaitingForInfo modules

Most helpful comment

The use case is identifying commit hashes the way they are present on upstream sources like Github.

Since previous package managers that Go used (Vndr, Deps) used full commit hashes to identify and download specific checkouts, it provides with the option for Go Mod to support the same.

For a personal use case, it helps reduce the chances of a collision during identification, working with multiple projects using different package managers.

All 12 comments

What's the use-case? (Why is this important enough to build into the go command, rather than a third-party tool?)

CC @jayconrod

The use case is identifying commit hashes the way they are present on upstream sources like Github.

Since previous package managers that Go used (Vndr, Deps) used full commit hashes to identify and download specific checkouts, it provides with the option for Go Mod to support the same.

For a personal use case, it helps reduce the chances of a collision during identification, working with multiple projects using different package managers.

For a personal use case, it helps reduce the chances of a collision during identification, working with multiple projects using different package managers.

The pseudo-version encodes a 12-digit hash prefix and a timestamp.

The inclusion of the timestamp makes an accidental hash collision much less likely. And if your dependencies are malicious enough to publish an _intentional_ hash collision, you probably don't want to be depending on them in the first place.

Fortunately, they don't / aren't made to collide intentionally. We maintain an inventory for static code analysis that ties these projects as git repos, with commit hashes. Now, we have been identifying Go projects since 2015, and hence chose to identify repos without proper tagging with full commit hashes.

  • From an analytics perspective, a full commit hash helps increase consistency across the inventory we have.
  • For static code analysis, it enables us to map the full hash to vulnerabilities or known risks from feeds, and brings uniformity in things as simple as information retrieval.
  • This feature would help us and others from the community to choose from the option of identifying the correct check out with ease.

If you're not worried about collisions, why does this need to be in the go tool? It seems like you could easily write a third-party tool that scrapes the output of go list -m, identifies the underlying repos (if they're in supported version control systems), and resolves the versions to commit hashes.

We run "go mod graph" to generate the dependency list but each of those is just the pseudo version and it seems like a non-trivial task to find the underlying repo for each of those particularly if that information is already known just not available.

I'm curious as to what would be expected in CVE, would you expect the pseudo version or the full commit hash? I looked for an example and this CVE appears to reference a full commit hash. If that is the expected case, then it would make sense to use or at least somehow yield full hashes.

@taikuukaits, I would expect a CVE to include ranges of affected releases, dates, and commits.

Given that most users should be consuming released versions, the range of releases should make it nearly trivial to identify whether a given user is affected, and users consuming pseudo-versions could start by checking the date range and then verifying the specific commit if they are near the boundary.

The general argument "previous Go package managers did X so Go modules must too" would lead to modules being a union of all possible features. Instead we are aiming for a simplified set that will be easy to understand and work with moving forward.

There is basically zero chance of an accidental collision in a 12-hex-digit hash in typical repo sizes. And there is basically 100% chance of a malicious collision with a 40-hex-digit hash, since SHA1 is broken. So expanding to 40 digits from 12 would not accomplish anything except making the files harder to read. And of course tags like v1.2.3 record 0 hex digits.

The answer for a real guarantee about avoiding collisions is go.sum, which uses a more secure hash (SHA256) that is VCS-independent and that we can therefore update quickly as needed. The hash also applies we will be able to update as needed and applies to both pseudo-versions and tagged versions.

Given the commit hash prefix/version tag you can avoid collisions (if for some reason you think they are likely) by looking up the prefix/tag, fetching the code, calculating the go.sum checksum, and checking against the go.sum file. Or you can let the go command do this for you, which it does all the time. :-)

I would agree that just because it used to do it is not sufficient reason. I think it's less about the collisions themselves and more about having to translate a go-specific format into the more widely used full commit hash.

My argument would simply be that full hashes are what is present on the source repository and that the full hashes are present in a CVE so being able to get the full hashes out of go.mod would be helpful as the lookup has to occur somewhere and if the information is already present and go already knows how to do it then it seems reasonable for it to occur there. I wouldn't say full hashes need to be used everywhere but even the ability to add a flag to graph would be sufficient.

For example if we could pass "--long" as a flag to go mod graph.
go mod graph --long

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

I am looking at this and trying to figure out what the commit is on Github for the repo highlighted in orange:

Screenshot from 2020-04-13 12-31-02

how can I figure it out? All I want is to copy the commit hash and go to github and load it, but I can't?

@ORESoftware The commit hash is 6d1c4477e6b9; however, that repo does not seem to exist.

Unlike many projects, the Go project does not use GitHub Issues for general discussion or asking questions. For questions about using Go, see https://github.com/golang/go/wiki/Questions.

This issue is closed and closed issues are not monitored. If you are still having an issue with Go, please open a new issue.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

michaelsafyan picture michaelsafyan  路  3Comments

ajstarks picture ajstarks  路  3Comments

longzhizhi picture longzhizhi  路  3Comments

enoodle picture enoodle  路  3Comments

jayhuang75 picture jayhuang75  路  3Comments