Go: proposal: cmd/go: secure releases with transparency log

Created on 23 May 2018  ·  60Comments  ·  Source: golang/go

[The text of this proposal is outdated. Find the whole proposal here.]

This is a proposal for a long-term plan to provide transparency logs to verify the authenticity of Go releases. It's not something we are ready to implement anytime soon.

Transparency logs are append only Merkle trees which are easy to audit, and provide efficient proofs of inclusion. They are used for Certificate Transparency and are starting to be used for binary transparency.

They are a good fit for securing releases:

  • The log will fetch releases directly from the source, punting the spam issue on GitHub (etc.) or domain registries (as we can ban accounts/domains).
  • Clients will ask the log(s) for the release hash, and for proof that it was included in the append-only log. Module authors can audit the logs for their own projects, or get notified about new versions of it.
  • An hypothetical go release tool can trigger submission of the version to the log, and then verify that its hash matches what the developer has on disk. This is especially nice as it keeps the host (i.e. GitHub) honest.
  • Logs can also gossip with each other to make sure that a different version has not been observed before. (This is important so that two logs don't end up disagreeing on a version hash when the author changes the tag in between two log submissions.) go release can also check with logs that a version does not exist yet before tagging it.
  • Logs can be audited by third parties by comparing their entries to the packages fetched from git (maybe using the GitHub API to learn about new releases as soon as they are pushed) or by clients by comparing their global (#24117) or observed modverify files.
  • Proxies can be integrated with this system so that they will verify packages they are proxying. We can then support the concept of a trusted proxy, so that for example internal company systems will connect only to the proxy and not to the external logs.

The security of such a system is superior to what is provided by modverify, which is effectively pinning to the view of the developer adding the dependency. Transparency logs pin to the first time the version was globally observed, and with the go release workflow they pin directly to the view of the developer who created the dependency.

We can probably build the implementation on top of Trillian, a transparency log (and map) implementation which has the explicit concept of "personalities" for the custom use-case logic. (CT is a Trillian personality.)

Ideally, these logs would be operated by multiple players in the community, and a client could choose to trust or submit to any number of them.

We can build the tooling outside the go tool as a way to check/generate modverify entries to experiment until we feel comfortable with it.

FrozenDueToAge Proposal Proposal-Accepted early-in-cycle modules

Most helpful comment

@SamWhited makes some excellent points here:
https://groups.google.com/d/msg/golang-dev/DD88cds-LuI/1ndy2ol3BQAJ

The Go community might benefit from a foundation, to provide services under a different corporate umbrella than Google.

All 60 comments

Not sure what the difference is between this issue and #24117.

24117 is about a system-wide go.sum (which might be made superfluous by this).

Got it. I retitled #24117 to avoid the confusion.

Change https://golang.org/cl/165018 mentions this issue: design: add 25530-notary.md

Published formal proposal: https://golang.org/design/25530-notary.

Regarding the lookup endpoint in the notary proposal.
Let's imagine a scenario where a company wouldn't like to leak import paths, and do use a proxy that caches the notary log according to:

The heavier, complete solution for notary privacy concerns is for developers to put their usage behind a proxy, such as a local Athens instance or JFrog’s GoCenter, assuming those proxies add support for proxying and caching the Go notary service endpoints. (Those endpoints are designed to be highly cacheable for exactly this reason, and a proxy with a full copy of the notary log doesn’t have to leak any information about what modules are in use, at the cost of maintaining its own index to answer lookup requests.)

Wouldn't the company end up with lots of leaks for every new developer that forgets to set GOPROXY? I imagine that during migration for example this will be common. Or developers that try out things in a container and forgets.

Would it make sense to instead hash the module name & version for lookups instead?

When Go turns the notary mechanism on by default, it seems that all developers will suddenly have to be aware of immediately setting a GOPROXY or GONOPROXY environment variables or private import paths will leak without them realizing it. Is my assumption incorrect?

@SamWhited makes some excellent points here:
https://groups.google.com/d/msg/golang-dev/DD88cds-LuI/1ndy2ol3BQAJ

The Go community might benefit from a foundation, to provide services under a different corporate umbrella than Google.

The Go team at Google will run the Go notary as a service to the Go ecosystem, similar to running godoc.org and golang.org. There is no plan to allow use of alternate notaries, which would add complexity and potentially reduce the overall security of the system, allowing different users to be attacked by compromising different notaries.

@rsc This decision will discriminate against Go programmers in countries where golang.org is blocked, such as China, or unavailable to countries like Crimea, Cuba, Iran, North Korea, Sudan, and Syria. I hope that you will reconsider this decision so that access to Google owned IP addresses is not the deciding factor in who can securely use modules post Go 1.12.

@davecheney Even allowing multiple notaries, we'd need an unblocked entity to run a service for those clients, and the clients to configure it. With the proposed design, that entity can run a proxy which maintains a full copy of the notary, and clients can configure it as their proxy.

I don't have the policy background to make a full assessment, but I believe this proposal has if anything the potential to raise security above Go 1.12 behavior (which will still be available with GONOPROXY=* GONOVERIFY=*): it enables untrusted proxies, so clients are free to select any proxy that meets the requirements not to be blocked wherever they are, and still be protected on the content of the code they fetch. It provides more flexibility in who to connect to, not less.

@FiloSottile is there a way to distribute the notary logs equitably? Saying that a proxy operator in a country, who is unable to access google ip space, can act as a proxy assumes they have a mechanism to access an up to date copy of the information their government denies them. I think this is an unreasonable burden to place on those operators who may face legal sanctions from their government for hosting such information.

Change https://golang.org/cl/165380 mentions this issue: cmd/go: add notary simulation and GONOVERIFY support

@davecheney I again am not qualified to tell what content is sanctioned and how, but note that the entire contents of the notary are module names, versions, and hashes. Moreover, a notary can refuse to serve certain records without breaking support for all other ones.

Also, a proxy can 404 actual modules contents, and as long as a notarization is available, the go client is allowed to fallback to the VCS for fetching, so you can run a proxy that serves exclusively the notary.

@FiloSottile If golang.org is blocked in a locale (including inside highly regulated firms, not just politically defined locales), I think the problem is that the proxy server will be unable to fetch contents of notary.golang.org for caching.

We've seen a similar but slightly different instance of this problem with module proxies hosted somewhere with restricted access to public VCS servers.

Some additional thoughts to add to my previous comment.

Adding to @marwan-at-work's question about GONOPROXY, there is no way to verify private modules, since /latest is signed by the google server and all the tiles get authenticated up to the top-level hash served by /latest.

Specifically this means that all private modules need to be listed in GONOVERIFY (that's been confirmed in https://github.com/gomods/athens/issues/1105 as well as the proposal under the last paragraph of https://go.googlesource.com/proposal/+/master/design/25530-notary.md#command-client)

I think forcing every developer to use GONOVERIFY to allow their code to build against private module dependencies is problematic because it encourages less secure practices over time. Multiple VCS servers get deployed inside organizations as time goes on. This means that the number of entries inside GONOVERIFY may need to increase if these VCS servers may not share a common suffix.

For example:

  • Team 1 goes and deployes an internal GitLab server at team1.gitlab
  • All internal developers who write apps that depend on team 1's modules set GONOVERIFY=*.team1.gitlab or, more generally, GONOVERIFY=*.gitlab
  • Team 2, who is completely unaware of team 1's gitlab server, deploys team2.vcs
  • Any developer who relies on team 1 and team 2's modules inside their codebase need to add *.vcs to their GONOVERIFY list

As an organization matures, this pattern continues, and each developer has to continuously update their GONOVERIFY in lockstep. Of course, the path of least resistance will always be to turn off verification, so that's what most people will inevitably do.

I have two proposals to help with this problem, one lightweight and one heavier weight.

The Light Proposal

First, the lightweight proposal could be to allow developers to specify, _in a file that gets checked into the codebase_, the modules to not verify. With this file, we can achieve two benefits for private codebases:

  1. Private modules and/or public modules that shouldn't be verified are identified once for all developers, with no extra configuration on any one development machine
  2. CI systems can ensure that no code gets merged without all private modules listed in that file, with no extra configuration in the CI system (the build will fail if a private module is not listed in the file)

The Heavy Proposal

Second, the heavier proposal would enable private notaries to keep their own log of just the private modules that an organization wants verified. I suggest that developers be able to add a directive to the go.sum file to indicate that a hash should be verified by a different notary server. This means that they should be able to specify a notary URL and the location of a public key to use when authenticating the return value from /latest for that URL.

Final Note

And one final note, I haven't seen firsthand the need for the second proposal inside an organization (although I can imagine some scenarios), but, since this feature can be used for verification of any module public or private, it certainly would help in other situations where golang.org is blocked. @davecheney described some of those in https://github.com/golang/go/issues/25530#issuecomment-469626733.

_Edit: updated reference to @davecheney and vcs domain suffixes_

@arschles, you raise a good point about the configuration burden. These things seem clear to me:

  • It must be possible to opt out of verification for some paths.
  • The proxy cannot be trusted to tell developers which paths have opted out.
  • An environment variable like GONOVERIFY is needed for the general case.

I like your team1/team2 example and I think it does motivate some way to share configuration, like in your "light proposal". It may be that focusing on this one variable is wrong and that there should be a per-module go.env file (in the sense of #30411). We need to figure out what the right mechanism is, and to limit the complexity.

In the "heavy proposal", can you help me understand the scenario in which it makes sense to have a notary for private modules? If an organization can't trust its own internal TLS connections, whether to their source code server or their internal proxy, they're in a pretty bad place, right? Having an internal notary seems either unnecessary or else not nearly enough.

I agree with @rsc that private code does not necessarily need to be verified. But there's a good chance that you're in such a large company you want to make sure the code you share across the company has a shared go.sum that everyone validates against.

However, the problem with GONOVERIFY whether it's in a committed file as @arschles suggested or whether it's in an env var, is that we end up with two bad choices and can only choose one of them:

  1. Leak all import paths by default.
  2. Do not verify anything by default.

The reason for this is because the proposal suggests turning on the Notary by default in 1.13.

This will cause a hard failure if a user (or a CI/CD) did not specify GONOVERIFY and has private code the moment they upgrade to 1.13. Not only the private import paths will leak, but the failure will break the compatibility promise because once 1.13 comes out, many people will upgrade and watch most of their builds fail.

Suggestion:

Would it be possible to not turn on the notary by default in 1.13? One way to do this, is to look for GONOVERIFY and if it has values, then notary should be turned on, otherwise it's off.

And maybe in 1.14 or higher, we can turn the Notary on which would have given the entire community "enough" time to migrate.

@davecheney, we're aware of the concern about various countries and looking into what we can do, if anything. Please note that https://support.google.com/a/answer/2891389?hl=en is a page specifically about Google sign-in for business services (G Suite), and to be _very_ clear, nothing about the notary requires Google sign-in.

If someone wants to set up an alternate notary, that's easy, and we will publish a reference server (https://go-review.googlesource.com/q/f:notary). If Go users want to use an alternate notary, that's easy too - edit $GOROOT/lib/notary/notary.cfg once it exists. If someone in a country blocked off from Google sets up a notary and other users decide to change their Go setups to trust that notary, that's completely OK.

@marwan-at-work, I'm trying to understand the part about Go 1.13 vs Go 1.14. If we delay changing the default until Go 1.14, doesn't that make all the bad things you mentioned about Go 1.13 happen in Go 1.14 instead? What is it exactly that you expect users to do with their "enough time" during the "Go 1.13 has been released but Go 1.14 has not" window?

@rsc I was just hoping to get a confirmation that this proposal would in fact break people's builds and leak their import paths if they were not aware of GONOVERIFY. Is that assumption correct?

Delaying to 1.14 is just one suggestion, which I think would give people more time to ensure they have a GONOVERIFY configured. There are a few other options of course, but I just wanted to double check my assumption above first before diving deeper into alternatives.

Thanks

@rsc good point regarding internal TLS, I didn't think of that!

Considering @davecheney's comment about locales, my thoughts about internal organizations, and @marwan-at-work's question about essentially leaking private module names by default, would it make sense long term to make the notary opt-in, and easier to configure for other notaries? I see a few benefits to those two properties:

  • The opt-in behavior helps prevent folks from shooing themselves in the foot and leaking private module names (@marwan-at-work posed this one)
  • Making it easier for folks to configure their machine to talk to a different notary allows organizations to verify public modules while still reducing the likelihood that private module names get leaked
  • Making it easy for any developer to point their toolchain to a different notary makes verification more accessible to folks in locales that can't ping golang.org (@davecheney posed this one)

Here's a rough proposal on how a local toolchain could be configured, with ideas taken from above comments and #30411:

  • If the developer does nothing, verification is turned off
  • The Go tool still comes with the notary.golang.org public key built-in, and allows developers to add a notary=default or similar to a project-local go.env. Setting to default turns on verification using notary.golang.org and the built-in public key
  • If developers want to run their own notary, they can override it in the project-specific go.env or in their machine-global notary.cfg. notary.cfg should take precedence to protect against cases where a single private codebase is misconfigured. A few notes on running notaries that use notary.golang.org:

    • The proposal doesn't indicate that you can specify a custom notary URL in notary.cfg, so that would need to be added

    • If a private notary is going to cache tiles from notary.google.com and also verify private modules at the same time - which it will need to unless you can turn off verification per-hash in the go.sum - then it will need to re-sign the values returned by the google notary's /latest endpoint

There are some assumptions above. Looking forward to hearing what you think.

Oh, and I didn't mention above that the rough proposal enables I think 4 different ways to configure verification, which feels like too much complexity. I'm of course open to reducing the options, but I do think that some kind of per-VCS-repository configuration is a must

I think it would be a huge win if we had the notary on by default. Imagine if users had to manually opt-in to HTTPS. The internet today might still vastly be in HTTP.

The only problem with turning the notary on by default, is the fact that Go will never be able to automagically find out which import paths are public and which are private.

Therefore, I'd like to throw in to the mix the idea of turning on the Notary by default while still preventing private import paths from leaking to the network.

To be able to turn it on by default and not break everyone, here are the options I can think of aside from my "delay" option above:

1. Consider the on-by-default feature a breaking change and bump to Go 2.0:

This can be done in lieu of 1.13 or maybe have the notary be opt-in for 1.13 (and higher), as @arschles suggested, until Go actually wants to introduce 2.0 and only then do we include the notary as an on-by-default feature.

2. Let the notary be turned on by default, but under some conditions:

Looking at how the community is using Go today we can make the following assumptions:

A. Most people right now are not using a GOPROXY.

B. I can safely say that many people are using Go Modules while many are still using Dep or older alternatives. That can be summarized in the fact that: they are using regular VCS fetching.

Therefore, we can potentially detect the user's current state (old or new) and based on that turn on/off the notary verification.

For example, we can turn on the notary based on the following conditions:

Condition 1: if the Go version in go.mod is 1.13 (or above), turn the notary on.
Condition 2: if the Go version in go.mod is Go 1.12 (and below) and GOPROXY is on, turn the notary on.
Condition 3 if the Go version in go.mod is Go 1.12 (and below) and GOPROXY is off, turn off the notary.

We can potentially get rid of Condition 2, and just have Condition 1 be the only way a notary is on-by-default.

would it make sense long term to make the notary opt-in

Sorry, but no, it wouldn't (as Marwan already said). One of the things we do in Go is make sure to have the right defaults out of the box. "Insecure by default" is the wrong default.

One idea I have gone back and forth about is maybe making the lookup send SHA256(module)\@version instead of module\@version. Obviously the notary does not know how to invert SHA256 in general, so it would only be able to answer for module paths it was already aware of. But then we'd need to have a separate way to tell the notary about a new module path. So 'go get' of a module unknown to the notary would fail after leaking only its SHA256 hash, and then you'd have a choice:

  1. If it's a private module, set GONOVERIFY appropriately.
  2. If it's a public module, run something like 'go notify module' so the notary will recognize SHA256(module) in the future.

And then you run 'go get' again and this time it works. I don't really like the complexity of that. If most corp users would be using their own proxy anyway (which could reject any notary lookups for private packages itself), then there's no benefit to this added complexity. Maybe the default would be module@version but there could be an 'opaque mode' that sends SHA256(module)@version instead. Or maybe the reverse. Or maybe something else entirely.

I don’t know if it’s worth the complexity of the manual submissions either, but if we do it, we should do it with k-anonymity, where we send a short hash prefix and answer with all possible matches. Module names have very little entropy, so reversing SHA256 is not actually hard.

So 'go get' of a module unknown to the notary would fail after leaking only its SHA256 hash

@rsc this would still be a breaking change as far as I can see. If so, can that be avoided by looking at go.mod's go version?

On another note, can GONOVERIFY and/or GOPROXY be included in go.mod as new directives similar to require and replace? Or potentially a new file format in the module root as Aaron suggested?

About privacy concern, maybe notary and proxy to Google should be opt-in to be compliant with EU law.

Perhaps this is off-base, but if the information that is ultimately sent is SHA256(module)@version (or perhaps a prefix or similar for k-anonymity), then it seems like a 'go notify module' would be a one-time operation over the lifetime of a given module path for a public module?

If so, I wonder if 'go release' could have a role here? Setting aside with the exact flag might be, something like 'go release -public' could do normal 'go release' behavior plus perform the operation suggested for 'go notify module' above. That might help shift the expectation and behavior to be something that a public module author expects to do (once) if they want to facilitate consumers of their public modules.

In addition, or perhaps alternatively, I wonder if it would be feasible for 'go release' when run without any '-public' flag to check at that moment in time whether or not the module is known to the notary, and report that back to the user. Depending on the ultimate resolution of GONOVERIFY behavior, perhaps 'go release' could check that as well. That could perhaps help public and private authors do the right thing.

Under that proposal, 'go notify module' or similar could still exist for the hopefully rarer case where a public module author forgot to do the right thing but a module consumer would like to get the information into the notary.

In short, 'go release' could be seen here as helping codify some best practices of things to do when releasing a module.

Also, would the index service be able to automatically trigger getting the large majority of SHA256(module) in place for public modules?

From https://blog.golang.org/modules2019:

And the index service makes it easy for mirrors, godoc.org, and any other similar sites to keep up with all the great new code being added to the Go ecosystem every day.

@marwan-at-work, I don't know what you mean by "breaking change" in this context. Tools like the go command are not covered by the Go 1 compatibility document. They change behavior from time to time.

@FiloSottile, k-anonymity seems like overkill. Also it doesn't work: if we send SHA256(path)\@version and the notary knows what path must be but doesn't have that version yet, it can go grab that version. If we send a truncated SHA256 then the notary has to go (try to) grab k versions, which is k-inefficient.

Remember, the problem I am solving with the SHA256 is just "don't tell the notary a private import path". I am _not_ solving "don't tell the notary which module you are interested in at all". The latter problem is solved by proxies, which can prefetch the entire database and serve it themselves.

@thepudds, yes, once there is a 'go release' we definitely want it to validate that the copy the notary sees matches the local copy, and that would imply telling the notary about the path. And yes, 'go notify module' is only once per path so the vast majority of users would never need to run it - someone almost certainly already has.

Fair enough regarding turning verification on by default

I think me and @marwan-at-work (correct me if I'm wrong, Marwan) are both trying to prevent build breakages out of the box (i.e. turning off noverify for private modules), but approaching it from slightly different ways. And I've been talking a lot about making it easier in the UX to use other notaries (i.e. for folks who can't access golang.org or who want to use other notaries).

In the former, I agree with @marwan-at-work's conditions, and ideas to set noverify modules in the go.mod. To help with the latter, the only thing I'd add to the options in the go.mod is making it easier to set or point to the public key of the notary you want to talk to. Also in the latter case, I don't think sending anything up to notary.golang.org is an option in the cases we need to be looking at for the latter, so I'm not sure if that solves anything.

I also have concerns over adding a go mod notify because essentially that requires that someone who releases a new ~version~ module path needs to add something to their CI/CD systems (or do it manually) to run go mod notify, right? That would be a big diversion from the current workflow for all module authors, which is tag it and you're done. That would disrupt another extremely common workflow that affects lots of people (even if it isn't covered under the compatibility guarantee)

_edit: s/version/path_

@arschles, I'm pretty reluctant to add configuration variables to either go.mod or go.sum. Your notary/proxy configuration is a local decision, not one that should be exposed to all clients of your module. Cloning someone's repo and cd'ing into it probably shouldn't default you into a whole separate proxy/notary/etc. And 'git checkout \

@arschles, I share your concern about the extra step in go notify. That's why it's not in the proposal. But if using the SHA256 hash makes people sufficiently more comfortable with the notary, it might (or might not) be the right tradeoff. That said, I don't see why a CI/CD system would need to run go mod notify, nor would most users.

First, my assumption about CI/CD usage is that you only push to that system once you've at least built the code locally. If you've built the code locally, the go command updated go.mod and go.sum to include any newly-added dependencies. Any entries in go.sum are always accepted as correct. A complete go.sum therefore implies no notary access at all.

Second, we added the -mod=readonly flag exactly for CI/CD systems, so that even if go.mod were not up to date they would not try to update it. I don't know whether -mod=readonly also applies to go.sum today, but it should, and if not we'll fix that (#30667). If a CI/CD system is using -mod=readonly, then, an incomplete go.sum will trigger a local failure, again no notary access at all.

It's true that during local development, some developer somewhere in the ecosystem will have to run go mod notify once for each public module path. If we roll it into go release, the author can do it easily. If not, the first user will. Most users will never need to do this. In fact, if you try to use a public module and find that you need to run go mod notify, that's a very strong signal that literally no one else has ever used that module as a dependency, and you might rethink blazing that trail. :-)

More generally, any answer we reach for the default behavior is going to be some compromise between:

  • defending against server and network compromises (do nothing),
  • bandwidth (download the entire notary database and look for what you need),
  • and privacy (be more selective in what is downloaded).

It's not going to be possible to satisfy everyone, with any decision. Ultimately the goal of this discussion is to try to find a default behavior that is as acceptable as possible for as many people as possible. I really appreciate everyone engaging respectfully and helpfully as we work through this.

I've updated the design doc, expanding the Security and Privacy sections quite a bit to capture the discussions to this point.

FYI, I think this is the recent diff on the proposal document and the corresponding CL, if interested.

I agree putting something like an actual proxy server into a go.mod would not be desirable, and I am not in love with putting a large amount of additional configuration into go.mod, but I wonder if some form of user _intent_ regarding public vs. nonpublic might make sense in go.mod so that it could be checked in to VCS, which could then drive sanity checking of the other related configuration settings or environment variables by things like go get, go release, etc.? I don't know that it would ultimately make sense to do that, but if that was to be pursued, would it be just two cases that really matter regarding user intent (e.g., some type of public vs. nonpublic in go.mod)? Of course, there would be follow-on thinking needed about defaults, dealing with Go 1.11/1.12 go.mod files, how/when it is set or checked, etc., but perhaps it might be reasonable to have effectively one bit encoded somehow in go.mod regarding intent.

Separately, regarding sending the hash(modulepath)@version, it seems like a nice win, and while it might be a modest increase in complexity in one area, it might be a net reduction in overall complexity (e.g., if it ends up driving down complexity in some of the related questions around other settings, or by reducing the complexity or penalty of having a "wrong" default for a subset of users). The biggest downside seems to be the potential for complexity around getting the hash into the notary. It would not be great if the end result was "you need to read the documentation to understand when, how, and why to invoke go notify <module>... but it certainly at least "feels" like people could be successfully guided by the go tool almost 100% of the time via _some_ combination of default behavior, informative messages suggesting likely resolution, automatic validation, flags, etc. as part of the otherwise natural module workflow (e.g., for go get, go release, or perhaps even go init), in addition to the indexing service doing it automatically for the large majority of repos on major public code hosting sites.

@rsc the issues that config variables in the go.mod sound like a good reason to leave them out, I agree.

Since GH issues don't have threading, I've submitted two posts in golang-dev to capture two of the earlier questions I've asked in here. I'll focus this just on the SHA256 / go mod notify idea. I've tried to gather some related points made in previous posts and I'll try to address them here. I apologize in advance if I missed something important. I certainly don't intend to cherry pick points or build strawmen here.

My assumption about CI/CD usage is that you only push to that system once you've at least built the code locally.

+1, this is mine too. As you said, that would mean that go mod notify was run somewhere for all of your dependencies, but it would still be up to a developer to run go mod notify on their own if they're releasing a new module path of their own. That's the part that concerns me because it would be the first time that module authors would not be able to git tag and be "done" with releasing their new module path.

Even though that's definitely not a good practice, it happens a lot when folks are prototyping something new or don't have any tests at all. I'm guilty of the latter, on multiple counts 😁

That's a very strong signal that literally no one else has ever used that module as a dependency, and you might rethink blazing that trail. :-)

I'd say so too 😄. However, this means that either the author has to run go mod notify (see above) or most new packages don't get used. In fact I'd guess that the only new modules that get adopted would be from "trusted" (or "famous") authors

I've updated the design doc, expanding the Security and Privacy sections quite a bit to capture the discussions to this point.

Thank you 😄

I wonder if some form of user intent regarding public vs. nonpublic might make sense in go.mod

@thepudds can you help me understand how the intent would be acted upon by the go tool? I'm trying to get a sense of how the sanity checks differ from skipping verification

Just as a heads up, I am going to be away from work (including GitHub!) starting pretty soon and continuing for the next two weeks, ramping back up gradually the week of April 1. I'll catch up on any discussion here and on golang-dev when I return. Thanks for the excellent conversation so far.

Hi there. I'm late to the party and maybe from left field, but I am part of an OWASP project that is trying to raise the security bar for package management systems across the development ecosystem.

As such, we took quick interest in the work happening here once we became aware of it and wanted to support and encourage further work as well. We recognize that golang is a little different than say, npm and rubygems where we have seen significant security issues emerge. Still, we think that there is more than can be done.

Please let me know if you would rather see separate issues raised, or would like to have a discussion in some other forum (eg. a list), or whatever is most constructive from your collective perspective.

There are a number of good things already in the proposal, obviously. Kudos for that.

As a developer, I still have some concerns:

  1. A library I am using could have a known security issue and I would never find out about it. It would be great to see something like go audit similar to npm audit. As a start, this could just look up known issues for currently used known versions.
  2. A developer contributing to a library I use could have their password guessed and a new version of the library released with a back door. Based on the sum approach described here, I may know that the module changed but I would never know for sure by who.
  3. We would suggest some indication of whether the authentication to the release is strong. In practice it seems like this defers to GitHub authentication. If I understand correctly though, that is optional and may not require a strong password or MFA.
  4. Ideally, we would like to see the packages signed by the authors. We know that is a high bar that most ecosystems have not been able to effectively implement but that would be stronger than a checksum.
  5. I'm not 100% clear on the internals to know if the go sum would have enough info (eg. hash of commit for each release) to be able to show what actually changed but that would be important to have too. In other words, the version and checksum needs to also match to some fixed source code that we can go see (and audit).
  6. It seems like because the Go ecosystem is decentralized, there may not be one place to go to report security issues. I see the disclosure process for the main Go project, and am on the announce list now but it is not clear how "other ecosystem issues" will be handled. That may be intentional at this point but it poses a risk to me as a developer.

There is probably more here but this seems like a good start for the conversation.

As always, I could be missing something - or many things - so hopefully this can come across as constructive input. Your work here, on ecosystem level pieces, is so critical to building a more secure overall software environment and there are a bunch of people from our community who would likely be willing to talk or even help with pieces of this. Thanks!

Hi @mkonda, other people might have more specific comments, but I wanted to at least briefly share some pointers that might be of interest to you.

Regarding your points 1. and 3., see for example #24031, and especially https://github.com/golang/go/issues/24031#issuecomment-407798552. As far as I understand it, that would enable declaring insecure versions of a module after a release has been published, and would do so without relying on a centralized authority. (edit: as you'll notice if you click on that link, that is an open proposal. That said, I've seen the core Go team state it is important to address that use case. Another alternative discussed prior to that was expanding godoc.org to enable reporting on pair-wise incompatibilities and security issues).

Regarding your point 2., two related items that might be of interest to you:

  • "Our Software Dependency Problem" ("Download and run code from strangers on the internet. What could go wrong?")

  • "Go Modules in 2019", which includes "Finally, we mentioned earlier that the module index will make it easier to build sites like godoc.org. Part of our work in 2019 will be a major revamp of godoc.org to make it more useful for developers who need to discover available modules and then decide whether to rely on a given module or not.".

Together, those two references I think describe a desire to provide services to make it easier for people within the ecosystem to evaluate the quality of a dependency. Perhaps evaluating something like whether or not a dependency has MFA set up for GitHub authentication could be a piece of that.

Regarding:

I'm not 100% clear on the internals to know if the go sum would have enough info (eg. hash of commit for each release) to be able to show what actually changed but that would be important to have too.

There is a brief description in the documentation:
https://golang.org/cmd/go/#hdr-Module_downloading_and_verification

The go command maintains, in the main module's root directory alongside go.mod, a file named go.sum containing the expected cryptographic checksums of the content of specific module versions. Each time a dependency is used, its checksum is added to go.sum if missing or else required to match the existing entry in go.sum.

The go command maintains a cache of downloaded packages and computes and records the cryptographic checksum of each package at download time.

I believe it is a SHA256.

Hi @mkonda, 1 and 3 are indeed open issues. They are thankfully orthogonal to what the checksum database solves, so we can tackle them separately. (Making sure the content is authentic, vs making sure it's "secure".)

On 2, I don't really believe we can get widespread adoption of authentication beyond the code host. Even if we made every author sign their releases (which is unrealistic for a number of reasons), they will most likely just sign what's in their repository, effectively delegating that trust.

It is important though that we prevent proxies and attackers from publishing versions unbeknownst to the author, and the checksum database log helps greatly with that: any third-party auditor can offer a service to notify owners of new releases in their repositories, and I hope we will see many kinds of that service.

That, combined with go release #26420 checking the checksum database, ensures that users only see releases that match what was on the developer machine. I don't think we can do any better than that.

I realize I am late to complain, but while I understand the need for veryfing the integrity of go modules, what this proposal amounts to is using a single central server by default, creating a single point of failure.

Furthermore, if GOPROXY and GOSUMDB are set by default to central Google servers, then not only people in countries such as China, but people al over the world behind firewalls, such as restrictive corporate firewalls, will experience difficulties in using Go. All these people will be forced to use a custom go proxy with support for sumdb. It doesn't make for a great user experience that the first thing a Go users should do when they start using the language is that they have to configure a proxy and a sum DB. See #31755 for a related discussion.

In I'd like to ask if there no other way that checking the module checksums could be more decentralized, and give a better user experience for firewalled users?

The design goal is to make sure everyone agrees on the same contents for the same version, so a degree of centralization is necessary. The point of the transparency log is to ensure that the log is _not_ a single point of failure: compromising it is not enough because it will lead to detection by the auditors.

The checksum db is also designed to allow for decentralized proxies. As you mentioned, anyone that can reach any untrusted proxy can successfully use the sumdb. (This also makes the sumdb not a single point of failure for read availability.)

If you want to discuss the defaults, please use #31755. The sumdb design can support any conclusion that that issue comes to.

(As an example, consider a "blockchain", that you might think of as more decentralized. You still have to talk to nodes somehow. If you have a way to talk to a node, you can also speak the proxy protocol with them as-is, and reach the sumdb through that.)

I'm going to send a CL that enables both the Go checksum database and the Go module mirror by default in module mode.

There remain issues to resolve with this proposal, so we cannot turn the checksum database on for all users. However, now that we are not enabling modules for all users, it seems reasonable to enable the checksum database for module users, so that we can more precisely understand the exact problems and develop solutions. People who aren't ready to move to modules yet will not be affected by enabling the checksum database, and modules users having particular trouble with the checksum database can turn it off for the specific modules (go env -w GONOSUMDB=mysite.com/*) or entirely (go env -w GOSUMDB=off), but I encourage them to file specific issues as well, so that we can address them.

If there are any show-stopper issues that we can't address before Go 1.13 is released, we will back out the change. But we need to understand better what the issues are, especially the as-yet-unknown ones.

Thanks.

Change https://golang.org/cl/178179 mentions this issue: cmd/go: default to GOPROXY=https://proxy.golang.org and GOSUMDB=sum.golang.org

Editorial comments about the document...

The use of a transparent log for module hashes aligns with a broader trend of using transparent logs
to enable detection of misbehavior by partially trusted systems, what the Trillian team calls
“General Transparency.”

This is the first mention of Trillian in the doc (despite the linked papers). Maybe the term

needs elaboration or a link (https://github.com/google/trillian)

There are two main privacy concerns: exposing the text of private modules paths to the database, and
exposing usage information for public modules to the databas.

DATABAS => DATABASE

The complete solution for not exposing either private module path text or public module usage
information is to us a proxy or a bulk download.

US => USE

Privacy in CI/CD Systems

Acronym is never defined in document (https://en.wikipedia.org/wiki/CI/CD)

I'm (maybe stupidly) unable to grasp one of the core design issues in this design. The services claimed as valuable and necessary are about doing a lookup to validate a pending action ("about to do this and want to know data to use in verification") or about the past-tense version of the same.

It seems important to me that In both cases the client (go get, et al) start by knowing something that may be a secret (the import path and version number) which is the key to obtain the not-a-secret value of the checksum. Because of privacy, non-public code, likely misconfiguration, etc., it seems that the sometimes secret info is the last thing you'd want to use as the public key.

I read the part about possibly using "Private Module SHA256s" but that seemed to miss the point by focusing on the reverse mapping table and first-time publication issues. Here is what I don't understand--why does the database ever need the "clear text" import path and version number? Why must it ever be sent anywhere? It is only useful in that form (I think) to go get on the client side.

Instead, have go get et al create a hashed/encrypted by path/version token, locally, that has strong one-way attributes and use that as the query key, store that in databases, etc. In such a world the module checksum responses are just as easy to supply, are totally "transparent" on the data to be secured side (the checksum) and totally opaque on the other (machine names, import paths, versions, timestamps, traffic analysis, etc.) It sidesteps completely the issue of private data...which will be harmlessly useless to all.

What have I overlooked here?

Following up...because I forgot that this is not "lunch in Charlie's" and it is my duty to explain all the implications:

Yes, my proposal is about an opaque 256-bit one-way cryptographic hash of the query string (the go get info: path+version, perhaps hashed atop the hash of the module source in the Merkle-Damgard sense.)...

...and a transparent-but-otherwise meaningless 256-bit one-way cryptographic hash of the module source -- the existing answer to database queries.

This leads to an exactly 512-bit per record database that would be remarkably simple to maintain and serve, with any complexity being the existing dance around security-through-federation. (Which is nice!)

The result is a public database with properties beyond "encryption at rest" -- not one byte of this database tells you anything that can be used from the database end to know about module paths, machine/url paths, developer identity, etc. It is, in this way, a giant mystery and thus provably safe in any privacy sense no matter how it is configured or maintained, nothing can leak because nothing is there to leak.

Yet, when used the other way, it is fully supportive. go get or other tools want to know about an importable module: they internalize the request path+version, and query with that key. They get back the hash value for the module just as now.

What is lost, you might presume, is the joy of voyeuristically looking at database keys to build a network map of the Land of Go, and G-tools leveraging that map to monitor, report, and assist in various open-ended activities. Nothing in the rationale seems to argue for this map and its conceptual buildability from query strings shows the leaky nature of the present design. If it is not needed, then maybe it is not wanted.

However (this is the new part that I thought would be clear without mention but now I'm thinking that thinking is not in the spirit of distributed discussions) it happens to be true that a better map of the Land of Go is buildable in my opaque-key design. What is needed is a tool or company that knows how to crawl the public web looking for openly shared .go,.mod,... files and then download the import path and version strings from those. These can easily be interned to opaque keys and requests made to the database. When a key is present, then so is the version's hash. all of this--the three tuple of provably-shared-source path&version, the resulting key, and the stored hash--are then united for building the Land of Go tooling.

This way, the map if desired, is never built from private code because that code is not shared on the web. So provenance is provable. Security is implicit. Misconfiguration can't hurt. That's what I meant.

The reason module names need to be available in plaintext in the database is for auditing purposes. A transparent log is only useful if it is scrutinized and held accountable, and to perform a number of checks the auditors need to know what the module names are.

An example: we will want a notification service that can email me for any new module like github.com/FiloSottile/..., so I get to know if fake modules are being published.

Also, the search space is not that wide so hashes can be reversed in most cases, which is why the private lookup proposal uses hash prefixes, to lean on the equivalent of k-anonymity.

(Thanks for the edits, I'll make a PR next week, but feel free to go ahead and make one in the meantime if you'd like.)

On May 21, I wrote:

I'm going to send a CL that enables both the Go checksum database and the Go module mirror by default in module mode.

There remain issues to resolve with this proposal, so we cannot turn the checksum database on for all users. However, now that we are not enabling modules for all users, it seems reasonable to enable the checksum database for module users, so that we can more precisely understand the exact problems and develop solutions. People who aren't ready to move to modules yet will not be affected by enabling the checksum database, and modules users having particular trouble with the checksum database can turn it off for the specific modules (go env -w GONOSUMDB=mysite.com/*) or entirely (go env -w GOSUMDB=off), but I encourage them to file specific issues as well, so that we can address them.

If there are any show-stopper issues that we can't address before Go 1.13 is released, we will back out the change. But we need to understand better what the issues are, especially the as-yet-unknown ones.

Go 1.13 beta has been out for a while with the checksum database enabled, and overall it seems to be working well. No show-stopper issues have been identified that I am aware of. We have not resolved the comments about wanting to change the content of a module without triggering an error, but the design of the system is meant to catch exactly that. And people who want not to be stopped can always turn off the checksum database.

We made it easier to turn off both the proxy and the checksum database together, selectively, with the new GOPRIVATE environment variable (see 'go help environment').

Overall it seems like the consensus here is that we can move forward with this and accept this proposal, since no show-stopper issues have been identified. Am I missing anything?

Will leave this open for a week to collect final comments.

(It is always fine to file a new issue for other problems found with the checksum database.)

One discussion on this issue was around the rather vague link to Google's standard privacy policy. We have posted a more detailed page about privacy and the proxy, sum, and index servers at https://proxy.golang.org/privacy. (Please file any feedback in separate issues.)

Marked this last week as likely accept w/ call for last comments (https://github.com/golang/go/issues/25530#issuecomment-520998619).
No comments, so accepting.

Already implemented, so closing.

Where is the tree size of lookup endpoint from ?

The tree size from lookup is:

  • different from Latest
  • increase some time

Example:

Lookup: 338124

Latest: 338145


Lookup: https://sum.golang.org/lookup/github.com/gin-gonic/[email protected]

github.com/gin-gonic/gin v1.4.0 h1:3tMoCCfM7ppqsR0ptz/wi1impNpT7/9wQtMZ8lr1mCQ=
github.com/gin-gonic/gin v1.4.0/go.mod h1:OW2EZn3DO8Ln9oIKOvM++LBO+5UPHJJDH72/q/3rZdM=

go.sum database tree
338124
Gsz639f3wDBB3gnyYzg58D9C91Cb9FWyvNrpltzl2uE=

— sum.golang.org Az3grjft44BfvQ3qiWzZRPWjK4wXbWLkf/BzMVlM3BgnbmnADL7AHSEm+v43AtYpFwS0glukjcqbIVXfq4hDvq0xNgg=

Latest: https://sum.golang.org/latest

go.sum database tree
338145
Megb2heVg8xuaXGRNBmkCPjA8EhHVXH7HkNuWpOImOY=

— sum.golang.org Az3grpMDXtFEl9vfNmeqqNKY6HORkeagC2TwQ6WA6WV5a0ykPurrlO1FvtxHEqLmvntnZawTbAw9OWSeOU8Il4NoGwg=

It only has to be higher than the record number returned in that lookup, so that the client is guaranteed to have at least one STH that includes the record (while the latests endpoint might be cached). It is always reconciled with the previous latest STH on the client side.

@FiloSottile
Yes, STH includes the record works fine.

But the tree size of lookup endpoint seems to be unpredictable, as it's always less than that in the latest endpoint from remote, and will change some time. (I reach the endpoint in browser to check)

I just want to find out the rule of the tree size in lookup endpoint

Where is source code or doc can tell the rule ?

There is no rule, except that it must be higher than the record number.

IIRC it depends on the interaction between the various caches and the internal database lookup. The lookup responses are cached more aggressively than the latest one, but not forever. That should explain the behavior you see. This code is all pretty Google-specific, so it's not open source. The important property of the transparent log is that as long a the (open source) client checks the proofs correctly, you don't have to trust the server to operate honestly.

Was this page helpful?
0 / 5 - 0 ratings