Nixpkgs: Recommend buildGoPackage instead of buildGoModule in the nixpkgs manual

Created on 9 Apr 2020  路  38Comments  路  Source: NixOS/nixpkgs

In the nixpkgs manual, the default function to package a Go application is buildGoModule (since the release 19.03). This function creates one derivation to fetch all dependencies. This is really convenient to use since we only have to provide one hash: the hash of a directory containing all of these dependencies.

However, it has several drawbacks:

  • long term reproducibility is harder to achieve because we don't know exactly what sources (revision/checksum) are used by this application (sources mirroring is harder)
  • source dependencies are not shared across packages
  • users can not know the source dependencies of a package

There is also a Nix issue to restrict fixed-output derivations: it would no longer be possible to create these kind derivations.

We have another function which is buildGoPackage. To use it, we first need to run a tool to generate a derivation per dependency. This is less convenient to use, but cleaner since we are exposing all dependencies, and not only a blob of sources. Unfortunately, this function is tagged as "legacy" in the manual:(

So, I think it would be better to:

  1. recommend buildGoPackage to package Go applications,
  2. and if buildGoPackage doesn't work, then fallback to buildGoModule.

I think the buildGoModule is a really nice function to quickly package an application. It has to be in the nixpkgs manual, but we should avoid it to package applications in nixpkgs.

I don't know how could I proceed on this. Should I "just" create a PR to upload the manual? WDYT?

Note this could also be applied to buildRustPackage versus Carnix/Crate2Nix.

enhancement golang documentation

Most helpful comment

The nixpkgs documentation implies that buildGoPackage is legacy and doesn't support Go modules, which has made me use buildGoModule in the past, but that doesn't seem quite true: vgo2nix lets us generate deps.nix files from go.mod.

My personal opinion is that I'd happily take an inflated nixpkgs source tree over the flakiness and questionable purity of fixed-output derivations. Ideally we'd have true recursive Nix and could properly integrate with third-party package and build systems...

All 38 comments

Really? I always thought as a reviewer it was the other way around! Since buildGoPackage is documented as a function to build "legacy" Go programs, not supporting Go modules. I thought Nixpkgs considers Go modules and buildGoModule as the future...

For the completeness's sake, there are two main advantages of using buildGoModule:

  1. there is much less generated code stored inside of nixpkgs. The larger nixpkgs is, the longer it takes to load a channel.
  2. less evaluation is happening while accessing the derivation.

These seem to go against the goal of having all the sources declared though so it has to be a trade-off unless another road is discovered. Point (2) can be mitigated by using the recursive nix feature once it becomes widely available.

Thank you for bringing this up!

I have had PRs rejected before because of buildRustPackage (which is the rust equivalent of buildGoModule) but rust has the added downside there seems to be no good alternative (there isn't a vetted 2nix tool yet). I would really like an official stance on this as it's hindering packaging of certain applications that I'd love to package but they've been simply blocked by the fact " This uses Fixed output derivations, you can't merge it" or on the other hand "your x2nix tool generates too many nix files, which is slow, you can't merge it"

https://github.com/NixOS/nixpkgs/pull/68702#discussion_r333258482

I've also had PRs rejected for doing the opposite (In this case it was a Node project):

https://github.com/NixOS/nixpkgs/pull/49082#issuecomment-433599774

So far this split with no clear decision about fixed-output derivations and 2nix tools is rather frustrating as a contributor

The nixpkgs documentation implies that buildGoPackage is legacy and doesn't support Go modules, which has made me use buildGoModule in the past, but that doesn't seem quite true: vgo2nix lets us generate deps.nix files from go.mod.

My personal opinion is that I'd happily take an inflated nixpkgs source tree over the flakiness and questionable purity of fixed-output derivations. Ideally we'd have true recursive Nix and could properly integrate with third-party package and build systems...

Thanks for bringing this up @arianvp. The positions are clearly untenable if both approaches are being rejected and no alternative is being proposed. Are there any other possible roads that we can take?

buildGoModule also has a hard time with private repos.

We know from experience with the python packages that having one version of each dependency doesn't scale. It also means that we are not using the versions of dependencies that were tested by upstream.

That being said, it might be possible to generate a big index of all the dependencies and gain some compression by doing that.

long term reproducibility is harder to achieve because we don't know exactly what sources (revision/checksum) are used by this application (sources mirroring is harder)

The source dependencies are built into a giant bundle of info inside the module download derivation - you have a fully copy of the source in module fetcher derivation. If that's not enough, we could modify it to generate a vendor directory which would also include the full raw source.

source dependencies are not shared across packages

Are they with goBuildPackage?

users can not know the source dependencies of a package

See answer to first question - but I don't think this is true.

CC @kalbasit

and if buildGoPackage doesn't work, then fallback to buildGoModule.

Every single package which builds with buildGoModule should also build with buildGoPackage if you download the same sources AFAIK, this sounds like we're proposing killing buildGoModule.

PR rejections

This sounds pretty painful + ridiculous, I've definitely had conflicting reviewer comments, but nothing so bad that PR's got rejected :(

We have another function which is buildGoPackage. To use it, we first need to run a tool to generate a derivation per dependency. This is less convenient to use, but cleaner since we are exposing all dependencies, and not only a blob of sources.

The ergonomics of a go.deps file are really painful for a go perspective. It makes it so that using go is significantly more taxing, since you have to double specify your dependencies. I guess I could use triply recursive nix in my personal setup, but It would probably change my advice of "oh you're trying to deploy some go code to your server, nix makes this 10 lines of code" to "well you can do it, but honestly it's painful and you'll need to do this multi layered thing to generate some nix files to feed into other nix files and then it'll build"

My personal opinion is that I'd happily take an inflated nixpkgs source tree over the flakiness and questionable purity of fixed-output derivations.

Amusingly enough, go module downloads are actually perfectly reproducible. There is a public certificate transparency log that will alert if you get two downloads with different contents, every go project using go modules, also stores it's local hashes in the repo, so you'll detect conflicts there as well. There was a nix bug with some flakiness initially, but this has all been fixed - if you're still seeing it please let me know. The bug is definitely in nix land, not go :)

Hilariously enough, as I've been auto converting some code from buildGoPackages to buildGoModules, I'm seeing that a bunch of the source hashes in buildGoPackages are incorrect, and don't match what upstream is serving. So I think go modules are doing a better job than nix in terms of making sure you build with what you expect.

Thanks for bringing this up @arianvp. The positions are clearly untenable if both approaches are being rejected and no alternative is being proposed. Are there any other possible roads that we can take?

One middle of the road approach is to modify buildGoModule so you can specify your hashes per dependency, rather than for the who go file. This would satisfy concerns around deduplication, while if we allowed the overall hash as an alternative keep the nice ergonomics. You would probably have to start downloading all go source code from the centralized mirror, and do a mapping to the http download blob, but it should be possible to do without requiring reentrant nix. You would probably have to also specify the version.

To add a datapoint: a modification to vgo2nix to obtain the full git hash and store the sha256 is enough to allow bulitings.fetchGit to be used in buildGoPackage (almost always, there are edge cases). My specific use case was to allow private repos for a project that required the versions in go.mod, and accepting the non-module dependency resolution was not an option.

https://github.com/adisbladis/vgo2nix/pull/34

To add a datapoint: a modification to vgo2nix to obtain the full git hash and store the sha256 is enough to allow bulitings.fetchGit to be used in buildGoPackage (almost always, there are edge cases). My specific use case was to allow private repos for a project that required the versions in go.mod, and accepting the non-module dependency resolution was not an option.

So private repos like this should be fully suported in go modules. If you have a package called bar, stored at a repo at foo.git/thing/foobar, then you can put a replace directive in your go.mod file and everything should work. If buildGoModules doesn't work (and I don't see why it wouldn't since it just reuses the normal go code), let me know and I'd be happy to debug.

More details can be seen at
https://github.com/golang/go/wiki/Modules#when-should-i-use-the-replace-directive

You may also want to set
GOPRIVATE= since go will log your code hash + names to the transparency (sum) database otherwise.

https://github.com/golang/go/wiki/Modules#go-113

The source dependencies are built into a giant bundle of info inside the module download derivation - you have a fully copy of the source in module fetcher derivation. If that's not enough, we could modify it to generate a vendor directory which would also include the full raw source.

The difference is that in one case you know the sources at evaluation time, while in the case of the giant bundle, you need to build this giant bundle derivation, and then writing something to extract urls: you can not write a Nix expression to generate this list of sources.
For instance, this information allows Nix to fallback on mirrors if the original source is no longer available (see the nix.conf option "hashed-mirrors" in the Nix manual). My long term plan is to use Software Heritage as a generic fallback mirror.

source dependencies are not shared across packages
Are they with goBuildPackage?

Since they are in separated derivations, yes they are (when they share the same sha256).

Every single package which builds with buildGoModule should also build with buildGoPackage if you download the same sources AFAIK, this sounds like we're proposing killing buildGoModule.

I often hit issues with tools generating the deps.nix file in the past. In this kind of situations, I think we should be able to use buildGoModule in nixpkgs: my feeling is that in practice, we would still sometimes need buildGoModule.

"oh you're trying to deploy some go code to your server, nix makes this 10 lines of code" to "well you can do it, but honestly it's painful and you'll need to do this multi layered thing to generate some nix files to feed into other nix files and then it'll build"

I'm definitely in favor of keeping the buildGoModule function in nixpkgs because it's more convenient than buildGoPackage. So, if someone wants to package a Go app for its own needs, this person could use the buildGoModule function. However, for a Go app to nixpkgs, I would prefer to recommend buildGoPackage.

Amusingly enough, go module downloads are actually perfectly reproducible.

If the upstream source is no longer available (when a repository is removed from GitHub for instance), I don't think you can still download it (I don't know if the Go community maintain a mirror of GitHub). This could explain why a lot of Go projects "vendorize" their deps in their own repository:/

The difference is that in one case you know the sources at evaluation time, while in the case of the giant bundle, you need to build this giant bundle derivation, and then writing something to extract urls: you can not write a Nix expression to generate this list of sources.

I think you have the ordering here inverted - you list the required modules for a go module by just looking at the go.mod file. The bundle derivation just fetches them :)

I still don't fully understand why you want a list of sources here, but you can pop this out of a derivation pretty easily by importing the go source, then running one go command to list all the items to download.

The other thing that's worth pointing out is that go sources are in a lot of formats - the go specific format, git, svn, and maybe tar(?). I expect more formats will keep getting added here. I don't think there is a lot of value in having to reimplement the go download logic in nix + keep that up to date as stuff changes. It's going to cause some significant churn and be super frustrating when trying to change stuff. I guess you could maybe only implement the go generic download format, and then only download through the proxy API, but then you're losing the ability to download from the source, and the go tool already downloads from the proxies so it's not significantly better.

The difference is that in one case you know the sources at evaluation time, while in the case of the giant bundle, you need to build this giant bundle derivation, and then writing something to extract urls: you can not write a Nix expression to generate this list of sources.
For instance, this information allows Nix to fallback on mirrors if the original source is no longer available (see the nix.conf option "hashed-mirrors" in the Nix manual). My long term plan is to use Software Heritage as a generic fallback mirror.

buildGoModule already does this by using modern go tooling - it first hits the mirror at proxy.golang.org, then flips over to directly trying the source (github etc...). I don't think I've ever seen this fail on the reliability front, but if we're worried, we could add the other 6 mirrors that are listed at
https://github.com/golang/go/wiki/Modules#are-there-always-on-module-repositories-and-enterprise-proxies
But honestly it's pretty unlikely the source and proxy.golang.org are down at the same time. If we have a lot of users in china, adding the china mirror to get around censorship might be useful, but I expect that the nix caches aren't that available in china anyway. If doing these builds were in a realtime serving path for me I'd probably add a 3rd version, but hydra is pretty tolerant here + not critical.

I often hit issues with tools generating the deps.nix file in the past. In this kind of situations, I think we should be able to use buildGoModule in nixpkgs: my feeling is that in practice, we would still sometimes need buildGoModule.

Why? If the tools break we should just hand write it while we wait for the tools to fix. I think switching to a nix format that you're not willing to write by hand probably indicates that we shouldn't be using it. I remember when I was using buildGoPackage for my personal stuff it literally took me a few weeks to figure out how to inject local folders into it, and I had to spend a silly amount of time reading it's implementation.

If the upstream source is no longer available (when a repository is removed from GitHub for instance), I don't think you can still download it (I don't know if the Go community maintain a mirror of GitHub).

https://proxy.golang.org/ should have an indefinite copy of literally anything that's been downloaded by a vanilla version of go. It's not just github specific :) There are another 6 proxies run by other parties if you're worried about that specific proxy.

This could explain why a lot of Go projects "vendorize" their deps in their own repository:/

I don't think I've ever seen a go dependency vanishing. Normally vendoring was done since it overrides your go path and makes multi person collaboration a lot easier if people have different versions of dependencies (this was a hugely frustrating issue in the early days, which was why lots of people ended up just using a Makefile and putting their whole go workplace in a repo). This has been superseeded by modules + vendor folders appear to be getting deleted. I wouldn't be surprised if vendor support is gone by go 0.16 (but I'm just guessing).

I would suggest to add an optional goDeps to buildGoModule so that we can switch back and forth between a generated deps.nix and a modSha256 attribute based on the package. I am currently looking into vgo2nix to make it fit for that use case.

Related: #86349

It also means that we are not using the versions of dependencies that were tested by upstream.

This is trading potential bugs for potential security issues, which is a massive problem with these lock formats. If we do not handle those issues, then what advantage do we as integrator still offer? By blindly using lock files we're no better than those using Docker images with outdated/insecure versions of software. Granted, we can't be expected to fix all of this, but we should not ignore that.

That being said, it might be possible to generate a big index of all the dependencies and gain some compression by doing that.

Looking at the contents of a go2nix deps.nix file, it seems reasonable to me to have a single file with a mapping which all Nixpkgs buildGoPackage / buildGoModule packages use. The mapping keys contain goPackagepath and rev so multiple versions can be allowed. Is this as future optimization possible? Note I am not familiar with Go at all.

Looking at the contents of a go2nix deps.nix file, it seems reasonable to me to have a single file with a mapping which all Nixpkgs buildGoPackage / buildGoModule packages use. The mapping keys contain goPackagepath and rev so multiple versions can be allowed. Is this as future optimization possible? Note I am not familiar with Go at all.

Sounds like an excellent idea to me. Guix has this:

https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/golang.scm?id=073c64dc792c79c2a988edae9f4e568d3b09f3b5

Also for crates.io:

https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/crates-io.scm?id=f87fa003617fe990bef4005800a9f40726494b25

There's also https://sum.golang.org/ which can be used to create something similar to mach-nix.

That's not what I mean. That's just putting all expressions for all the packages in the same file, like we used to do in e.g. python-packages.nix. Anyway, it's just a detail.

That's just putting all expressions for all the packages in the same file.

What I tried to point out is that they put not only end user programs there but the modules themselves (e.g).

@FRidh I like where this is going. The more data we have in a software-consumable format, the more tooling we can develop around it. After all, software is a force multiplier and we should take advantage of this as much as possible.

To go down your idea, we would start storing a subset of each package registry. The subset would be all the (name, version) tuples dependencies that applications depend on. Eg:
pkgs/registries/crates.io/<name>-<version>.json.

(the reason to create separate files is so that loading a single rust application doesn't require to parse all the crates.io dependencies. And it also minimized git merge issues)

Then on the leaf node, on the application level, the lock file would be translated to a list of references to those .json (or .toml or whatever) files.

We would need a tool that automates the creation of those files, and garbage-collects when references to them are being lost.

Once we have that we can start writing tools to push version bounds around. We could, for example, assuming semver, bump tiny versions to compress the dependencies. When a package is marked as insecure it will become easier to bump all the programs that depend on it. ...

Regarding software heritage and hashed mirrors, the only thing that's truly necessary is to avoid outputHashMode = recursive, which is not compatible at all with any type of mirroring, and relies on the build tools to produce the src directory (instead of a simple fetch).

For the Rust platform, for instance, you merely need this single tarball to build ripgrep:

$ nix-build -A ripgrep.cargoDeps
/nix/store/q7wr64657jx4dp6b2zq05vsxg0vvlp85-ripgrep-12.0.1-vendor.tar.gz

$ ls -lh /nix/store/q7wr64657jx4dp6b2zq05vsxg0vvlp85-ripgrep-12.0.1-vendor.tar.gz
-r--r--r-- 2 root root 16M Dec 31  1969 /nix/store/q7wr64657jx4dp6b2zq05vsxg0vvlp85-ripgrep-12.0.1-vendor.tar.gz

Producing it requires a full Cargo setup and your Cargo.lock file, but once you've done it you can stick it on a hashed mirror or the software heritage and future builders/installers never need to hit the internet or execute cargo at all. This is ideal, since cargo itself does some direct internet access to find the repo metadata.

I'd like to do the same with Bazel, since mirroring src packages for it is a nightmare otherwise. Go is much less of a problem because it's simpler and the goproxy hooks already "just work", but it might not be unreasonable to do there.


A related consideration is that upstream Go/Rust ecosystems are heavily leaning on static linkage with lockfiles for library dependencies. If we decide we want a consistent distribution approach for library versions, we're swimming against the current here.

In the case of Rust there's potentially something to be gained here, since the compilation is slow and CPU/RAM intensive; but in the case of Go the builds are so fast already that the simple solution of duplicating compilation of libraries seems like a non-issue.

For rust there might a new solution https://github.com/kolloch/crate2nix/issues/102 eventually.

@bhipple I don't think this issue is the place to talk about this, so I'm trying to be short. Actually, a lot of fixed output derivations with hashMode = recursive (fetchFromGithub for instance) are really suitable for "mirroring", because they only consider the content of the archive and not the container itself. Software Heritage people don't want to store archives. They prefer to store source code and build archives on-demand. However, tar archives are not designed to be reproducible. This means a release archive coming from GitHub doesn't have the same hash than the same release coming from Software Heritage (concretely the file owners are not identical for instance).
I would be happy to elaborate, but I don't want to "pollute" this issue with unrelated topics...

Looking at the contents of a go2nix deps.nix file, it seems reasonable to me to have a single file with a mapping which all Nixpkgs buildGoPackage / buildGoModule packages use.

For go2nix that might make sense, but with the advent of Go modules (previously known as vgo) that's no longer the case.

For go2nix that might make sense, but with the advent of Go modules (previously known as vgo) that's no longer the case.

Why not? What has changed information wise? I had a look at your vgo2nix and the go.sum there still contains the information that is needed. repo or go path, ref and rev. All that's needed for builtins.fetchGit.

This is trading potential bugs for potential security issues, which is a massive problem with these lock formats.

We've talked about this before in https://github.com/NixOS/nixpkgs/pull/76646#issuecomment-570001476 and I'm still of the opinion that they are good because we can package things _correctly_.
That doesn't mean we need to use the upstream lock-files.

We could add a generic "re-locking" passthru attribute that https://github.com/ryantm/nixpkgs-update (or some other update mechanism) is aware of so it knows how to re-lock a dependency graph.
Correctness is an important factor to consider.

Another datatpoint. I actually think buildRustPackage is buggy in its current form. cargo-vendor doesn't seem to be byte-reproducible. Every time I build our work project from scratch on a new machine there is a 50/50 chance I get this error and need to update the cargoSha256:

hash mismatch in fixed-output derivation '/nix/store/4g6fjwn2pdfdrlqq7f93gqgqngj8dnxg-cryptobox-c-2019-10-22-vendor.tar.gz':
  wanted: sha256:0m85c49hvvxxv7jdipfcaydy4n8iw4h6myzv63v7qc0fxnp1vfm8
  got:    sha256:1v9930zaznnig9kmpy9dinc6hmgwljbp4lpa6wi0h4qd210q35fz

This seems really bad to me; as fixed-output derivations that are not byte-reproducible will lead to very bad bugs.

How is this never an issue on nixpkgs? because we never purge the cache?

Maybe we should try building all the buildRustPackage packages in nixpkgs from scratch and see if they have a similar bug?

How is this never an issue on nixpkgs? because we never purge the cache?

Because once it's in the binary cache, the other builder machines will pick it up from there. Another factor is that the design of Hydra uses a single machine to evaluate and send the builds out, so even that machine has a consistent disk cache.

We are on a good way to replace buildRustPackage with crate2nix in nixpkgs for most packages: https://github.com/kolloch/crate2nix/issues/102 which no longer have this issue.

Another datatpoint. I actually think buildRustPackage is buggy in its current form. cargo-vendor doesn't seem to be byte-reproducible.

Can you provide some examples? I have rebuilt buildRustPackage cargo vendor directories hundreds of times on machines with no binary cache and have yet to see a problem. A couple rare cases have issues with Apple's filesystem representing unicode characters in filenames differently than Linux, resulting in different hashes between the platforms, but that's a consistent failure that needs to be fixed in the expression, not a flake.

You can check by running this, replacing ripgrep with the package you want to check:

nix-build -A ripgrep.cargoDeps --check

It also means that we are not using the versions of dependencies that were tested by upstream.

This is trading potential bugs for potential security issues, which is a massive problem with these lock formats.

Checking in thousands of lines of generated nix expression fetcher code that can't be easily read or reviewed by a human presents security problems of its own, though.

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/parsing-go-sum-and-cargo-lock-files-to-spare-the-need-for-fixed-output-derivations/7367/1

@nlewo also made some progress on getting vgo2nix to work nicely with buildGoModule: https://github.com/nlewo/nixpkgs/commit/09ac010e4fe4fb5cfa89a452912cc759dba20427#diff-5cc2ec5eab07fa4ee4e3ba57846f5134R53 This means we no longer have evil fixed-input derivations.

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/fixed-output-derivations-to-become-part-of-flake-inputs/8263/1

Can't we just write an adapter (parser + transponder) to nix so that it can _instantiate_ and _evaluate_ go.mod properly.

... coming from the serious doubt, that there would be any bits of information missing in go.mod.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rzetterberg picture rzetterberg  路  3Comments

vaibhavsagar picture vaibhavsagar  路  3Comments

copumpkin picture copumpkin  路  3Comments

sid-kap picture sid-kap  路  3Comments

yawnt picture yawnt  路  3Comments