Go: cmd/go: unclear how to cache transitive dependencies in a Docker image

Created on 17 Sep 2018  ·  56Comments  ·  Source: golang/go

What version of Go are you using (go version)?

go version go1.11 linux/amd64

Does this issue reproduce with the latest release?

yes

What did you do?

I'm attempting to populate a Docker cache layer with compiled dependencies based on the contents of go.mod. The general recommendation with Docker is to use go mod download however this only provides caching of sources.

go build all can be used to compile these sources but instead of relying on go.mod contents, it requires my application source to be present to determine which deps to build. This causes a cache invalidation on every code change and renders the step useless.

Here's a Dockerfile demonstrating my issue:

FROM golang:1.11-alpine
RUN apk add git

ENV CGO_ENABLED=0 GOOS=linux

WORKDIR /app

COPY go.mod go.sum ./

RUN go mod download

# this fails
RUN go build all
# => go: warning: "all" matched no packages

COPY . .

# this now works but isn't needed
RUN go build all

# compile app along with any unbuilt deps
RUN go build

From package lists and patterns:

When using modules, "all" expands to all packages in the main module and their dependencies, including dependencies needed by tests of any of those.

where the main module is defined by the contents of go.mod (if I'm understanding this correctly).

Since "the main module's go.mod file defines the precise set of packages available for use by the go command", I would expect go build all to rely on go.mod and build any packages listed within.

Other actions which support "all" have this issue but some have flags which resolve it (go list -m all).

NeedsInvestigation modules

Most helpful comment

@bcmills I'm kind of at a loss on how to explain the issue in a different way. The go list approach is incompatible with docker's caching mechanism. It requires the presence of my application source. Any subsequent change to that source invalidates docker's cache which also throws away anything in GOCACHE.

Similarly, @hinshun's approach of copying GOCACHE from a previous build step has no effect because go mod download doesn't populate GOCACHE. There is nothing to be copied.

You mention an --install flag would overlap with go get, but go get requires application source whereas go mod download does not and works on .mod files. If there is a way to have either go get operate on .mod files in isolation, or have go mod download populate GOCACHE after downloading, there'd be no issue. Since this doesn't work, we need a new option or command or something to accomplish this.

Personally, go mod download --install or even go mod install seem like good fits.

All 56 comments

I don't think all makes sense in a non GOPATH world; previously all
expanded to GOPATH/src/..., taking into account that GOPATH may be a list.

I think you should use ./...

On 18 September 2018 at 07:40, Greg Wedow notifications@github.com wrote:

What version of Go are you using (go version)?

go version go1.11 linux/amd64

Does this issue reproduce with the latest release?

yes

What did you do?

I'm attempting to populate a Docker cache layer with compiled dependencies
based on the contents of go.mod. The general recommendation with Docker
is to use go mod download however this only provides caching of sources.

go build all can be used to compile these sources but instead of relying
on go.mod contents, it requires my application source to be present to
determine which deps to build. This causes a cache invalidation on every
code change and renders the step useless.

Here's a Dockerfile demonstrating my issue:

FROM golang:1.11-alpineRUN apk add git
ENV CGO_ENABLED=0 GOOS=linux
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download

this failsRUN go build all# => go: warning: "all" matched no packages

COPY . .

this now works but isn't neededRUN go build all

compile app along with any unbuilt depsRUN go build

From package lists and patterns
https://golang.org/cmd/go/#hdr-Package_lists_and_patterns:

When using modules, "all" expands to all packages in the main module and
their dependencies, including dependencies needed by tests of any of those.

where the main module
https://golang.org/cmd/go/#hdr-The_main_module_and_the_build_list is
defined by the contents of go.mod (if I'm understanding this correctly).

Since "the main module's go.mod file defines the precise set of packages
available for use by the go command", I would expect go build all to rely
on go.mod and build any packages listed within.

Other actions which support "all" have this issue but some have flags
which resolve it (go list -m all).


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/golang/go/issues/27719, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAAcAyLHppxayNq5m1rql-f_LWe-13YDks5ucBbmgaJpZM4Wsxed
.

Thanks Dave, go build ./... is a bit of an improvement since it doesn't include the test dependencies that all does. However it still requires my application source to be present and gives go: warning: "./..." matched no packages if run with only go.mod and go.sum present.

Can you explain more about your use case? Perhaps with an example repo to demonstrate what you are doing. Thanks.

On 18 Sep 2018, at 13:27, Greg Wedow notifications@github.com wrote:

Thanks Dave, go build ./... is a bit of an improvement since it doesn't include the test dependencies that all does. However it still requires my application source to be present and gives go: warning: "./..." matched no packages if run with only go.mod and go.sum present.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

For sure. I've found in most previous projects that dependency build times are fast enough to not be an issue so in the end the existing behaviour is probably fine.

Part of my current project is the creation of a custom Terraform Provider for managing some of our internal systems. Building the Terraform packages only happens once locally so not a big deal, but they need to be rebuilt every time a new docker image is built. When these packages are already compiled, go build completes in under a second. When they need to be rebuilt from scratch, go build can take up to two minutes locally or longer on our CI servers.

Some time can be saved by using go mod download to cache the Terraform package sources but afaict there is no command to compile them after download without having our package main present for go build to determine what the dependencies actually are.

Based on the existing module documentation, I would expect the go.mod file to have an accurate list of required dependencies and for the toolchain to be able to rely on it in isolation.

We do similar things with projects in other languages for building Docker images. The flow is generally:

  1. Copy package manifest (Gemfile, package.json, etc.) into container
  2. Download dependency code and compile associated libraries (bundle, npm install, etc.)
  3. Copy the rest of our project source into container

This lets us avoid having to rebuild dependencies on every commit. It would be nice if this could be replicated with the Go module system. go mod download gets us halfway but doesn't allow caching of compilation artifacts.

Here's an example repo: https://github.com/wedow/docker-go-build

To see the issue we're having, clone it and run docker build ., add a comment or something to main.go and run docker build . again. Ideally all deps would be be built and cached prior to the COPY . . step and the final go build would be a sub-second operation.

I think what you're after here is:

go list -export $(go list -m)/...
The -export flag causes list to set the Export field to the name of a
file containing up-to-date export information for the given package.

This will populate the build cache (go env GOCACHE) with the results of compiling for the -export flag. The module cache ($GOPATH/pkg/mod) as you say contains the module-related caches.

If you want to install main packages too then:

go install $(go list -f '{{ $ip := .ImportPath}}{{if eq .Name "main"}}{{$ip}}{{end}}' $(go list -m)/...)

go build all can be used to compile these sources but instead of relying on go.mod contents, it requires my application source to be present to determine which deps to build.

Yes, that is working as designed: in module mode, all refers to the transitive imports of the packages in the main module, not the packages in its module dependencies. That's not going to change.

This causes a cache invalidation on every code change and renders the step useless.

If the code changes are only in your .go source files, then only the cache entries for the packages containing those source files should be invalidated: the cache contents for the other transitive dependencies should be unaffected.

The build artifact cache is separate from the module cache: the former is controlled by GOCACHE (and defaults to $HOME/.cache), while the latter is a subdirectory of the first entry in GOPATH. You may need to set the GOCACHE environment variable to make sure it is within the container; see Build and test caching for detail.

Can you confirm that both the build cache and the module cache are present and populated in your docker image after the first go build all?

Thanks guys, I think there may be some confusion about which caches are being affected and when.

The issue is in how docker caches layers after each operation. When my source files are changed, all side effects which occur after the COPY . . line (such as populating GOCACHE) are lost. Those changes are isolated in a layer which has been invalidated and must be fully rebuilt.

The go list -export $(go list -m)/... command works great for populating GOCACHE but since it must come after COPY . ., it must be fully re-run during every build. go build all also has this issue.

I'm looking for a command which can compile the dependencies listed in the go.mod file in isolation so that it can occur before that COPY . . line that adds our sources to the container. Totally understand if that's not possible with the current module system. I may just experiment with parsing and building the deps separately.

The go list -export $(go list -m)/... command works great for populating GOCACHE but since it must come after COPY . ., it must be fully re-run during every build

I'm unclear why you say it must come after the copy - please can you explain?

I'm looking for a command which can compile the dependencies listed in the go.mod file in isolation so that it can occur before that COPY . .

go list -export $(go list -m)/... should be all you need here. But let's first unravel the question above first.

@myitcv, note that -export may at some point do less than a full build. I don't think it's a perfect fit for the use-case.

You have to export both GOCACHE and GOPATH/pkg/mod:

Example:

FROM golang:1.11-alpine AS mod
RUN apk add -U git
WORKDIR /src
COPY go.mod .
COPY go.sum .
RUN go mod download

FROM golang:1.11-alpine
COPY --from=mod $GOCACHE $GOCACHE
COPY --from=mod $GOPATH/pkg/mod $GOPATH/pkg/mod
WORKDIR /src
COPY . .
RUN go build

@myitcv the go list trick only works if you have your source present
The way we avoid re-downloading all deps is to simply copy over go.mod and go.sum then run go mod download which creates the package source cache, but does not create the compiled cache of the modules.

so we're looking for a way to get the stuff listed in go.mod compiled and placed in ~/.cache before we copy all the project source over, this lets us avoid the length re-compile of our deps on each build

think of it as a 2 phase build
phase 1: copy go.mod, download and (hopefully) compile deps
phase 2: copy project source and compile our stuff against phase 1 cached stuff

@dbudworth it doesn't really seem possible to do what we're looking to do with the currently available tooling. I came up with a hacky workaround to get the results I was looking for and just updated my example repo to illustrate it.

The basic idea is the use of a dummy import file which can trigger the compilation of dependencies when run through go build. This file is added with go.mod to the docker image, then compiled to prime the cache, then removed before adding the real application source files.

While I'd much prefer a way to compile dependencies separate from application code as part of the official toolchain, this method does dramatically reduce subsequent docker image build times for our project and has really sped up our CI process.

Same issue here. go build step depends on main.go AND it compiles vendor dependencies. Which means every time we change main.go, it will recompile it AND all of the vendor dependencies. The only way around that for now appears to be @wedow's workaround of a dummy_main.go that includes dummy imports of all vendor dependencies. So we run go build on that file first, and only then we COPY/ADD main.go and go build the latter (but this later go build now reuses deps pre-compiled with the previous go build).

This would be somewhat easier to handle if docker build supported a -v option so we could mount a "compilation cache" directory at build time.

Would it be possible to add a --install or --compile flag to go mod download, that would compile and cache the downloaded packages?

@benweissmann, that seems like it would have significant overlap with go get, which does build and install the requested packages.

@dinvlad

go build step depends on main.go AND it compiles vendor dependencies. Which means every time we change main.go, it will recompile it AND all of the vendor dependencies.

The Go build cache is content-addressed, and contains intermediate artifacts. If you are correctly storing the build cache (as @hinshun describes), then it should not recompile dependencies whose sources are unchanged.

The only way around that for now appears to be @wedow's workaround of a dummy_main.go that includes dummy imports of all vendor dependencies.

You can use go list to query the dependencies of your top-level package and request to build those dependencies explicitly. (A dummy .go file is fine too, but not strictly necessary.)

Please try the above approach (saving both GOCACHE and GOPATH/pkg/mod and using go list to compute the set of packages to warm the cache) and let us know if there are any remaining issues.

@bcmills I'm kind of at a loss on how to explain the issue in a different way. The go list approach is incompatible with docker's caching mechanism. It requires the presence of my application source. Any subsequent change to that source invalidates docker's cache which also throws away anything in GOCACHE.

Similarly, @hinshun's approach of copying GOCACHE from a previous build step has no effect because go mod download doesn't populate GOCACHE. There is nothing to be copied.

You mention an --install flag would overlap with go get, but go get requires application source whereas go mod download does not and works on .mod files. If there is a way to have either go get operate on .mod files in isolation, or have go mod download populate GOCACHE after downloading, there'd be no issue. Since this doesn't work, we need a new option or command or something to accomplish this.

Personally, go mod download --install or even go mod install seem like good fits.

The go list approach is incompatible with docker's caching mechanism. It requires the presence of my application source. Any subsequent change to that source invalidates docker's cache

Yes, you'd need to prime the cache in your Docker image from a specific version of your application source, and changing that source would invalidate the image caching. (I suspect that you could discard that source from the final image, but I don't use Docker much so I'm a bit fuzzy on the details.)

You could also use go list to compute the dependency versions (and dependency packages), and build those even without your application source.

go get does not require your application source in general: it can download packages and modules as needed. (You still need to pass it an appropriate list of packages to build, though.)

you'd need to prime the cache in your Docker image

I may be misunderstanding you but Docker doesn't have this capability.

Correct me if I'm wrong but you're suggesting using go list to generate a separate list of dependencies from what's already maintained in the .mod file and also committing that when dependencies are updated. Then using that list to build those dependencies (with go get or otherwise).

If you're suggesting using go list as part of the Dockerfile, that again doesn't work due to the cache invalidation issue. Unless go list has an option for parsing only the .mod file?

Correct me if I'm wrong but you're suggesting using go list to generate a separate list of dependencies from what's already maintained in the .mod file and also committing that when dependencies are updated. Then using that list to build those dependencies (with go get or otherwise).

Yes, exactly: use go list to produce a list of modules and versions, and to separately produce a list of packages to prime in the cache. Then commit that alongside your Dockerfile (or wherever you like), and have the Docker image run the equivalent of go mod init foo && go get -m $(<module_list.txt) && go get $(<package_list.txt) && rm go.mod go.sum.

Is module_list.txt the output of go list -m all? And is package_list.txt = go.sum? Thanks

Is module_list.txt the output of go list -m all?

Probably, yes. With the main module (go list -m) filtered out, and perhaps with the output transformed a bit into something that works as an argument to go get.

And is package_list.txt = go.sum?

No, it's probably more like go list all minus go list ./... (with both commands evaluated in module mode).

Hmm, thanks. If we experiment with a wrapper to produce the right lists, would you accept a PR to add this as a first-class option for go mod download?

To be honest, I'm a bit confused on the purpose of the .mod and .sum files if they're not meant as a list of dependencies. Seems weird to create a third file for this purpose.

Other language ecosystems don't seem to have an issue with building dependencies based on some sort of manifest file.

  1. bundle install only needs a Gemfile
  2. npm install only needs a package.json
  3. cargo build only needs a Cargo.toml and an empty src/lib.rs file.

What makes Go special that this isn't feasible?

EDIT: To clarify, go mod download works exactly as expected on .mod files. All we want is to compile whatever was just downloaded by that command. It's cumbersome to maintain custom tooling to generate a redundant list of dependencies just to be able to compile them. I can't understand why if go mod download is able to fetch dependencies, another command can't exist to then build them.

I'd also be very happy to come up with a PR for this if it'd be welcome.

@wedow, you _can_ copy in your go.mod and go.sum files to list the module versions. So I suppose you don't need a module_list.txt; those files suffice.

You still need a package_list.txt to tell the build exactly which targets you want to be warmed in the cache. (Presumably you don't want to pre-build packages that aren't actually needed to satisfy the transitive imports of the packages and tests you're running in the image.)

Alternatively, if your docker daemon is recent enough (18.09+), you can export DOCKER_BUILDKIT="1" to use the new image builder with cache mount features. Currently only available on the experimental Dockerfile frontend, see: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md#run---mounttypecache

# syntax = docker/dockerfile:experimental
FROM golang
...
RUN --mount=type=cache,target=/root/.cache/go-build go build ...

Change https://golang.org/cl/175985 mentions this issue: cmd/coordinator: stop using gitlock, use go modules

Change https://golang.org/cl/176257 mentions this issue: cmd/buildlet/stage0, cmd/scaleway: stop using gitlock, use go modules

FWIW I ran into this exact same issue and found no great solution, and I know it's not sexy but I just used go mod vendor to vendor the dependencies on the host before COPYing the source into the Docker build, then set GOFLAGS=-mod=vendor prior to building binaries inside the Docker build. Then the vendor directory is just cleaned up after the Docker build. Until there's a way to download dependencies from go.mod/go.sum, using a temporary vendor directory just feels much simpler and more maintainable than these other solutions.

Thanks @hinshun for your comment, it's how I found out about Buildkit, which is working really well for our team! 🙌

In case anyone is using Go modules (and coming from a Node/Ruby background such as myself), there's no good way to "pre-compile" the Go modules ahead of doing the full compilation. I think the fact that Go is a compiled language makes this particularly challenging, but others can feel free to correct me on that as I'm new to Go! However, using Buildkit, what you can do is use a mounted cache that is shared across builds (for a given build agent) so that your Go modules don't re-download every time:

# syntax = docker/dockerfile:experimental
FROM golang:1.12-stretch
...
RUN --mount=type=cache,target=/go/pkg/mod go build ...

The different thing to note here ^ is the target=/go/pkg/mod. As mentioned previously in this thread go mod download doesn't populate the go build cache (GOCACHE, which is usually someplace like: /root/.cache/go-build), but rather downloads the modules to a different location, which took me a while to work out amongst all the googling, and that's: /go/pkg/mod.

This alone is a great optimisation! The Go modules downloads were taking _forever_ for us and they hardly ever change. You can apply that same cache mount logic to other things besides the Go modules too:

...
RUN --mount=type=cache,target=/var/cache/apk apk add --update curl ...
...
RUN --mount=type=cache,target=/usr/local/share/.cache/yarn/v1 yarn ...

Buildkit (and in particular this cache mount beauty) brought our full monorepo build times down from about 45 minutes to 12 minutes 🎉

Hope this helps someone 🤷‍♀😄

Maybe an even better approach is to just mount or copy the $GOPATH/pkg/mod from the host which would already have the dependencies. This would make the first time builds even faster

@nicollecastrog can you give a full example of your approach as I can't fully understand it.

btw this worked for me:

FROM golang:1.12 AS mod
WORKDIR $GOPATH/src/github.com/open-fresh/avalanche
COPY go.mod .
COPY go.sum .
RUN GO111MODULE=on go mod download

FROM golang:1.12 as build
COPY --from=mod $GOCACHE $GOCACHE
COPY --from=mod $GOPATH/pkg/mod $GOPATH/pkg/mod
WORKDIR $GOPATH/src/github.com/open-fresh/avalanche
COPY . .
RUN GO111MODULE=on CGO_ENABLED=0 GOOS=linux go build -o=/bin/avalanche ./cmd

FROM scratch
COPY --from=build /bin/avalanche /bin/avalanche
EXPOSE 9001
ENTRYPOINT ["/bin/avalanche"]

the first build takes a while, but each additional one is fast as long as the go.mod/go.sum files don't change.

@nicollecastrog @krasi-georgiev I just went through this exercise. Let me try to summarize my findings:
Buildkit supported cache mounts (i.e --mount=type=cache,target=) are mounted in at build time only for the context of a given RUN command. They are umount-ed immediately after a given layer (i.e RUN ...) completes. What this means is this that if you want to cache Go modules at build time, you need to mount the Go module directory to the RUN statement that is executing your go build.

What this looks like:

RUN --mount=type=cache,target=/opt/gopath/pkg/mod \
    --mount=type=cache,target=/root/.cache/go-build \ 
    go mod download && go build

Note, that something like the following is actually incorrect from a caching perspective:

RUN --mount=type=cache,target=/opt/gopath/pkg/mod go mod download
RUN --mount=type=cache,target=/root/.cache/go-build go build

The reason why this is incorrect is because the mount cache that holds the Go modules is unmounted right after the first RUN executes and is not present for the go build. If you were to inspect an intermediate container right after the go mod download, you'll see that /opt/gopath/pkg/mod is actually empty (because it was unmounted). The commands will still run, but your Go modules will not be cached, and you'll see go build downloading modules again.

Hope this helps!

@arjunpur If you are using Buildkit's cache mounts you should not run go mod download and just let go build fetch the dependencies based on the pkg/mod cache.

@hinshun why is that?

You're running go mod download && go build which is the equivalent of go build.

Won't that not download any test dependencies?

If we assume that go.mod has been populated after a go build - can we just go get everything listed in require(...)? Also go list -m all only prints out my module name and nothing else, maybe because I'm not using a repo, and my module name is just testmod.

Following the advice given by @hinshun, @nicollecastrog and @arjunpur, I made a PR to Kubeapps that I think solves this exact problem. Here is the Dockerfile for reference:

# syntax = docker/dockerfile:experimental

FROM golang:1.13 as builder
WORKDIR /go/src/github.com/kubeapps/kubeapps
COPY go.mod go.sum ./
COPY vendor vendor
COPY pkg pkg
COPY cmd cmd
ARG VERSION
# With the trick below, Go's build cache is kept between builds.
# https://github.com/golang/go/issues/27719#issuecomment-514747274
RUN --mount=type=cache,target=/go/pkg/mod \
    --mount=type=cache,target=/root/.cache/go-build \
    CGO_ENABLED=0 go build -installsuffix cgo -ldflags "-X main.version=$VERSION" ./cmd/tiller-proxy

FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /go/src/github.com/kubeapps/kubeapps/tiller-proxy /proxy
EXPOSE 8080
CMD ["/proxy"]

Compared to having go mod download cached by Docker, but not the Go build cache, this brings the total build time (for make kubeapps/tiller-proxy) down from 40 seconds to about 5 seconds.

More simple way:

FROM golang:1.13-alpine

RUN apk update \
    && apk add --no-cache git

WORKDIR /attacker

COPY ./go.mod .

RUN go mod graph | cut -d '@' -f 1 | cut -d ' ' -f 2 | sort | uniq | tr '\n' ' ' | xargs go get -v

COPY . .

RUN CGO_ENABLED=0 go test -c 

CMD ./attacker.test

Took me a while to find out that such a simple thing that all other languages that I ever used made me take for granted can't be done in Go natively.

It is quite a pain having to switch our entire CI ecosystem because of a missing command option, but it seems like the only way for the modules and caching to have any meaning besides allowing code outside of GOPATH and to achieve the Go's goal of fast builds while using containerized applications

It is quite a pain having to switch our entire CI ecosystem because of a missing command option

@Fryuni, what is the “missing command option” to which you refer? Most of the recent progress on this issue has been folks figuring out the proper docker configuration (https://github.com/golang/go/issues/27719#issuecomment-485008577, https://github.com/golang/go/issues/27719#issuecomment-514747274, and others), rather than any proposed changes in the go command.

@bcmills A command to build the dependency cache. Or a flag to do it with go mod download.

In Python, for example, pip install -r requirements.txt will download all the dependencies _and_ compile C dependencies if they are not pre-compiled. That means that this dockerfile will have everything in the cache correctly and won't recompile the dependencies:

FROM python:3.7
WORKDIR /app
COPY requirements.txt /app
RUN pip install -r requirements.txt

CMD ["python", "main.py"]
COPY . /app

The same is equally simple in Node, Java, Ruby, etc.
But in Go... this happens:

FROM golang:1.13
WORKDIR /src
COPY go.mod go.sum /src/
RUN go mod download

# At this point there is no cache for the dependencies binaries, but there should be
# Either with a command like `go mod build-cache` or a flag for the previous like `go mod download --build-cache`

COPY . /src  # Making a cache after this is totally useless as it will be thrown away by any change in the code

RUN go build -o /app . # The cache is only created here

CMD ["/app"]

Everyone is figuring a way to use other docker feature to _compensate_ this missing feature. Using the experimental features to _sidestep_ docker layer architecture just to inject a cache along with a RUN command.

That is exactly why we are having to change our CI ecosystem. We currently use managed solutions, but those (very wisely) do not allow _experimental_ features to be enabled on dockerd on your CI pipeline. We are changing to a self-hosted solution in order to use them.

Agree with @Fryuni. FWIW my first attempt was to do this with go mod download and I was also looking for a flag on the command that would facilitate this. That would be a good solution IMO. I’m still using go mod vendor aa a workaround and it has worked great. Presumably a go mod download flag would produce a similar but more elegant solution.

I'd like to avoid experimental docker features to cache go module compilation artifacts, so I tried @Feresey's approach of using go get to download and install dependencies.

RUN go mod graph | cut -d '@' -f 1 | cut -d ' ' -f 2 | sort | uniq | tr '\n' ' ' | xargs go get -v

However, this failed on this go.mod file:

module github.com/my-module

go 1.13

require (
    gopkg.in/DataDog/dd-trace-go.v1 v1.20.1
)

To repro without a go mode file:

$ go get gopkg.in/DataDog/dd-trace-go.v1

go get gopkg.in/DataDog/dd-trace-go.v1: no Go source files

I'm not sure what's going on here. All the actual imports are gopkg.in/DataDog/dd-trace-go.v1/ddtrace so I don't know why the module is different.

@jschaf, go get (without the -d argument) requests to fetch and build the _packages_ named on the command line.

Packages are not 1:1 with modules: a module _contains_ packages — often many of them, and often many that are not going to be relevant to building the packages in _your_ module. That's why much of the discussion above (for example, https://github.com/golang/go/issues/27719#issuecomment-483777680) focuses on packages rather than modules.

It's also why a flag to go mod download would not be a great fit for this use-case: if we were to add some flag to go mod download that also builds all of the packages _within_ the downloaded modules, it would encourage folks to build (and cache) a bunch of extraneous dependencies that they won't actually end up needing.

Honestly, prebuilding a cache that has _more_ than what I'm gonna need is way better then not building any cache at all. After all, that is the build image, having extra data there is not a problem, is expected. The final binary should be moved to another image in a multi-stage build, as per best practices to have small docker images _at the end_.

Also, I never expected it to build only the cache of what I'm going to use, but the cache of the dependencies declared, whether my code use them or not. This is cache done _before_ the code is added to the image, it obviously cannot optimize for the code.

Similar to what happen with typescript, you install all your dependencies and transitive dependencies entirely, but when you compile it to JS it only includes what is actually used.

I'm running into this now that we've switched to using modules, whereas before we could use:

RUN go get -d -v ./...
RUN go install -v ./...

Now it seems our only option is to use experimental docker engine that isn't supported by our CI or some of our devs machines or live with slow builds.

I think there's been a lot of confusion about docker cache vs build cache and go module source cache vs go module build cache. To reiterate the issue for @bcmills what we all really want is:

go mod download --install

or

go mod install

This would allow us leverage existing docker versions non-experimental caching layers that have been around forever the same way we use it to avoid re-downloading the modules source every time we build an image.

For example here is an example Dockerfile which caches the go module source in a docker layer so subsequent docker builds use the cached layer and don't re-download the modules. The problem is if I change anything in the main source then I have to re-build all the modules. If go mod download --install existed it would be be cached in a docker layer and speed up the `go build which would only build the actual example app instead of all the dependencies.

FROM golang:1-alpine AS build
ARG COMMIT_HASH
WORKDIR /example-app

COPY ./go.mod ./go.mod
COPY ./go.sum ./go.sum

RUN go mod download

COPY ./*.go ./

ENV GOARCH=amd64
ENV CGO_ENABLED=0
ENV GOOS=linux

RUN go build -o example .

FROM scratch
WORKDIR /app
ENV PATH=/bin/
COPY --from=build /example-app/example ./example
ENTRYPOINT ["./example"]

Would it be possible to add a --install or --compile flag to go mod download, that would compile and cache the downloaded packages?

Personally, go mod download --install or even go mod install seem like good fits.

Correct me if I'm wrong but isn't the requested extension to the go command equivalent to the following?

go mod download --json | jq -r '"\(.Path)@\(.Version)"' | xargs go get -v

This obviously depends on jq to transform the JSON output so it would still be nice to have it built into the go command to avoid that extra dependency.

Example of usage in a Dockerfile:

FROM golang:1.14-alpine AS build
WORKDIR /go/src/app
ENV CGO_ENABLED=0
RUN apk add --no-cache jq
COPY go.mod go.sum ./
RUN go mod download --json | jq -r '"\(.Path)@\(.Version)"' | xargs go get -v
COPY . .
RUN go build -o /go/bin/app

FROM gcr.io/distroless/base
COPY --from=build /go/bin/app /
ENTRYPOINT ["/app"]

@futek unfortunately that doesn't always work and results in this:

The command '/bin/sh -c go mod download --json | jq -r '"\(.Path)@\(.Version)"' | xargs go get -v' returned a non-zero code: 123

reproduction repo: https://github.com/montanaflynn/golang-docker-cache

@montanaflynn That looks like broken software you're trying to build, not an issue with the jq kludge.

    github.com/coreos/bbolt: github.com/coreos/[email protected]: parsing go.mod:
    module declares its path as: go.etcd.io/bbolt
            but was required as: github.com/coreos/bbolt

@futek

This obviously depends on jq to transform the JSON output so it would still be nice to have it built into the go command to avoid that extra dependency.

That would be very simple to write as a go run buildall.go helper, to avoid the jq dependency. It could also easily subsume the --json and xargs parts.

@tv42 If you remove the line:

RUN go mod download --json | jq -r '"\(.Path)@\(.Version)"' | xargs go get -v

Then it works and actually downloads far less dependencies, presumably just what's needed for the resulting binary. I think that there are edge cases and associated logic that is included in the go cli that should be applied to any solution for the problem of caching the built dependencies.

Example Dockerfile and docker build logs: https://gist.github.com/montanaflynn/9c7365f0b74635f18268f12897b0b6eb

There are other one-liner shell solutions in this comment thread as well that try to use the dependencies from go mod and go get, for example this comment suggested:

RUN go mod graph | cut -d '@' -f 1 | cut -d ' ' -f 2 | sort | uniq | tr '\n' ' ' | xargs go get -v

Which kind of worked for my reproduction, except while it installed even more dependencies than go mod download --json | jq -r '"\(.Path)@\(.Version)"' | xargs go get -v it also missed some that were later picked up by go build. It also failed entirely for this commenter.

Example Dockerfile and docker build logs: https://gist.github.com/montanaflynn/2d8a5532077e501ec86b4ad643cd1075

I think these one-liner combinations of go mod and go get while they may work for a specific set of dependencies will run into issues if being used for the full spectrum of software being built with Go and that is why we need to have it included in the official go cli where any problems can be reported and fixed.

@montanaflynn

@futek unfortunately that doesn't always work and results in this:

The command '/bin/sh -c go mod download --json | jq -r '"\(.Path)@\(.Version)"' | xargs go get -v' returned a non-zero code: 123

reproduction repo: https://github.com/montanaflynn/golang-docker-cache

Right, it appears that passing all indirect dependencies to go get is not equivalent to what happens during a normal build. I think it suffices to just pass the direct dependencies to go get like in this one-liner:

go mod graph | grep "^$(go mod edit -json | jq -r .Module.Path) " | cut -d ' ' -f 2 | xargs go get -v

(i.e. grab the module name from go.mod and use it to filter direct dependencies in the output of go mod graph)

There are other one-liner shell solutions in this comment thread as well that try to use the dependencies from go mod and go get, for example this comment suggested:

RUN go mod graph | cut -d '@' -f 1 | cut -d ' ' -f 2 | sort | uniq | tr '\n' ' ' | xargs go get -v

Now that I look at this one again it seems like it's trying to do exactly the same by relying on the fact that the root package doesn't have a version suffix (@...). However, it didn't work with any version of cut I tried since it outputs lines even when it doesn't have a second field when it can't find the delimiter. Passing -s to cut should fix that making the following one-liner almost equivalent to the one above except that it throws away the version suffix which seems wrong (would it always pick the correct version?):

go mod graph | cut -d '@' -f 1 | cut -s -d ' ' -f 2 | xargs go get -v

Building on that, this is the "simplest" version I can come up with that also retains the version:

go mod graph | grep -v '@.*@' | cut -d ' ' -f 2 | xargs go get -v

I'm sure there are a lot of ways to do this (which could break in various subtle ways) so I'm still voting for an official go flag/command that handles this correctly without the need to maintain one-liners/scripts like this.

@futek I appreciate the thought but that command fails for drone's dependencies.

go: github.com/NVIDIA/[email protected] requires
    k8s.io/[email protected] requires
    k8s.io/[email protected]: git init --bare in /go/pkg/mod/cache/vcs/917454838ed90b2f0e9868490d4b59302d7a7e8f8826d51d313bd68be346ecce: exec: "git": executable file not found in $PATH

Even after installing git it still fails with this error:

go: github.com/NVIDIA/[email protected] requires
    k8s.io/[email protected] requires
    k8s.io/[email protected]: reading k8s.io/api/go.mod at revision v0.0.0: unknown revision v0.0.0

When removing RUN go mod graph | grep -v '@.*@' | cut -d ' ' -f 2 | xargs go get -v and just letting go build handle the dependencies it works fine.

For some projects it can certainly improve the docker built time dramatically but it doesn't work everywhere or for every project. I'll still be using it for a few projects where I know it works with their dependencies, in some cases I'm seeing a 10x docker image build speed!

By the way I think this might be a little simpler to understand and only requires a single pipe to awk:

go mod graph | awk '{if ($1 !~ "@") print $2}' | xargs go get -v
Was this page helpful?
0 / 5 - 0 ratings