Go: proposal: cmd/go: allow -toolexec tools to opt in to build caching

Created on 31 Aug 2020 · 14Comments · Source: golang/go

From go help build:

        -toolexec 'cmd args'
                a program to use to invoke toolchain programs like vet and asm.
                For example, instead of running asm, the go command will run
                'cmd args /path/to/asm <arguments for asm>'.

-toolexec is a very powerful flag to control or modify how Go's toolchain programs get run. The classic use case is for development of Go itself; for example, using a program to measure how much CPU and wall time each compiler and linker invocation takes, or to insert a debugger in a particular toolchain program execution.

Issue https://github.com/golang/go/issues/27628 already covers an existing problem: right now, the use of -toolexec does not invalidate the build cache, which is a problem in both of the examples above. We agree that that bug should be fixed, but consider it to be a separate issue.

However, we argue that it would be reasonable for a toolexec program's output to be cached when the program is deterministic. The use case we have in mind is tools which alter a Go build predictably; for example, a code obfuscator which modifies the arguments passed to the compiler and linker in a deterministic and reproducible way. Without caching, such a tool is extremely slow, especially for big projects with many packages.

The way go build handles caching properly for go tool compile seems to be to first ask it compile -V=full, which shows the compiler's full version, and then use that as part of the cache key. This means that the "package build" operations will need to be run again if the compiler version changes or if the input changes, but not otherwise.

We propose adding similar functionality when using -toolexec. For example, when running go build -toolexec=mytool, the Go tool would run the tool with a well defined version flag like mytool -toolversion, similar to the compiler's -V=full. If it returns a non-zero exit code, we do no caching at all, which would align with the non-deterministic use cases in https://github.com/golang/go/issues/27628. If the command succeeds, its output is added to the cache key, and proper build caching takes place.

We think this is the best of both worlds, because current non-deterministic toolexec programs would continue to use no caching at all, while other Go-specific complex tools could take advantage of the build cache.

We should also define the bare minimum of information for a -toolversion flag to report. go tool compile -V=full seems to include the toolchain's version as well as the compiler's package build ID:

$ go tool compile -V=full
compile version devel +54e18f1c2a Fri Aug 28 20:01:41 2020 +0000 buildID=DU3bt_lJqYe1kVhgbnOf/1cyl58sUXfIZ9K-RctHM/5cl10_pQMFsyS4FJFGwM/fqivXvY1n7nVWhRpqqqP

We realise that this proposal is somewhat far-fetched, and the easy answer might be "-toolexec was never meant for this kind of use". However, we argue that it opens up a very exciting world of possibilities in Go tooling, especially with tools that closely integrate with the toolchain itself to alter builds.

https://github.com/golang/go/issues/29430 is a related proposal, but it also has a very different objective. It proposes an entirely new interface for code rewriting tools, which is yet to be defined. We simply propose to improve the existing -toolexec interface in a very specific and backwards-compatible way. We also think both features could coexist; not all build-altering tools will simply need to alter source code - for example, the code obfuscator shown above also enforces some linker flags like -w -s.

/cc @bcmills @cherrymui @ianlancetaylor @josharian @jayconrod @ianthehat

GoCommand Proposal ToolSpeed Tools

Source

mvdan

👍7

Most helpful comment

Indeed, it works. Here's a proof of concept I wrote today: https://github.com/burrowers/garble/commit/308e4aef293d8e14dc5da979874b1c4141ad655d

Unfortunately, via plain go build I don't have a good way to obtain myexec's own version. For now, I just do a sha256 sum of os.Executable, and add the flags passed to myexec which affect the output. Hopefully https://github.com/golang/go/issues/37475 can be fixed soon, as then I'd avoid the extra I/O.

I'll leave this proposal open for a few more days, just in case this current solution isn't enough for any other tools or people following this thread. For my own immediate use, it appears I don't need any extra features after all, so I'm happy.

mvdan on 18 Sep 2020

❤3

All 14 comments

The problem with general caching for toolexec is that we can't tell whether the tool is idempotent (such as a transformer or annotator or some sort of idempotent analyzer), or intended specifically to measure repeated invocations (such as a profiler or tracer), or even consistent from run to run (such as an interactive debugger with user-supplied edits to arbitrary memory locations).

Tool versioning aside, I think we would also need some way to determine whether the output is actually intended to be cached.

bcmills on 31 Aug 2020

we would also need some way to determine whether the output is actually intended to be cached.

That's what the existence of the flag is meant to convey. We could call it -gocachekey if you think -toolversion is not specific enough.

mvdan on 31 Aug 2020

Consider go build -toolexec myexec.
It sounds like you are saying to run myexec -V=full.
Why are we not running myexec compile -V=full?
Why does myexec matter at all?
What matters is what happens in the tools it runs.
Using "myexec compiler -V=full" would work for toolstash.

I see #27628 and wanting to get timing repeated, but that seems orthogonal to the use of toolexec.
You can't run "time go build" twice in a row and get consistent answers either (the second uses results cached by the first).
"go build -toolexec=time" behaving like a fine-grained "time go build" does not bother me too much.
Maybe for that case you want to be able to checkpoint the cache and rewind.
(Delete all cached results past a certain moment, or maybe "go build -do-not-write-new-things-to-cache".)

rsc on 2 Sep 2020

It sounds like you are saying to run myexec -V=full.
Why are we not running myexec compile -V=full?
Why does myexec matter at all?

Both myexec -gocachekey and go tool compile -V=full would be in the cache key when running go build -toolexec myexec, now that I think about it. This is because myexec and go tool compile are independent and can change their behavior separately.

Running only myexec compile -V=full and expecting that to contain both cache key inputs at once is an option, too. I don't feel strongly one way or another.

mvdan on 6 Sep 2020

What kinds of things do you envision myexec doing that would make only using compile -V=full not good enough as a cache key? Myexec's job should be to run some kind of executable. Would it ever _change_ the behavior of that executable so that it didn't work the way it was compiled to work? (How it was compiled to work should be covered by the -V=full output.)

rsc on 16 Sep 2020

The immediate use case is indeed altering Go builds, such as changing the input Go source or altering the flags passed to the compiler and linker. This is how I implemented a binary obfuscator, which I tried to explain in the original post.

Any tool which aims to build a Go binary with altered Go source code would also benefit from this. https://github.com/golang/go/issues/29430 shows fuzzing as an example, since one needs to instrument the code for fuzzing.

mvdan on 16 Sep 2020

@mdvan In that case it seems pretty clear that the tool that is altering the behavior of the compiler should be responsible for altering the -V=full output as well.

rsc on 18 Sep 2020

That's actually a good point, I'm not sure why I never thought of it. I'll give that a try; please give me a week before making a decision on this proposal.

mvdan on 18 Sep 2020

For the record (because I wasn't sure above), go build -toolexec=myexec _already_ runs myexec compile -V=full:

% go build -toolexec=echo helloworld.go
go tool compile -V=full: unexpected output:
    /Users/rsc/go/pkg/tool/darwin_amd64/compile -V=full
go tool compile -V=full: unexpected output:
    /Users/rsc/go/pkg/tool/darwin_amd64/compile -V=full
go tool compile -V=full: unexpected output:
    /Users/rsc/go/pkg/tool/darwin_amd64/compile -V=full

rsc on 18 Sep 2020

Indeed, it works. Here's a proof of concept I wrote today: https://github.com/burrowers/garble/commit/308e4aef293d8e14dc5da979874b1c4141ad655d

mvdan on 18 Sep 2020

❤3

Alright, withdrawing proposal as per my comment above. Thanks again @rsc.

mvdan on 22 Sep 2020

Unfortunately, via plain go build I don't have a good way to obtain myexec's own version.

I don't understand this. Every program gets a build ID, right? It seems like you should be able to fetch the build ID from os.Executable instead of doing the full sha256 of the executable.

rsc on 23 Sep 2020

👍1

Oh, I was completely unaware that I could use go tool buildid on a final Go binary. I thought it would only work on object files. That indeed seems to be significantly faster on a 7MiB binary.

mvdan on 23 Sep 2020

Indeed, it works. Here's a proof of concept I wrote today: burrowers/garble@308e4ae

I should note that it does not work just yet, because my implementation was far too naive. I need to first properly read the documentation on what build IDs are (from src/cmd/go/internal/work/buildid.go), and learn how to alter the hashes in the right way without altering the format.

So, it should still work, I just went about it the wrong way in my first attempt.

mvdan on 23 Sep 2020

Was this page helpful?

0 / 5 - 0 ratings