Go: cmd/go: do not cache tool output if tools print to stdout/stderr

Created on 11 Sep 2018 · 27Comments · Source: golang/go

Update, Oct 7 2020: see https://github.com/golang/go/issues/27628#issuecomment-702252564 for most recent proposal in this issue.

What version of Go are you using (`go version`)?

tip (2e5c32518ce6facc507862f4156d4e6ac776754f), also Go 1.11

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (`go env`)?

darwin/amd64

What did you do?

$ go build -toolexec=/usr/bin/time hello.go
# command-line-arguments
        0.01 real         0.00 user         0.00 sys
# command-line-arguments
        0.12 real         0.11 user         0.02 sys
$ go build hello.go 
# command-line-arguments
        0.01 real         0.00 user         0.00 sys
# command-line-arguments
        0.12 real         0.11 user         0.02 sys
$

What did you expect to see?

The second invocation of go build doesn't have -toolexec, so it should not invoke the toolexec command (which I think it doesn't), nor reprint its output.

What did you see instead?

toolexec output is reprinted.

In fact, I think it probably should not cache at all if -toolexec is specified, since the external command that toolexec invokes may do anything, and (intentionally) not reproducible.

cc @dr2chase

Proposal Proposal-Accepted

Source

cherrymui

👍2

Most helpful comment

Thinking about my comment above a bit more, I think this issue should still be fixed by simply doing no caching when -toolexec is used. That would be strictly better than what we have right now.

Once this issue is fixed, I can then file the "opt-in caching for -toolexec" idea as a separate proposal.

mvdan on 9 Dec 2019

👍5

All 27 comments

Using a cooked command "wrap" =

#!/bin/bash
echo -n `basename "$1"` 1>&2 ; /usr/bin/time "$@"
date 1>&2

it keeps reprinting the same date+time. This is definitely cached output.

go build -toolexec wrap .
go build .
rm hello
go build .

dr2chase on 11 Sep 2018

27207 is about `-exec prevents test caching`. Maybe there needs to be a more holistic review of -exec, -toolexec, and friends, and how they all relate to caching.

mark-rushakoff on 11 Sep 2018

I'm not sure whether we should cache the result of -toolexec (see https://github.com/golang/go/issues/27207#issuecomment-420628219), but if we do, the value of the -toolexec flag should be included in the cache key.

So there's definitely a problem here either way: we just need to decide whether the fix is to update the cache key or exclude the result from caching.

bcmills on 12 Sep 2018

I actually don't think those are the right choices. There is a plausible use of -toolexec to invoke a debugger or other interactive program, in which case we should never cache if -toolexec is used. And there is a plausible use of -toolexec to invoke a logger or other build annotator, in which we should completely ignore -toolexec and its argument for caching purposes. I can't think of an important use case in which we want to cache based on whether we are using the same -toolexec option as before.

I would probably pick "never cache if -toolexec is used" but I don't feel very strongly about it.

ianlancetaylor on 12 Sep 2018

👍5

I would probably pick "never cache if -toolexec is used"

I vote for this.

cherrymui on 12 Sep 2018

👍3

It looks like we have a consensus. (Moving from NeedsDecision to NeedsFix.)

bcmills on 12 Sep 2018

I agree that this bug needs fixing one way or another. The only way I find -toolexec useful right now is like -a -toolexec=whatever.

I have some thoughts about caching, though. It's unfortunate if we simply throw caching out the window, because it's entirely reasonable for toolexec programs to be deterministic given their input.

The way go build handles caching properly for go tool compile seems to be to first ask it compile -V=full, and use that as part of the cache key. This means the compile operations will need re-doing if the compiler version changes, but not otherwise.

Could we do the same for -toolexec as an opt-in? For example, if one does go build -toolexec=mytool, the Go tool would run a well defined version flag like mytool -toolversion, similar to the compiler's -V=full. If that fails, we do no caching at all with -toolexec=mytool. If it succeeds, its output is added to the cache key, and caching takes place.

I think this is the best of both worlds, because current non-deterministic toolexec programs would continue to use no caching at all, while other Go-specific complex tools could take advantage of the build cache.

We would have to define the bare minimum of information for -toolversion to report, though. For example, would a stringified https://golang.org/pkg/runtime/debug/#Module be enough, if available? Or perhaps a hash of the file contents in https://golang.org/pkg/os/#Executable?

mvdan on 8 Dec 2019

👍2

Thinking about my comment above a bit more, I think this issue should still be fixed by simply doing no caching when -toolexec is used. That would be strictly better than what we have right now.

Once this issue is fixed, I can then file the "opt-in caching for -toolexec" idea as a separate proposal.

mvdan on 9 Dec 2019

👍5

Going to mark this as a proposal since the answer is unclear and it is bound up in #41145. Maybe they should be merged.

rsc on 2 Sep 2020

I do think the answer is clear here; the default behavior with -toolexec should be to not use the build cache at all. You can see #41145 as an extension of this behavior, to allow opting into using the build cache properly.

In terms of implementation, if we want both issues implemented, they could be implemented together. But I don't think this issue needs to be a proposal, because as far as I can tell there's consensus that it needs to be done.

mvdan on 6 Sep 2020

Turns out that build caching with -toolexec is possible, by simply altering the output of [...]/compile -V=full and [...]/link -V=full. See https://github.com/golang/go/issues/41145#issuecomment-694612401 and the following comments.

So this brings forward a dilemma. If this proposal is accepted and implemented, my ability to use the Go build cache with the method above will break, because -toolexec will simply disable the build cache entirely.

The problem is still that -toolexec doesn't know whether it's given a deterministic tool that should be cached, or an interactive/changing tool that should entirely disable the cache.

Here's a suggested idea: make a wrapper for a not-to-be-cached tool like time, so that when called with just -V=full it will return a random version, meaning that it's never cached. This would quickly fill up the build cache though, so perhaps we could take a shortcut and make it so that if -V=full returns a non-zero exit status or prints nothing, no caching should take place.

mvdan on 22 Sep 2020

Using a wrapper seems complicated. The commands like time and gdb doesn't need to understand the arguments. If we go with the wrapper, it will need to parse the arguments.

However, the commands like time and gdb will have extra output when asking the -V=full output. Maybe the extra output should signal the go command not to cache? But that is not ideal: I can imagine a command that records things to file doesn't have extra output.

cherrymui on 22 Sep 2020

I agree that using a wrapper seems complicated, but it seems like the simplest answer from the point of view of the toolchain.

Using extra output to signal "no caching" is probably the worst option of all, because in my solution above I am using extra output to add extra hashes/input to the cache key.

I'm not saying that what I'm doing with -V=full is perfect, but it is what Russ suggested in the other thread, so I take it as being more or less supported. It would be very unfortunate if fixing this issue just breaks that and means I need to reopen that proposal.

mvdan on 22 Sep 2020

I still think -toolexec should be completely orthogonal from the cache. There are times when you want to force a tool to run (as an extreme case, -a), and we can and should handle those independently of whether -toolexec is being used.

rsc on 23 Sep 2020

I've read your last comment a number of times, but I'm having trouble understanding it, sorry.

If you mean that one should be able to use -toolexec with or without build the build cache, I agree, which is why I raised the other proposal to add such a knob. If you mean something else, you lost me :)

mvdan on 24 Sep 2020

I mean that the use of the build cache should be _completely_ separate from -toolexec.
The two should operate independently of each other, as they do today.
Neither knows anything about the other. It should stay that way.

If there are problems with wanting to avoid certain entries in the build cache at certain times,
that's a general problem, not limited to -toolexec, and so any solution should not involve -toolexec.

If you don't want a cache, there is currently -a.
If that's too big a hammer, then let's address that separate from -toolexec.

rsc on 30 Sep 2020

Fair enough, I agree that we don't need to make this about toolexec.

If you don't want a cache, there is currently -a.

I don't think that's quite right, though. -a does not disable the cache, it merely forces rebuilding, but it still writes to the cache. This results in unintended behavior when one wants to truly not use the cache at all, since we still write to the cache - for example:

$ cd tools # golang.org/x/tools
$ go build -a -toolexec=time ./cmd/stringer
[...]
# golang.org/x/tools/cmd/stringer
0.23user 0.04system 0:00.17elapsed 160%CPU (0avgtext+0avgdata 75048maxresident)k
0inputs+0outputs (0major+14062minor)pagefaults 0swaps
$ go build ./cmd/stringer
[...]
# golang.org/x/tools/cmd/stringer
0.23user 0.04system 0:00.17elapsed 160%CPU (0avgtext+0avgdata 75048maxresident)k
0inputs+0outputs (0major+14062minor)pagefaults 0swaps

Because stdout/stderr of the tools such as the compiler and linker is also cached, we've now written cache entries with the -toolexec=time effect. The only way to get rid of those is to rebuild again without toolexec, like go build -a ./cmd/stringer. So -a doesn't quite disable the build cache in a useful way.

mvdan on 30 Sep 2020

I see, thanks. I got this issue mixed up in my head with the other one (#41145).

@mvdan, above you wrote:

I do think the answer is clear here; the default behavior with -toolexec should be to not use the build cache at all.

I disagree that the answer is clear. Maybe it is clear for this one example of -toolexec=time, but that's not what -toolexec is for.

I added -toolexec specifically for toolstash. If you are using -toolexec toolstash you definitely _do_ want caching of results: why should substituting an alternate compiler disable caching?

The build cache came after -toolexec, but its design explicitly contemplates -toolexec and takes care to preserve good performance from -toolexec toolstash. See go doc -u cmd/go/internal/work.toolID. This was not an accident and should not be lightly discarded.

The design of -toolexec was not directly intended for -toolexec=time or -toolexec=gdb. That was an accident - a mostly happy one, but an accident nonetheless. The fact that -toolexec=time fell out naturally from the design doesn't mean it should force awkward changes to the design.

So as far as this proposal issue's current title - "do not cache tool output when using -toolexec" - I don't believe that's the right decision.

One possible way to address -toolexec=time would be to add a new -cachewrite=false flag that allows reading from the build cache but disables writes to it. Then you run -toolexec=time -cachewrite=false. It's easy to forget, though. Also, the existence of this flag would encourage thinking about the cache as something that should be manually managed, which disturbs me quite a bit. A cache that you need to manage by hand to get semantically correct behavior is not actually a cache.

rsc on 1 Oct 2020

Counter-proposal: any time a command prints to stdout or stderr, its results are not cached.

The most common time a tool prints to stdout/stderr are failures, and those are already not cached because of the non-zero exit status. The new rule would generalize that: any print cannot be cached.

At first this seems a bit surprising and more tied to tool behavior than you'd expect, but it turns out that it would remove a bunch of special case code that been introduced since the build cache was first added.

The original build cache didn't save stdout/stderr - it just didn't seem necessary.
One unintended effect was that -toolexec=time did not pollute future build output with cached timing prints.

But then go build -gcflags=-m errors didn't reprint the -m output on a second run (#22587).
We fixed that in CL 77110 by caching and reprinting the stdout/stderr too.

But then we still missed stdout/stderr for certain skipped compile+link steps (#23877).
We fixed that in CL 128903 with even more special logic.

And now go build -toolexec=time prints the timing output without -toolexec,
which is kind of the opposite of the -gcflags=-m issue.

(There were probably other issues I am forgetting.)

Instead of continuing to finesse the exact behavior of when stdout/stderr are cached or not,
why not turn back the clock and re-fix #22587 by saying:
"don't cache build artifacts when a command prints anything".
Then #22587 is fixed, #23877 doesn't happen, this issue doesn't happen either,
-toolexec stays independent of the cache decision, and -toolexec=toolstash stays fast.

And it would let us delete all the output-handling code in the build cache (the test cache is separate and would stay).

Thoughts?

rsc on 1 Oct 2020

👍4

I'd be fine with the counter-proposal. Almost all the time when I want the compiler to print something (using toolexec or not), I'd expect it actually invokes the compiler.

cherrymui on 1 Oct 2020

I disagree that the answer is clear.

For the record, I've changed my mind since then, which is the entire reason I retracted the other proposal. So yes, I agree with your comment.

I'm fine with the counter-proposal too. My tool which wraps the compiler and linker changes their behavior, and alters the output of -V=full to propagate that to the build cache, but otherwise doesn't print anything extra to stdout/stderr. So nothing would break for me.

mvdan on 1 Oct 2020

I like the counter-proposal also.

As a rookie compiler developer I used to add code like this to the compiler somewhere while testing/debugging a change:

  if os.Getenv("ENABLE_EXPERIMENTAL_FOOBAR") {
     fmt.Fprintf("enabling the experimental foobar!\n")
     <do something that causes the compiler to emit different code or DWARF>
  }

Since my new environment variable wasn't visible to build caching machinery, this me caused no end of trouble. I've since moved away from this entirely (I now use explicit command line flags, and make sure that I add my new flags to the recordFlags() call in the gc's main). The counter-proposal would make this process a little easier; just have to make sure that the "experimental" code has some trace output.

thanm on 1 Oct 2020

Retitled and will include in the proposal minutes under the new title, but seems headed for a likely accept.

rsc on 7 Oct 2020

(Crosslink) #41973 is an example that will be fixed if we do this.

cherrymui on 14 Oct 2020

Based on the discussion above, this seems like a likely accept.

rsc on 14 Oct 2020

🎉2

In case anyone wants to start working on this soon, heads up that I'm writing testdata/script/toolexec.txt for #15677, so it could be reused for this issue later too.

mvdan on 17 Oct 2020

No change in consensus, so accepted.

rsc on 21 Oct 2020

🎉2

Was this page helpful?

0 / 5 - 0 ratings

Related issues

x/tools/cmd/stringer: cannot import vendored dependencies

stub42 · 3Comments

cmd/compile: testing/quick misbehaves on Nexus 9 linux/arm64

rsc · 3Comments

proposal: sync: Map.Delete method should return bool, indicating if key was deleted or not

lkarlslund · 3Comments

x/build/cmd/coordinator: ssh proxy should support scp

bradfitz · 3Comments

proposal: cmd/vet: vet should warn when time.Time type (or types embed it) is used as map keys.

go101 · 3Comments