go version
)?tip (2e5c32518ce6facc507862f4156d4e6ac776754f), also Go 1.11
Yes
go env
)?darwin/amd64
$ go build -toolexec=/usr/bin/time hello.go
# command-line-arguments
0.01 real 0.00 user 0.00 sys
# command-line-arguments
0.12 real 0.11 user 0.02 sys
$ go build hello.go
# command-line-arguments
0.01 real 0.00 user 0.00 sys
# command-line-arguments
0.12 real 0.11 user 0.02 sys
$
The second invocation of go build
doesn't have -toolexec
, so it should not invoke the toolexec command (which I think it doesn't), nor reprint its output.
toolexec output is reprinted.
In fact, I think it probably should not cache at all if -toolexec
is specified, since the external command that toolexec invokes may do anything, and (intentionally) not reproducible.
cc @dr2chase
Using a cooked command "wrap" =
#!/bin/bash
echo -n `basename "$1"` 1>&2 ; /usr/bin/time "$@"
date 1>&2
it keeps reprinting the same date+time. This is definitely cached output.
go build -toolexec wrap .
go build .
rm hello
go build .
-exec prevents test caching
. Maybe there needs to be a more holistic review of -exec, -toolexec, and friends, and how they all relate to caching.I'm not sure whether we should cache the result of -toolexec
(see https://github.com/golang/go/issues/27207#issuecomment-420628219), but if we do, the value of the -toolexec
flag should be included in the cache key.
So there's definitely a problem here either way: we just need to decide whether the fix is to update the cache key or exclude the result from caching.
I actually don't think those are the right choices. There is a plausible use of -toolexec
to invoke a debugger or other interactive program, in which case we should never cache if -toolexec
is used. And there is a plausible use of -toolexec
to invoke a logger or other build annotator, in which we should completely ignore -toolexec
and its argument for caching purposes. I can't think of an important use case in which we want to cache based on whether we are using the same -toolexec
option as before.
I would probably pick "never cache if -toolexec
is used" but I don't feel very strongly about it.
I would probably pick "never cache if -toolexec is used"
I vote for this.
It looks like we have a consensus. (Moving from NeedsDecision
to NeedsFix
.)
I agree that this bug needs fixing one way or another. The only way I find -toolexec
useful right now is like -a -toolexec=whatever
.
I have some thoughts about caching, though. It's unfortunate if we simply throw caching out the window, because it's entirely reasonable for toolexec programs to be deterministic given their input.
The way go build
handles caching properly for go tool compile
seems to be to first ask it compile -V=full
, and use that as part of the cache key. This means the compile operations will need re-doing if the compiler version changes, but not otherwise.
Could we do the same for -toolexec
as an opt-in? For example, if one does go build -toolexec=mytool
, the Go tool would run a well defined version flag like mytool -toolversion
, similar to the compiler's -V=full
. If that fails, we do no caching at all with -toolexec=mytool
. If it succeeds, its output is added to the cache key, and caching takes place.
I think this is the best of both worlds, because current non-deterministic toolexec programs would continue to use no caching at all, while other Go-specific complex tools could take advantage of the build cache.
We would have to define the bare minimum of information for -toolversion
to report, though. For example, would a stringified https://golang.org/pkg/runtime/debug/#Module be enough, if available? Or perhaps a hash of the file contents in https://golang.org/pkg/os/#Executable?
Thinking about my comment above a bit more, I think this issue should still be fixed by simply doing no caching when -toolexec
is used. That would be strictly better than what we have right now.
Once this issue is fixed, I can then file the "opt-in caching for -toolexec
" idea as a separate proposal.
Going to mark this as a proposal since the answer is unclear and it is bound up in #41145. Maybe they should be merged.
I do think the answer is clear here; the default behavior with -toolexec
should be to not use the build cache at all. You can see #41145 as an extension of this behavior, to allow opting into using the build cache properly.
In terms of implementation, if we want both issues implemented, they could be implemented together. But I don't think this issue needs to be a proposal, because as far as I can tell there's consensus that it needs to be done.
Turns out that build caching with -toolexec
is possible, by simply altering the output of [...]/compile -V=full
and [...]/link -V=full
. See https://github.com/golang/go/issues/41145#issuecomment-694612401 and the following comments.
So this brings forward a dilemma. If this proposal is accepted and implemented, my ability to use the Go build cache with the method above will break, because -toolexec
will simply disable the build cache entirely.
The problem is still that -toolexec
doesn't know whether it's given a deterministic tool that should be cached, or an interactive/changing tool that should entirely disable the cache.
Here's a suggested idea: make a wrapper for a not-to-be-cached tool like time
, so that when called with just -V=full
it will return a random version, meaning that it's never cached. This would quickly fill up the build cache though, so perhaps we could take a shortcut and make it so that if -V=full
returns a non-zero exit status or prints nothing, no caching should take place.
Using a wrapper seems complicated. The commands like time
and gdb
doesn't need to understand the arguments. If we go with the wrapper, it will need to parse the arguments.
However, the commands like time
and gdb
will have extra output when asking the -V=full output. Maybe the extra output should signal the go command not to cache? But that is not ideal: I can imagine a command that records things to file doesn't have extra output.
I agree that using a wrapper seems complicated, but it seems like the simplest answer from the point of view of the toolchain.
Using extra output to signal "no caching" is probably the worst option of all, because in my solution above I am using extra output to add extra hashes/input to the cache key.
I'm not saying that what I'm doing with -V=full
is perfect, but it is what Russ suggested in the other thread, so I take it as being more or less supported. It would be very unfortunate if fixing this issue just breaks that and means I need to reopen that proposal.
I still think -toolexec should be completely orthogonal from the cache. There are times when you want to force a tool to run (as an extreme case, -a), and we can and should handle those independently of whether -toolexec is being used.
I've read your last comment a number of times, but I'm having trouble understanding it, sorry.
If you mean that one should be able to use -toolexec
with or without build the build cache, I agree, which is why I raised the other proposal to add such a knob. If you mean something else, you lost me :)
I mean that the use of the build cache should be _completely_ separate from -toolexec.
The two should operate independently of each other, as they do today.
Neither knows anything about the other. It should stay that way.
If there are problems with wanting to avoid certain entries in the build cache at certain times,
that's a general problem, not limited to -toolexec, and so any solution should not involve -toolexec.
If you don't want a cache, there is currently -a.
If that's too big a hammer, then let's address that separate from -toolexec.
Fair enough, I agree that we don't need to make this about toolexec.
If you don't want a cache, there is currently -a.
I don't think that's quite right, though. -a
does not disable the cache, it merely forces rebuilding, but it still writes to the cache. This results in unintended behavior when one wants to truly not use the cache at all, since we still write to the cache - for example:
$ cd tools # golang.org/x/tools
$ go build -a -toolexec=time ./cmd/stringer
[...]
# golang.org/x/tools/cmd/stringer
0.23user 0.04system 0:00.17elapsed 160%CPU (0avgtext+0avgdata 75048maxresident)k
0inputs+0outputs (0major+14062minor)pagefaults 0swaps
$ go build ./cmd/stringer
[...]
# golang.org/x/tools/cmd/stringer
0.23user 0.04system 0:00.17elapsed 160%CPU (0avgtext+0avgdata 75048maxresident)k
0inputs+0outputs (0major+14062minor)pagefaults 0swaps
Because stdout/stderr of the tools such as the compiler and linker is also cached, we've now written cache entries with the -toolexec=time
effect. The only way to get rid of those is to rebuild again without toolexec, like go build -a ./cmd/stringer
. So -a
doesn't quite disable the build cache in a useful way.
I see, thanks. I got this issue mixed up in my head with the other one (#41145).
@mvdan, above you wrote:
I do think the answer is clear here; the default behavior with -toolexec should be to not use the build cache at all.
I disagree that the answer is clear. Maybe it is clear for this one example of -toolexec=time, but that's not what -toolexec is for.
I added -toolexec specifically for toolstash. If you are using -toolexec toolstash you definitely _do_ want caching of results: why should substituting an alternate compiler disable caching?
The build cache came after -toolexec, but its design explicitly contemplates -toolexec and takes care to preserve good performance from -toolexec toolstash. See go doc -u cmd/go/internal/work.toolID
. This was not an accident and should not be lightly discarded.
The design of -toolexec was not directly intended for -toolexec=time or -toolexec=gdb. That was an accident - a mostly happy one, but an accident nonetheless. The fact that -toolexec=time fell out naturally from the design doesn't mean it should force awkward changes to the design.
So as far as this proposal issue's current title - "do not cache tool output when using -toolexec" - I don't believe that's the right decision.
One possible way to address -toolexec=time would be to add a new -cachewrite=false flag that allows reading from the build cache but disables writes to it. Then you run -toolexec=time -cachewrite=false. It's easy to forget, though. Also, the existence of this flag would encourage thinking about the cache as something that should be manually managed, which disturbs me quite a bit. A cache that you need to manage by hand to get semantically correct behavior is not actually a cache.
Counter-proposal: any time a command prints to stdout or stderr, its results are not cached.
The most common time a tool prints to stdout/stderr are failures, and those are already not cached because of the non-zero exit status. The new rule would generalize that: any print cannot be cached.
At first this seems a bit surprising and more tied to tool behavior than you'd expect, but it turns out that it would remove a bunch of special case code that been introduced since the build cache was first added.
The original build cache didn't save stdout/stderr - it just didn't seem necessary.
One unintended effect was that -toolexec=time did not pollute future build output with cached timing prints.
But then go build -gcflags=-m errors
didn't reprint the -m output on a second run (#22587).
We fixed that in CL 77110 by caching and reprinting the stdout/stderr too.
But then we still missed stdout/stderr for certain skipped compile+link steps (#23877).
We fixed that in CL 128903 with even more special logic.
And now go build -toolexec=time
prints the timing output without -toolexec,
which is kind of the opposite of the -gcflags=-m
issue.
(There were probably other issues I am forgetting.)
Instead of continuing to finesse the exact behavior of when stdout/stderr are cached or not,
why not turn back the clock and re-fix #22587 by saying:
"don't cache build artifacts when a command prints anything".
Then #22587 is fixed, #23877 doesn't happen, this issue doesn't happen either,
-toolexec stays independent of the cache decision, and -toolexec=toolstash stays fast.
And it would let us delete all the output-handling code in the build cache (the test cache is separate and would stay).
Thoughts?
I'd be fine with the counter-proposal. Almost all the time when I want the compiler to print something (using toolexec or not), I'd expect it actually invokes the compiler.
I disagree that the answer is clear.
For the record, I've changed my mind since then, which is the entire reason I retracted the other proposal. So yes, I agree with your comment.
I'm fine with the counter-proposal too. My tool which wraps the compiler and linker changes their behavior, and alters the output of -V=full
to propagate that to the build cache, but otherwise doesn't print anything extra to stdout/stderr. So nothing would break for me.
I like the counter-proposal also.
As a rookie compiler developer I used to add code like this to the compiler somewhere while testing/debugging a change:
if os.Getenv("ENABLE_EXPERIMENTAL_FOOBAR") {
fmt.Fprintf("enabling the experimental foobar!\n")
<do something that causes the compiler to emit different code or DWARF>
}
Since my new environment variable wasn't visible to build caching machinery, this me caused no end of trouble. I've since moved away from this entirely (I now use explicit command line flags, and make sure that I add my new flags to the recordFlags() call in the gc's main). The counter-proposal would make this process a little easier; just have to make sure that the "experimental" code has some trace output.
Retitled and will include in the proposal minutes under the new title, but seems headed for a likely accept.
(Crosslink) #41973 is an example that will be fixed if we do this.
Based on the discussion above, this seems like a likely accept.
In case anyone wants to start working on this soon, heads up that I'm writing testdata/script/toolexec.txt
for #15677, so it could be reused for this issue later too.
No change in consensus, so accepted.
Most helpful comment
Thinking about my comment above a bit more, I think this issue should still be fixed by simply doing no caching when
-toolexec
is used. That would be strictly better than what we have right now.Once this issue is fixed, I can then file the "opt-in caching for
-toolexec
" idea as a separate proposal.