The compiler (gc) has come a long way, however the inliner is sometimes lacking, for example any function with a for-loop
no matter how simple it is, won't get inlined so we either have to use a goto
loop instead (which is used few times in the runtime) or just have a one big ugly function.
My proposal is to add a go:inline
directive that will force inlining, even if it's just for non-exported functions.
This will help make a lot of code much cleaner and allows separation of big functions that can't be currently inlined.
If the proposal is accepted, I'd like to work on it.
Another semi-common example:
func (xx *XXHash64) WriteString(s string) (int, error) {
if len(s) == 0 {
return 0, nil
}
return xx.Write([]byte(s))
}
// cannot inline (*XXHash64).WriteString: non-leaf method
Related: #17566 #14768
@dr2chase @josharian @randall77 @mvdan
This proposal has basically no chance of being accepted.
Go eschews knobs.
Compiler directives are widely disliked.
I don't think it can be implemented reasonably, if at all. Functions are not eligible for inlining for many reasons. These include conceptual implementation difficulty: I have no idea how to inline a function containing defer. And practical implementation difficulty: It took a dedicated person a summer to get non-leaf inlining working correctly, even with part of the implementation already written. And it's still not released because of the binary size and compilation time aspects. Supporting such a directive would mean solving a host of such problems.
I personally have long wanted a much smaller directive that causes compilation to fail if a function is deemed ineligible for inlining. This would allow me to refactor without fear that future inlining changes will cause my no-op inlining to become a silent performance regression. Even this much more limited proposal has been rejected forcefully.
All that said, if you'd like to work on improving inlining, that'd be fabulous. There are many opportunities to improve inlining decisions, to improve how we do inlining (e.g. partial inlining), and to expand the set of functions that can be unlined.
@josharian The flag -gcflags "-m"
makes the go compiler print decisions on inlining. Maybe you can build a tool that parses this output and verifies it against some "must inline" list. Most people won't need it, but if you really do, then you could add that tool to your build pipeline.
@neelance yes, or alternatively parse the output of go tool nm
and make sure the inlined function has been dead-code-eliminated. (Note that I want it most of all for package runtime
, so "my build pipeline" is probably go test runtime
.)
@josharian I already suspected that you want it for the stdlib. May I ask why you didn't build it yet? I get the point of not adding such a feature to the Go compiler itself, but a standalone tool doesn't do any harm if only used by individuals voluntarily. I just cringe when I hear "fear of refactoring". ;-)
@neelance just haven't gotten to it yet; I have little contribution time at the moment and lots I want to do. I'd be delighted if someone else wanted to jump in and do this. The recently introduce runtime function tophash
would be an excellent and relevant test case. :)
Change https://golang.org/cl/57410 mentions this issue: runtime: add TestIntendedInlining
For the reasons Josh outlined above, we're not going to do this. Note also that even in C/C++, the "inline" directive does not really mean "this must be inlined". It is more a guideline than a rule.
Any reason why inlining is more difficult in go than other languages like java or c++.
I was trying to find the rationale behind no branches / 40 expressions or less restrictions for inlined functions.
Any pointers are appreciated.
While trying to write performant code (dynamic proto3 marshaling), hand inlining a function brought down marshal time from 5ms to 4ms. That can be significant depending on your performance goals.
@mandarjog A closed issue is not the place for this discussion. General discussions about inlining in Go should use a forum; see https://golang.org/wiki/Questions . Specific discussions about the gc toolchain should take place on the golang-dev mailing list.
I'm going to re-open this for further, given the recent discussion at https://github.com/golang/go/issues/17566.
In an attempt to migrate discussion to here, to keep the threads distinct, I will also paste in some other folks' comments from there.
The question I have for
//go:inline
is, does it apply to functions or to callsites? It sounds like the proposal is the former, but I worry that people would really want the latter.
Callsite pragma sounds interesting to me. Maybe go a bit farther and say that such pragmas are required to be intra-package (caller and callee both in package X, and callee is not exported)? Even then a "must" inline pragmas is still a scary prospect. Well-intentioned users could easily blow up package compile times by accident. Also seems like the inliner would have to build some sort of call graph in order to detect cycles in must-inline directives.
Even then a "must" inline pragmas is still a scary prospect.
Agreed. Any request for inlining will fundamentally need to be a request, not a promise. I imagine the initial implementation would be a relaxation or removal of the budget, but not the other constraints.
For callsite inlining requests, does that propagate to only the immediate function being called (possibly using mid-stack inlining), or to as much of the call tree from that point onwards as is possible?
What (if anything) does this do in the case of recursive calls which can be partially mid-stack inlined? There is discussion of this case in a few scattered issues and CLs, because it turns out to be important in a few places in the runtime.
My rationale for asking for this is at https://github.com/golang/go/issues/17566#issuecomment-478303912. In particular, an argument that I believe does not get weaker with better (but not perfect) heuristics is that having the inliner decisions be opaque and subtle means that unrelated changes by other developers can significantly impact performance, and the only way to get robustness is currently writing the _caller_ in assembly.
My proposal, instead of //go:inline
, is
package runtime
// Inline requests, but does not guarantee, that the caller be inlined.
function Inline() {}
Like runtime.KeepAlive
and testing.Helper
. And it's not a magic comment.
It's per-function instead of per-callsite, but I never needed the latter myself.
A totally different option is to have a “flatten” directive that forces a function to be a leaf, that is all calls within the function body are recursively inlined. This is usually simpler to reason about because people usually know which functions should go fast, while it’s sometimes harder to foresee whether a function should always be inlined or not.
Moreover the flatten directive doesn’t propagate transitively, so a module author that misuses flatten shouldn’t cause exponential compile time behavior like a force-inline directive might cause.
A small collection of examples where this would have been useful.
https://speakerdeck.com/gtank/micro-optimizing-go-code?slide=21
https://speakerdeck.com/gtank/i-wanna-go-fast?slide=12
https://lemire.me/blog/2017/09/05/go-does-not-inline-functions-when-it-should/
https://github.com/golang/go/issues/17566#issuecomment-478681653
I also had half a dozen people complain about it to me or to my immediate circle in the last year or so. I'll ask them to chime in with examples.
I've had my share of fights with the inliner. Some representative examples are in some private code: a column-oriented database which naturally has some very important tight loops.
if <sanity check> { panic }
safety checks because panic prevented inlining, but more recently I was able to put them back in. (I still have to be careful about the size of the methods, though.)In all these cases, it would be very helpful to have an assertion that can simply fail compilation if something cannot be inlined. Even without any fundamentally new inlining capabilities, it would greatly increase my confidence in making code adjustments and changing Go versions if I could tell the compiler my inlining wishes rather than just leaving scary comments for programmers.
My proposal, instead of //go:inline, is [...] runtime.Inline
A function call isn't a great fit for this use case. What should happen here?
func f() {
if false {
runtime.Inline()
}
}
What if it was x := false; if x
instead of a constant? What if x
was a false global that no one ever wrote to? In all these cases, testing.Helper
and runtime.KeepAlive
have clear semantics, by thinking of them as actual function invocations (which it is in the former case, and was for a release in the latter case).
I'll ask them to chime in with examples.
Please do. They are helpful not only in making the case, but in seeing opportunities for immediate improvement. George's problematic inlining case turned out to be a compiler bug around inlining intrinsics.
a comment stating that it's important that all the methods are inlined
Someone or other I proposed //go:mustinline
, which would cause a compilation failure if inlining didn't happen (according to the normal rules). We ended up with TestIntendedInlining
tests. Feel free to steal them. :)
A totally different option is to have a “flatten” directive
Very interesting. There are some tricky cases that I don't quite see how to express with this directive, though. Consider:
func f() {
for {
if rare {
impossibleToInline()
}
definitelyNeedsToBeInlined()
}
}
You can't flatten f
, but flattening definitelyNeedsToBeInlined
is impossible. I'd be good to try it out on a bunch of examples to see whether it works well in practice.
I suppose you could also fix this by tweaks the semantics of flatten
. It could mean "fail to compile if you can't flatten the function", which is safer, but breaks the code above. It could also mean "flatten as much as you can, throughout the tree", which helps with the code above.
There's another problem with flattening across package lines: you can't request recursive inlining of a function that has already been compiled without recursive inlining--it breaks package-at-a-time compilation. I suppose you could specify that flatten only works within a package. That might suffice, and doing better inter-package inlining might take up the slack. But it is another complication.
It would be neat to find a way to enable something like
runtime.Inline(func() { a = mul(b, c) })
but I can't think of anything that doesn't involve complex closure edge-cases.
I'll ask them to chime in with examples.
Please do. They are helpful not only in making the case, but in seeing opportunities for immediate improvement. George's problematic inlining case turned out to be a compiler bug around inlining intrinsics.
I think his benchmarks are still useful to show the performance impact, even if they are not meaningful to discuss the heuristic. My point is that any imperfect heuristic is a balancing game and a ticking bomb. As soon as someone comes along and changes something innocent, performance might drop.
(We should still fix bugs in the heuristic, of course.)
One example of a function that could really use some sort of inlining guarantee I have is the quarterRound
implementation in chacha20. If it stops being inlined the performance of the generic implementation of chacha20
would be awful. (The inlining cost is why it doesn't use bits.RotateLeft32
.)
If we did the closure technique then I don't think we should use the runtime
package since it isn't a runtime
operation. Maybe a compiler/must
package (or compiler/try
)? If the closure were interpreted as reducing the inlining cost of the called function to 0 then it would be easy to write inlineable functions:
func quarterRound(a, b, c, d uint32) (uint32, uint32, uint32, uint32) {
try.Inline(func() {
a += b
d ^= a
...
b = (b << 7) | (b >> 25)
})()
return a, b, c, d
}
Still a bit ugly and hard to read though.
I'll notice that this is one of those cases where the goal of the author is better described by simply saying:
func (s *Cipher) XORKeyStream(dst, src []byte) {
compiler.Flatten()
[...]
}
as the idea is that XORKeyStream
must become linear code with no function calls, irrespective of how quarterRound
, xor
or bits.Rotate*
are implemented.
I wonder to what extent we can separate the “what” from the “why”.
//go:inline
specifies a particular mechanism, but I don't think the details of the concrete mechanism actually matter most of the time: it's important that calls to these functions be efficient, but whether that is done by inlining more aggressively, choosing a more efficient calling convention, or something else entirely should really be up to the compiler.
The thing that you need to communicate to the compiler is “this function is called frequently enough that it's worth spending the extra compile time and binary size to optimize heavily”. If you can't express that through profile-guided optimization, then perhaps an annotation like //go:hotfunction
would more clearly express the relevant information (and more clearly capture the compiler's obligation — or lack thereof — to use any particular mechanism).
The inlining cost is why it doesn't use
bits.RotateLeft32
.
bits.RotateLeft32
should have a cheaper inlining cost than the |
expression. Please file an issue if not.
perhaps an annotation like
//go:hotfunction
would more clearly express the relevant information
//go:O3
? :P
Anyway, this is a nice idea. It's a good way to dip our toes in the FGO water (and ultimately perhaps decide to ignore the hint in favor of actual FGO?) and it is not hard to imagine finding other places in the compiler where we could choose to spend a few extra cycles on such functions.
For what it's worth, GCC supports both __attribute__((hot))
and __attribute__((cold))
for functions and labels (putting the attribute on a label affects whether a branch to that label is predicted as taken). When profiling feedback is available, GCC ignores these attributes.
GCC also supports function attributes always_inline
, noinline
, and flatten
. Among many others.
Re #17566, package-runtime-limited annotations are fine to experiment with, without a proposal. We have plenty of those already.
But for the broader space of all Go packages, micromanaging optimizations at the source code level like //go:inline
or __attribute__((hot))
etc cuts against the grain of Go's approach to software development. This is why we have only one GC tuning knob for example (just GOGC). Make the tools smart enough to make a good decision instead of turning all the programmers into tuning AIs.
Closing again.
A few people pointed out that I didn't respond directly to the various examples of automatic inlining not being optimal for various performance results. To address that explicitly, the answer needs to be to improve the inlining rules. Look at what the compiler does, understand why it's wrong, and fix it.
An inline
keyword is like C's old register
keyword. It's telling the compiler something it should be able to do on its own. If our compiler is not doing a good enough job, the bugs to file are specific instances where it should be doing better, so we can fix those.
@rsc I did not reopen the conversation because I believe the issue is automatic inlining not being optimal.
What I tried to communicate at https://github.com/golang/go/issues/21536#issuecomment-478753006 and https://github.com/golang/go/issues/17566#issuecomment-478303912 is that:
I believe 2. in particular is not in line with Go's approach to team collaboration and explicit behavior, and while annotations are also not in line with Go's approach, so is assembly (3.).
I'd love to be wrong about 1. but no one from the compiler team has expressed confidence that I am.
I tend to agree with your first point, it's going to be hard to make inlining happen at all the points which are performance critical, and not blow up the binary by inlining too much everywhere else.
So I think we're going to need to be somewhat conservative in our rules. I'm ok with that. I'm a fan of simplicity even if it costs a little bit of performance.
I don't really agree with your second point. If you're modifying the code, you should expect possible performance changes, and you should run benchmarks as part of validating those changes. Inlining is only one of many ways in which you could break the performance characteristics of a function by modifying its code.
I suspect a heuristic could be good enough, but we have not put adequate effort into finding such a heuristic, and are proceeding only gradually.
I did an experiment last night where I exposed all the inliner knobs and turned them up well past 11%. I'm working on the spreadsheet and how to share it but TLDR is:
Ridiculous inlining tends to tickle build bugs in the runtime, which limited how far I could take this.
I suspect the compiler is much better than I am at deciding what to inline roughly 95% of the time.
It's true that you should probably be benchmarking performance-sensitive code when changing it, but it's still useful to get more direct feedback on what specifically changed -- if a change is part of a larger change, it can be very non-obvious that the actual substantive performance change is a single function no longer being inlined. (There's also the risk that the change actually comes about because of a new compiler, or better yet, a new compiler and a code change, where either alone doesn't change the behavior.)
I do note that "inline", and "register" for that matter, can have additional compiler-hinting semantics that are sometimes non-obvious, like "also this implies you can't take the thing's address". I am not sure whether I like that or not -- I dislike the additional complexity, but I see the appeal of being able to give the compiler hints about a thing's behavior or assumptions.
I do sort of like an annotation for "be sure to let me know if this can't be inlined", but that's something I think we can already do with existing tools and a bit of shell scripting.
- short of a feedback-guided optimization framework, I don't believe any heuristic can be good enough, so sometimes assembly will have to be written instead
- any implicit behavior is opaque and unreliable: if someone else later changes something about an inlined function that incorrectly makes it not inline, performance will tank, so it will either require coordination across every contributor to the code, or force assembly to be written
- forcing users to use assembly feels strictly worse than forcing them to add annotations
You are claiming that, without //go:inline
, people must write assembly
to work around compiler inlining decisions that they need to override.
This is untrue. At most, people must copy and paste Go code. No assembly required.
A little copying is better than a new, subtle source code annotation.
(Not quite the same ring as the original, I admit.)
A few other, much more minor points:
(1) short of a feedback-guided optimization framework, I don't believe any heuristic can be good enough
It depends on what "good enough" means. For one person, today's compiler might be "good enough". For someone else, no compiler would ever be "good enough". That needs to be defined better. And without filed bugs for the specific cases that aren't good enough today, how would we know? The first step has to be filing those bugs.
(2) any implicit behavior is opaque and unreliable: if someone else later changes something about an inlined function that incorrectly makes it not inline, performance will tank,
The performance cliff you are describing is true of almost every compiler optimization, but most notably escape analysis. We avoid making escape analysis "opaque" by giving developers tools to understand the compiler's decisions, specifically -gcflags=-m
, which also reports inlining decisions. And we avoid making it "unreliable" by writing tests that specific cases are handled a particular way. If there are specific inlining cases that are important work today and are important to keep working (like maybe chacha20 quarterRound), it may be worth adding them as test cases (in $GOROOT/test/inline*.go
).
so it will either require coordination across every contributor to the code,
As Keith pointed out, tests and benchmarks exist precisely to enable this kind of coordination. It is not some impossibility but instead a reality of all our development.
(3) forcing users to use assembly feels strictly worse than forcing them to add annotations
Regarding annotations more generally, there is a significant learning, maintenance, and semantic cost to annotations and they are historically _avoided_ for this kind of thing in Go. Unlike modern C, Go is _not_ meant to be a replacement for an assembler in all cases. If you really need very precise control over details of code generations, assembly can be an entirely appropriate answer. And that is OK! But again the proper phrasing of (3) would be "forcing users to copy and paste code". And that makes the argument for micromanagement of the compiler's decision here even weaker.
Most helpful comment
This proposal has basically no chance of being accepted.
Go eschews knobs.
Compiler directives are widely disliked.
I don't think it can be implemented reasonably, if at all. Functions are not eligible for inlining for many reasons. These include conceptual implementation difficulty: I have no idea how to inline a function containing defer. And practical implementation difficulty: It took a dedicated person a summer to get non-leaf inlining working correctly, even with part of the implementation already written. And it's still not released because of the binary size and compilation time aspects. Supporting such a directive would mean solving a host of such problems.
I personally have long wanted a much smaller directive that causes compilation to fail if a function is deemed ineligible for inlining. This would allow me to refactor without fear that future inlining changes will cause my no-op inlining to become a silent performance regression. Even this much more limited proposal has been rejected forcefully.
All that said, if you'd like to work on improving inlining, that'd be fabulous. There are many opportunities to improve inlining decisions, to improve how we do inlining (e.g. partial inlining), and to expand the set of functions that can be unlined.