For the following functions (cl120140044):
func (v *Value) Load() interface{} {
return valueLoad(v)
}
// Implemented in assembly.
func valueLoad(v *Value) interface{}
more time is spent in the Value.Load thunk than in the valueLoad that does actual work:
26.20% atomic.test atomic.test [.] sync/atomic_test.func.053
14.78% atomic.test atomic.test [.] sync/atomic.(*Value).Load
14.41% atomic.test atomic.test [.] assertE2Tret
10.50% atomic.test atomic.test [.] sync/atomic.valueLoad
Compiler should inline the Value.Load thunk.
And probably cases that add few additional arguments like:
func foo(x int) int {
return bar(x, true)
}
I opened issue #4735 last year for the more detailed debug info, for the same purpose.
I have also run across cases where it would be helpful to inline "forwarding" functions.
For instance, consider this pattern:
func F() { f() } // well-documented, exported function in x.go
```go
func f() // stub for asm implementation in x_amd64.go
``` go
func f() { // pure-go implementation in x_other.go
// ...
}
If this is a very fast function, having a double function-call overhead instead of single is significant. (The alternative is to declare and document F twice.) And if you want to write the pure-Go implementation of f in x.go, so that a test may compare the asm with the pure-go implementation, then that requires another level of forwarding (so triple function-call overhead):
// x.go
func F() { f() }
func fgo() { /* ... */ }
```go
func f() // stub for asm implementation in x_amd64.go
``` go
func f() { fgo() } // x_other.go
(I also described this issue in this go-nuts thread.)
There's a forthcoming proposal for mid-stack inlining in #19348 that I believe should address this.
See also https://golang.org/cl/147361.
I haven't tried it, but it looks like something that addresses what this issue asks to address.
See also golang.org/cl/147361.
I haven't tried it, but it looks like something that addresses what this issue asks to address.
As a followup to that CL I started sending CLs to inline the fast paths of some of the sync facilities: https://go-review.googlesource.com/c/go/+/152698
I also tried to apply the same approach to the lock/unlock in runtime but it turns out it's impossible for now, as lock and unlock turn out to be mutually recursive with throw, and the compiler doesn't like that (both because they are mutually recursive, and because it blows the inlining budget). The only way to solve it would be to move the call to throw out of the fast path.
@CAFxX if you mark throw as //go:noinline does that help with the mutual recursion/budget problem?
@josharian I didn't think of that but alas it doesn't work:
src/runtime/lock_futex.go:46:6: cannot inline lock: recursive
I suppose this is because the check for recursion is done before caninl and is, likely, unaware of noinline annotations. FWIW maybe this check could be folded in caninl so that it is aware of actual inlining restrictions/decisions (I'm assuming the only reason why inlining of recursive functions is disabled is to avoid inlining a copy of a function within that same function, although I'm not really sure of the root reason for this).
Another thing I noticed, that is only partially related though, is that it seems that no DCE is done before inlining, so things like the following happen (I'm compiling on linux amd64, race mode is disabled). Notice all the "dead" ifs:
src/math/bits/bits.go:283:6: can inline Len as: func(uint) int { if UintSize == 32 { }; return Len64(uint64(x)) }
src/runtime/cgocall.go:176:14: inlining call to dolockOSThread func() { if GOARCH == "wasm" { }; _g_ := getg(); _g_.m.lockedg.set(_g_); _g_.lockedm.set(_g_.m) }
src/runtime/stack.go:1277:24: inlining call to stackmapdata func(*stackmap, int32) bitvector { if stackDebug > 0 { }; return bitvector literal }
src/runtime/malloc.go:1105:6: can inline nextSample as: func() int32 { if GOOS == "plan9" { }; return fastexprand(MemProfileRate) }
src/go/token/position.go:452:15: inlining call to sync.(*RWMutex).RLock method(*sync.RWMutex) func() { if bool(false) { }; if atomic.AddInt32(&sync.rw.readerCount, int32(1)) < int32(0) { sync.runtime_SemacquireMutex(&sync.rw.readerSem, bool(false)) }; if bool(false) { } }
src/runtime/type.go:269:20: inlining call to reflectOffsUnlock func() { if raceenabled { }; unlock(&reflectOffs.lock) }
(longer output is here: https://gist.github.com/CAFxX/c901a4420216d154144b4ef4b469b2cf)
Now, all of these were clearly within the inlining budget, but it's likely that many others would be within the budget if DCE would run before inlining. I'm not sure if there is already a bug for this or if I should file one.
I suppose this is because the check for recursion is done before caninl
Yes. I'd need to double-check, but IIRC the recursion check is done by visitBottomUp, which is shared across a few use cases. But we could add a flag to it to break on //go:noinline when using for inlining analysis. This is (probably) a straightforward change. If you'd like, I can look into it. Or you are welcome to.
no DCE is done before inlining
We do basic DCE before inlining. See e.g. https://go-review.googlesource.com/c/go/+/37499/ and other links from https://github.com/golang/go/issues/19699. See also a related discussion in https://github.com/golang/go/issues/16871. There are others. The general problem is that inlining happens early, and we don't have much analysis completed to work off of to do a more complete DCE. Ideas for cheaply increasing early DCE coverage are always welcome (cheap both in execution time and code complexity).
Change https://golang.org/cl/154058 mentions this issue: cmd/compile: don't recurse into go:noinline during inlining walk
I suppose this is because the check for recursion is done before caninl
@CAFxX I had a few minutes to kill so I whipped up CL 154058, if you'd like to patch that in and then experiment some with your runtime lock/unlock changes. And it should be relatively clear how to beef up the change if there are mechanisms other than go:noinline that you want to use to avoid a recursive walk. (See inl.go for how to write said mechanisms.)
This should be fixed with mid-stack inlining.
@randall77 awesome! Just tested out tip and indeed, the forwarding overhead I mentioned in https://github.com/golang/go/issues/8421#issuecomment-264089533 (both the "double" and "triple" versions) are gone.
Most helpful comment
This should be fixed with mid-stack inlining.