Go: cmd/compile: inline forwarding thunk functions

Created on 25 Jul 2014 · 16Comments · Source: golang/go

For the following functions (cl120140044):

func (v *Value) Load() interface{} {
    return valueLoad(v)
}

// Implemented in assembly.
func valueLoad(v *Value) interface{}

more time is spent in the Value.Load thunk than in the valueLoad that does actual work:

 26.20%  atomic.test  atomic.test        [.] sync/atomic_test.func.053
 14.78%  atomic.test  atomic.test        [.] sync/atomic.(*Value).Load
 14.41%  atomic.test  atomic.test        [.] assertE2Tret
 10.50%  atomic.test  atomic.test        [.] sync/atomic.valueLoad   

Compiler should inline the Value.Load thunk.
And probably cases that add few additional arguments like:

func foo(x int) int {
    return bar(x, true)
}

FrozenDueToAge Performance

Source

dvyukov

Most helpful comment

This should be fixed with mid-stack inlining.

randall77 on 3 Apr 2019

🎉2

All 16 comments

Comment 1:

inlining is disabled for all non-leaf function so that the
stack trace could be more informative.
but I think at least the compiler could inline compiler
generated wrappers.

minux on 26 Jul 2014

Comment 2:

C world solves this with more elaborate debug info. Basically a PC can say that it is
related to several frames.

dvyukov on 26 Jul 2014

Comment 3:

I opened issue #4735 last year for the more detailed debug info, for the same purpose.

remyoudompheng on 26 Jul 2014

Comment 4:

I might make an exception to the stack trace rule for this specific case, where a method
is calling a func without a body.
But not until the conversion from C to Go is done, so not for a while.

rsc on 5 Aug 2014

I have also run across cases where it would be helpful to inline "forwarding" functions.

For instance, consider this pattern:

func F() { f() } // well-documented, exported function in x.go

```go
func f() // stub for asm implementation in x_amd64.go

``` go
func f() { // pure-go implementation in x_other.go
  // ...
}

If this is a very fast function, having a double function-call overhead instead of single is significant. (The alternative is to declare and document F twice.) And if you want to write the pure-Go implementation of f in x.go, so that a test may compare the asm with the pure-go implementation, then that requires another level of forwarding (so triple function-call overhead):

// x.go
func F() { f() }
func fgo() { /* ... */ }

```go
func f() // stub for asm implementation in x_amd64.go

``` go
func f() { fgo() } // x_other.go

(I also described this issue in this go-nuts thread.)

cespare on 1 Dec 2016

👍1

There's a forthcoming proposal for mid-stack inlining in #19348 that I believe should address this.

josharian on 2 Mar 2017

🎉2

See also https://golang.org/cl/147361.
I haven't tried it, but it looks like something that addresses what this issue asks to address.

quasilyte on 8 Dec 2018

See also golang.org/cl/147361.
I haven't tried it, but it looks like something that addresses what this issue asks to address.

As a followup to that CL I started sending CLs to inline the fast paths of some of the sync facilities: https://go-review.googlesource.com/c/go/+/152698

I also tried to apply the same approach to the lock/unlock in runtime but it turns out it's impossible for now, as lock and unlock turn out to be mutually recursive with throw, and the compiler doesn't like that (both because they are mutually recursive, and because it blows the inlining budget). The only way to solve it would be to move the call to throw out of the fast path.

CAFxX on 9 Dec 2018

@CAFxX if you mark throw as //go:noinline does that help with the mutual recursion/budget problem?

josharian on 9 Dec 2018

👍1

@josharian I didn't think of that but alas it doesn't work:

src/runtime/lock_futex.go:46:6: cannot inline lock: recursive

I suppose this is because the check for recursion is done before caninl and is, likely, unaware of noinline annotations. FWIW maybe this check could be folded in caninl so that it is aware of actual inlining restrictions/decisions (I'm assuming the only reason why inlining of recursive functions is disabled is to avoid inlining a copy of a function within that same function, although I'm not really sure of the root reason for this).

CAFxX on 10 Dec 2018

Another thing I noticed, that is only partially related though, is that it seems that no DCE is done before inlining, so things like the following happen (I'm compiling on linux amd64, race mode is disabled). Notice all the "dead" ifs:

src/math/bits/bits.go:283:6: can inline Len as: func(uint) int { if UintSize == 32 {  }; return Len64(uint64(x)) }
src/runtime/cgocall.go:176:14: inlining call to dolockOSThread func() { if GOARCH == "wasm" {  }; _g_ := getg(); _g_.m.lockedg.set(_g_); _g_.lockedm.set(_g_.m) }
src/runtime/stack.go:1277:24: inlining call to stackmapdata func(*stackmap, int32) bitvector { if stackDebug > 0 {  }; return bitvector literal }
src/runtime/malloc.go:1105:6: can inline nextSample as: func() int32 { if GOOS == "plan9" {  }; return fastexprand(MemProfileRate) }
src/go/token/position.go:452:15: inlining call to sync.(*RWMutex).RLock method(*sync.RWMutex) func() { if bool(false) {  }; if atomic.AddInt32(&sync.rw.readerCount, int32(1)) < int32(0) { sync.runtime_SemacquireMutex(&sync.rw.readerSem, bool(false)) }; if bool(false) {  } }
src/runtime/type.go:269:20: inlining call to reflectOffsUnlock func() { if raceenabled {  }; unlock(&reflectOffs.lock) }

(longer output is here: https://gist.github.com/CAFxX/c901a4420216d154144b4ef4b469b2cf)

Now, all of these were clearly within the inlining budget, but it's likely that many others would be within the budget if DCE would run before inlining. I'm not sure if there is already a bug for this or if I should file one.

CAFxX on 10 Dec 2018

I suppose this is because the check for recursion is done before caninl

Yes. I'd need to double-check, but IIRC the recursion check is done by visitBottomUp, which is shared across a few use cases. But we could add a flag to it to break on //go:noinline when using for inlining analysis. This is (probably) a straightforward change. If you'd like, I can look into it. Or you are welcome to.

no DCE is done before inlining

We do basic DCE before inlining. See e.g. https://go-review.googlesource.com/c/go/+/37499/ and other links from https://github.com/golang/go/issues/19699. See also a related discussion in https://github.com/golang/go/issues/16871. There are others. The general problem is that inlining happens early, and we don't have much analysis completed to work off of to do a more complete DCE. Ideas for cheaply increasing early DCE coverage are always welcome (cheap both in execution time and code complexity).

josharian on 10 Dec 2018

Change https://golang.org/cl/154058 mentions this issue: cmd/compile: don't recurse into go:noinline during inlining walk

gopherbot on 13 Dec 2018

I suppose this is because the check for recursion is done before caninl

@CAFxX I had a few minutes to kill so I whipped up CL 154058, if you'd like to patch that in and then experiment some with your runtime lock/unlock changes. And it should be relatively clear how to beef up the change if there are mechanisms other than go:noinline that you want to use to avoid a recursive walk. (See inl.go for how to write said mechanisms.)

josharian on 13 Dec 2018

👍1

This should be fixed with mid-stack inlining.

randall77 on 3 Apr 2019

🎉2

@randall77 awesome! Just tested out tip and indeed, the forwarding overhead I mentioned in https://github.com/golang/go/issues/8421#issuecomment-264089533 (both the "double" and "triple" versions) are gone.

cespare on 3 Apr 2019

Was this page helpful?

0 / 5 - 0 ratings