Go: cmd/compile: autogenerated wrapper is not inlined

Created on 10 May 2018 · 6Comments · Source: golang/go

What version of Go are you using (`go version`)?

Master:

go version devel +5bd88b0 Thu May 10 18:02:07 2018 +0000 linux/amd64

Running encoding/binary benchmarks I see significant slowdown (compared to 1.10):

WriteSlice1000Int32s-4                           9.51Âµs Â± 2%    12.83Âµs Â± 2%    +34.92%  (p=0.008 n=5+5)
ReadSlice1000Int32s-4                            9.34Âµs Â± 2%    12.60Âµs Â± 2%    +34.90%  (p=0.008 n=5+5)

This is caused by compiler not inlining autogenerated binary.(*bigEndian).PutUint32, which does nothing but shuffles arguments and calls binary.bigEndian.PutUint32.

Bisect points to ca2f85fd3f32b2a4c863a2de602876bd31e9d956, which is IMHO strange, because it doesn't enable new export format by default.
@mdempsky any ideas?

FrozenDueToAge NeedsInvestigation Performance

Source

TocarIP

👍4

Most helpful comment

@TocarIP I see you are systematically checking benchmarks and running down significant regressions. Just wanted to say a hearty thank you for doing this!

josharian on 11 May 2018

👍7 😄1

All 6 comments

In genwrapper (cmd/compile/internal/gc/subr.go) I see:

        // TODO(mdempsky): Investigate why this doesn't work with
        // indexed export. For now, we disable even in non-indexed
        // mode to ensure fair benchmark comparisons and to track down
        // unintended compilation differences.
        if false {
                inlcalls(fn)
        }

Should we at least change false to flagiexport, since benchmarks are done?

TocarIP on 10 May 2018

@TocarIP I see you are systematically checking benchmarks and running down significant regressions. Just wanted to say a hearty thank you for doing this!

josharian on 11 May 2018

👍7 😄1

Additional regressions caused by that CL:

StableInt1K-4         148µs ± 0%      237µs ± 0%    +59.40%  (p=0.008 n=5+5)
StableInt64K-4       13.2ms ± 1%     21.0ms ± 1%    +59.11%  (p=0.008 n=5+5)
SortInt1K-4           134µs ± 0%      201µs ± 1%    +50.75%  (p=0.008 n=5+5)
SortInt64K-4         13.0ms ± 1%     19.3ms ± 1%    +48.86%  (p=0.008 n=5+5)
Stable1e6-4           9.76s ± 0%     13.39s ± 0%    +37.26%  (p=0.008 n=5+5)
Sort1e6-4             2.46s ± 0%      3.37s ± 0%    +37.08%  (p=0.008 n=5+5)
Sort1e4-4            16.2ms ± 0%     22.2ms ± 0%    +36.50%  (p=0.016 n=5+4)
Stable1e2-4           155µs ± 0%      211µs ± 0%    +35.67%  (p=0.008 n=5+5)

bufio
WriterFlush-4       26.8ns ± 0%     31.6ns ± 0%    +18.09%  (p=0.016 n=5+4)

There are more regressions, but these are the most significant ones (highest delta).

quasilyte on 11 May 2018

I see you are systematically checking benchmarks and running down significant regressions.

Probably we should automate this through a nightly job so that offfending CLs are caught quicker.

agnivade on 13 May 2018

👍2

Change https://golang.org/cl/135697 mentions this issue: cmd/compile/internal/gc: inline autogenerated (*T).M wrappers

gopherbot on 17 Sep 2018

👍3

I believe this is fixed now. The CL above, and possibly the slightly more aggressive mid-stack inlining makes the wrapper inline successfully.
Tip is still slower than 1.10, but only by a few percent.