Go: cmd/compile: high inline cost of encoding/binary and math.Float32bits

Created on 23 Nov 2020  路  3Comments  路  Source: golang/go

What version of Go are you using (go version)?

$ go version
1.15.5

Does this issue reproduce with the latest release?

Yes, also happens on tip.

What did you do?

The inline cost calculation for these seem inconsistent https://go.godbolt.org/z/YMKWjc

package ex

import (
    "math"
    "encoding/binary"
)

type Point struct {
    X, Y float32
}

type Quad struct {
    From, Ctrl, To Point
}

func EncodeQuad(d [24]byte, q Quad) {
    binary.LittleEndian.PutUint32(d[0:], math.Float32bits(q.From.X))
    binary.LittleEndian.PutUint32(d[4:], math.Float32bits(q.From.Y))
    binary.LittleEndian.PutUint32(d[8:], math.Float32bits(q.Ctrl.X))
    binary.LittleEndian.PutUint32(d[12:], math.Float32bits(q.Ctrl.Y))
    binary.LittleEndian.PutUint32(d[16:], math.Float32bits(q.To.X))
    binary.LittleEndian.PutUint32(d[20:], math.Float32bits(q.To.Y))
}

func EncodeQuad2(d [6]float32, q Quad) {
    d[0] = q.From.X
    d[1] = q.From.Y
    d[2] = q.Ctrl.X
    d[3] = q.Ctrl.Y
    d[4] = q.To.X
    d[5] = q.To.Y
}

func EncodeQuad3(d [6]uint32, q Quad) {
    d[0] = math.Float32bits(q.From.X)
    d[1] = math.Float32bits(q.From.Y)
    d[2] = math.Float32bits(q.Ctrl.X)
    d[3] = math.Float32bits(q.Ctrl.Y)
    d[4] = math.Float32bits(q.To.X)
    d[5] = math.Float32bits(q.To.Y)
}

The inline cost for those:

$ go tool compile -m -m ex.go | grep cost
ex.go:16:6: cannot inline EncodeQuad: function too complex: cost 318 exceeds budget 80
ex.go:25:6: can inline EncodeQuad2 with cost 42 as: func([6]float32, Quad) { d[0] = q.From.X; d[1] = q.From.Y; d[2] = q.Ctrl.X; d[3] = q.Ctrl.Y; d[4] = q.To.X; d[5] = q.To.Y }
ex.go:34:6: cannot inline EncodeQuad3: function too complex: cost 90 exceeds budget 80

There seems to be a really large difference between EncodeQuad2, EncodedQuad3 and EncodeQuad. I would expect all of them to be inlinable and similar in cost.

NeedsInvestigation

Most helpful comment

The big difference in EncodeQuad is that PutUint32 has to handle unaligned writes, whose expensiveness depends on the system. For architectures with unaligned writes it is cheap, but for those without unaligned writes it is still 24 byte writes. The other EncodeQuads can assume aligned writes.
At inline cost calculation time, we have to assume the worst for PutUint32 because it all depends on whether the unaligned write combining optimization happens.

BTW, EncodeQuad* doesn't do anything. It encodes into an argument that is thrown away upon return. You want to use a slice or pointer to an array, not a bare array.

All 3 comments

As far as the Float32bits part goes, it could be similar to #42739.

The big difference in EncodeQuad is that PutUint32 has to handle unaligned writes, whose expensiveness depends on the system. For architectures with unaligned writes it is cheap, but for those without unaligned writes it is still 24 byte writes. The other EncodeQuads can assume aligned writes.
At inline cost calculation time, we have to assume the worst for PutUint32 because it all depends on whether the unaligned write combining optimization happens.

BTW, EncodeQuad* doesn't do anything. It encodes into an argument that is thrown away upon return. You want to use a slice or pointer to an array, not a bare array.

BTW, EncodeQuad* doesn't do anything. It encodes into an argument that is thrown away upon return. You want to use a slice or pointer to an array, not a bare array.

Thanks, I'm aware. The code is a simplification of https://github.com/golang/go/issues/28941#issuecomment-732446273 to ensure BCE works.

Was this page helpful?
0 / 5 - 0 ratings