Go: cmd/compile: better append of unmodified slices

Created on 5 Mar 2020  路  7Comments  路  Source: golang/go

What version of Go are you using (go version)?

$ go version
go version go1.14 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN="/Users/s.rabot/go/bin"
GOCACHE="/Users/s.rabot/Library/Caches/go-build"
GOENV="/Users/s.rabot/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/s.rabot/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/z0/rcywz4nn3jg6g96h67c_4xzdw31lvl/T/go-build757609694=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

package main

import (
    "testing"
)

func BenchmarkNoCap(b *testing.B) {
    for i := 0; i < b.N; i++ {
        var slice []string

        slice = append(slice, []string{"a", "q", "w"}...)
        slice = append(slice, []string{"z", "s", "x"}...)
    }
}

func BenchmarkCap(b *testing.B) {
    for i := 0; i < b.N; i++ {
        slice := make([]string, 0, 6)

        slice = append(slice, []string{"a", "q", "w"}...)
        slice = append(slice, []string{"z", "s", "x"}...)
    }
}

func BenchmarkCapNoMagicNumber(b *testing.B) {
    for i := 0; i < b.N; i++ {
        base := []string{"a", "q", "w"}
        fixed := []string{"z", "s", "x"}

        slice := make([]string, 0, len(base)+len(fixed))

        slice = append(slice, base...)
        slice = append(slice, fixed...)
    }
}

func BenchmarkCapNoMagicNumberFixedSlices(b *testing.B) {
    for i := 0; i < b.N; i++ {
        base := [...]string{"a", "q", "w"}
        fixed := [...]string{"z", "s", "x"}

        slice := make([]string, 0, len(base)+len(fixed))

        slice = append(slice, base[:]...)
        slice = append(slice, fixed[:]...)
    }
}

What did you expect to see?

I would have expected BenchmarkCapNoMagicNumber and BenchmarkCapNoMagicNumberFixedSlices to have the same throughput.

What did you see instead?

$ go test -bench . -count=3
goos: darwin
goarch: amd64
BenchmarkNoCap-12                            7771320           142 ns/op
BenchmarkNoCap-12                            7877163           142 ns/op
BenchmarkNoCap-12                            7857156           143 ns/op

BenchmarkCap-12                             59972187            20.2 ns/op
BenchmarkCap-12                             58082768            23.4 ns/op
BenchmarkCap-12                             44653009            25.6 ns/op

BenchmarkCapNoMagicNumber-12                15412473            74.8 ns/op
BenchmarkCapNoMagicNumber-12                14540749            75.1 ns/op
BenchmarkCapNoMagicNumber-12                15240964            74.7 ns/op

BenchmarkCapNoMagicNumberFixedSlices-12     64831053            21.0 ns/op
BenchmarkCapNoMagicNumberFixedSlices-12     64266398            21.4 ns/op
BenchmarkCapNoMagicNumberFixedSlices-12     60569865            22.7 ns/op
NeedsInvestigation Performance

Most helpful comment

It鈥檚 definitely possible for them to be improved. The question鈥攚hich I can鈥檛 answer offhand鈥攊s whether it is possible to make a small, local improvement that would cover this case, or whether it would require an overhaul. (There are good independent reasons for an overhaul, but what is needed鈥攅liminating one of the compiler鈥檚 intermediate representations鈥攊s a huge undertaking.)

All 7 comments

@randall77, I think you've got the title backwards. It looks like the sliced array is outperforming the slice-literal version for some reason.

Sorry, yes.

Slow routine allocates because len(base)+len(fixed) of slices isn't determined at compile time, whereas with arrays the expression is constant.

Or to be more precise, during the escape analysis phase, len(base)+len(fixed) isn't known to be a constant. (Later on, SSA figures it out.)

So both are optimized to the extent of what they can be or is there room for improvement in the slice case ?

It鈥檚 definitely possible for them to be improved. The question鈥攚hich I can鈥檛 answer offhand鈥攊s whether it is possible to make a small, local improvement that would cover this case, or whether it would require an overhaul. (There are good independent reasons for an overhaul, but what is needed鈥攅liminating one of the compiler鈥檚 intermediate representations鈥攊s a huge undertaking.)

Is this a duplicate of #20533?

Was this page helpful?
0 / 5 - 0 ratings