Go: cmd/compile: optimize bitset tests

Created on 8 May 2019 · 7Comments · Source: golang/go

What version of Go are you using (`go version`)?

go version go1.12.1 windows/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (`go env`)?

set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\rillig\AppData\Local\go-build
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\Users\rillig\go
set GOPROXY=
set GORACE=
set GOROOT=C:\Program Files\Go
set GOTMPDIR=
set GOTOOLDIR=C:\Program Files\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\rillig\Program Files\cygwin\tmp\go-build152819318=/tmp/go-build -gno-record-gcc-switches

What did you do?

https://play.golang.org/p/G1EhaxsYsAV

What did you expect to see?

When a bitset is queried for a single bit, it doesn't matter whether set & q is compared to 0 or to the expected bit. Comparing to 0 saves an instruction on x86_64 and doesn't modify a register.

Therefore I expected the generated code to use TESTQ $0x8, AX instead of the combined ANDQ and CMPQ.

I didn't measure how often this pattern appears in the wild, but when I saw the generated code I immediately wondered why it uses two instructions when the same effect can be achieved with a single instruction and even fewer side effects.

What did you see instead?

  bitset.go:13          MOVQ os.Args+8(SB), AX
  bitset.go:14          NOPL
  bitset.go:14          XORPS X0, X0
  bitset.go:14          MOVUPS X0, 0x40(SP)
  bitset.go:14          LEAQ runtime.types+65664(SB), CX
  bitset.go:14          MOVQ CX, 0x40(SP)
  bitset.go:9           ANDQ $0x8, AX              <-- seems inefficient
  bitset.go:9           CMPQ $0x8, AX              <-- seems inefficient
  bitset.go:9           SETE AL
  bitset.go:14          MOVZX AL, AX

FrozenDueToAge NeedsFix Performance help wanted

Source

rillig

Most helpful comment

Yes, it should use BTL. It's smaller, but probably not any faster than TESTQ.
It might be hard to find a benchmark which cares. I'm happy for this to go in as long as the benchmarks don't get worse.

randall77 on 8 May 2019

👍6

All 7 comments

Indeed. We currently handle x & 8 == 0 but not x & 8 == 8.

randall77 on 8 May 2019

Change https://golang.org/cl/175957 mentions this issue: cmd/compile: optimize bitset tests

gopherbot on 8 May 2019

This optimisation is only valid for x & y == y if y has exactly one bit set (i.e. is a power of two).

See this example: https://play.golang.org/p/fAqS02z3Fgm

tmthrgd on 8 May 2019

@tmthrgd it's right, finding a way to fix it.

@randall77 is there anyway to check a *ssa.Value is power of two? I see there's a function isPowerOfTwo, but it gets an int64 instead.

cuonglm on 8 May 2019

isPowerOfTwo is probably the right one to use.
You're going to need to make the rules require that one argument is a constant.

It would be nice if you could do this optimization in the generic rules, something like:

(Eq64 (And64 x (Const64 [y])) (Const64 [y])) && isPowerOfTwo(y) -> (Ne64 (And64 x (Const64 [y])) (Const64 [0]))

(And 32/16/8 versions, Ne -> Eq also)
That way all the architectures get the optimization.

randall77 on 8 May 2019

👍1

@randall77 interesting, with these rules added in generic.rules:

// Optimize bitsets
(Eq(8|16|32|64) (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [y])) (Const(8|16|32|64) <t> [y])) && isPowerOfTwo(y)
  -> (Neq(8|16|32|64) (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [y])) (Const(8|16|32|64) <t> [0]))
(Neq(8|16|32|64) (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [y])) (Const(8|16|32|64) <t> [y])) && isPowerOfTwo(y)
  -> (Eq(8|16|32|64) (And(8|16|32|64) <t> x (Const(8|16|32|64) <t> [y])) (Const(8|16|32|64) <t> [0]))

For return set&8 == 8, the assembly code generate using BTL instead of TESTQ:

"".AllBitsSet STEXT nosplit size=15 args=0x18 locals=0x0
    0x0000 00000 (main.go:8)    TEXT    "".AllBitsSet(SB), NOSPLIT|ABIInternal, $0-24
    0x0000 00000 (main.go:8)    FUNCDATA    $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
    0x0000 00000 (main.go:8)    FUNCDATA    $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
    0x0000 00000 (main.go:8)    FUNCDATA    $2, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
    0x0000 00000 (main.go:12)   PCDATA  $0, $0
    0x0000 00000 (main.go:12)   PCDATA  $1, $0
    0x0000 00000 (main.go:12)   MOVQ    "".set+8(SP), AX
    0x0005 00005 (main.go:12)   BTL $3, AX
    0x0009 00009 (main.go:12)   SETCS   "".~r2+24(SP)
    0x000e 00014 (main.go:12)   RET

I benched with no speed improvement compare to ANDQ + COMPQ.

cuonglm on 8 May 2019

randall77 on 8 May 2019

👍6

Was this page helpful?

0 / 5 - 0 ratings

Related issues

all: WebAssembly ("wasm") support

bradfitz · 147Comments

cmd/go: options missing from cgo whitelists

ianlancetaylor · 138Comments

cmd/go: -buildmode=c-shared should work on windows

chai2010 · 216Comments

sort: make sorting easier, add Slice, SliceStable, SliceIsSorted, reflect.Swapper

bradfitz · 158Comments

runtime: FreeBSD memory corruption involving fork system call

derekmarcotte · 243Comments

Go: cmd/compile: optimize bitset tests

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

Most helpful comment

All 7 comments

Related issues

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?