Execute go1 benchmarks.
RegexpMatchEasy0_1K is much slower (at https://github.com/golang/go/commit/86b2f29676c52774d91dda96e0ba5d4d7bcd3b47) as compared to 1.7.1
benchmark old ns/op new ns/op delta
BenchmarkRegexpMatchEasy0_32-4 102 100 -1.96%
BenchmarkRegexpMatchEasy0_1K-4 249 1016 +308.03%
BenchmarkRegexpMatchEasy1_32-4 105 99.3 -5.43%
BenchmarkRegexpMatchEasy1_1K-4 437 426 -2.52%
benchmark old MB/s new MB/s speedup
BenchmarkRegexpMatchEasy0_32-4 310.92 319.40 1.03x
BenchmarkRegexpMatchEasy0_1K-4 4110.44 1007.33 0.25x
BenchmarkRegexpMatchEasy1_32-4 304.17 322.33 1.06x
BenchmarkRegexpMatchEasy1_1K-4 2342.11 2398.62 1.02x
go version devel +86b2f29 Sat Oct 15 07:13:17 2016 +0000 linux/amd64
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH=""
GORACE=""
GOROOT="/home/mohit/tmp/t/go"
GOTOOLDIR="/home/mohit/tmp/t/go/pkg/tool/linux_amd64"
TERM="dumb"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build951092435=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
uname -sr: Linux 4.4.0-38-generic
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial
/lib/x86_64-linux-gnu/libc.so.6: GNU C Library (Ubuntu GLIBC 2.23-0ubuntu3) stable release version 2.23, by Roland McGrath et al.
gdb --version: GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
That looks odd. Are you sure that's not a fluke?
Can you try again with many runs (use -count=20 on both), and then paste the output of https://godoc.org/rsc.io/benchstat ?
That looks odd. Are you sure that's not a fluke?
It did seem odd to me too initially. But git log --grep=MatchEasy also suggests that the specified benchmark is slower. As an example, please refer to the stats in commit messages for c03ed491fe8d11622a2ecac6bfc87da91879dfa8 and dd24b1098ae8923b98a13560c89ae59fc0c49258.
Following are the stats with -count=20:
name old time/op new time/op delta
RegexpMatchEasy0_32-4 108ns ± 8% 102ns ±10% -5.53% (p=0.000 n=20+19)
RegexpMatchEasy0_1K-4 248ns ± 1% 1023ns ± 1% +312.02% (p=0.000 n=18+18)
RegexpMatchEasy1_32-4 103ns ± 1% 99ns ± 4% -3.70% (p=0.000 n=16+18)
RegexpMatchEasy1_1K-4 461ns ±18% 434ns ± 3% -5.83% (p=0.000 n=18+18)
name old speed new speed delta
RegexpMatchEasy0_32-4 295MB/s ± 8% 313MB/s ± 9% +5.92% (p=0.000 n=20+19)
RegexpMatchEasy0_1K-4 4.12GB/s ± 1% 1.00GB/s ± 1% -75.69% (p=0.000 n=18+18)
RegexpMatchEasy1_32-4 310MB/s ± 1% 324MB/s ± 5% +4.28% (p=0.000 n=16+18)
RegexpMatchEasy1_1K-4 2.19GB/s ±16% 2.36GB/s ± 3% +7.57% (p=0.000 n=20+18)
Thanks. I'll let @randall77 decide the importance of this regression for Go 1.8.
Looks like this is related to the change in how bytes.Index works. @TocarIP
When looking for a smallish multicharacter string (in this case, 26 characters), we used to look for the first character, then check the remaining characters, then repeat. Now we use fun SSE (AVX?) instructions to look for multiple characters at once. The new way will definitely be faster when the first character matches but the rest don't (it is not slowed down by partial matches). The old way looks like it is faster when there are few matches of the first character. I'm surprised about how much faster it is, however.
It is possible that we should do some combination algorithm, looking for just the first character and switching once we get a few false positives (partial but not complete matches).
All that said, the regexp is still doing >1GB/sec. I think this may be an instance of "something blazingly fast got a bit less blazingly fast".
Thanks for the investigation.
Fast or not, 3x slower doesn't seem like "a bit less".
I'll experiment with dynamically switching between indexbyte and substring search.
I was thinking we could, in each of those assembly loops, skip ahead by 16 if we compare 16 bytes against the first byte (copied 16 times) and find no match.
It does seem like we should fix this one way or the other.
The belief is that the kind of check in RegexpMatchEasy0_1K
is common enough that it shouldn't get 3x slower.
Just FYI, RegexpMatchEasy0_1K is the main loser in go1 benchmarks when comparing go tip to go1.7:
BinaryTree17-4 2.88s ± 3% 2.81s ± 2% -2.41% (p=0.001 n=9+9)
Fannkuch11-4 3.10s ± 1% 3.03s ± 2% -2.50% (p=0.000 n=9+10)
FmtFprintfEmpty-4 54.7ns ± 7% 49.5ns ± 6% -9.48% (p=0.000 n=10+10)
FmtFprintfString-4 152ns ± 6% 140ns ±10% -7.69% (p=0.006 n=10+10)
FmtFprintfInt-4 142ns ± 8% 120ns ± 5% -15.33% (p=0.000 n=10+10)
FmtFprintfIntInt-4 218ns ± 9% 182ns ± 7% -16.45% (p=0.000 n=10+10)
FmtFprintfPrefixedInt-4 210ns ± 8% 189ns ± 7% -9.78% (p=0.000 n=10+9)
FmtFprintfFloat-4 308ns ±11% 285ns ± 8% -7.69% (p=0.003 n=10+10)
FmtManyArgs-4 877ns ± 5% 809ns ± 1% -7.74% (p=0.000 n=9+9)
GobDecode-4 8.13ms ± 4% 7.23ms ± 3% -11.03% (p=0.000 n=10+10)
GobEncode-4 6.66ms ± 1% 6.57ms ±10% ~ (p=0.515 n=8+10)
Gzip-4 248ms ± 0% 249ms ± 1% ~ (p=1.000 n=7+8)
Gunzip-4 38.4ms ± 6% 38.8ms ±10% ~ (p=0.739 n=10+10)
HTTPClientServer-4 57.5µs ± 7% 62.6µs ± 7% +8.93% (p=0.000 n=9+10)
JSONEncode-4 18.8ms ± 4% 18.8ms ± 4% ~ (p=0.780 n=9+10)
JSONDecode-4 53.0ms ± 4% 54.7ms ± 9% ~ (p=0.218 n=10+10)
Mandelbrot200-4 4.85ms ± 1% 4.79ms ± 1% -1.20% (p=0.002 n=10+10)
GoParse-4 3.61ms ± 5% 3.47ms ± 2% -3.76% (p=0.000 n=10+8)
RegexpMatchEasy0_32-4 86.6ns ± 4% 86.4ns ± 5% ~ (p=0.765 n=10+9)
RegexpMatchEasy0_1K-4 206ns ± 5% 859ns ± 3% +317.14% (p=0.000 n=10+10)
RegexpMatchEasy1_32-4 87.5ns ± 4% 80.3ns ± 3% -8.25% (p=0.000 n=10+10)
RegexpMatchEasy1_1K-4 389ns ± 6% 359ns ± 2% -7.59% (p=0.000 n=10+10)
RegexpMatchMedium_32-4 137ns ± 4% 126ns ± 3% -7.97% (p=0.000 n=9+9)
RegexpMatchMedium_1K-4 41.8µs ± 3% 39.6µs ± 5% -5.24% (p=0.000 n=8+10)
RegexpMatchHard_32-4 1.93µs ± 0% 1.96µs ± 2% +1.37% (p=0.011 n=9+10)
RegexpMatchHard_1K-4 59.6µs ± 7% 59.5µs ± 5% ~ (p=0.497 n=10+9)
Revcomp-4 520ms ± 7% 461ms ± 5% -11.36% (p=0.000 n=9+9)
Template-4 62.7ms ± 3% 55.4ms ± 1% -11.58% (p=0.000 n=9+10)
TimeParse-4 338ns ±11% 308ns ± 2% -8.74% (p=0.000 n=10+9)
TimeFormat-4 350ns ± 0% 324ns ± 1% -7.40% (p=0.000 n=10+10)
CL https://golang.org/cl/31690 mentions this issue.
Most helpful comment
Just FYI,
RegexpMatchEasy0_1Kis the main loser in go1 benchmarks when comparing go tip to go1.7: