Go: regexp: RegexpMatchEasy0_1K slower at tip

Created on 15 Oct 2016  Â·  10Comments  Â·  Source: golang/go

What did you do?

Execute go1 benchmarks.

What did you see?

RegexpMatchEasy0_1K is much slower (at https://github.com/golang/go/commit/86b2f29676c52774d91dda96e0ba5d4d7bcd3b47) as compared to 1.7.1

benchmark                          old ns/op     new ns/op     delta
BenchmarkRegexpMatchEasy0_32-4     102           100           -1.96%
BenchmarkRegexpMatchEasy0_1K-4     249           1016          +308.03%
BenchmarkRegexpMatchEasy1_32-4     105           99.3          -5.43%
BenchmarkRegexpMatchEasy1_1K-4     437           426           -2.52%

benchmark                          old MB/s     new MB/s     speedup
BenchmarkRegexpMatchEasy0_32-4     310.92       319.40       1.03x
BenchmarkRegexpMatchEasy0_1K-4     4110.44      1007.33      0.25x
BenchmarkRegexpMatchEasy1_32-4     304.17       322.33       1.06x
BenchmarkRegexpMatchEasy1_1K-4     2342.11      2398.62      1.02x

System details

go version devel +86b2f29 Sat Oct 15 07:13:17 2016 +0000 linux/amd64
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH=""
GORACE=""
GOROOT="/home/mohit/tmp/t/go"
GOTOOLDIR="/home/mohit/tmp/t/go/pkg/tool/linux_amd64"
TERM="dumb"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build951092435=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
uname -sr: Linux 4.4.0-38-generic
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.1 LTS
Release:    16.04
Codename:   xenial
/lib/x86_64-linux-gnu/libc.so.6: GNU C Library (Ubuntu GLIBC 2.23-0ubuntu3) stable release version 2.23, by Roland McGrath et al.
gdb --version: GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
FrozenDueToAge NeedsFix

Most helpful comment

Just FYI, RegexpMatchEasy0_1K is the main loser in go1 benchmarks when comparing go tip to go1.7:

BinaryTree17-4              2.88s ± 3%      2.81s ± 2%    -2.41%    (p=0.001 n=9+9)
Fannkuch11-4                3.10s ± 1%      3.03s ± 2%    -2.50%   (p=0.000 n=9+10)
FmtFprintfEmpty-4          54.7ns ± 7%     49.5ns ± 6%    -9.48%  (p=0.000 n=10+10)
FmtFprintfString-4          152ns ± 6%      140ns ±10%    -7.69%  (p=0.006 n=10+10)
FmtFprintfInt-4             142ns ± 8%      120ns ± 5%   -15.33%  (p=0.000 n=10+10)
FmtFprintfIntInt-4          218ns ± 9%      182ns ± 7%   -16.45%  (p=0.000 n=10+10)
FmtFprintfPrefixedInt-4     210ns ± 8%      189ns ± 7%    -9.78%   (p=0.000 n=10+9)
FmtFprintfFloat-4           308ns ±11%      285ns ± 8%    -7.69%  (p=0.003 n=10+10)
FmtManyArgs-4               877ns ± 5%      809ns ± 1%    -7.74%    (p=0.000 n=9+9)
GobDecode-4                8.13ms ± 4%     7.23ms ± 3%   -11.03%  (p=0.000 n=10+10)
GobEncode-4                6.66ms ± 1%     6.57ms ±10%      ~      (p=0.515 n=8+10)
Gzip-4                      248ms ± 0%      249ms ± 1%      ~       (p=1.000 n=7+8)
Gunzip-4                   38.4ms ± 6%     38.8ms ±10%      ~     (p=0.739 n=10+10)
HTTPClientServer-4         57.5µs ± 7%     62.6µs ± 7%    +8.93%   (p=0.000 n=9+10)
JSONEncode-4               18.8ms ± 4%     18.8ms ± 4%      ~      (p=0.780 n=9+10)
JSONDecode-4               53.0ms ± 4%     54.7ms ± 9%      ~     (p=0.218 n=10+10)
Mandelbrot200-4            4.85ms ± 1%     4.79ms ± 1%    -1.20%  (p=0.002 n=10+10)
GoParse-4                  3.61ms ± 5%     3.47ms ± 2%    -3.76%   (p=0.000 n=10+8)
RegexpMatchEasy0_32-4      86.6ns ± 4%     86.4ns ± 5%      ~      (p=0.765 n=10+9)
RegexpMatchEasy0_1K-4       206ns ± 5%      859ns ± 3%  +317.14%  (p=0.000 n=10+10)
RegexpMatchEasy1_32-4      87.5ns ± 4%     80.3ns ± 3%    -8.25%  (p=0.000 n=10+10)
RegexpMatchEasy1_1K-4       389ns ± 6%      359ns ± 2%    -7.59%  (p=0.000 n=10+10)
RegexpMatchMedium_32-4      137ns ± 4%      126ns ± 3%    -7.97%    (p=0.000 n=9+9)
RegexpMatchMedium_1K-4     41.8µs ± 3%     39.6µs ± 5%    -5.24%   (p=0.000 n=8+10)
RegexpMatchHard_32-4       1.93µs ± 0%     1.96µs ± 2%    +1.37%   (p=0.011 n=9+10)
RegexpMatchHard_1K-4       59.6µs ± 7%     59.5µs ± 5%      ~      (p=0.497 n=10+9)
Revcomp-4                   520ms ± 7%      461ms ± 5%   -11.36%    (p=0.000 n=9+9)
Template-4                 62.7ms ± 3%     55.4ms ± 1%   -11.58%   (p=0.000 n=9+10)
TimeParse-4                 338ns ±11%      308ns ± 2%    -8.74%   (p=0.000 n=10+9)
TimeFormat-4                350ns ± 0%      324ns ± 1%    -7.40%  (p=0.000 n=10+10)

All 10 comments

That looks odd. Are you sure that's not a fluke?

Can you try again with many runs (use -count=20 on both), and then paste the output of https://godoc.org/rsc.io/benchstat ?

That looks odd. Are you sure that's not a fluke?

It did seem odd to me too initially. But git log --grep=MatchEasy also suggests that the specified benchmark is slower. As an example, please refer to the stats in commit messages for c03ed491fe8d11622a2ecac6bfc87da91879dfa8 and dd24b1098ae8923b98a13560c89ae59fc0c49258.

Following are the stats with -count=20:

name                   old time/op    new time/op    delta
RegexpMatchEasy0_32-4     108ns ± 8%     102ns ±10%    -5.53%  (p=0.000 n=20+19)
RegexpMatchEasy0_1K-4     248ns ± 1%    1023ns ± 1%  +312.02%  (p=0.000 n=18+18)
RegexpMatchEasy1_32-4     103ns ± 1%      99ns ± 4%    -3.70%  (p=0.000 n=16+18)
RegexpMatchEasy1_1K-4     461ns ±18%     434ns ± 3%    -5.83%  (p=0.000 n=18+18)

name                   old speed      new speed      delta
RegexpMatchEasy0_32-4   295MB/s ± 8%   313MB/s ± 9%    +5.92%  (p=0.000 n=20+19)
RegexpMatchEasy0_1K-4  4.12GB/s ± 1%  1.00GB/s ± 1%   -75.69%  (p=0.000 n=18+18)
RegexpMatchEasy1_32-4   310MB/s ± 1%   324MB/s ± 5%    +4.28%  (p=0.000 n=16+18)
RegexpMatchEasy1_1K-4  2.19GB/s ±16%  2.36GB/s ± 3%    +7.57%  (p=0.000 n=20+18)

Thanks. I'll let @randall77 decide the importance of this regression for Go 1.8.

Looks like this is related to the change in how bytes.Index works. @TocarIP

When looking for a smallish multicharacter string (in this case, 26 characters), we used to look for the first character, then check the remaining characters, then repeat. Now we use fun SSE (AVX?) instructions to look for multiple characters at once. The new way will definitely be faster when the first character matches but the rest don't (it is not slowed down by partial matches). The old way looks like it is faster when there are few matches of the first character. I'm surprised about how much faster it is, however.

It is possible that we should do some combination algorithm, looking for just the first character and switching once we get a few false positives (partial but not complete matches).

All that said, the regexp is still doing >1GB/sec. I think this may be an instance of "something blazingly fast got a bit less blazingly fast".

Thanks for the investigation.

Fast or not, 3x slower doesn't seem like "a bit less".

I'll experiment with dynamically switching between indexbyte and substring search.

I was thinking we could, in each of those assembly loops, skip ahead by 16 if we compare 16 bytes against the first byte (copied 16 times) and find no match.

It does seem like we should fix this one way or the other.
The belief is that the kind of check in RegexpMatchEasy0_1K
is common enough that it shouldn't get 3x slower.

Just FYI, RegexpMatchEasy0_1K is the main loser in go1 benchmarks when comparing go tip to go1.7:

BinaryTree17-4              2.88s ± 3%      2.81s ± 2%    -2.41%    (p=0.001 n=9+9)
Fannkuch11-4                3.10s ± 1%      3.03s ± 2%    -2.50%   (p=0.000 n=9+10)
FmtFprintfEmpty-4          54.7ns ± 7%     49.5ns ± 6%    -9.48%  (p=0.000 n=10+10)
FmtFprintfString-4          152ns ± 6%      140ns ±10%    -7.69%  (p=0.006 n=10+10)
FmtFprintfInt-4             142ns ± 8%      120ns ± 5%   -15.33%  (p=0.000 n=10+10)
FmtFprintfIntInt-4          218ns ± 9%      182ns ± 7%   -16.45%  (p=0.000 n=10+10)
FmtFprintfPrefixedInt-4     210ns ± 8%      189ns ± 7%    -9.78%   (p=0.000 n=10+9)
FmtFprintfFloat-4           308ns ±11%      285ns ± 8%    -7.69%  (p=0.003 n=10+10)
FmtManyArgs-4               877ns ± 5%      809ns ± 1%    -7.74%    (p=0.000 n=9+9)
GobDecode-4                8.13ms ± 4%     7.23ms ± 3%   -11.03%  (p=0.000 n=10+10)
GobEncode-4                6.66ms ± 1%     6.57ms ±10%      ~      (p=0.515 n=8+10)
Gzip-4                      248ms ± 0%      249ms ± 1%      ~       (p=1.000 n=7+8)
Gunzip-4                   38.4ms ± 6%     38.8ms ±10%      ~     (p=0.739 n=10+10)
HTTPClientServer-4         57.5µs ± 7%     62.6µs ± 7%    +8.93%   (p=0.000 n=9+10)
JSONEncode-4               18.8ms ± 4%     18.8ms ± 4%      ~      (p=0.780 n=9+10)
JSONDecode-4               53.0ms ± 4%     54.7ms ± 9%      ~     (p=0.218 n=10+10)
Mandelbrot200-4            4.85ms ± 1%     4.79ms ± 1%    -1.20%  (p=0.002 n=10+10)
GoParse-4                  3.61ms ± 5%     3.47ms ± 2%    -3.76%   (p=0.000 n=10+8)
RegexpMatchEasy0_32-4      86.6ns ± 4%     86.4ns ± 5%      ~      (p=0.765 n=10+9)
RegexpMatchEasy0_1K-4       206ns ± 5%      859ns ± 3%  +317.14%  (p=0.000 n=10+10)
RegexpMatchEasy1_32-4      87.5ns ± 4%     80.3ns ± 3%    -8.25%  (p=0.000 n=10+10)
RegexpMatchEasy1_1K-4       389ns ± 6%      359ns ± 2%    -7.59%  (p=0.000 n=10+10)
RegexpMatchMedium_32-4      137ns ± 4%      126ns ± 3%    -7.97%    (p=0.000 n=9+9)
RegexpMatchMedium_1K-4     41.8µs ± 3%     39.6µs ± 5%    -5.24%   (p=0.000 n=8+10)
RegexpMatchHard_32-4       1.93µs ± 0%     1.96µs ± 2%    +1.37%   (p=0.011 n=9+10)
RegexpMatchHard_1K-4       59.6µs ± 7%     59.5µs ± 5%      ~      (p=0.497 n=10+9)
Revcomp-4                   520ms ± 7%      461ms ± 5%   -11.36%    (p=0.000 n=9+9)
Template-4                 62.7ms ± 3%     55.4ms ± 1%   -11.58%   (p=0.000 n=9+10)
TimeParse-4                 338ns ±11%      308ns ± 2%    -8.74%   (p=0.000 n=10+9)
TimeFormat-4                350ns ± 0%      324ns ± 1%    -7.40%  (p=0.000 n=10+10)

CL https://golang.org/cl/31690 mentions this issue.

Was this page helpful?
0 / 5 - 0 ratings