Go: crypto/cipher: optimize safeXORBytes

Created on 5 Nov 2019  Â·  10Comments  Â·  Source: golang/go

The discussion at https://github.com/golang/go/issues/31586 includes several techniques for optimizing xor of byte slices, including using encoding/binary (with native endianness) and loop unrolling. It'd be nice to apply them to safeXORBytes, which is used on many non-amd64 platforms. An optimized safeXORBytes might even be fast enough that we could delete fastXORBytes.

Somewhat related: #30553

cc @nhooyr

NeedsInvestigation Performance Suggested help wanted

Most helpful comment

(Aside: I've wanted encoding/binary.NativeEndian to exist on multiple occasions. Here's another one.)

All 10 comments

(Aside: I've wanted encoding/binary.NativeEndian to exist on multiple occasions. Here's another one.)

Will give this a shot on the weekend.

(Aside: I've wanted encoding/binary.NativeEndian to exist on multiple occasions. Here's another one.)

I agree this would be very useful. Some discussion in the past: https://groups.google.com/forum/#!topic/golang-nuts/3GEzwKfRRQw

Will give this a shot on the weekend.

Awesome. No rush, in any case—the freeze just started, so nothing will go in for three months.

Thanks for the pointer to previous discussion of native endianness. I’ll file a proposal to discuss.

If you implement a generic, fast xor function, it could probably stand to be used in x/crypto/sha3 — the current implementation there is somewhat dubious due to alignment assumptions.

(See CL 203837.)

CC @FiloSottile

@bcmills good point, thanks. If we put it in a place that both x/crypto and crypto/cipher can reach it, that'd presumably also resolve #30553.

Proposal at #35398.

Added in https://github.com/nhooyr/go/commit/24ce00e112af127052f7c46e5ec50cadb1cab97f

Results against xorBytesSSE2 on amd64:

$ /Users/nhooyr/src/golang/go/bin/go test -run=_ -bench=XORBytes
goos: darwin
goarch: amd64
pkg: crypto/cipher
BenchmarkXORBytes/8Bytes-8      176461338            6.70 ns/op 1194.11 MB/s
BenchmarkXORBytes/128Bytes-8    100000000           11.6 ns/op  11062.06 MB/s
BenchmarkXORBytes/2048Bytes-8   12895827            95.2 ns/op  21517.42 MB/s
BenchmarkXORBytes/32768Bytes-8            627291          1596 ns/op    20527.27 MB/s
BenchmarkFastSafeXORBytes/8Bytes-8      147817434            7.92 ns/op 1009.53 MB/s
BenchmarkFastSafeXORBytes/128Bytes-8    66896612            17.2 ns/op  7438.88 MB/s
BenchmarkFastSafeXORBytes/2048Bytes-8    6217075           185 ns/op    11069.10 MB/s
BenchmarkFastSafeXORBytes/32768Bytes-8    397446          3007 ns/op    10897.15 MB/s
PASS
ok      crypto/cipher   12.014s

So safe fast is still about 2x as slow. Still 10x faster in my testing than basic simple xor though.

https://github.com/nhooyr/go/commit/68261d7dcf45c3e97f74d7d851301c945bf0eda7

Added safe xor to the benchmark:

$ /Users/nhooyr/src/golang/go/bin/go test -run=_ -bench=XORBytes
goos: darwin
goarch: amd64
pkg: crypto/cipher
BenchmarkXORBytes/default/8B-8          180542707            6.79 ns/op 1178.83 MB/s
BenchmarkXORBytes/safe/8B-8             135818607            8.73 ns/op  916.11 MB/s
BenchmarkXORBytes/fastSafe/8B-8         139375618            8.62 ns/op  927.86 MB/s
BenchmarkXORBytes/default#01/128B-8     100000000           10.9 ns/op  11787.06 MB/s
BenchmarkXORBytes/safe#01/128B-8        13410099            83.8 ns/op  1527.65 MB/s
BenchmarkXORBytes/fastSafe#01/128B-8            60092373            19.7 ns/op  6490.28 MB/s
BenchmarkXORBytes/default#02/2048B-8            13841852            83.5 ns/op  24532.14 MB/s
BenchmarkXORBytes/safe#02/2048B-8                 974487          1180 ns/op    1735.13 MB/s
BenchmarkXORBytes/fastSafe#02/2048B-8            5222248           220 ns/op    9314.25 MB/s
BenchmarkXORBytes/default#03/32768B-8             855936          1448 ns/op    22634.24 MB/s
BenchmarkXORBytes/safe#03/32768B-8                 61302         19043 ns/op    1720.75 MB/s
BenchmarkXORBytes/fastSafe#03/32768B-8            332857          3440 ns/op    9524.70 MB/s
PASS
ok      crypto/cipher   17.433s

Sometimes nearly 3x as slow as the current default.

cc @FiloSottile for removing assembly in favor of pure Go

Filippo, is 2x-3x in this case within range of switching to pure Go?

There may also be further optimizations available (either in the pure Go or in the compiler); I haven't checked yet, and may not for a little while.

cc @CAFxX who I believe likes optimizing stuff like this

Was this page helpful?
0 / 5 - 0 ratings

Related issues

brtzsnr picture brtzsnr  Â·  168Comments

griesemer picture griesemer  Â·  808Comments

bradfitz picture bradfitz  Â·  147Comments

networkimprov picture networkimprov  Â·  194Comments

rsc picture rsc  Â·  242Comments