Both go 1.8 and go tip provides too slow server-side handshake performance for RSA certificates if the client doesn't use TLS session cache:
$ go get -u github.com/valyala/fasthttp/fasthttputil
$ GOMAXPROCS=1 go test github.com/valyala/fasthttp/fasthttputil -bench=TLSHandshake
goos: linux
goarch: amd64
pkg: github.com/valyala/fasthttp/fasthttputil
BenchmarkPlainHandshake 300000 3953 ns/op
BenchmarkTLSHandshakeWithClientSessionCache 20000 81960 ns/op
BenchmarkTLSHandshakeWithoutClientSessionCache 500 3493016 ns/op
BenchmarkTLSHandshakeWithCurvesWithClientSessionCache 20000 80307 ns/op
BenchmarkTLSHandshakeWithCurvesWithoutClientSessionCache 500 3518508 ns/op
PASS
ok github.com/valyala/fasthttp/fasthttputil 11.683s
The results show that a single amd64 core may perform only 300 handshakes per second from new clients without session tickets. This is very discouraging performance comparing to openssl
as described on https://istlsfastyet.com/ :
$ openssl version
OpenSSL 1.0.2g 1 Mar 2016
$ openssl speed ecdh
...
op op/s
256 bit ecdh (nistp256) 0.0001s 12797.0
384 bit ecdh (nistp384) 0.0007s 1416.8
521 bit ecdh (nistp521) 0.0005s 1968.0
Note that openssl
performs 12797 256-bit ecdh operations per second on a single CPU core. This is 40x higher than the results from the comparable BenchmarkTLSHandshakeWithCurvesWithoutClientSessionCache
above. Below are cpu profiles for this benchmark:
Mixed client and server profile:
(pprof) top20
Showing nodes accounting for 154.20ms, 87.56% of 176.10ms total
Dropped 200 nodes (cum <= 0.88ms)
Showing top 20 nodes out of 103
flat flat% sum% cum cum%
82.50ms 46.85% 46.85% 82.50ms 46.85% math/big.addMulVVW /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
19.50ms 11.07% 57.92% 113.20ms 64.28% math/big.nat.montgomery /home/aliaksandr/work/go-tip/src/math/big/nat.go
13.70ms 7.78% 65.70% 13.70ms 7.78% runtime.memmove /home/aliaksandr/work/go-tip/src/runtime/memmove_amd64.s
7.40ms 4.20% 69.90% 21.10ms 11.98% math/big.nat.divLarge /home/aliaksandr/work/go-tip/src/math/big/nat.go
5ms 2.84% 72.74% 5ms 2.84% math/big.mulAddVWW /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
3.40ms 1.93% 74.67% 3.40ms 1.93% crypto/sha256.block /home/aliaksandr/work/go-tip/src/crypto/sha256/sha256block_amd64.s
2.70ms 1.53% 76.21% 2.70ms 1.53% math/big.subVV /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
2.70ms 1.53% 77.74% 2.70ms 1.53% p256MulInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
2.60ms 1.48% 79.22% 2.60ms 1.48% math/big.addVV /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
2.60ms 1.48% 80.69% 2.60ms 1.48% p256SqrInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
2.10ms 1.19% 81.89% 2.10ms 1.19% math/big.shlVU /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
1.80ms 1.02% 82.91% 1.80ms 1.02% syscall.Syscall /home/aliaksandr/work/go-tip/src/syscall/asm_linux_amd64.s
1.30ms 0.74% 83.65% 1.30ms 0.74% crypto/elliptic.p256Sqr /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
1.10ms 0.62% 84.27% 2.30ms 1.31% sync.(*Pool).Put /home/aliaksandr/work/go-tip/src/sync/pool.go
1ms 0.57% 84.84% 4.20ms 2.39% math/big.nat.add /home/aliaksandr/work/go-tip/src/math/big/nat.go
1ms 0.57% 85.41% 3.70ms 2.10% math/big.nat.mulAddWW /home/aliaksandr/work/go-tip/src/math/big/nat.go
1ms 0.57% 85.97% 1ms 0.57% sync.(*Mutex).Lock /home/aliaksandr/work/go-tip/src/sync/mutex.go
1ms 0.57% 86.54% 1ms 0.57% sync.(*Mutex).Unlock /home/aliaksandr/work/go-tip/src/sync/mutex.go
0.90ms 0.51% 87.05% 24.90ms 14.14% math/big.(*Int).GCD /home/aliaksandr/work/go-tip/src/math/big/int.go
0.90ms 0.51% 87.56% 5.40ms 3.07% math/big.(*Int).Mul /home/aliaksandr/work/go-tip/src/math/big/int.go
Server profile:
(pprof) top20 Server
Showing nodes accounting for 149.20ms, 84.72% of 176.10ms total
Dropped 110 nodes (cum <= 0.88ms)
Showing top 20 nodes out of 86
flat flat% sum% cum cum%
82.50ms 46.85% 46.85% 82.50ms 46.85% math/big.addMulVVW /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
19.50ms 11.07% 57.92% 113.20ms 64.28% math/big.nat.montgomery /home/aliaksandr/work/go-tip/src/math/big/nat.go
13.40ms 7.61% 65.53% 13.40ms 7.61% runtime.memmove /home/aliaksandr/work/go-tip/src/runtime/memmove_amd64.s
7.40ms 4.20% 69.73% 21.10ms 11.98% math/big.nat.divLarge /home/aliaksandr/work/go-tip/src/math/big/nat.go
5ms 2.84% 72.57% 5ms 2.84% math/big.mulAddVWW /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
2.70ms 1.53% 74.11% 2.70ms 1.53% math/big.subVV /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
2.60ms 1.48% 75.58% 2.60ms 1.48% math/big.addVV /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
2.30ms 1.31% 76.89% 2.30ms 1.31% crypto/sha256.block /home/aliaksandr/work/go-tip/src/crypto/sha256/sha256block_amd64.s
2.10ms 1.19% 78.08% 2.10ms 1.19% math/big.shlVU /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
1.40ms 0.8% 78.88% 1.40ms 0.8% p256MulInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
1.30ms 0.74% 79.61% 1.30ms 0.74% syscall.Syscall /home/aliaksandr/work/go-tip/src/syscall/asm_linux_amd64.s
1.20ms 0.68% 80.30% 1.20ms 0.68% p256SqrInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
1.10ms 0.62% 80.92% 2.30ms 1.31% sync.(*Pool).Put /home/aliaksandr/work/go-tip/src/sync/pool.go
1ms 0.57% 81.49% 4.20ms 2.39% math/big.nat.add /home/aliaksandr/work/go-tip/src/math/big/nat.go
1ms 0.57% 82.06% 3.70ms 2.10% math/big.nat.mulAddWW /home/aliaksandr/work/go-tip/src/math/big/nat.go
1ms 0.57% 82.62% 1ms 0.57% sync.(*Mutex).Lock /home/aliaksandr/work/go-tip/src/sync/mutex.go
1ms 0.57% 83.19% 1ms 0.57% sync.(*Mutex).Unlock /home/aliaksandr/work/go-tip/src/sync/mutex.go
0.90ms 0.51% 83.70% 24.90ms 14.14% math/big.(*Int).GCD /home/aliaksandr/work/go-tip/src/math/big/int.go
0.90ms 0.51% 84.21% 5.40ms 3.07% math/big.(*Int).Mul /home/aliaksandr/work/go-tip/src/math/big/int.go
0.90ms 0.51% 84.72% 114.30ms 64.91% math/big.nat.expNNMontgomery /home/aliaksandr/work/go-tip/src/math/big/nat.go
Client profile:
(pprof) top20 Client
Showing nodes accounting for 14.10ms, 8.01% of 176.10ms total
Showing top 20 nodes out of 202
flat flat% sum% cum cum%
2.60ms 1.48% 1.48% 2.60ms 1.48% p256SqrInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
2.10ms 1.19% 2.67% 2.10ms 1.19% p256MulInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
1.60ms 0.91% 3.58% 2.50ms 1.42% math/big.nat.divLarge /home/aliaksandr/work/go-tip/src/math/big/nat.go
1.10ms 0.62% 4.20% 1.10ms 0.62% crypto/sha256.block /home/aliaksandr/work/go-tip/src/crypto/sha256/sha256block_amd64.s
0.90ms 0.51% 4.71% 0.90ms 0.51% crypto/elliptic.p256Sqr /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
0.80ms 0.45% 5.17% 4.50ms 2.56% crypto/elliptic.p256PointDoubleAsm /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
0.80ms 0.45% 5.62% 0.80ms 0.45% math/big.addMulVVW /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
0.70ms 0.4% 6.02% 0.70ms 0.4% syscall.Syscall /home/aliaksandr/work/go-tip/src/syscall/asm_linux_amd64.s
0.50ms 0.28% 6.30% 1.30ms 0.74% math/big.basicMul /home/aliaksandr/work/go-tip/src/math/big/nat.go
0.40ms 0.23% 6.53% 0.40ms 0.23% math/big.subVV /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
0.30ms 0.17% 6.70% 0.30ms 0.17% crypto/elliptic.p256Select /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
0.30ms 0.17% 6.87% 0.30ms 0.17% crypto/hmac.New /home/aliaksandr/work/go-tip/src/crypto/hmac/hmac.go
0.30ms 0.17% 7.04% 0.30ms 0.17% math/big.mulAddVWW /home/aliaksandr/work/go-tip/src/math/big/arith_amd64.s
0.30ms 0.17% 7.21% 0.30ms 0.17% p256SubInternal /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
0.30ms 0.17% 7.38% 0.30ms 0.17% runtime.mallocgc /home/aliaksandr/work/go-tip/src/runtime/mbitmap.go
0.30ms 0.17% 7.55% 0.30ms 0.17% runtime.memmove /home/aliaksandr/work/go-tip/src/runtime/memmove_amd64.s
0.20ms 0.11% 7.67% 0.60ms 0.34% crypto/elliptic.p256PointAddAffineAsm /home/aliaksandr/work/go-tip/src/crypto/elliptic/p256_asm_amd64.s
0.20ms 0.11% 7.78% 1.70ms 0.97% encoding/asn1.parseField /home/aliaksandr/work/go-tip/src/encoding/asn1/asn1.go
0.20ms 0.11% 7.89% 0.20ms 0.11% math/big.nat.setBytes /home/aliaksandr/work/go-tip/src/math/big/nat.go
0.20ms 0.11% 8.01% 0.20ms 0.11% runtime.heapBitsSetType /home/aliaksandr/work/go-tip/src/runtime/mbitmap.go
As you can see, the client side takes 1/10 part of CPU time comparing to the server side.
@agl , @vkrasnov
@valyala , you should use ECDSA instead RSA if you can. RSA is not very optimized in go.
@vkrasnov , then probably ECDSA must go before RSA at initDefaultCipherSuites?
@valyala , you need an ECDSA certificate, you can try it and see if it helps:
go run `go env GOROOT`/src/crypto/tls/generate_cert.go --host=localhost --ecdsa-curve=P256
@vkrasnov , thanks - this raised the performance from 300 handshakes per second to 2000 handshakes per second on a single CPU core:
$ GOMAXPROCS=1 go test -run=111 -bench=Handshake -cpuprofile=cpu.pprof -benchtime=1s
goos: linux
goarch: amd64
pkg: github.com/valyala/fasthttp/fasthttputil
BenchmarkPlainHandshake 300000 3968 ns/op
BenchmarkTLSHandshakeWithClientSessionCache 20000 89539 ns/op
BenchmarkTLSHandshakeWithoutClientSessionCache 3000 554257 ns/op
BenchmarkTLSHandshakeWithCurvesWithClientSessionCache 20000 90366 ns/op
BenchmarkTLSHandshakeWithCurvesWithoutClientSessionCache 3000 570461 ns/op
PASS
ok github.com/valyala/fasthttp/fasthttputil 10.383s
Are there plans to improve handshake performance for RSA certificates?
Are there plans to improve handshake performance for RSA certificates?
I personally have no such immediate plans. RSA is past its prime, and its usage is constantly dropping.
Added some benchmarks https://golang.org/cl/44730/
Change https://golang.org/cl/74851 mentions this issue: math/big: speed-up addMulVVW on amd64
@vkrasnov @bradfitz Does it mean the RSA is not recommended and generally RSA related code won't be optimized in the future?
According to https://www.ssl.com/article/comparing-ecdsa-vs-rsa/ ECDSA is significantly more vulnerable to Shor’s algorithm (quantum computing attack) than the RSA and I'm more concern about that than the benefits of ECDSA in the moment.
Most helpful comment
@vkrasnov , thanks - this raised the performance from 300 handshakes per second to 2000 handshakes per second on a single CPU core:
Are there plans to improve handshake performance for RSA certificates?