signal: segmentation fault (core dumped)
FAIL golang.org/x/crypto/chacha20poly1305 1.843s
https://build.golang.org/log/dc44fcbd057280191957fc9c9cf056eceee7d416
Seems to have started at CL 203837 (sha3: align (*state).storage), which frankly makes no sense, and to be flaky, which is also weird.
Seeing it also on the BSDs.
Any pattern to the affected go revisions or architectures?
(Possibly related to #35473 or #35470 or #35462?)
CC @mdempsky
This is not flaky, it only passes on the release branches. Bisecting with the help of a Raspberry Pi.
2ff746d7dc3ce5ce7034bfcc3af16b7b8eab7413 is the first bad commit
commit 2ff746d7dc3ce5ce7034bfcc3af16b7b8eab7413
Author: Cherry Zhang <[email protected]>
Date: Sun Oct 20 17:25:02 2019 -0400
runtime: add async preemption support on ARM
This CL adds support of call injection and async preemption on
ARM.
Injected call, like sigpanic, has special frame layout. Teach
traceback to handle it.
Change-Id: I887e90134fbf8a676b73c26321c50b3c4762dba4
Reviewed-on: https://go-review.googlesource.com/c/go/+/202338
Run-TryBot: Cherry Zhang <[email protected]>
Reviewed-by: Austin Clements <[email protected]>
The only arm assembly we have is poly1305/sum_arm.s, which wasn't touched in the last 3 years.
/cc @cherrymui @aclements
Bad news, this affects the vendored chacha20poly1305, so it's a Go 1.14 release blocker =(
/cc @golang/osp-team
The assembly file in x/crypto/poly1305 is broken. It uses the G register as a temporary register, which violates the Go ABI. E.g. https://go.googlesource.com/crypto/+/refs/heads/master/poly1305/sum_arm.s#33
This is not signal safe. We already don't preempt assembly functions. But the signal handler needs the G register. It is not only the preemption signal, but any signals, including profiling or using os/signal package, that could crash the program. E.g. running on the Linux/ARM builder with async preemption disabled but profiling enabled,
# GODEBUG=asyncpreemptoff=1 ./chacha20poly1305.test -test.v -test.short -test.cpuprofile=/tmp/x
=== RUN TestVectors
--- PASS: TestVectors (0.02s)
=== RUN TestRandom
=== RUN TestRandom/Standard
Segmentation fault (core dumped)
Turning on profiling, it also fails with Go 1.13.
Looks like it's been broken since it was added in 2015 in https://github.com/golang/crypto/commit/4d48e5fa3d62b5e6e71260571bf76c767198ca02
At a five second glance it doesn't look like a trivial rewrite. Perhaps for 1.14 we should just drop the arm assembly code. Is this performance critical?
Yeah, I figured it would be that again. We had similar issues with other old ARM assembly.
Let's drop it. This assembly was never ported to the Writer interface (https://golang.org/cl/111335) anyway, so it was going to stop being used by chacha20poly1305 soon one way or another (https://golang.org/cl/206977) and no one uses poly1305 standalone.
The performance drop is as dramatic as you'd expect on 32 bit ARM, but "only" 2x for the combined chacha20poly1305...
name old time/op new time/op delta
64 1.44µs ± 0% 9.44µs ± 0% +556.77% (p=0.000 n=9+9)
1K 16.2µs ± 0% 131.9µs ± 0% +714.99% (p=0.000 n=9+9)
2M 44.2ms ± 0% 283.6ms ± 0% +541.38% (p=0.000 n=10+9)
64Unaligned 1.82µs ± 0% 9.44µs ± 0% +417.32% (p=0.000 n=9+9)
1KUnaligned 22.3µs ± 0% 131.8µs ± 0% +489.78% (p=0.000 n=10+10)
2MUnaligned 57.1ms ± 0% 284.6ms ± 1% +398.71% (p=0.000 n=9+8)
Write64 8.38µs ± 0% 8.39µs ± 0% +0.17% (p=0.002 n=9+8)
Write1K 131µs ± 0% 131µs ± 0% ~ (p=0.237 n=8+10)
Write2M 283ms ± 0% 284ms ± 0% ~ (p=0.077 n=9+9)
Write64Unaligned 8.38µs ± 0% 8.40µs ± 0% ~ (p=0.099 n=9+10)
Write1KUnaligned 131µs ± 1% 131µs ± 0% ~ (p=0.247 n=10+10)
Write2MUnaligned 284ms ± 0% 284ms ± 0% ~ (p=0.277 n=8+9)
name old speed new speed delta
64 44.5MB/s ± 0% 6.8MB/s ± 0% -84.77% (p=0.000 n=9+9)
1K 63.3MB/s ± 0% 7.8MB/s ± 0% -87.73% (p=0.000 n=9+9)
2M 47.4MB/s ± 0% 7.4MB/s ± 0% -84.41% (p=0.000 n=10+9)
64Unaligned 35.1MB/s ± 0% 6.8MB/s ± 0% -80.67% (p=0.000 n=9+7)
1KUnaligned 45.8MB/s ± 0% 7.8MB/s ± 0% -83.04% (p=0.000 n=10+10)
2MUnaligned 36.8MB/s ± 0% 7.4MB/s ± 1% -79.95% (p=0.000 n=9+8)
Write64 7.64MB/s ± 0% 7.63MB/s ± 0% -0.14% (p=0.007 n=9+8)
Write1K 7.83MB/s ± 0% 7.84MB/s ± 0% ~ (p=0.819 n=8+10)
Write2M 7.40MB/s ± 0% 7.39MB/s ± 0% ~ (p=0.102 n=9+9)
Write64Unaligned 7.63MB/s ± 0% 7.62MB/s ± 0% ~ (p=0.100 n=9+9)
Write1KUnaligned 7.83MB/s ± 0% 7.84MB/s ± 0% ~ (p=0.453 n=9+10)
Write2MUnaligned 7.38MB/s ± 0% 7.37MB/s ± 0% ~ (p=0.085 n=8+9)
name old time/op new time/op delta
Chacha20Poly1305/Open-64 22.2µs ± 0% 34.6µs ± 0% +55.42% (p=0.000 n=10+10)
Chacha20Poly1305/Seal-64 20.2µs ± 0% 32.4µs ± 0% +60.57% (p=0.000 n=9+9)
Chacha20Poly1305/Open-64-X 26.5µs ± 0% 39.7µs ± 0% +49.71% (p=0.000 n=9+9)
Chacha20Poly1305/Seal-64-X 25.1µs ± 0% 39.0µs ± 0% +55.07% (p=0.000 n=10+10)
Chacha20Poly1305/Open-1350 160µs ± 0% 326µs ± 0% +103.19% (p=0.000 n=10+9)
Chacha20Poly1305/Seal-1350 158µs ± 0% 323µs ± 1% +104.45% (p=0.000 n=9+9)
Chacha20Poly1305/Open-1350-X 166µs ± 0% 332µs ± 0% +100.68% (p=0.000 n=10+10)
Chacha20Poly1305/Seal-1350-X 164µs ± 1% 331µs ± 0% +101.96% (p=0.000 n=10+9)
Chacha20Poly1305/Open-8192 842µs ± 1% 1798µs ± 0% +113.42% (p=0.000 n=9+10)
Chacha20Poly1305/Seal-8192 856µs ± 0% 1812µs ± 0% +111.72% (p=0.000 n=9+9)
Chacha20Poly1305/Open-8192-X 849µs ± 0% 1805µs ± 0% +112.66% (p=0.000 n=10+9)
Chacha20Poly1305/Seal-8192-X 864µs ± 0% 1820µs ± 0% +110.70% (p=0.000 n=9+9)
name old speed new speed delta
Chacha20Poly1305/Open-64 2.88MB/s ± 0% 1.85MB/s ± 0% -35.76% (p=0.008 n=6+7)
Chacha20Poly1305/Seal-64 3.17MB/s ± 1% 1.97MB/s ± 0% -37.78% (p=0.000 n=10+8)
Chacha20Poly1305/Open-64-X 2.41MB/s ± 0% 1.61MB/s ± 0% -33.29% (p=0.000 n=9+9)
Chacha20Poly1305/Seal-64-X 2.55MB/s ± 0% 1.64MB/s ± 0% -35.61% (p=0.000 n=10+9)
Chacha20Poly1305/Open-1350 8.43MB/s ± 0% 4.15MB/s ± 0% -50.78% (p=0.000 n=10+10)
Chacha20Poly1305/Seal-1350 8.55MB/s ± 0% 4.18MB/s ± 0% -51.12% (p=0.000 n=9+9)
Chacha20Poly1305/Open-1350-X 8.16MB/s ± 0% 4.06MB/s ± 0% -50.18% (p=0.000 n=10+10)
Chacha20Poly1305/Seal-1350-X 8.24MB/s ± 1% 4.08MB/s ± 1% -50.53% (p=0.000 n=10+10)
Chacha20Poly1305/Open-8192 9.73MB/s ± 1% 4.56MB/s ± 0% -53.15% (p=0.000 n=9+10)
Chacha20Poly1305/Seal-8192 9.57MB/s ± 0% 4.52MB/s ± 0% -52.77% (p=0.000 n=9+9)
Chacha20Poly1305/Open-8192-X 9.65MB/s ± 0% 4.54MB/s ± 0% -52.95% (p=0.000 n=10+7)
Chacha20Poly1305/Seal-8192-X 9.47MB/s ± 1% 4.50MB/s ± 0% -52.50% (p=0.000 n=10+9)
Change https://golang.org/cl/213880 mentions this issue: poly1305: drop broken arm assembly
Reopening for importing into the go tree.
Change https://golang.org/cl/214078 mentions this issue: src/go.mod: update x/crypto to drop broken poly1305 arm assembly
Most helpful comment
At a five second glance it doesn't look like a trivial rewrite. Perhaps for 1.14 we should just drop the arm assembly code. Is this performance critical?