Go: x/crypto/chacha20poly1305: segmentation fault on arm

Created on 11 Nov 2019  ·  12Comments  ·  Source: golang/go

signal: segmentation fault (core dumped)
FAIL    golang.org/x/crypto/chacha20poly1305    1.843s

https://build.golang.org/log/dc44fcbd057280191957fc9c9cf056eceee7d416

Seems to have started at CL 203837 (sha3: align (*state).storage), which frankly makes no sense, and to be flaky, which is also weird.

NeedsInvestigation release-blocker

Most helpful comment

At a five second glance it doesn't look like a trivial rewrite. Perhaps for 1.14 we should just drop the arm assembly code. Is this performance critical?

All 12 comments

Seeing it also on the BSDs.

Any pattern to the affected go revisions or architectures?

(Possibly related to #35473 or #35470 or #35462?)

CC @mdempsky

This is not flaky, it only passes on the release branches. Bisecting with the help of a Raspberry Pi.

2ff746d7dc3ce5ce7034bfcc3af16b7b8eab7413 is the first bad commit

commit 2ff746d7dc3ce5ce7034bfcc3af16b7b8eab7413
Author: Cherry Zhang <[email protected]>
Date:   Sun Oct 20 17:25:02 2019 -0400

    runtime: add async preemption support on ARM

    This CL adds support of call injection and async preemption on
    ARM.

    Injected call, like sigpanic, has special frame layout. Teach
    traceback to handle it.

    Change-Id: I887e90134fbf8a676b73c26321c50b3c4762dba4
    Reviewed-on: https://go-review.googlesource.com/c/go/+/202338
    Run-TryBot: Cherry Zhang <[email protected]>
    Reviewed-by: Austin Clements <[email protected]>

The only arm assembly we have is poly1305/sum_arm.s, which wasn't touched in the last 3 years.

/cc @cherrymui @aclements

Bad news, this affects the vendored chacha20poly1305, so it's a Go 1.14 release blocker =(

/cc @golang/osp-team

The assembly file in x/crypto/poly1305 is broken. It uses the G register as a temporary register, which violates the Go ABI. E.g. https://go.googlesource.com/crypto/+/refs/heads/master/poly1305/sum_arm.s#33

This is not signal safe. We already don't preempt assembly functions. But the signal handler needs the G register. It is not only the preemption signal, but any signals, including profiling or using os/signal package, that could crash the program. E.g. running on the Linux/ARM builder with async preemption disabled but profiling enabled,

# GODEBUG=asyncpreemptoff=1 ./chacha20poly1305.test -test.v -test.short -test.cpuprofile=/tmp/x
=== RUN   TestVectors
--- PASS: TestVectors (0.02s)
=== RUN   TestRandom
=== RUN   TestRandom/Standard
Segmentation fault (core dumped)

Turning on profiling, it also fails with Go 1.13.

Looks like it's been broken since it was added in 2015 in https://github.com/golang/crypto/commit/4d48e5fa3d62b5e6e71260571bf76c767198ca02

At a five second glance it doesn't look like a trivial rewrite. Perhaps for 1.14 we should just drop the arm assembly code. Is this performance critical?

Yeah, I figured it would be that again. We had similar issues with other old ARM assembly.

Let's drop it. This assembly was never ported to the Writer interface (https://golang.org/cl/111335) anyway, so it was going to stop being used by chacha20poly1305 soon one way or another (https://golang.org/cl/206977) and no one uses poly1305 standalone.

The performance drop is as dramatic as you'd expect on 32 bit ARM, but "only" 2x for the combined chacha20poly1305...

name              old time/op    new time/op    delta
64                  1.44µs ± 0%    9.44µs ± 0%  +556.77%  (p=0.000 n=9+9)
1K                  16.2µs ± 0%   131.9µs ± 0%  +714.99%  (p=0.000 n=9+9)
2M                  44.2ms ± 0%   283.6ms ± 0%  +541.38%  (p=0.000 n=10+9)
64Unaligned         1.82µs ± 0%    9.44µs ± 0%  +417.32%  (p=0.000 n=9+9)
1KUnaligned         22.3µs ± 0%   131.8µs ± 0%  +489.78%  (p=0.000 n=10+10)
2MUnaligned         57.1ms ± 0%   284.6ms ± 1%  +398.71%  (p=0.000 n=9+8)
Write64             8.38µs ± 0%    8.39µs ± 0%    +0.17%  (p=0.002 n=9+8)
Write1K              131µs ± 0%     131µs ± 0%      ~     (p=0.237 n=8+10)
Write2M              283ms ± 0%     284ms ± 0%      ~     (p=0.077 n=9+9)
Write64Unaligned    8.38µs ± 0%    8.40µs ± 0%      ~     (p=0.099 n=9+10)
Write1KUnaligned     131µs ± 1%     131µs ± 0%      ~     (p=0.247 n=10+10)
Write2MUnaligned     284ms ± 0%     284ms ± 0%      ~     (p=0.277 n=8+9)

name              old speed      new speed      delta
64                44.5MB/s ± 0%   6.8MB/s ± 0%   -84.77%  (p=0.000 n=9+9)
1K                63.3MB/s ± 0%   7.8MB/s ± 0%   -87.73%  (p=0.000 n=9+9)
2M                47.4MB/s ± 0%   7.4MB/s ± 0%   -84.41%  (p=0.000 n=10+9)
64Unaligned       35.1MB/s ± 0%   6.8MB/s ± 0%   -80.67%  (p=0.000 n=9+7)
1KUnaligned       45.8MB/s ± 0%   7.8MB/s ± 0%   -83.04%  (p=0.000 n=10+10)
2MUnaligned       36.8MB/s ± 0%   7.4MB/s ± 1%   -79.95%  (p=0.000 n=9+8)
Write64           7.64MB/s ± 0%  7.63MB/s ± 0%    -0.14%  (p=0.007 n=9+8)
Write1K           7.83MB/s ± 0%  7.84MB/s ± 0%      ~     (p=0.819 n=8+10)
Write2M           7.40MB/s ± 0%  7.39MB/s ± 0%      ~     (p=0.102 n=9+9)
Write64Unaligned  7.63MB/s ± 0%  7.62MB/s ± 0%      ~     (p=0.100 n=9+9)
Write1KUnaligned  7.83MB/s ± 0%  7.84MB/s ± 0%      ~     (p=0.453 n=9+10)
Write2MUnaligned  7.38MB/s ± 0%  7.37MB/s ± 0%      ~     (p=0.085 n=8+9)
name                          old time/op    new time/op    delta
Chacha20Poly1305/Open-64        22.2µs ± 0%    34.6µs ± 0%   +55.42%  (p=0.000 n=10+10)
Chacha20Poly1305/Seal-64        20.2µs ± 0%    32.4µs ± 0%   +60.57%  (p=0.000 n=9+9)
Chacha20Poly1305/Open-64-X      26.5µs ± 0%    39.7µs ± 0%   +49.71%  (p=0.000 n=9+9)
Chacha20Poly1305/Seal-64-X      25.1µs ± 0%    39.0µs ± 0%   +55.07%  (p=0.000 n=10+10)
Chacha20Poly1305/Open-1350       160µs ± 0%     326µs ± 0%  +103.19%  (p=0.000 n=10+9)
Chacha20Poly1305/Seal-1350       158µs ± 0%     323µs ± 1%  +104.45%  (p=0.000 n=9+9)
Chacha20Poly1305/Open-1350-X     166µs ± 0%     332µs ± 0%  +100.68%  (p=0.000 n=10+10)
Chacha20Poly1305/Seal-1350-X     164µs ± 1%     331µs ± 0%  +101.96%  (p=0.000 n=10+9)
Chacha20Poly1305/Open-8192       842µs ± 1%    1798µs ± 0%  +113.42%  (p=0.000 n=9+10)
Chacha20Poly1305/Seal-8192       856µs ± 0%    1812µs ± 0%  +111.72%  (p=0.000 n=9+9)
Chacha20Poly1305/Open-8192-X     849µs ± 0%    1805µs ± 0%  +112.66%  (p=0.000 n=10+9)
Chacha20Poly1305/Seal-8192-X     864µs ± 0%    1820µs ± 0%  +110.70%  (p=0.000 n=9+9)

name                          old speed      new speed      delta
Chacha20Poly1305/Open-64      2.88MB/s ± 0%  1.85MB/s ± 0%   -35.76%  (p=0.008 n=6+7)
Chacha20Poly1305/Seal-64      3.17MB/s ± 1%  1.97MB/s ± 0%   -37.78%  (p=0.000 n=10+8)
Chacha20Poly1305/Open-64-X    2.41MB/s ± 0%  1.61MB/s ± 0%   -33.29%  (p=0.000 n=9+9)
Chacha20Poly1305/Seal-64-X    2.55MB/s ± 0%  1.64MB/s ± 0%   -35.61%  (p=0.000 n=10+9)
Chacha20Poly1305/Open-1350    8.43MB/s ± 0%  4.15MB/s ± 0%   -50.78%  (p=0.000 n=10+10)
Chacha20Poly1305/Seal-1350    8.55MB/s ± 0%  4.18MB/s ± 0%   -51.12%  (p=0.000 n=9+9)
Chacha20Poly1305/Open-1350-X  8.16MB/s ± 0%  4.06MB/s ± 0%   -50.18%  (p=0.000 n=10+10)
Chacha20Poly1305/Seal-1350-X  8.24MB/s ± 1%  4.08MB/s ± 1%   -50.53%  (p=0.000 n=10+10)
Chacha20Poly1305/Open-8192    9.73MB/s ± 1%  4.56MB/s ± 0%   -53.15%  (p=0.000 n=9+10)
Chacha20Poly1305/Seal-8192    9.57MB/s ± 0%  4.52MB/s ± 0%   -52.77%  (p=0.000 n=9+9)
Chacha20Poly1305/Open-8192-X  9.65MB/s ± 0%  4.54MB/s ± 0%   -52.95%  (p=0.000 n=10+7)
Chacha20Poly1305/Seal-8192-X  9.47MB/s ± 1%  4.50MB/s ± 0%   -52.50%  (p=0.000 n=10+9)

Change https://golang.org/cl/213880 mentions this issue: poly1305: drop broken arm assembly

Reopening for importing into the go tree.

Change https://golang.org/cl/214078 mentions this issue: src/go.mod: update x/crypto to drop broken poly1305 arm assembly

Was this page helpful?
0 / 5 - 0 ratings