Go: cmd/compile: go binaries not working on exynos 64 bit CPUs

Created on 26 Oct 2018  ·  27Comments  ·  Source: golang/go

Go binaries have stopped working on Galaxy S9+. The breaking commit seems to be 0a7ac93c27c9ade79fe0f66ae0bb81484c241ae5.

What version of Go are you using (go version)?

go 1.11.1 (worked in go 1.10.4)

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

Android on Arm64 (Android 8.0 on a Galaxy S9+)

What did you do?

Used gomobile to compile the following code and used it in an Android app to display a toast notification.

package hello

import (
    "runtime"
)

func Message() string {
    return "Hello: " + runtime.Version()
}

What did you expect to see?

The app displaying a Toast with the text "Hello: "

What did you see instead?

The app crashing with a SIGILL.

Credits

The problem was brought to our attention in the gophers slack channel by @shipit.

Git bisect output

0a7ac93c27c9ade79fe0f66ae0bb81484c241ae5 is the first bad commit
commit 0a7ac93c27c9ade79fe0f66ae0bb81484c241ae5
Author: Wei Xiao <[email protected]>
Date:   Fri Nov 3 02:05:28 2017 +0000

    cmd/compile: improve atomic add intrinsics with ARMv8.1 new instruction

    ARMv8.1 has added new instruction (LDADDAL) for atomic memory operations. This
    CL improves existing atomic add intrinsics with the new instruction. Since the
    new instruction is only guaranteed to be present after ARMv8.1, we guard its
    usage with a conditional on CPU feature.

    Performance result on ARMv8.1 machine:
    name        old time/op  new time/op  delta
    Xadd-224    1.05µs ± 6%  0.02µs ± 4%  -98.06%  (p=0.000 n=10+8)
    Xadd64-224  1.05µs ± 3%  0.02µs ±13%  -98.10%  (p=0.000 n=9+10)
    [Geo mean]  1.05µs       0.02µs       -98.08%

    Performance result on ARMv8.0 machine:
    name        old time/op  new time/op  delta
    Xadd-46      538ns ± 1%   541ns ± 1%  +0.62%  (p=0.000 n=9+9)
    Xadd64-46    505ns ± 1%   508ns ± 0%  +0.48%  (p=0.003 n=9+8)
    [Geo mean]   521ns        524ns       +0.55%

    Change-Id: If4b5d8d0e2d6f84fe1492a4f5de0789910ad0ee9
    Reviewed-on: https://go-review.googlesource.com/81877
    Run-TryBot: Cherry Zhang <[email protected]>
    TryBot-Result: Gobot Gobot <[email protected]>
    Reviewed-by: Cherry Zhang <[email protected]>

:040000 040000 848bfd65d49b5fe8b5472954c50e6596a4ad15a6 c77dcbdac660c41fb79e07785745459b5d704489 M  src
FrozenDueToAge NeedsInvestigation release-blocker

Most helpful comment

The easy sledgehammer is disabling the optimization if runtime.GOOS == "android".

@gopherbot please consider this for backport to 1.11. It's an unavoidable crash on a popular Android phone and easy to avoid from Go.

All 27 comments

grafik
CPU info

Seems like a likely culprit.
The question is: is the go toolchain not guarding the new instruction correctly, or is your system incorrectly reporting that the instruction is available, when it isn't?

I see "atomics" on your list of cpu features. That's the feature we're using to guard the new instruction.
Is that not the right one? Or is your chip lying to you?

How can I help to find that out?

Read the docs for that new instruction (LDADDAL)?
I notice that your chip has two different speed cores on it. Maybe some support LDADDAL and some don't?
Read through src/internal/cpu/cpu_arm64.go and look for the atomics flag. See if the constants there agree with the hwcap bits we should be checking.
Compile a binary and make sure the guard code around the new instruction looks correct.

We've encountered the error only in the 64 bit version, compiling it with 32 bit support only worked perfectly.

I notice that your chip has two different speed cores on it. Maybe some support LDADDAL and some don't?

All cores seem to support atomics (flags are set).

Read through src/internal/cpu/cpu_arm64.go and look for the atomics flag. See if the constants there agree with the hwcap bits we should be checking.

They do

Compile a binary and make sure the guard code around the new instruction looks correct.

This goes a little beyond my capabilities.

Reading through the ARM manual (https://static.docs.arm.com/ddi0487/ca/DDI0487C_a_armv8_arm.pdf) I suspect there is a check in ID_AA64ISAR0_EL1 (PDF page 2518) for the Arm 8.1 atomics missing.

Also a hint is the CPU info from the device:

$ cat /proc/cpuinfo                                                                                   
processor   : 0
BogoMIPS    : 52.00
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x0
CPU part    : 0xd05
CPU revision    : 1
// + 3 more

processor   : 4
BogoMIPS    : 52.00
Features    : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp
CPU implementer : 0x53
CPU architecture: 8
CPU variant : 0x1
CPU part    : 0x002
CPU revision    : 0
// + 3 more

as you can see the small cores are ARM 8.0 and the big cores are 8.1.

I notice that your chip has two different speed cores on it. Maybe some support LDADDAL and some don't?

Seems like your first suspicion was right, while they all support atomics, LDADDAL is only available on Arm 8.1, not 8.0.

We've encountered the error only in the 64 bit version, compiling it with 32 bit support only worked perfectly.

That's expected, the new instruction is only used for 64-bit builds.

All cores seem to support atomics (flags are set).

Well, they claim to. We should test that claim. Can you write some assembly with a LDADDAL instruction in it, compile and run it? No Go, just C and/or assembly.
There may be a way to use gcc intrinsics to get that assembly op without writing assembly, not sure.

This goes a little beyond my capabilities.

I took a look and it seems ok at first glance, but the Go arm64 disassembler gives up on the instruction in question, so I can't be sure. I need access to the platform disassembler. Could you build this program (go build program.go), run objdump -d on the executable, and post the disassembled code for main.main?

package main

import "sync/atomic"

var p int64

func main() {
    atomic.AddInt64(&p, 1)
}

We could add a test for the CPU variant if there's an easy way to get it.
Will the OS move a Go process between cores of different types? If so, we need to find the lowest common denominator feature set somehow.

I wish the OS didn't lie to us and present the atomics feature when all cores don't support it. It's going to be hard (possibly, impossible) for us to work around it.

Piece of information while I try to get you your objdump ready
smartselect_20181027-011758_termux

Here is the object dump

0000000000087c08 <main.main>:
   87c08:   90000520    adrp    x0, 12b000 <runtime.trace+0xf2a0>
   87c0c:   913a8c00    add x0, x0, #0xea3
   87c10:   39400000    ldrb    w0, [x0]
   87c14:   b5000120    cbnz    x0, 87c38 <main.main+0x30>
   87c18:   b24003e0    orr x0, xzr, #0x1
   87c1c:   90000521    adrp    x1, 12b000 <runtime.trace+0xf2a0>
   87c20:   913c8021    add x1, x1, #0xf20
   87c24:   c85ffc22    ldaxr   x2, [x1]
   87c28:   8b000042    add x2, x2, x0
   87c2c:   c81bfc22    stlxr   w27, x2, [x1]
   87c30:   b5ffffbb    cbnz    x27, 87c24 <main.main+0x1c>
   87c34:   d65f03c0    ret
   87c38:   b24003e0    orr x0, xzr, #0x1
   87c3c:   90000521    adrp    x1, 12b000 <runtime.trace+0xf2a0>
   87c40:   913c8021    add x1, x1, #0xf20
   87c44:   f8e00022    ldaddal x0, x2, [x1]
   87c48:   8b000042    add x2, x2, x0
   87c4c:   17fffffa    b   87c34 <main.main+0x2c>
    ...

Thanks. That code looks correct to me.
I think your taskset test says it best. Some of your chips just can't execute the instruction, even though the OS says they can.
I'm not sure what the fix would be.
How would a C program handle this? Can any C program actually use the new instructions without being run inside a taskset cage?

@randall77 the only way I can think of would be for the C code to put itself in a taskset cage. Go code could potentially do something similar by using syscall.SYS_SCHED_SETSCHEDULER (syscall equivalent of taskset), but this seems extremely messy. Auto-detecting this may also be difficult.

Also, ARM documentation states that the big and little cores must have the same instruction support. Therefore, this is really a hardware bug. Unfortunately, it is a widespread bug.

Source: https://developer.arm.com/technologies/big-little

Clarification: The 4 big cores in the Samsung Exynos 9810 are custom Mongoose 3 cores which support ~ARMv8.3~ ARMv8.0, and the 4 little cores are standard ARM Cortex-A55 cores which support ARMv8.2.

~Except LDADDAL is supposed to be ARMv8.1 so the A55 might actually have full ARMv8.2 support.~
I was wrong, LDADDAL is supported by the Cortex-A55. It is the other way around - the little cores have a newer ISA version than the big cores.

If we can confirm that the Go compiler is doing this right, then the best solution may be to report this to Samsung. The most reasonable fix I can think of would be updating the kernel to report the lowest common feature denominator instead of that of the local core.

I agree, this isn't a bug in Go. It's a bug in the feature set reported by the kernel.
I'm going to close this issue on the Go end.
It would probably be helpful for someone to write a C/asm program to demonstrate the problem and report that to the kernel or chip maintainers.

Update:

I have made a minimal C example of this. It is available on a gist: https://gist.github.com/jadr2ddude/aee7b47a91b70ee3dd75cad97e5ff4e0

Also on twitter: https://twitter.com/CompuJad/status/1057744906943377408

Even though it's a bug in the kernel or hardware, I think Go should avoid triggering it. It seems gomobile (1.11) programs are completely broken on one of the most widely used Android phones.

We have also experienced this behaviour, however only in debug binaries. We could not trigger it with release binaries (somehow).
Perhaps the process was scheduled on ARM v8.2 cores...

In order to detect this, we would need to read /proc/cpuinfo, and check for a mongoose 3 core. Cannot rely on the reported architecture variant - the kernel reports this incorrectly.

The easy sledgehammer is disabling the optimization if runtime.GOOS == "android".

@gopherbot please consider this for backport to 1.11. It's an unavoidable crash on a popular Android phone and easy to avoid from Go.

Backport issue(s) opened: #28586 (for 1.11).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

Change https://golang.org/cl/147377 mentions this issue: runtime: avoid arm64 8.1 atomics on Android

I don't have a Samsung S9+ to test CL 147377. Can someone with the offending hardware verify the fix?

I can confirm that it works now (android app built with gomobile and this Go version).

Update: the kernel problem was resolved in an update. The kernel no longer reports support for atomics. Image from @noxer of my atomics test.

smartselect_20190210-013206_termux

Was this page helpful?
0 / 5 - 0 ratings