Julia: Darwin/ARM64 tracking issue

Created on 11 Jul 2020  Â·  26Comments  Â·  Source: JuliaLang/julia

I figured it would be worth having a single issue to track all the known issues on Apple Silicon. I'll try to keep this list updated as things get fixed or people encounter additional issues.

  • [x] Add MacOS(:aarch64) as a valid platform (https://github.com/JuliaLang/Pkg.jl/pull/1916)
  • [x] Port Mach exception handling (https://github.com/JuliaLang/julia/pull/36592)
  • [x] Unconditioanlly enable CRC32 (#36624) Hook up ARM feature detection (via sysctl hw.optional)
  • [x] Figure out where to get a Fortran compiler from (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96168 for gcc discussion - other compiler seem farther away)
  • [x] Build BB shard (should be relatively straightforward once we have a Fortran compiler - https://github.com/JuliaPackaging/Yggdrasil/pull/1626)
  • [ ] Fix synchronous memory exception delivery (#36625)
  • [x] Upgrade config.sub (fixed by #36615)
  • [x] GMP fails to compile (fixed by #36616)
  • [ ] PCRE2 crashes (https://bugs.exim.org/show_bug.cgi?id=2618 https://github.com/zherczeg/sljit/pull/90)
  • [x] uv_cpu_info errors (tracked at https://github.com/libuv/libuv/issues/2911)
  • [x] LLVM9 process ARM64 relocations incorrectly (seems to be fixed in LLVM10 - needs #35318)
  • [ ] Our libosxunwind is 7 years old and does not have ARM64 support - we should replace it by LLVM libunwind
  • [x] Unknown LLVM issue with the following symptom (might be a general LLVM10 issue?): (https://github.com/JuliaPackaging/Yggdrasil/pull/1318)
      From worker 14:   While deleting: i8* %splitgep
      From worker 14:   An asserting value handle still pointed to this value!
      From worker 14:   UNREACHABLE executed at /Users/julia/julia/deps/srccache/llvm-10.0.0/lib/IR/Value.cpp:917!
      From worker 14:   
      From worker 14:   signal (6): Abort trap: 6
      From worker 14:   in expression starting at /Users/julia/julia/usr/share/julia/stdlib/v1.6/LinearAlgebra/test/diagonal.jl:11
  • [ ] Some sort of issue during precompile:
Generating REPL precompile statements... 22/28ERROR: LoadError: IOError: stream is closed or unusable
  • [ ] Test failure in worlds test:
worlds                             (4) |         failed at 2020-11-13T00:31:04.270
On worker 4:
BoundsError: attempt to access 3-element BitVector at index [0:3]
  • [ ] Test failure in numbers test (related to SIGFPE handling):
Worker 6 terminated.
numbers                            (6) |         failed at 2020-11-13T00:31:34.703
ProcessExitedException(6)
  • [ ] Segfault in complex test:
    complex (2) | started at 2020-11-13T00:39:12.332
      From worker 2:    
      From worker 2:    signal (11): Segmentation fault: 11
      From worker 2:    in expression starting at /Users/julia/julia23/test/complex.jl:30
      From worker 2:    jl_method_error_bare at /Users/julia/julia23/usr/lib/libjulia.1.6.dylib (unknown line)
      From worker 2:    jl_method_error at /Users/julia/julia23/usr/lib/libjulia.1.6.dylib (unknown line)
      From worker 2:    jl_apply_generic at /Users/julia/julia23/usr/lib/libjulia.1.6.dylib (unknown line)
      From worker 2:    do_call at /Users/julia/julia23/usr/lib/libjulia.1.6.dylib (unknown line)
  • [ ] Test failures in complex test (filed as #38419)
  • [ ] Several tests run forever:
LinearAlgebra/triangular (running for 61 minutes)
LinearAlgebra/addmul (running for 55 minutes)
bitarray (running for 53 minutes)
iterators (running for 52 minutes)
ccall (running for 39 minutes)
loading (running for 39 minutes)
sorting (running for 24 minutes)
  • [ ] Test failure in inference
compiler/inference                 (5) |         failed at 2020-11-13T01:24:18.980
Test Failed at /Users/julia/julia23/test/compiler/inference.jl:944
  Expression: break_21369()
    Expected: ErrorException
      Thrown: BoundsError
  • [ ] Build system hacks since we don't have a native GCC toolchain built (#38421)
arm help wanted mac

Most helpful comment

Yep, we're on top of it (https://github.com/JuliaPackaging/Yggdrasil/pull/1626), thanks!

All 26 comments

Is the compiler enabling all the features available by default?

In another word, does it pass https://github.com/JuliaLang/julia/blob/a23a4ff08da5b6d95e9a35eee96e3260a452c02b/src/crc32c.c#L328 by default? Or do we have to do a +crc one way or another ourselves.

Here's what's enabled by default:

#define __ARM64_ARCH_8__ 1
#define __ARM_64BIT_STATE 1
#define __ARM_ACLE 200
#define __ARM_ALIGN_MAX_STACK_PWR 4
#define __ARM_ARCH 8
#define __ARM_ARCH_ISA_A64 1
#define __ARM_ARCH_PROFILE 'A'
#define __ARM_FEATURE_CLZ 1
#define __ARM_FEATURE_CRYPTO 1
#define __ARM_FEATURE_DIRECTED_ROUNDING 1
#define __ARM_FEATURE_DIV 1
#define __ARM_FEATURE_FMA 1
#define __ARM_FEATURE_IDIV 1
#define __ARM_FEATURE_LDREX 0xF
#define __ARM_FEATURE_NUMERIC_MAXMIN 1
#define __ARM_FEATURE_UNALIGNED 1
#define __ARM_FP 0xE
#define __ARM_FP16_ARGS 1
#define __ARM_FP16_FORMAT_IEEE 1
#define __ARM_NEON 1
#define __ARM_NEON_FP 0xE
#define __ARM_NEON__ 1
#define __ARM_PCS_AAPCS64 1
#define __ARM_SIZEOF_MINIMAL_ENUM 4
#define __ARM_SIZEOF_WCHAR_T 4

Since I doubt there'll be a mac without crc32, we should just add that to the default feature flags in our Makefile. For everything else we can do runtime detection with sysctl.

I'm surprised that it enables crypto but not crc.... Yeah, I don't think it's worth doing a runtime detection here.

And from https://github.com/JuliaLang/julia/pull/36592#issuecomment-656984903 it doesn't seem to provide all the features that LLVM may use

The features detectable currently appears to be

hw.optional.neon_fp16: fullfp16
hw.optional.armv8_1_atomics: lse
hw.optional.armv8_crc32: crc
hw.optional.armv8_2_fhm: fp16fml
__ARM_FEATURE_CRYPTO (compile time): aes, sha2

The ones that should be supported on that CPU (all requirement from armv8.3-a) are jsconv, complxnum, rcpc, ccpp, rdm. Some of the floating point ones are quite intereting.

Also intereting that since fp16fml is reported the featureset is closer to that of a13 than a12. (that or the LLVM feature set for a12 is wrong...)


Anyway, this is probably a low priority item...

Looks like they're just shipping an old LLVM, e.g. if I try to build jsconv (just to see whether it would run) fatal error: error in backend: Cannot select: intrinsic %llvm.aarch64.fjcvtzs

Huh, which LLVM version do they have? Over at https://github.com/JuliaLang/julia/blob/a23a4ff08da5b6d95e9a35eee96e3260a452c02b/src/features_aarch64.h#L24 I was assuming as long as the feature is available in AArch64.td it's usable... Is that not the case? (and/or is that a mac only problem?)

Huh, which LLVM version do they have

I don't know. It claims to be LLVM 12, but Apple lies about versions. I'm building upstream clang now to try it out.

It also seems that although the feature was added in https://reviews.llvm.org/D54633 which is in LLVM 8.0 the intrinsic wasn't added until https://reviews.llvm.org/D64495 much later. Does that error mean that it's a recognized intrinsic but just isn't supported by the backend? I guess just writing inline assembly shoud be good enough for testing.

Fails upstream too.

Works with raw llc and +mattr though, so I'm gonna say it does exist.

... I thought the error you got is a backend one..... (so llc should behave the same as clang = = ....., unless clang emits the wrong IR...)

I manually added the correct mattr to llc. I also managed to get it to work with -mcpu=apple-a12 at the clang level (appears to default to apple-a7). I filed an issue with Apple to get a better error message as well as bumping the default.

Ah, OK. So you didn't set the target when running with clang.

I tried, but mattr=armv8.3-a+jsconv didn't seem to do it.

  From worker 14: While deleting: i8* %splitgep
  From worker 14: An asserting value handle still pointed to this value!
  From worker 14: UNREACHABLE executed at /Users/julia/julia/deps/srccache/llvm-10.0.0/lib/IR/Value.cpp:917!

Ah, this is where I've seen this issue... It's not Darwin or ARM/AArch64 specific and it's fixed by https://reviews.llvm.org/D84031

Can we get a BB shard going without the Fortran compiler, and see how much of the BB ecosystem can be built?

Just thinking out aloud here. The major use of Fortran in the julia build is to build LAPACK (part of the openblas build). We could have a Fortran to Julia translator and move LAPACK to Julia. Of course BB has a bunch of other fortran libraries, and there's lot of commercial software packages that need fortran compilers.

We could have a Fortran to Julia translator and move LAPACK to Julia.

If anyone is interested in helping, I'll be happy to add and maintain Fortran to Julia translator in LFortran. We already have LLVM and C++ backends. It took us quite some time to get to this point, as a lot of infrastructure had to be figured out and implemented, but we now have a foundation of a production C++ implementation of the compiler and are making rapid progress in adding features. As an example of what works already, this Fortran code:

https://gitlab.com/lfortran/lfortran/-/blob/7384b0ff81eaa2043281e48ae5158d34fcbf26f6/integration_tests/arrays_04.f90

gets correctly translated to this C++ code (and it compiles and runs):

https://gitlab.com/lfortran/lfortran/-/blob/master/tests/reference/cpp-arrays_04-ae9bd17.stdout

The C++ translator itself is implemented here: https://gitlab.com/lfortran/lfortran/-/blob/7384b0ff81eaa2043281e48ae5158d34fcbf26f6/src/lfortran/codegen/asr_to_cpp.cpp, as you can see it is a simple visitor pattern over the Abstract Semantic Representation (ASR) which contains all the types and everything is figured out and ready for LLVM or C++ translation.

I don't like making predictions how long it will take us to be able to compile Lapack, but I am hoping it is in the range of months now.

Assuming we could translate Lapack to C++ (or Julia also) automatically and correctly and quickly in a few months, what would be the workflow?

I can imagine two workflows in the future:

  • You translate once and just maintain the resulting code in C++ (or Julia). We will try to ensure the translator produces a nice readable and maintainable C++ code.

  • You keep Lapack in Fortran, but translate each new version to C++ or Julia. That way when upstream makes some changes, you will get them.

Regarding speed and performance of the translated code, that is currently unclear to me whether there can be some obstacle that would prevent it to match the performance of the original Fortran code. But we will find out, and I would think it should be possible to translate in a way to keep the performance.

LAPACK will keep moving upstream. So we have to keep running the translator on any new version - perhaps could even be integrated into BinaryBuilder. Performance shouldn't be a major problem - since 90% of the performance is anyways from calling the BLAS. The main problem will be testing correctness. Presumably the LAPACK tests translated + Julia tests may be sufficient to get started.

@ViralBShah that makes sense. Regarding correctness: my goal is for people to use LFortran as a regular Fortran compiler via LLVM, which will ensure that the parsing -> AST -> ASR -> LLVM is all correct. The ASR -> C++ backend is thus starting from a well tested starting point (ASR) that has been exercised well via the LLVM route, so there will be bugs, but they will be well isolated, and engineering-wise I think this can be delivered and made robust. The ASR -> Julia backend would be similar.

I am very excited about this, and I will keep you updated. As I said, it will take us probably months to get something initially usable, and then it takes time to mature everything, so I don't want to give you false hope that it can fix your immediate problem; but I will work towards this, I think it will become very useful to a lot of people once it matures.

I think for actively developed upstream projects, we'd rather just use lfortran as a straight LLVM compiler. The automatic translation part mostly makes sense where people want to do new development in Julia.

Just learned that there’s some ongoing effort at porting the GCC backend: https://github.com/iains/gcc-darwin-arm64

Yep, we're on top of it (https://github.com/JuliaPackaging/Yggdrasil/pull/1626), thanks!

Should LLVM9 process ARM64 relocations incorrectly be marked done, since the linked PR is merged?

I've updated the tracking list with all items I currently know about.

I wonder how well Julia will run on Rossetta 2.

Works ok, but at reduced perf of course.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

TotalVerb picture TotalVerb  Â·  3Comments

StefanKarpinski picture StefanKarpinski  Â·  3Comments

wilburtownsend picture wilburtownsend  Â·  3Comments

dpsanders picture dpsanders  Â·  3Comments

musm picture musm  Â·  3Comments