I tried to build the debug build on Power ppc64le , I get the following error:
Sysimage built. Summary:
Total โโโโโโโ 372.283735 seconds
Base: โโโโโโโ 130.814623 seconds 35.1384%
Stdlibs: โโโโ 241.469020 seconds 64.8616%
JULIA /home/cloud-user/builddebug/julia/usr/lib/julia/sys-debug-o.a
Generating precompile statements... 1738 generated in 491.452075 seconds (overhead 298.335000 seconds)
signal (11): Segmentation fault
in expression starting at none:0
_ZN12_GLOBAL__N_120PPCTargetELFStreamer6finishEv at /home/cloud-user/builddebug/julia/usr/bin/../lib/libLLVM-9jl.so (unknown line)
_ZN4llvm10MCStreamer6FinishEv at /home/cloud-user/builddebug/julia/usr/bin/../lib/libLLVM-9jl.so (unknown line)
_ZN4llvm10AsmPrinter14doFinalizationERNS_6ModuleE at /home/cloud-user/builddebug/julia/usr/bin/../lib/libLLVM-9jl.so (unknown line)
_ZN12_GLOBAL__N_118PPCLinuxAsmPrinter14doFinalizationERN4llvm6ModuleE at /home/cloud-user/builddebug/julia/usr/bin/../lib/libLLVM-9jl.so (unknown line)
_ZN4llvm13FPPassManager14doFinalizationERNS_6ModuleE at /home/cloud-user/builddebug/julia/usr/bin/../lib/libLLVM-9jl.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/cloud-user/builddebug/julia/usr/bin/../lib/libLLVM-9jl.so (unknown line)
_ZN4llvm6legacy11PassManager3runERNS_6ModuleE at /home/cloud-user/builddebug/julia/usr/bin/../lib/libLLVM-9jl.so (unknown line)
operator() at /home/cloud-user/juliav1.3.1_2debug/julia/src/aotcompile.cpp:508
jl_dump_native at /home/cloud-user/juliav1.3.1_2debug/julia/src/aotcompile.cpp:542
jl_write_compiler_output at /home/cloud-user/juliav1.3.1_2debug/julia/src/precompile.c:88
jl_atexit_hook at /home/cloud-user/juliav1.3.1_2debug/julia/src/init.c:227
main at /home/cloud-user/juliav1.3.1_2debug/julia/ui/repl.c:218
.annobin_libc_start.c at /lib64/libc.so.6 (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
Allocations: 167065988 (Pool: 167027255; Big: 38733); GC: 135
/bin/sh: line 1: 2330 Segmentation fault (core dumped) JULIA_BINDIR=/home/cloud-user/builddebug/julia/usr/bin /home/cloud-user/builddebug/julia/usr/bin/julia-debug -O0 -C "native" --output-o /home/cloud-user/builddebug/julia/usr/lib/julia/sys-debug-o.a.tmp --startup-file=no --warn-overwrite=yes --sysimage /home/cloud-user/builddebug/julia/usr/lib/julia/sys.ji /home/cloud-user/juliav1.3.1_2debug/julia/contrib/generate_precompile.jl 1
*** This error is usually fixed by running `make clean`. If the error persists, try `make cleanall`. ***
make[1]: *** [/home/cloud-user/juliav1.3.1_2debug/julia/sysimage.mk:87: /home/cloud-user/builddebug/julia/usr/lib/julia/sys-debug-o.a] Error 1
make: *** [/home/cloud-user/juliav1.3.1_2debug/julia/Makefile:87: julia-sysimg-debug] Error 2
Could anyone point on on what might be going wrong? The non-debug however does build fine.
My reply is not going to help with this particular issue, but hopefully provides the bigger picture.
It's going to be very hard to fix ppc issues, since almost nobody who develops Julia has access to these machines. Even so, when issues are fixed, there is no CI and stuff keeps regressing. In order to make PPC support solid, the PPC user community needs to step in with code, funds, ownership of the port, and of course, reporting issues as you have done here.
What you can do as a user to help this is to report these issues to your organization or IBM, and that you would like to see Julia running on this architecture.
Even so, when issues are fixed, there is no CI and stuff keeps regressing
We have/had CI but we never got to the point we're that was failure free and we could turn it on for the repo in general. Right now that CI is provided through the ResearchComputing group at MIT (https://researchcomputing.mit.edu/satori/home/)
I agree that we need more people looking at this, preferably with support through IBM. But part of that is opening issues so that we can point people to these are known issues with PPC in Julia.
Should we delete the ppc64le label?
Responding to @ViralBShah here - there is free linux on power VMs available to all open source developers at https://osuosl.org/services/powerdev/. Also, any project that is Travis CI enabled can add os: ppc64le to their .travis.yml and get free CI at travis-ci.com (backed by Power servers in the IBM Power cloud). IBM has several customers interested in using Julia for large projects. My team does not have language experts but we do have a lot of porting experience. We can definitely help out in the porting but anything that requires language related support, e.g. interfaces to LLVM and such will probably need help from experts in that community. @shirodkara is part of my team and will be helping out here as he can on porting and validation. It sounds like the MIT CI solution is also a good possibility, depending on community preference.
Should we delete the ppc64le label?
Done.
We used to run the CI on OSU, but had to move recently. OSU is good for individual developers, but the systemic issue the Viral is alluding to is that developers
primarily care about the platforms they have access to and want to run code on. The subset of people wanting to run code on PPC has been fairly small.
I have been trying to maintain it as best as I can, but my time commitment to that has been fairly limited.
@gerrith3 Happy to have a conversation about how we can help. https://github.com/JuliaLang/julia/labels/power gives the current status of things I know to be broken.
The buildbot is online at https://build.julialang.org/#/builders/45 and if we get the tests to be pass reliably we can turn them on to be visible and enforce that new changes can't regress PowerPC support. Some issues require Julia knowledge, other issues require LLVM knowledge (and IBM's LLVM team is responsive to bugs filed on LLVM bugtracker.)
@vchuravy accurately describes the situation.
@gerrith3 Thanks for chiming in. In the past, discussions with IBM (in my capacity as Julia Computing CEO) and various Power users at DOE have failed to materialize funding for the Julia ecosystem on Power. Julia Computing would be happy to maintain Power with the right business incentives. The list of things is also not small and a complete supported port requires a long list of items to be tested - LLVM, Julia codegen, multi-threading, distributed capabilities, full Julia test suite passing at all times, Julia packages, binarybuilder, GPU integration, CI (including Power+GPU), etc.
In the meanwhile, it is valuable to make a start. Personally, as a community member, I am always enthusiastic to see more architectures being able to run Julia, and it is great to see IBM actively supporting Julia in the form of contributions.
@ViralBShah: Thanks for the quick response. We are going to have an internal/IBM conversation on this - the challenges are a bit broader than my team can handle but we'll see if we can get some additional help. We have a major retailer looking to use Julia on Power and have heard of a few of our other customers (mostly very large!) wanting to use Julia as well. I understand that we exited our relationship with Julia last time we had a major re-focusing effort and reduced our investment in HPC across the board, which is apparently where most of the requests for Julia were coming from at that time. I will also have my team evaluate the open tasks more thoroughly, get me a sizing and a summary of what can be handled within our team and what can not. Can we possibly talk in a few days/week and see if we can build a plan going forward? We are obviously extremely interested in helping our customers here!
Running with a Make.user containing:
LLVM_DEBUG=1
LLVM_ASSERTIONS=1
FORCE_ASSERTIONS=1
I get a better stacktrace:
julia-debug: /home/vchuravy/julia/deps/srccache/llvm-9.0.1/include/llvm/MC/MCSymbol.h:303: const llvm::MCExpr* llvm::MCSymbol::getVariableValue(bool) const: Assertion `isVariable() && "Invalid accessor!"' failed.
signal (6): Aborted
in expression starting at none:0
gsignal at /lib64/libc.so.6 (unknown line)
abort at /lib64/libc.so.6 (unknown line)
__assert_fail_base at /lib64/libc.so.6 (unknown line)
__assert_fail at /lib64/libc.so.6 (unknown line)
getVariableValue at /home/vchuravy/julia/deps/srccache/llvm-9.0.1/include/llvm/MC/MCSymbol.h:303
finish at /home/vchuravy/julia/deps/srccache/llvm-9.0.1/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.cpp:199
Finish at /home/vchuravy/julia/deps/srccache/llvm-9.0.1/lib/MC/MCStreamer.cpp:911
doFinalization at /home/vchuravy/julia/deps/srccache/llvm-9.0.1/lib/CodeGen/AsmPrinter/AsmPrinter.cpp:1666
doFinalization at /home/vchuravy/julia/deps/srccache/llvm-9.0.1/lib/Target/PowerPC/PPCAsmPrinter.cpp:1386
doFinalization at /home/vchuravy/julia/deps/srccache/llvm-9.0.1/lib/IR/LegacyPassManager.cpp:1703
runOnModule at /home/vchuravy/julia/deps/srccache/llvm-9.0.1/lib/IR/LegacyPassManager.cpp:1779
run at /home/vchuravy/julia/deps/srccache/llvm-9.0.1/lib/IR/LegacyPassManager.cpp:1863
run at /home/vchuravy/julia/deps/srccache/llvm-9.0.1/lib/IR/LegacyPassManager.cpp:1894
operator() at /home/vchuravy/julia/src/jitlayers.cpp:1025
jl_dump_native at /home/vchuravy/julia/src/jitlayers.cpp:1064
jl_write_compiler_output at /home/vchuravy/julia/src/precompile.c:93
jl_atexit_hook at /home/vchuravy/julia/src/init.c:227
main at /home/vchuravy/julia/ui/repl.c:218
generic_start_main.isra.0 at /lib64/libc.so.6 (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
Allocations: 160184348 (Pool: 160149044; Big: 35304); GC: 158
I managed to successfully build the debug build on PPC and LLVM10
Using https://github.com/JuliaLang/julia/pull/35318
[vchuravy@service0001 julia-10-bb]$ cat Make.user
USE_BINARYBUILDER=0
USE_BINARYBUILDER_LLVM=1
Slightly worrying the corresponding CI job (https://build.julialang.org/#/builders/45/builds/8714/steps/8/logs/stdio) failed with
signal (11): Segmentation fault
in expression starting at none:0
_ZN12_GLOBAL__N_120PPCTargetELFStreamer6finishEv at /buildworker/worker/package_linuxppc64le/build/usr/bin/../lib/libLLVM-10jl.so (unknown line)
Allocations: 159123504 (Pool: 159085215; Big: 38289); GC: 131
Segmentation fault (core dumped)
make[1]: *** [/buildworker/worker/package_linuxppc64le/build/usr/lib/julia/sys-o.a] Error 1
But not while building make debug, but make release so non-deterministic...
Opened upstream bug: https://bugs.llvm.org/show_bug.cgi?id=45366
Most helpful comment
@ViralBShah: Thanks for the quick response. We are going to have an internal/IBM conversation on this - the challenges are a bit broader than my team can handle but we'll see if we can get some additional help. We have a major retailer looking to use Julia on Power and have heard of a few of our other customers (mostly very large!) wanting to use Julia as well. I understand that we exited our relationship with Julia last time we had a major re-focusing effort and reduced our investment in HPC across the board, which is apparently where most of the requests for Julia were coming from at that time. I will also have my team evaluate the open tasks more thoroughly, get me a sizing and a summary of what can be handled within our team and what can not. Can we possibly talk in a few days/week and see if we can build a plan going forward? We are obviously extremely interested in helping our customers here!