Omr: Create a CI pipeline for RISC-V

Created on 5 Dec 2019  路  31Comments  路  Source: eclipse/omr

Provide the means to launch a CI build that is triggered from @omr-genie for RISC-V.

Initially, this will target building the compiler component and running the Tril tests. This can be expanded to include the JitBuilder tests once it is proven to work on RISC-V. This has a dependency on #4431.

@janvrany provided the following as a starting point for a CI pipeline that he has been using successfully: https://github.com/janvrany/omr/commit/d6193856f1fd0c526974e429ad2af88144473f83

This should eventually include building the various OMR components targeting RISC-V and running the appropriate fvtests on the emulator. #4432, #4433, #4434, and #4435 are likely prerequisites of this.

riscv ci

All 31 comments

Just a little update: I'm working on a pipeline to build and test on RISC-V - including compiler tests.
So far I only compile it natively but do not run tests. There are two things yet to be done:

  • port test hangs up, it seems that's because of file locking on NFS (my workspace is NFS-mounted)
  • compilertriltests has to be run so that it excludes tests known to fail (as compiler is not 100% finished, there are still some evaluators to be implemented)

As soon as I have a pipeline that works, tests something and passes, I plan to open a PR.

Port tests seem to pass once PR #4434 with two suggested fixes is merged in.

I have just opened a PR #4760 with basic, compile only pipelines. Despite my previous comment, I decided to skip tests initially. Now I have to go through all failing compiler tests and add
SKIP_IF() (in case they fail because of unimplemented feature, indeed). This may take some time.

The idea is that once PR #4760 is merged it is enough to actually set up a CI on Eclipse infrastructure.

Just and update:
As it seems it will take a little longer to setup CI for RISC-V, I have set up jobs that compile OMR every time a PR is merged into master (both cross-compile and native-compile). If anything fails, I'll have a look. If anyone else is interested in getting email notification for failed builds, let me know.

Meanwhile, I'm waiting for #4760 to be reviewed and merged and working on (compiler) tests so we can enable tests on RISC-V too.

Another update,

both, #4760 _RISC-V: add CI pipelines for RISC-V_ and #4821: _RISC-V: enable compiler tests in CI pipeline_ PR have been merged.

There's one outstanding issue caused by commit cb146bd which requires OMR_GC_FULL_POINTERS or OMR_GC_COMPRESSED_POINTERS to be specified when configuring a build. I have opened a PR #4912 to fix RISC-V pipelines but it has been decided to address the issue by essentially reverting the effect of commit cb146bd.

If there's anything else I can do let me know. I'd happy to help setting up Jenkins jobs.

Commit 26a9eb2e fixes the cross-compilation so now everything is ready to set up build slaves and jobs for OMR.

I'm just bumping this as there is some investigation being done on it. @fjeremic @janvrany @AdamBrousseau

I spent the last day and a bit wrestling with Docker which turns out to be some super subtle issue which seems silly in hindsight. Anyway I have a partial Dockerfile which will cross-compile OMR for RISC-V and enable us to run OMR tests via QEMU on x86-64 in reasonable time. I think we can proceed forward with this for both PR testing and builds. Here is the Dockerfile:

FROM debian:buster

# Add Debian Unstable (Sid) repositories
RUN printf "Package: *\nPin: release a=unstable\nPin-Priority: 10\n" | tee /etc/apt/preferences.d/unstable.pref
RUN printf "deb http://ftp.debian.org/debian unstable main\ndeb-src http://ftp.debian.org/debian unstable main\n" | tee /etc/apt/sources.list.d/unstable.list

# Install QEMU and mmdebstrap and repository keys (req'd to build root filesystem and run installed system)
RUN apt-get update
RUN apt-get install -y mmdebstrap qemu-user-static qemu-system-misc binfmt-support debian-ports-archive-keyring gcc-riscv64-linux-gnu rsync wget sudo

# Fix keys no longer being valid on Debian 10
RUN wget http://ftp.debian.org/debian/pool/main/d/debian-ports-archive-keyring/debian-ports-archive-keyring_2019.11.05_all.deb
RUN dpkg -i debian-ports-archive-keyring_2019.11.05_all.deb

# Install tooling needed for cross compiling
RUN sudo apt-get install -y gcc-riscv64-linux-gnu g++-riscv64-linux-gnu libgcc1-riscv64-cross libc6-dev-riscv64-cross libstdc++-8-dev-riscv64-cross linux-libc-dev-riscv64-cross wget git vim python3 ninja-build build-essential pkg-config libglib2.0-dev sudo cmake

# Download required riscv.h and riscv-opc.h from GNU binutils
RUN wget "-O/usr/riscv64-linux-gnu/include/riscv.h" 'https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob_plain;f=include/opcode/riscv.h;hb=HEAD'
RUN wget "-O/usr/riscv64-linux-gnu/include/riscv-opc.h" 'https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=blob_plain;f=include/opcode/riscv-opc.h;hb=HEAD'

# Create a "root filesystem" containing all required libraries and tools to link and run OMR
WORKDIR /root
RUN git clone https://github.com/janvrany/riscv-debian.git
WORKDIR /root/riscv-debian/roofs

# Clone and build QEMU
WORKDIR /root
RUN git clone https://git.qemu.org/git/qemu.git
WORKDIR qemu
RUN ./configure --target-list=riscv64-linux-user
RUN make -j4

# Clone and build OMR native
WORKDIR /root
RUN git clone https://github.com/eclipse/omr
WORKDIR omr/build-native
RUN cmake ..
RUN make -j4

# Setup directories for RISC-V cross-compile
WORKDIR ../build-riscv64

You can build it with:

docker build . --tag=omr-riscv-cross

Once built we need to run the container under privileged mode:

docker run -it --privileged omr-riscv-cross

Finally these are the steps which could not be completed in the Dockerfile because of the subtle issue. It turns out the following RUN command:

RUN bash /root/riscv-debian/debian-mk-rootfs.sh /root/riscv-debian/rootfs

will fail with the following error:

I: riscv64 cannot be executed, falling back to qemu-user
E: binfmt_misc not found in /proc/mounts -- not mounted?

The reason this happens is because the apt-get qemu-user-static command above should have installed a mount point mapping a device to a directory. However this was not possible because docker build does not support privileged operations such as using the mount utility. Because this mount point does not exist the RUN command calling the debian-mk-rootfs.sh script will fail with the above error. Unfortunately apt-get seems to silently somehow install this mount and it doesn't fail!

There is no way to fix this via docker build so the following has to be done manually at this point. We have to create the mount ourselves. So now that we've run the container we can manually issue the following commands:

mount -t binfmt_misc binfmt_misc /proc/sys/fs/binfmt_misc
mkdir /root/riscv-debian/rootfs && bash /root/riscv-debian/debian-mk-rootfs.sh /root/riscv-debian/rootfs

This will build the filesystem needed for RISC-V cross compilation. At the end it will ask you to enter a password for the root user in the RISC-V filesystem. You can just enter an empty password to continue (press the Enter key a few times). Now we continue:

cmake .. \
  -Wdev -DCMAKE_BUILD_TYPE=Debug -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
  -DOMR_COMPILER=ON -DOMR_TEST_COMPILER=ON \
  -DOMR_JITBUILDER=ON -DOMR_JITBUILDER_TEST=ON \
  -DCMAKE_FIND_ROOT_PATH=/root/riscv-debian/rootfs \
  -DCMAKE_TOOLCHAIN_FILE=../cmake/toolchains/riscv64-linux-cross.cmake \
  -DOMR_TOOLS_IMPORTFILE=../build-native/tools/ImportTools.cmake
make -j4

Now we can test it out by running TRIL tests for example:

time ~/qemu/build/qemu-riscv64 -L ~/riscv-debian/rootfs ./fvtest/compilertriltest/comptest
...
real    16m5.083s
user    15m52.763s
sys 0m12.816s

16 minutes for all the TRIL tests is not bad at all for an emulated platform. Next step is to try and get a farm machine so we can start installing all these tools and getting it configured for running the tests.

@jdekonin @AdamBrousseau we have a total of 13 AMD64 UNB machines on the OMR farm. Looking at the load statistics we can for sure spare a few for RISC-V cross compile build + test via QEMU as shown above. This will greatly help the progress @janvrany and @shingarov have been making to ensure code getting contributed does not break RISC-V moving forward.

Are we able to spare say two machines to begin with and reimage them to Debian 10 so we can model them after the above Dockerfile and prepare them for cross compiled builds via Jenkins?

If I may, I'd suggest to try to make it running on existing build machines. IIUC, all it takes is to:

  1. install RISC-V cross-compilation toolchain and qemu
  2. drop pre-built debian rootfs somewhere where CI pipeline can find it.

The advantage would be that we (well, not me :-) don't have to reimage machines and
we can use all machines for RISC-V builds having load spread across all of them.

@AdamBrousseau: do you know what Linux exactly are these build machines running? @fjeremic they run CentOS, if so, what version?

I'm happy to try and see what it takes later this week.

All the x86 systems on the OMR jenkins ci are ubuntu16. I agree with @janvrany, if we can set it up on one system and get it functional we can do the same config for the others to increase coverage.

All the x86 systems on the OMR jenkins ci are ubuntu16. I agree with @janvrany, if we can set it up on one system and get it functional we can do the same config for the others to increase coverage.

Ok, I'm going to boot up a clean Ubuntu 16 VM and try the steps above to see if we can get it working there. I will message you @janvrany on Slack if I encounter problems to see if we can work them out. I will report back here once we have something noteworthy.

Ubuntu 16 will not work because the cross-compilation toolchains are not available on that version [1]. They are only available starting Ubuntu 18. I built the rootfs on my Debian 10 Docker image and transferred over to a clean Ubuntu 18 installation and followed all the other steps. I am successfully able to run RISC-V build + test in this environment:

root@sparkler1:~/omr/build-riscv64# time ~/qemu/build/qemu-riscv64 -L ~/riscv-debian/rootfs ./fvtest/compilertriltest/comptest

...

[----------] 450 tests from SelectCompareTest/Int32SelectInt32CompareTest
[----------] 450 tests from SelectCompareTest/Int32SelectInt32CompareTest (4316 ms total)

[----------] 450 tests from SelectCompareTest/Int64SelectInt32CompareTest
[----------] 450 tests from SelectCompareTest/Int64SelectInt32CompareTest (4139 ms total)

[----------] 450 tests from SelectCompareTest/FloatSelectInt32CompareTest
[----------] 450 tests from SelectCompareTest/FloatSelectInt32CompareTest (4255 ms total)

[----------] 450 tests from SelectCompareTest/DoubleSelectInt32CompareTest
[----------] 450 tests from SelectCompareTest/DoubleSelectInt32CompareTest (4144 ms total)
[----------] 450 tests from SelectCompareTest/Int32SelectInt8CompareTest
Assertion failed at /root/omr/compiler/riscv/codegen/OMRTreeEvaluator.cpp:114: false
    Opcode bcmpeq is not implemented

compiling file:line:name at level: warm
#0: function TR_LinuxCallStackIterator::printStackBacktrace(TR::Compilation*)+0x38 [0x16c5876]
#1: function TR_Debug::printStackBacktrace()+0x36 [0x1490eda]
#2: ./fvtest/compilertriltest/comptest() [0x12c74da]
#3: ./fvtest/compilertriltest/comptest() [0x12c75cc]
#4: function TR::assertion(char const*, int, char const*, char const*, ...)+0xba [0x12c768a]
#5: function OMR::RV::TreeEvaluator::unImpOpEvaluator(TR::Node*, TR::CodeGenerator*)+0x56 [0x14c3d3c]
#6: function OMR::CodeGenerator::evaluate(TR::Node*)+0x1b0 [0x14e1574]
#7: function OMR::RV::TreeEvaluator::iselectEvaluator(TR::Node*, TR::CodeGenerator*)+0x7c [0x16d8d6e]
#8: function OMR::CodeGenerator::evaluate(TR::Node*)+0x1b0 [0x14e1574]
#9: function genericReturnEvaluator(TR::Node*, OMR::RealRegister::RegNum, TR_RegisterKinds, TR_ReturnInfo, TR::CodeGenerator*)+0x5c [0x16d7a40]
#10: function OMR::RV::TreeEvaluator::ireturnEvaluator(TR::Node*, TR::CodeGenerator*)+0x42 [0x16d7b90]
#11: function OMR::CodeGenerator::evaluate(TR::Node*)+0x1b0 [0x14e1574]
#12: function OMR::CodeGenerator::doInstructionSelection()+0x758 [0x14e4c2a]

real    19m54.047s
user    19m37.218s
sys     0m16.343s

It seems like the only way to proceed is to upgrade a few of our farm machines from Ubuntu 16 to Ubuntu 18, unless someone else has other alternatives this seems like the path of least resistance.

@jdekonin / @AdamBrousseau could we put in a request to upgrade 3 of our AMD64 ub16 images to ub18 to be used for RISC-V cross compilation build + test?

[1] https://packages.ubuntu.com/search?keywords=gcc-riscv64-linux-gnu

Alternative would be to build (backport) cross toolchain for Ubuntu 16 and install it there. I may try that, if it makes things easier.
Moreover, if upgrading, why not to upgrade to 20 (also LTS)?

Moreover, if upgrading, why not to upgrade to 20 (also LTS)?

Seems reasonable, though I haven't tested on 20 if the above is working. If you want to give it a shot to backport to 16 that would be appreciated because we do have machines on hand.

OK, I'll give it a shot in next few days.

I gave it a shot and failed. Will give it one more...

I gave it a second shot and missed, too.

I sucessfully compiled GCC 10 cross compiler using this script (a modification of another one I found) https://gist.github.com/janvrany/b34f09bc088bb56074fbb7fe4d2dbfae

However, such GCC is missing a library search path (/usr/lib64/lp64d/riscv64-linux-gnu), not yet sure why, so CMake does not find libraries in sysroot. Working on that. Sigh, this is far trickier than I thought!

@jdekonin / @AdamBrousseau can we get an estimate of how much work it is to reimage? We know we have a working setup with Ubuntu 18. I guess we're trying to weigh investing further effort into getting this thing to work on Ubuntu 16 via @janvrany's efforts vs. just upgrading the OS.

Yeah, spent good part of today on it to no avail - I have no idea (so far) how to add this ti GCC's search path (i.e., to --print-search-dirs).

  • clone existing ub18; 30 mins, although I would suggest we borrow an OpenJ9 instance and do the work directly on that one as I am guessing eventually a cross compile environment will be wanted there correct?
  • test with OMR build; 30 mins???
    -- its setup for testing not compilation, so not sure what the requirements are for OMR compile; 1 hour
    -- setup cross compile environment according to steps provided; 30 mins??

Once functional/completed on 1 system we could then clone to replace however many ub16 systems are wanted replaced. ~30 mins each.

I've moved a ub18 system over from OpenJ9 (since I have permissions in both). What is the best way to test it? Are there instructions or listed requirements for what is needed on the system anywhere? I cannot commit to much for time this week as I have a pretty full schedule but I can see what I can poke along.

https://ci.eclipse.org/omr/computer/ub18-x86-1/

I've moved a ub18 system over from OpenJ9 (since I have permissions in both). What is the best way to test it? Are there instructions or listed requirements for what is needed on the system anywhere?

Thanks Joe. Yes there are instructions for tools needed and an example step-by-step instructions here:
https://github.com/eclipse/omr/issues/4641#issuecomment-740316109

Creating the rootfs will not work on Ubuntu 18. The Dockerfile and instructions there are for Debian 10. The only difference for Ubuntu 18 is that we skip the rootfs creating and just copy it from a Debian 10 Docker image. I have an example Fyre instance with all this working. I can re-create the example on the new image that you added if you want. That will get us all the prerequisites ready to go. Let me know how you want to proceed.

I removed ub18-x86-1 as there were package dependencies not present from OMR jenkins. @rajdeepsingh1 has a Debian 10 install in progress for you to use.

I got access to the machine via SSH and will be performing the setup tomorrow.

Ok, so forward progress. I was able to replicate the full setup on our OMR Jenkins machine. Here are the stats:

jenkins@deb10-x64-1:~/omr/build-native$ time make -j4
real    1m9.581s
user    3m35.492s
sys 0m51.346s

jenkins@deb10-x64-1:~/omr/build$ time make -j4
real    15m57.563s
user    48m18.504s
sys 8m34.453s

jenkins@deb10-x64-1:~/omr/build$ time ~/qemu/build/qemu-riscv64 -L ~/riscv-debian/rootfs ./fvtest/compilertriltest/comptest
real    23m58.427s
user    23m14.580s
sys 0m33.196s

Unfortunately we take much longer to build and run the tests on this machine. The test portion also crashed with a segfault during comptest of CompareTest/DoubleCompareOrUnordered which I am going to take a quick peek through now, but comparing to other platforms this seems about 90% of the way to finishing comptest so we can safely add say 3 more minutes to get an accurate timing. So the entire build + test would take ~45 mins. I suppose that is not outrageous, but we may need another machine depending on how bottlenecked we get.

Going to take a look at the segfault now, and proceed to try to hook something up to Jenkins.

I have just opened a draft PR demonstrating an alternative, CMake-based solution to run tests under simulator for cross-compilation jobs: see #5755

Time to close this issue?

Yes, just waiting to see if anyone disagrees to enable it by default:
https://eclipse-omr.slack.com/archives/CCQ8B4B39/p1612289540003400

Enabled by default now. Tested it over in #5785.

Was this page helpful?
0 / 5 - 0 ratings