Openj9: Porting OpenJ9 to RISC-V

Created on 11 Mar 2019  ·  238Comments  ·  Source: eclipse/openj9

We determined to port a complete runtime environment including OpenJDK11 + OpenJ9 + OMR (without JIT) to the RISC-V development board (e.g. HiFive Unleashed with Linux support / please refer to https://www.sifive.com/boards/hifive-unleashed for details) to get it work on the board.

According to the explanation of RISC-V cross-compilers at https://www.lowrisc.org/docs/untether-v0.2/riscv_compile/, the JDK /JRE is supposed to be executed with Linux support in the user mode as the JVM /application is multiple-threaded.

Overall, there are a bunch of prerequisites before compiling OpenJ9:

  • the Freedom E SDK (https://github.com/sifive/freedom-e-sdk) + openOCD ready for use (already offered on the website at https://www.sifive.com)
  • compile a cross compiler with glibc support: e.g. riscv-unknown-linux-gnu-gcc based on the instructions at https://github.com/riscv/riscv-gnu-toolchain; if OK, check whether it works good with simple test.
  • probably need the Yocto build system (https://www.yoctoproject.org/ or https://github.com/riscv/meta-riscv) to build everything including a Linux/RISC-V kernel (https://github.com/riscv/riscv-linux), a root file system image, a bootloader, a toolchain, QEMU, etc.
  • (to be added if anything else)

Considering the complexity & unexpected challenges during the porting, the project will be split into at least 4 sub-tasks as follows to be accomplished to achieve our goal:

  • compile a build without JIT on Linux (changed all request config/setting to get it work) to ensure it works good as expected; if everything goes well, all changes will be added to the compilation with riscv-unknown-linux-gnu-gcc.
  • import the FFI code (libffi) for RISC-V support to OpenJ9 (https://github.com/libffi/libffi/tree/master/src/riscv / need to open an Eclipse CQ for that)
  • figure out whether we need to modify the related config/setting in the Freedom E SDK and how to integrate them into the config/setting in OpenJDK11 + OpenJ9 + OMR to ensure the build is compiled with riscv-unknown-linux-gnu-gcc.
  • might need to run via an Emulator (e.g. QEMU) on the host to ensure it works before uploading.
  • figure out how to flash/upload the compiled build & the Linux port to the development board to get it executed/tested.
  • (to be added if anything else)

The issue at #11136 was created separately to track the progress of OpenJ9 JIT on RISC-V.

FYI: @DanHeidinga , @pshipton, @tajila

vm

Most helpful comment

Already talked to @mstoodle on slack, we can create a high-level/generic porting guideline not specific to any platform with our porting experiences on RISC-V, which helps people to understand the basic/key steps to follow when porting OpenJ9 to a new platform.

All 238 comments

@shingarov you might be interested in this one :)

Currently, I already finished the following jobs:

  1. verified the HiFive1 32bit board with simple tests (e.g. compile/upload a program with the Freedom E SDK to the board) to understand how it works.
  2. compiled a 32 bit Linux cross compiler (riscv32-unknown-linux-gnu-gcc) but there might be some issue with the compiler. Still working on that.

@pshipton , I am wondering whether we should disable the shared classes on the resource-limited board (only 128M flash memory available for use, which includes the Linux port/utilities, the Java build, user applications, etc) as the shared cache is set to 300M by default.

In addition, we might need to disable some modules/unnecessary stuffs in code to get things easy to go on the board.

@ChengJin01 jlink can generate java.base only JRE (no other modules).

Probably need to disable these modules in config/setting in the OpenJDK before compiling the build with riscv32 as there might be no way to run jink on the compiled build (all executables are already in the RISC-V format) on the host machine.

@ChengJin01 When the available disk space is small, the default shared cache size should be 64MB. However I can guess this still might be too big. We do want to ensure shared classes is working, and could set an even smaller size for this platform, like 16M or even 4M just for test purposes.
@hangshao0

Actually, this is 32-bit, so the default shared cache size should be 16MB already. The 300MB default size is for 64-bit only.

I doubled-check the document on HiFive1(32bit FE310), it is actually 16MB off-chip flash memory (128Mbit). So the shared cache still needs to be reduced if possible.

In addition, according to the technical Spec on FE310 at
https://docs.platformio.org/en/latest/frameworks/freedom-e-sdk.html
https://github.com/RIOT-OS/RIOT/wiki/Board:-HiFive1

HiFive1 Features & Specifications
Microcontroller: SiFive Freedom E310 (FE310)
SiFive E31 RISC-V Core
Architecture: 32-bit RV32IMAC
...
Memory: 16 KB Instruction Cache, 16 KB Data Scratchpad <--- 16KB RAM
...
Flash Memory: 128 Mbit Off-Chip (ISSI SPI Flash)

and the discussion of Linux port on HiFive1 at https://forums.sifive.com/t/is-there-a-linux-distribution-that-can-run-on-hifive1/658/5, it seems the 32bit FE310 chip holds a very tiny RAM (16KB) which is far from enough to support the Linux kernel.

Given that porting OS kernel to the board is not our focus, there might be two options for us to move forward:
1) choose 64bit HiFive FU540 Unleashed (Linux-capable, multi-core / https://www.sifive.com/boards/hifive-unleashed) to support Linux on chip for JVM.
2) consider to use a RTOS (must support multi-threading / e.g. RTLinux, Zephyr, Apache Mynewt, etc) with RISC-V 32bit support. It is still unclear whether we can make it work in this way and how many code/libraries in OpenJ9 need to be adjusted to accommodate the RTOS.

@DanHeidinga

1) Can the HiFive1 boards be extended with additional RAM & disk?
2) Is there an emulator for the RISC-V that we can use while working to procure more suitable boards?

3) We want to target Linux with this work as it simplifies the rest of the porting effort.

Surely supporting the shared classes cache could be considered on the "nice to have" list rather than in the initial set of priority activities for a new platform bring up?

Is there an existing new platform bring-up (ordered) checklist? If not, could we start creating one as part of this effort?

@DanHeidinga ,

  1. Can the HiFive1 boards be extended with additional RAM & disk?

There is no public doc/Spec shows it can do that but we can bring this question to their forum at https://forums.sifive.com. (already raised the question at https://forums.sifive.com/t/extending-the-hifive1-board-with-additional-ram-disk/2155)

  1. Is there an emulator for the RISC-V that we can use while working to procure more suitable boards?

The typical emulator is QEMU (https://github.com/riscv/riscv-qemu) both with RISC-V (RV64G, RV32G) Emulation Support. It will be used to boot the Linux/shell so as to run the JVM after compilation.

  1. We want to target Linux with this work as it simplifies the rest of the porting effort.

In this case, there might be not too many options in term of development boards except the 64bit HiFive FU540 Unleashed board.

Technically, most of work will be finished (compilation, emulation, etc) before uploading everything to the board which should be the final steps to verify whether it really works on the RISC-V chip/hardware. So it might be not that urgent for the moment to decide which board to use as long as we get our build work on RISC-V via emulator.

Already talked to @mstoodle on slack, we can create a high-level/generic porting guideline not specific to any platform with our porting experiences on RISC-V, which helps people to understand the basic/key steps to follow when porting OpenJ9 to a new platform.

we can create a high-level/generic porting guideline not specific to any platform with our porting experiences on RISC-V, which helps people to understand the basic/key steps to follow when porting OpenJ9 to a new platform

@knn-k has some recent experience porting OpenJ9 onto Aarch64. He might be helpful regarding the porting guideline.

If it is without the JIT, you can take a look at these PRs to begin with for what I have done with AArch64 VM build:

  • #3502, #3559, #4087, #4333, #4350, #4487, #4696

@knn-k , many thanks for your links of changsets. I believe we will get started with at least something similar (with RISC-V instruction sets) except Docker-related stuff as we need to run it directly on hardware/board.

AArch64 VM uses the Docker image for cross-compilation on x86-64 Linux.
I would appreciate any feedback from your RISC-V effort.

[1] The response as to extending the 32bit board with external RAM/disk as follows:
https://forums.sifive.com/t/extending-the-hifive1-board-with-additional-ram-disk/2155/3

A while ago, one suggestion to extend the amount of memory was to connect a 
SPI RAM to the board’s SPI GPIO pins. It’s slower than direct RAM but depending 
on the use-case it might be enough:
...
Though to be honest it might very well be that your application
 (if it requires “far more than 16MB”) is too much for this board.

So it seems extending the hardware this way is tricky and it is hard to say whether the 32bit board can really support that.

[2] I went over the Spec document for the 64bit board at https://sifive.cdn.prismic.io/sifive%2Ffa3a584a-a02f-4fda-b758-a2def05f49f9_hifive-unleashed-getting-started-guide-v1p1.pdf,

HiFive Unleashed is a Linux development platform for SiFive’s Freedom U540 SoC, 
the world’s first 4+1 64-bit multi-core Linux-capable RISC-V SoC. 
The HiFive Unleashed has 8GB DDR4, 32MB QuadSPI Flash, a Gigabit Ethernet port, 
and a MicroSD card slot (can be used to boot the linux image) for more external storage.

If it is the mainstream hardware setup on the 64bit board, there should be no limitation
on RAM/disk for Linux kernel + JVM with this board, whether it comes to RAM or the share cache.


[3] There are 3 options for the cross-compilation & EQMU emulation:
1) manually compile all related artifacts including cross-compiler + Linux kernel + Boot Loader +an customized shell environment, etc from the source, for which https://github.com/michaeljclark/busybear-linux already integrated everything we need in the cross-compilation.
but it is unclear whether the generated boot image & Boot Loader works good on the real hardware.

2) https://buildroot.org/
Buildroot includes everything required for cross-compilation but it needs manual choice for a bunch of configurations.

3) https://github.com/riscv/meta-riscv
Yoctco provides a one-stop integration environment for cross-compilation pretty much without any manual intervention except a simple choice for the image type, coming with a full-featured Linux
environment for RISC-V-based cross-development. The only drawback is it mostly ends up with over xGB Linux image which seems huge in size to us (it might not be a big problem if booting from the SD-card).

I will go with the option 1) to compile everything including EQMU, which is the fastest & straightforward way we can do to solve all problems on our side before moving forward to the on-board/hardware verification.

If there is any special configuration/setting required in the boot process on the hardware which can't be done via 1), I will go back to check whether 2) or 3) works that way.

Already created the RISC-V cross compiler on 64bit:

/opt/riscv_gnu_toolchain_64bit/bin$ ./riscv64-unknown-linux-gnu-gcc  --version
riscv64-unknown-linux-gnu-gcc (GCC) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Will keep compiling other related artifacts.

Hello,

I would like to add some comments.

First off I am a large contributor to meta-riscv and the Yocto/OpenEmbedded world in general. Please let me know what you need and I will help wherever I can. The less time you all spend on getting Linux running the more time you can spend on porting Java :)

Please do not spend time on the HiFive1 board OR 32-bit RISC-V. The HiFive1 board is very small, it's basically a RISC-V version of an Arduino. RISC-V 32-bit support is still lacking so you should start with 64-bit support. You can develop for 64-bit RISC-V on either the HiFive Unleashed or QEMU. QEMU has great RISC-V support and you should be able to do all your development on QEMU. If you want help running QEMU please let me know, I am also a QEMU maintainer and am happy to help here as well.

I think meta-riscv is the best option from above to use (option [3]). It's extremely powerful and you won't have any issues of size in either QEMU or on the HiFive Unleased. Buildroot is also an option (as a maintainer here as well I can help if required). I would not recommend to build it all yourself, you will end up wasting a lot of time building toolchains, Linux and related packages.

If you want to do native work you can also just run Fedora on QEMU and dnf install all the packages you need.

Either way, let me know how to help and I will do what I can

Something else worth noting is that most of the repos on the github riscv account are out of date. For Linux, GNU tools and QEMU you should use mainline not the riscv forks.

@alistair23 , many thanks for your suggestion & recommandation. Actually I already finished compiling everything on 64bit including QEMU with option 1) (just a minor issue with ntpd which can be ignored for now). We will move forward to the next step to compile our code if everything goes fine.

image

The problem with option [3] is it will end up with a huge size of linux image which I already tried but stopped it when it exceeded 9GB (What we need is a normal Linux kernel with decent shell support, so there is no need to integrate everything for Linux). With option 1, we are able to customize the size for use (e.g. 1GB or less if necessary).

I won't go for option 2) or 3) for now unless there is something unexpected when compiling our code.

Great, I'm glad you got it working!

I'm not sure how you eneded up with a 9GB image, my full OpenEmbeded desktop images with debug symbols and self hosted toolchains aren't that big.

Either way can you keep me updated with the progress? Also let me know if there is anything that I can do to help.

@alistair23 , I didn't try other options except bitbake core-image-full-cmdline guided at https://github.com/riscv/meta-riscv and the whole folder there kept increasing to over 9GB (eventually screwed up my VM due to out-of-space).

Basically the compilation of our code will be done with the 64bit cross-compiler outside of QEMU and mount the whole build inside to get executed in there. So we definitely need your support from QEMU perspective. My guess is there will be a bunch of problems with QEMU once we're done compiling our code to get it work, specifically in debugging in QEMU if it crashes or something similar happens.

Just keep subscribed in this issue and you will be informed with what is going on in our project.

Ah, the temporary files can get big. That makes more sense.

Great! I am subscribed to this issue so I will keep and eye on things and help where I can. QEMU is very stable and the RISC-V support is also mature. If possible it would be best to build it from the master branch in mainline QEMU. Otherwise the 3.1+ releases are in good shape.

Good luck with this. Java is one of the few missing pieces for RISC-V so it will be great when you get it running :)

Working on other crash issue on Windows. Will get back to this once we address that problem.

I am currently compiling a build without JIT components (excluded from OpenJ9 & OMR) on a Fyre Linux machine and then on my Ubuntu VM to eliminate any compilation error with JIT excluded.

If it works good, the next step is to get the risc-v toolchain introduced in the configure/setting to see what happens (temporarily ignore the spec setting & code specific to riscv and just ensure it picks up the correct cross-compiler/tools in the process of compilation)

Just finished the compilation on Linux without JIT on my Ubuntu VM

oot@jincheng-VirtualBox:...# jdk/bin/java -version
JVMJ9VM011W Unable to load j9jit29: /home/jincheng/RISC_V_OPENJ9/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/images/jdk/lib/compressedrefs/libj9jit29.so: cannot open shared object file: No such file or directory
openjdk version "11.0.3-internal" 2019-04-16
OpenJDK Runtime Environment (build 11.0.3-internal+0-adhoc.root.openj9-openjdk-jdk11)
Eclipse OpenJ9 VM (build master-32fb64b, JRE 11 Linux amd64-64-Bit
 Compressed References 20190401_000000 (JIT disabled, AOT disabled)
OpenJ9   - 32fb64b
OMR      - 5ceecf1
JCL      - ff6f49a based on jdk-11.0.3+2)

and will get started to figure out how to modify the config/setting for RISC-V on OpenJDK11 side.

The main reasons that our build needs to be compiled outside of QEMU/Linux are
1) Except the cross-compilers (gcc/g++ for riscv), a bunch of tools & commands (e.g. m4, etc) required in compiling the JDK are not entirely provided in riscv-gnu-toolchain & QEMU/Linux and there is no need to do so as they are offered on the Linux-based building platform.
2) The boot JDK (which is not compiled with the riscv cross-compiler) can't be executed in QEMU/Linux and must be used outside during the whole compilation.

So the first problem we encountered in the configuration check of JDK is the lack of X11 support in riscv-gnu-toolchain (it is required for libawt in JDK)

checking for X... no
configure: error: Could not find X11 libraries.

configure:58129: /opt/riscv_gnu_toolchain_64bit/bin/riscv64-unknown-linux-gnu-g++ -E  conftest.cpp
conftest.cpp:21:10: fatal error: X11/Xlib.h: No such file or directory
 #include <X11/Xlib.h>
          ^~~~~~~~~~~~
compilation terminated.
...

https://github.com/riscv/riscv-gnu-toolchain/tree/master/linux-headers/include
/riscv-gnu-toolchain/linux-headers/include# ls -l
total 48
drwxr-xr-x  2 root root  4096 Apr  2 12:15 asm
drwxr-xr-x  2 root root  4096 Apr  2 12:15 asm-generic
drwxr-xr-x  2 root root  4096 Apr  2 12:15 drm
drwxr-xr-x 28 root root 16384 Apr  2 21:10 linux
drwxr-xr-x  2 root root    56 Apr  2 12:15 misc
drwxr-xr-x  2 root root   135 Apr  2 12:15 mtd
drwxr-xr-x  3 root root  4096 Apr  2 12:15 rdma
drwxr-xr-x  3 root root   145 Apr  2 12:15 scsi
drwxr-xr-x  2 root root   321 Apr  2 12:15 sound
drwxr-xr-x  2 root root    89 Apr  2 12:15 video
drwxr-xr-x  2 root root   110 Apr  2 12:15 xen
<----------------------------------------------------- X11 is missing in riscv-gnu-toolchain

the X11 libraries should be part of riscv-gnu-toolchain otherwise the cross-compiler/linker fails to locate them during the compilation (the library will be used in QEMU/So we can't use the same one on the building platform).

There are two options to deal with the problem:
1) figure out how to integrate X11 (https://github.com/mirror/libX11) to riscv-gnu-toolchain, which might take a while to get things work.
2) skip X11 in configure for the moment and disable all X11 related code later in the compilation to see how far we can push forward. If it works out, we will get back to address X11 once we finish compiling the JDK build.

@alistair23 , do you know the basic steps to build X11 to riscv-gnu-toolchain?

the X11 libraries should be part of riscv-gnu-toolchain otherwise the cross-compiler/linker fails to locate them during the compilation (the library will be used in QEMU/So we can't use the same one on the building platform).

The problem isn't that you don't have X11 in the toolchain, it's that you don't have the X11 libraries installed for the guest. My advice would be to use Yocto/OpenEmbedded to build a rootFS. Continuing to build these manually will be more and more of a pain.

The other option is to use RISC-V Fedora, where you probably can dnf install the libraries.

1. Except the cross-compilers (gcc/g++ for riscv), a bunch of tools & commands (e.g. m4, etc) required in compiling the JDK are not entirely provided in riscv-gnu-toolchain & QEMU/Linux and there is no need to do so as they are offered on the Linux-based building platform.

This doesn't sound right. This sounds like you haven't installed the required tools onto the guest image. Almost all packages that run on x86 will run on RISC-V, about 90% of the Fedora/Debian packages are being cross compiled. My colleague just checked and m4 can be installed in Fedora with a simple dnf install and it's available as a RISC-V target package in OpenEmbedded.

If you would like to continue cross compiling the packages for RISC-V then I strongly recommend using meta-riscv and OpenEmbedded. It will handle all of these complexities for you as cross compiling an entire distro is a lot of work. If you want to do native compile/development then Fedora (or Debian) is the way to go.

2. The boot JDK (which is not compiled with the riscv cross-compiler) can't be executed in QEMU/Linux  and must be used outside during the whole compilation.

Does this mean that it is impossible to natively compile JDK on a non-x86 architecture?

Does this mean that it is impossible to natively compile JDK on a non-x86 architecture?

Technically yes, the compilation of JDK is done with cross-compiler, macro preprocessor, other tools as well as the boot JDK (used for java source) which is only compiled on the host /can't be used/executed in QEMU.

Does this mean that it is impossible to natively compile JDK on a non-x86 architecture?

Technically yes, the compilation of JDK is done with cross-compiler, macro preprocessor, other tools as well as the boot JDK which is only compiled on the host /can't be used in QEMU.

The compiler, macro pre-processor and standard build tools will all work on a non-x86 native compile.

So boot JDK is x86 only?

When you say can't be used in QEMU, do you mean QEMU specifically or just that it can't be built on non-x86 architectures?

So boot JDK is x86 only?

Not really, we always need a boot JDK (already compiled) for the compilation. Basically, compiling a JDK always involves a boot JDK (only x86 in our case) which is already ready for use; otherwise there is no need to compile outside of QEMU if everything including the boot JDK is ready for use inside.

So boot JDK is x86 only?

Not really, we always need a boot JDK (already compiled) for the compilation. Basically, compiling a JDK always involves a boot JDK (only x86 in our case) which is already ready for use; otherwise there is no need to compile outside of QEMU if everything including the boot JDK is ready for use inside.

Ah, so boot JDK is a bootstrapper that allows you to compile. In this case boot JDK is already and only compiled for x86 so you need to do a x86 host cross compile.

Thanks for clarifying what boot JDK is.

In which case a native Fedora style compile won't work for you. I still recommend OpenEmbedded then.

In which case a native Fedora style compile won't work for you. I still recommend OpenEmbedded then.

At this point, does the cross-compiler can locate the correct X11 libraries offered from OpenEmbedded (the total compilation must be outside of QEMU) ?

OpenEmbedded will allow you to build the entire distro on your host machine (x86).

OpenEmbedded will build the X11 libraries for you and will link with them.

@alistair23, many thanks for your suggestion. I will try to see whether OpenEmbedded works for us in such case. Let me know whether there is anything that needs to be aware of in installing/building OpenEmbedded (links, guides, etc)

If you want I can have a go at it. Is there some sort of Java that can be cross compiled today?

I would recommend just following the docs in meta-riscv. There is also the Yoe distro (https://github.com/YoeDistro/yoe-distro) that seems easy to use, although I don't use it.

Is there some sort of Java that can be cross compiled today?

Might exist but not available in the public as far as I know. You could try Hotspot OpenJDK to see how it works. Actually, that's what we are currently doing here.

I figured that, I just meant is there maybe some WIP that I could test the build process with or maybe config options to disable the JIT and interpreter. I misread what you posted above and thought you had something running on RISC-V.

Either way, let me know if there is anything else you need

@alistair23 , I already followed the instructions at https://github.com/riscv/meta-riscv to build the image on one of our internal Ubuntu VM.

bitbake core-image-full-cmdline
runqemu qemux86 nographic
...
root@qemuriscv64:/# uname -a
Linux qemuriscv64 5.0.3-yocto-standard #1 SMP PREEMPT Wed Apr 3 23:31:29 UTC 2019 riscv64 riscv64 riscv64 GNU/Linux

It seems this is pretty much the same one as I already done before to build the EQMU environment (I didn't find any compiler/toolchain pre-installed in the EQMU), which is only used to launch the target JDK instead of compiling the JDK source. Even though the toolchain is installed in there, there is no way to launch boot JDK inside as it is compiled on Linux/X86_64.

In our case, the cross-compiler (riscv64-gnu-gcc/g++) must work with the boot JDK(already compiled on Linux/X86_64) to compile the native and java code for the target JDK (run on riscv-64). That being said above, the cross-compiler/toolchain must work on the same platform (Linux/X86_64) as the boot JDK (outside of EQMU), which is the requirement of the target JDK configuration/setting.

I also checked the result from an online Fedora/RISC-V enviroment (also without X11 headers at /usr/include) at https://rwmj.wordpress.com/2018/09/13/run-fedora-risc-v-with-x11-gui-in-your-browser/
image, which ended up with the same error in such case.

It seems a bunch of native headers in OpenJDK require X11 libraries to be compiled, which means we can't be simply ignore X11 during compilation. Might still need to figure out how to get X11 work in riscv-gnu-toolchain or other way around.

Does configuring with --with-x=no work around the issue?

It doesn't help and will ends up with the following error:

checking how to link with libstdc++... static
configure: error: It is not possible to disable the use of X11. Remove the --without-x option.
configure exiting with result code 1

as already explained at /riscv_openj9-openjdk-jdk11/make/autoconf/lib-x11.m4

    if test "x${with_x}" = xno; then
      AC_MSG_ERROR([It is not possible to disable the use of X11. Remove the --without-x option.])
    fi

and /riscv_openj9-openjdk-jdk11/make/autoconf/libraries.m4

AC_DEFUN_ONCE([LIB_DETERMINE_DEPENDENCIES],
[
  # Check if X11 is needed
  if test "x$OPENJDK_TARGET_OS" = xwindows || test "x$OPENJDK_TARGET_OS" = xmacosx; then
    # No X11 support on windows or macosx
    NEEDS_LIB_X11=false
  else
    # All other instances need X11, even if building headless only, libawt still
    # needs X11 headers.  <---------------------------
    NEEDS_LIB_X11=true
  fi

I managed to hack the the script in configure to disable the X11 libraries and will keep checking the remaining errors in config.

It seems this is pretty much the same one as I already done before to build the EQMU environment (I didn't find any compiler/toolchain pre-installed in the EQMU), which is only used to launch the target JDK instead of compiling the JDK source. Even though the toolchain is installed in there, there is no way to launch boot JDK inside as it is compiled on Linux/X86_64.

Yep, OpenEmbedded doesn't include the toolchain in the guest image by default. To install it just edit your IMAGE_FEATURES variable in your conf/local.conf file.

Something like this will give you development and debug packages for all installed packages and it will install the tools required to build and debug natively.

IMAGE_FEATURES += "debug-tweaks dev-pkgs dbg-pkgs tools-sdk tools-debug"

I don't understand why you need native RISC-V compiler tools though. I thought because you need the boot strap boot JDK which only runs on x86 you need to do a cross compile?

I also checked the result from an online Fedora/RISC-V enviroment (also without X11 headers at /usr/include) at https://rwmj.wordpress.com/2018/09/13/run-fedora-risc-v-with-x11-gui-in-your-browser/

I don't understand this. I have been saying you can use Fedora and you can run it in directly in QEMU. I thought that isn't an option because you need to cross compile on x86 due to the boot JDK boot strapper?

I don't understand why you need native RISC-V compiler tools though. I thought because you need the boot strap boot JDK which only runs on x86 you need to do a cross compile?

No need to have tool chains for now but it is better to be ready for use later (compiling & debugging)

The theory is, if we are able to generate the first JDK on RISCV via cross-compilation (it has to be this way), then the generated JDK can be used as the first boot JDK on RISCV. That means the total compilation can be moved into QEMU along with the generated JDK and there is no need to do via cross-compilation after that, which is perfect for us to manage in later use.

I don't understand this. I have been saying you can use Fedora and you can run it in directly in QEMU. I thought that isn't an option because you need to cross compile on x86 due to the boot JDK boot strapper?

This is just to double-check to see whether there is any other option for us.

@alistair23 , I already followed the instructions at https://github.com/riscv/meta-riscv to build the image on one of our internal Ubuntu VM.

So the next step if you want to keep using OpenEmbedded is to either add OpenJ9 to the meta-java layer or to just build an SDK and use that.

If you don't have a lot of OpenEmbedded experience it is probably easier to just build the host SDK as then you don't have to work with the OpenEmbedded build system. The advantage with using the build system though is then you can upstream your OpenJ9 support to meta-java allowing others to use it (not just for RISC-V).

If you do opt for the SDK option (probably the best starting place) then you can build the SDK with this command:
MACHINE=qemuriscv64 bitbake meta-toolchain

No need to have tool chains for now but it is better to be ready for use later (compiling & debugging)

The theory is, if we are able to generate the first JDK on RISCV via cross-compilation (it has to be this way), then the generated JDK can be used as the first boot JDK on RISCV. That means the total compilation can be moved into QEMU along with the generated JDK and there is no need to do via cross-compilation after that, which is perfect for us to manage in later use.

Makes sense! This can be easily done with the IMAGE_FEATURES I mentioned above.

You will probably want to set DISTRO_FEATURES += "x11" as well, to install X11.

This is just to double-check to see whether there is any other option for us.

Great!

IMAGE_FEATURES += "debug-tweaks dev-pkgs dbg-pkgs tools-sdk tools-debug"
You will probably want to set DISTRO_FEATURES += "x11" as well, to install X11.

All of these looks great for us to take OpenEmbedded as the perfect option to run/debug the JDK and compile a fully-featured JDK later on once we finish the cross-compilation.

Already finished the configure part on the OpenJDK11 side:
risc64_configure_log.txt

...
A new configuration has been successfully created in
/root/jchau/temp/.../openj9-openjdk-jdk11/build/linux-riscv64-normal-server-release
using configure arguments '--disable-ddr --openjdk-target=riscv64-unknown-linux-gnu --with-freemarker-jar=/root/jchau/temp/freemarker.jar'.

Configuration summary:
* Debug level:    release
* HS debug level: product
* JVM variants:   server
* JVM features:   server: 'cds cmsgc compiler1 compiler2 epsilongc g1gc jfr jni-check jvmti management nmt parallelgc serialgc services vm-structs' 
* OpenJDK target: OS: linux, CPU architecture: riscv64, address length: 64
* Version string: 11.0.3-internal+0-adhoc.root.openj9-openjdk-jdk11 (11.0.3-internal)

Tools summary:
* Boot JDK:       openjdk version "11.0.1" 2018-10-16 OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.1+13) OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.1+13, mixed mode)  (at /root/jchau/temp/jdk-11.0.1_13_hotspot)
* Toolchain:      gcc (GNU Compiler Collection)
* C Compiler:     Version 8.3.0 (at /opt/riscv_gnu_toolchain_64bit/bin/riscv64-unknown-linux-gnu-gcc)
* C++ Compiler:   Version 8.3.0 (at /opt/riscv_gnu_toolchain_64bit/bin/riscv64-unknown-linux-gnu-g++)

Build performance summary:
* Cores to use:   8
* Memory limit:   16046 MB

Temporarily the following native libraries required for OpenJDK11 are disabled as these libraries (only installed on the riscv64-based platform) are not offered by the cross-compilation toolchains:

NEEDS_LIB_X11 = false ---> libx11-dev  (X11 Windows system) 
NEEDS_LIB_FONTCONFIG = false ----> libfontconfig1-dev (fontconfig)
NEEDS_LIB_CUPS = false ---> libcups2-dev (Common Unix Printing System)

Will get started to modify/add the required makefile-related settings (spec, flags, module.xml, etc) in OpenJ9 & OMR. For any source & assembly code specific to riscv64, will add stub/empty files temporarily to see whether we can get it through in compilation.

Cool!

There seems to be an issue with freeType which needs to be disabled on the OpenJDK11 side (not offered by the cross-compilation toolchain). Still checking the corresponding scripts to address the problem.

Already disabled the native libraries for freeType in OpenJDK11 and keep working on the makefile/scripts in OpenJ9 & OMR.

Already finished most of the changes (including all kinds of settings, makefile related scripts, stub code, etc) in OpenJ9 & OMR and currently working on the issue with trace/hook tools (tracegen/tracemerge/hookgen that are used to generated a bunch of header files) when trying to cross-compile a build.

Need to figure out a way in setting to ensure these tools are only created by local compiler rather than the cross-compiler as the header files must be generated locally via these tools these before building with the cross-compiler.

Just resolved the issue with the trace/hook tools in OMR by overwriting the cross compiler with the local compiler only when compiling the source of these tools. And keep investigating the similar issue with constgen tool in OpenJ9 (need to check whether it has to be created via the local compiler).

It turns out the constgen tool (only used to generate JIT related constants) should be skipped to avoid compiling JIT. Now get back to address the issue with signal handling (need to fill up part of code base against the Spec of RISCV Instruction Sets to get it through the compilation).

Currently working on a verification related issue raised by an external user. Will get back to this after fixing it up.

Already finished the code in signal handling from the compilation perspective (might be still errors but it temporarily passed in compilation /need to get back to address if anything happens later in execution) and currently dealing with the errors in libffi/riscv64 to get it compatible with the existing libffi settings in OpenJ9.

dealing with the errors in libffi/riscv64

Have you checked the libffi project? They may have already done the a port to riscv64.

libffi is ported to RISC-V. You need to make sure you use the release candidate (or master).

Have you checked the libffi project? They may have already done the a port to riscv64.

I already moved the riscv64 part from the libffi project at https://github.com/libffi/libffi/tree/master/src/riscv to our code but need to check the compatibility issue detected in compilation as the libffi code in OpenJ9 is based on an old version. Given that there is no need update the whole libffi in OpenJ9 to keep up-to-date with the latest libffi, just need to make a few modifications to get it pass (macros / functions were renamed or discarded on the latest libffi /need to restore them back for OpenJ9 )

Already fixed the compatibility issue with libffi/riscv in OpenJ9 and the compilation failed at:

/opt/riscv_gnu_toolchain_64bit/lib/gcc/riscv64-unknown-linux-gnu/8.3.0
/../../../../riscv64-unknown-linux-gnu/bin/ld: ../..//libj9vm29.so: 
undefined reference to `__riscv_flush_icache'

As explained by the cross toolchain developer at http://rodrigo.ebrmx.com/github_/riscv/riscv-gcc/issues/140, this is a versioning problem from the riscv-glibc (2.26). To be specific, the cross compiler (gcc 8.3.0) requires __riscv_flush_icachethat was first defined/implemented since glibc 2.27 (https://fossies.org/diffs/glibc/2.26_vs_2.27/manual/platform.texi-diff.html).

Given that there is no clear roadmap when glibc 2.27 is ready in the cross toolchain (they mentioned they were working on that but there is no update of their progress in the past years) and there is no code calling __riscv_flush_icache in OpenJ9 in the case of riscv for now, I will try to disable __riscv_flush_icache in the source of the cross-toolchain and recompile the whole toolchain to see whether it works when compiling OpenJ9; otherwise, we might need to manually extract all code related to __riscv_flush_icachefrom glibc 2.27 (https://sourceware.org/git/?p=glibc.git / https://www.gnu.org/software/libc/sources.html) to riscv-glibc(2.26) to fit the need of the cross compiler gcc 8.3.0.

In addition, simply replacing riscv-glibc in the cross toolchain with glibc 2.27 doesn' help as the config/settings specific to riscv on riscv-glibc(2.26) are different from glibc 2.27.

The cross-compiler works good now after recompiling the cross-toolchain with __riscv_flush_icachedisabled. So keep working on a bunch of warn-as-errors detected during compilation except the code issue with cinterp.m4.

Given that cinterp.m4 is related to the assembly code invoking interpreter (in callin) which needs to be written manually against the riscv instruction set, I temporarily leave these stub files (oti/rv64helpers.m4 & vm/rv64cinterp.m4) there as the last issue to be addressed until everything else in compilation gets resolved.

Already solved all warn-as-errors on JDK11 & OpenJ9 & OMR and currently adding DDR-related changes (previously disabled in compilation) to see how it goes in compilation.

Just double-checked /runtime/ddr/module.xml and realized that j9ddrgen either should be created on the target platform (the Linux/QEMU in our case) or has to be disabled as there is no way to generate the tool via the local compiler when compiling with the cross-compiler (this is different from the trace/hook tools in which case the cross-compiler can be overridden for the benefit of the cross-compilation). So there is no more change for DDR for the moment during the cross-compilation.

Now the other thing to be addressed immediately is still the issue with glibc.
I compiled a simple test (print hello world) with the cross-compiler locally and uploaded it to the Fedora-riscv/QEMU (https://fedorapeople.org/groups/risc-v/disk-images/ can be launched directly with QEMU after downloading /no need to compile)

/opt/riscv_gnu_toolchain/bin/riscv64-unknown-linux-gnu-gcc   
--sysroot=/opt/riscv_gnu_toolchain/sysroot  -o test  test.c

and it ended up with failure to locate the required glibc as follows:

# ./test: /lib64/lp64d/libc.so.6: version `GLIBC_2.26' not found (required by ./test)

# uname -a
Linux stage4.fedoraproject.org 4.19.0-rc8 #1 SMP Wed Oct 17 15:11:25 UTC 2018 
riscv64 riscv64 riscv64 GNU/Linux

# ls -l  /lib64/lp64d/libc.so.6
lrwxrwxrwx 1 root root 17 Mar  4  2018 /lib64/lp64d/libc.so.6 -> libc-2.27.9000.so

It means the program compiled with glibc 2.26 must be executed with the same version of glibc installed on the target platform.

Given that the Fedora/riscv comes with glibc 2.27 at https://secondary.fedoraproject.org/pub/alt/risc-v/RPMS/riscv64/

glibc-2.27.9000-7.fc28.riscv64.rpm                                                     2018-05-11 21:20  3.0M 

and Openembedded/riscv with glibc 2.29
at https://layers.openembedded.org/layerindex/branch/master/layer/openembedded-core/

glibc   2.29    GLIBC (GNU C Library)

I will first try to re-compile the whole cross-toolchain again by replacing glibc 2.26 in there with https://github.com/riscv/riscv-glibc/tree/riscv-glibc-2.27 to see whether it works for us.

Still working on the issue with riscv-glibc 2.27 on the cross-toolchain. There seems to be a versioning problem with linux headers plus code issues with riscv-glibc.2.27/sysdeps/unix/sysv/linux/riscv/flush-icache.c (not used in libffi/riscv). So I need to hack the code/makefile to remove these stuff to see whether it works.

In addition, it seems there is no gdb support for now on Fedora/riscv. According to explanation at https://github.com/riscv/riscv-binutils-gdb/issues/157, all of required patches need to be compiled from scratch.

Double-checked on Fedora/riscv as follows:

Last login: Sun Jan 28 15:59:08 on ttyS0
[root@stage4 ~]# uname -r
4.19.0-rc8
[root@stage4 ~]# uname -a
Linux stage4.fedoraproject.org 4.19.0-rc8 #1 SMP Wed Oct 17 15:11:25 UTC 2018 
riscv64 riscv64 riscv64 GNU/Linux
[root@stage4 ~]# gcc --version
gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[root@stage4 ~]# gdb  --version
-bash: gdb: command not found <------------ no gdb support on Fedora/riscv

In such case, we might as well need to prepare everything for OpenEmbedded at https://github.com/riscv/meta-riscv (including the cross-toolchain compiled with glibc2.29 and the linux image booted with QEMU) unless the compiled build (compiled with glibc2.27) works perfect on Fedora-Riscv/QEMU for the first time (the JVM most likely crashes as there might be bunch of errors in the changes of code/scripts).

I just figured that the branch at https://github.com/riscv/riscv-gnu-toolchain/tree/linux-headers-4.15-rc3 already comes with the latest linux headers file plus glibc2.27. Instead of hacking the existing code/makefile on glibc2.26 (which also works but some issues might be introduced in compilation), I will try to compile this branch to see how it goes.

Jan and I spent many months going through similar torture before we got to a point where we could make any progress on the OMR JIT side. Especially gdb: even though it's been claimed by some writers that gdb has been "working" for a while, it took Jan (who is one of the gdb maintainers) a significant amount of effort to get it to be actually useful for real-life debugging.

IMHO the shortest / most effort-effective path to getting a functioning Linux-RV64 dev system is to follow Jan's scripts, for both kernel, root fs, and gdb:
https://github.com/janvrany/riscv-debian
They work both for QEMU emulation of RV64 and on real silicon (at least on SiFive Unleashed for sure). I am not saying this scenario fits everyone (e.g. I don't know whether you may have specific reasons why you need Openembedded), I am saying these scripts summarize the half-year of desperate frustrations Jan and I had, and attempt to give a clear shortcut for others to not have to go through the same long thorny path.

I don't know whether you may have specific reasons why you need Openembedded...

@shingarov, many thanks for your suggestion. But our main effort now is to generate the first JDK which must be built via the cross-compilation rather than the native compilation inside a functioning Linux-RV64 dev system (there is no way to create it inside the Linux/QEMU as tools/java files have to be compiled via local compiler first), in which case linux headers/glibc offered by the cross-toolchain must keep consistent with the equivalent in Linux/QEMU. So Openembedded (with glibc2.29) still remains our backup option if the cross-toolchain works with glibc2.29.

Already resolved the issue with glibc on the toolchain after replacing the older linux header files and riscv-glibc(2.26) with the latest ones from https://github.com/riscv/riscv-gnu-toolchain/tree/linux-headers-4.15-rc3 and keep going with the cross-compilation on OpenJDK11/OpenJ9/OMR.

Currently working on a verifier related issue and will get back to this after fixing it up.

After grabbing the latest changes on OpenJDK11/OpenJ9/OMR, the compilation failed as follows:

../../../gc_glue_java/PointerArrayObjectScanner.hpp: In member function 'virtual GC_IndexableObjectScanner* GC_PointerArrayObjectScanner::splitTo(MM_EnvironmentBase*, void*, uintptr_t)':
../../../gc_glue_java/PointerArrayObjectScanner.hpp:139:132: 
error: invalid new-expression of abstract class type 'GC_PointerArrayObjectScanner'
   new(splitScanner) GC_PointerArrayObjectScanner(env, _parentObjectPtr, _basePtr, _limitPtr, _endPtr, _endPtr + splitAmount, _flags);

History shows the changes above was introduced by GC at #5684 & #5714 last week and the compiation passed locally with gcc v5.4 and v7.3.

Given that the cross-compiler version in the toolchain is v8.3, I will
1) first try recompiling the toolchain by replacing v8.3 with v7.3 at https://github.com/riscv/riscv-gcc/tree/riscv-gcc-7.3.0 to see whether it helps to get through the compilation;
2) otherwise, might need to modify the GC code in there to follow the compilation rules on v8.3 (most of the errors are related to unimplemented functions as explained at https://stackoverflow.com/questions/23827014/invalid-new-expression-of-abstract-class-type.

Already fixed the issue with GC changes (it turns out the latest GC changes somehow didn't get synchronized in my OpenJ9 branch which led to mismatch on the changes between OpenJ9 and OMR) and keep compiling to see whether there is anything wrong or minor issues left in building the JDK.

I already fixed most of script/make file related issues in the cross-compilation.
configure.log.txt
build.log.txt

and now need to double-check the validity check on the cross-toolchain to see whether it really matters in building the JDK as there seems to be a mismatch between the configure at the beginning and the configure in OMR during compilation as follows:
[1] configure at the very beginning

checking whether we are using the GNU C compiler... yes
checking whether /opt/riscv_gnu_toolchain_glibc2.27_v3/bin/riscv64-unknown-linux-gnu-gcc accepts -g... yes
checking for /opt/riscv_gnu_toolchain_glibc2.27_v3/bin/riscv64-unknown-linux-gnu-gcc option to accept ISO C89... none needed
checking for riscv64-unknown-linux-gnu-g++... /opt/riscv_gnu_toolchain_glibc2.27_v3/bin/riscv64-unknown-linux-gnu-g++
checking resolved symbolic links for CXX... no symlink
configure: Using gcc C++ compiler version 7.3.0 [riscv64-unknown-linux-gnu-g++ (GCC) 7.3.0]
checking whether we are using the GNU C++ compiler... yes
checking whether /opt/riscv_gnu_toolchain_glibc2.27_v3/bin/riscv64-unknown-linux-gnu-g++ accepts -g... yes
checking how to run the C preprocessor... /opt/riscv_gnu_toolchain_glibc2.27_v3/bin/riscv64-unknown-linux-gnu-gcc -E
checking how to run the C++ preprocessor... /opt/riscv_gnu_toolchain_glibc2.27_v3/bin/riscv64-unknown-linux-gnu-g++ -E
configure: Using gcc linker version 2.32 [GNU ld (GNU Binutils) 2.32]

[2] configure in OMR during the cross-compilation

/usr/bin/make -C omr -f run_configure.mk 'SPEC=linux_riscv64_cmprssptrs_cross' 'OMRGLUE=../gc_glue_java' 'CONFIG_INCL_DIR=../gc_glue_java/configure_includes' 'OMRGLUE_INCLUDES=../oti ../include ../gc_base ../gc_include ../gc_stats ../gc_structs ../gc_base ../include ../oti ../nls ../gc_include ../gc_structs ../gc_stats ../gc_modron_standard ../gc_realtime ../gc_trace ../gc_vlhgc' 'EXTRA_CONFIGURE_ARGS='
make[5]: Entering directory '/root/jchau/temp/RISCV_OPENJ9_v2/openj9-openjdk-jdk11/build/linux-riscv64-normal-server-release/vm/omr'
sh configure --disable-auto-build-flag 'OMRGLUE=../gc_glue_java' 'SPEC=linux_riscv64_cmprssptrs_cross'  --enable-OMRPORT_OMRSIG_SUPPORT --enable-OMR_GC --enable-OMR_PORT --enable-OMR_THREAD --enable-OMR_OMRSIG --enable-tracegen --enable-OMR_GC_ARRAYLETS --enable-OMR_GC_DYNAMIC_CLASS_UNLOADING --enable-OMR_GC_MODRON_COMPACTION --enable-OMR_GC_MODRON_CONCURRENT_MARK --enable-OMR_GC_MODRON_SCAVENGER --enable-OMR_GC_CONCURRENT_SWEEP --enable-OMR_GC_SEGREGATED_HEAP --enable-OMR_GC_HYBRID_ARRAYLETS --enable-OMR_GC_LEAF_BITS --enable-OMR_GC_REALTIME --enable-OMR_GC_VLHGC --enable-OMR_PORT_ASYNC_HANDLER --enable-OMR_THR_CUSTOM_SPIN_OPTIONS --enable-OMR_NOTIFY_POLICY_CONTROL --disable-debug 'lib_output_dir=$(top_srcdir)/../lib' 'exe_output_dir=$(top_srcdir)/..' 'GLOBAL_INCLUDES=$(top_srcdir)/../include' --enable-debug --enable-OMR_THR_THREE_TIER_LOCKING --enable-OMR_THR_YIELD_ALG --enable-OMR_THR_SPIN_WAKE_CONTROL --enable-OMRTHREAD_LIB_UNIX --enable-OMR_ARCH_RISCV --enable-OMR_ENV_LITTLE_ENDIAN --enable-OMR_PORT_CAN_RESERVE_SPECIFIC_ADDRESS --enable-OMR_GC_IDLE_HEAP_MANAGER --enable-OMR_GC_TLH_PREFETCH_FTA --enable-OMR_GC_CONCURRENT_SCAVENGER --enable-OMR_GC_ARRAYLETS --host=riscv64-unknown-linux-gnu --enable-OMR_ENV_DATA64 'OMR_TARGET_DATASIZE=64' --enable-OMR_GC_COMPRESSED_POINTERS --enable-OMR_INTERP_COMPRESSED_OBJECT_HEADER --enable-OMR_INTERP_SMALL_MONITOR_SLOT --build=x86_64-pc-linux-gnu 'OMR_CROSS_CONFIGURE=yes' 'AR=riscv64-unknown-linux-gnu-ar' 'AS=riscv64-unknown-linux-gnu-as' 'CC=riscv64-unknown-linux-gnu-gcc --sysroot=/opt/riscv_gnu_toolchain_glibc2.27_v3/sysroot' 'CXX=riscv64-unknown-linux-gnu-g++ --sysroot=/opt/riscv_gnu_toolchain_glibc2.27_v3/sysroot' 'OBJCOPY=riscv64-unknown-linux-gnu-objcopy' libprefix=lib exeext= solibext=.so arlibext=.a objext=.o 'CCLINKEXE=$(CC)' 'CCLINKSHARED=$(CC)' 'CXXLINKEXE=$(CXX)' 'CXXLINKSHARED=$(CXX)' 'OMR_HOST_OS=linux' 'OMR_HOST_ARCH=riscv' 'OMR_TOOLCHAIN=gcc' 'OMR_BUILD_TOOLCHAIN=gcc' 'OMR_TOOLS_CC=gcc' 'OMR_TOOLS_CXX=g++' 'OMR_BUILD_DATASIZE=64' 
checking build system type... x86_64-pc-linux-gnu
checking host system type... riscv64-unknown-linux-gnu
checking OMR_HOST_OS... linux
checking OMR_HOST_ARCH... riscv
checking OMR_TARGET_DATASIZE... 64
checking OMR_TOOLCHAIN... gcc
checking for riscv64-unknown-linux-gnu-gcc... (cached) /opt/riscv_gnu_toolchain_glibc2.27_v3/bin/riscv64-unknown-linux-gnu-gcc
checking whether we are using the GNU C compiler... no  <--------
checking whether /opt/riscv_gnu_toolchain_glibc2.27_v3/bin/riscv64-unknown-linux-gnu-gcc accepts -g... no
checking for /opt/riscv_gnu_toolchain_glibc2.27_v3/bin/riscv64-unknown-linux-gnu-gcc option to accept ISO C89... unsupported
checking whether we are using the GNU C++ compiler... no  <------
checking whether riscv64-unknown-linux-gnu-g++ --sysroot=/opt/riscv_gnu_toolchain_glibc2.27_v3/sysroot accepts -g... no
checking for numa.h... no

against the configure in OMR during the compilation on X86

/usr/bin/make -C omr -f run_configure.mk 'SPEC=linux_x86-64_cmprssptrs' 'OMRGLUE=../gc_glue_java' 'CONFIG_INCL_DIR=../gc_glue_java/configure_includes' 'OMRGLUE_INCLUDES=../oti ../include ../gc_base ../gc_include ../gc_stats ../gc_structs ../gc_base ../include ../oti ../nls ../gc_include ../gc_structs ../gc_stats ../gc_modron_standard ../gc_realtime ../gc_trace ../gc_vlhgc' 'EXTRA_CONFIGURE_ARGS='
make[5]: Entering directory '/root/jchau/temp/openj9-openjdk-jdk11/build/linux-x86_64-normal-server-release/vm/omr'
sh configure --disable-auto-build-flag 'OMRGLUE=../gc_glue_java' 'SPEC=linux_x86-64_cmprssptrs'  --enable-OMRPORT_OMRSIG_SUPPORT --enable-OMR_GC --enable-OMR_PORT --enable-OMR_THREAD --enable-OMR_OMRSIG --enable-tracegen --enable-OMR_GC_ARRAYLETS --enable-OMR_GC_DYNAMIC_CLASS_UNLOADING --enable-OMR_GC_MODRON_COMPACTION --enable-OMR_GC_MODRON_CONCURRENT_MARK --enable-OMR_GC_MODRON_SCAVENGER --enable-OMR_GC_CONCURRENT_SWEEP --enable-OMR_GC_SEGREGATED_HEAP --enable-OMR_GC_HYBRID_ARRAYLETS --enable-OMR_GC_LEAF_BITS --enable-OMR_GC_REALTIME --enable-OMR_GC_SCAVENGER_DELEGATE --enable-OMR_GC_STACCATO --enable-OMR_GC_VLHGC --enable-OMR_PORT_ASYNC_HANDLER --enable-OMR_THR_CUSTOM_SPIN_OPTIONS --enable-OMR_NOTIFY_POLICY_CONTROL --disable-debug 'lib_output_dir=$(top_srcdir)/../lib' 'exe_output_dir=$(top_srcdir)/..' 'GLOBAL_INCLUDES=$(top_srcdir)/../include' --enable-debug --enable-OMR_THR_THREE_TIER_LOCKING --enable-OMR_THR_YIELD_ALG --enable-OMR_THR_SPIN_WAKE_CONTROL --enable-OMRTHREAD_LIB_UNIX --enable-OMR_ARCH_X86 --enable-OMR_ENV_DATA64 --enable-OMR_ENV_LITTLE_ENDIAN --enable-OMR_GC_COMPRESSED_POINTERS --enable-OMR_GC_IDLE_HEAP_MANAGER --enable-OMR_GC_TLH_PREFETCH_FTA --enable-OMR_GC_CONCURRENT_SCAVENGER --enable-OMR_INTERP_COMPRESSED_OBJECT_HEADER --enable-OMR_INTERP_SMALL_MONITOR_SLOT --enable-OMR_PORT_CAN_RESERVE_SPECIFIC_ADDRESS --enable-OMR_PORT_NUMA_SUPPORT libprefix=lib exeext= solibext=.so arlibext=.a objext=.o 'AR=ar' 'AS=as' 'CC=gcc' 'CCLINKEXE=gcc' 'CCLINKSHARED=gcc' 'CXX=g++' 'CXXLINKEXE=g++' 'CXXLINKSHARED=g++' 'RM=rm -f' 'OMR_HOST_OS=linux' 'OMR_HOST_ARCH=x86' 'OMR_TARGET_DATASIZE=64' 'OMR_TOOLCHAIN=gcc' 
configure: WARNING: unrecognized options: --enable-OMR_GC_STACCATO
checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking OMR_HOST_OS... linux
checking OMR_HOST_ARCH... x86
checking OMR_TARGET_DATASIZE... 
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether /usr/bin/gcc accepts -g... yes
checking for /usr/bin/gcc option to accept ISO C89... none needed

Comparison the configuration log in OMR and in OpenJDK indicates the sysrootpath didn't get picked up when validating the cross-compiler on the OMR side:

configure:4753: checking whether we are using the GNU C compiler
configure:4772: /opt/riscv_gnu_toolchain/bin/riscv64-unknown-linux-gnu-gcc -c   conftest.c >&5
configure:4772: $? = 0
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "OMR"
...
configure:4781: result: no

against the config.log in OpenJDK

configure:37042: checking whether we are using the GNU C compiler
----> configure:37061: /opt/riscv_gnu_toolchain/bin/riscv64-unknown-linux-gnu-gcc -c  
--sysroot=/opt/riscv_gnu_toolchain/sysroot  --sysroot=/opt/riscv_gnu_toolchain/sysroot conftest.c >&5
configure:37061: $? = 0
configure:37070: result: yes

Will further investigate to figure out how to fix the issue.

Investigation shows the misleading checks on the cross-compiler was automatically generated via the following macros in the configure script by autoconf:
/omr/configure.ac

AC_PROG_CC()  //Determine a C compiler to use. If CC is not already set in the environment, check for gcc and cc, then for other C compilers. Set output variable CC to the name of the compiler found.
AC_PROG_CXX() //Determine a C++ compiler to use. Check whether the environment variable CXX or CCC (in that order) is set; if so, then set output variable CXX to its value.

, which is meaningless in our case as all these set-ups on checks are already done previously via the configuration in OpenJDK and the cross-compilers work well during the later compilation.
So the macros should be ignored in the case of cross-compilation on RISC-V.

Now move forward to address all code-specific issues during execution.

After uploading the compiled build to the Fedora-riscv/QEMU, it failed with the following errors when running with java - version and crashed some where after that.

snippet of snaptrace (e.g.)

0x38f00    j9vm.519        > initializeImpl for java/lang/String (0000000000045D00)
0x38f00    j9vm.520         - call preinit hook
0x38f00    j9vm.20          > sendClinit
0x38f00    j9vm.179          > javaLookupMethod(vmStruct 0000000000038F00, targetClass 0000000000045D00, nameAndSig 0000002000B7DDF0, senderClass 0000000000000000, lookupOptions 24612)
0x38f00    j9vm.180           - javaLookupMethod - methodName <clinit>
0x38f00    j9vm.538           - searching methods from 0000000000043938 using linear search
0x38f00    j9vm.181          < exit javaLookupMethod resultMethod 0000000000044BB8
0x38f00    j9vm.222          - sendClinit - class java/lang/String
0x38f00    j9vm.21  ------->  < sendClinit (no <clinit> found)

against comparing to the trace on X86/AMD64

0x12cea00    j9vm.519     > initializeImpl for java/lang/String (00000000012DB500)
0x12cea00    j9vm.520      - call preinit hook
0x12cea00    j9vm.20       > sendClinit
0x12cea00    j9vm.179       > javaLookupMethod(vmStruct 00000000012CEA00, targetClass 00000000012DB500, nameAndSig 00007F039D8CD8A0, senderClass 0000000000000000, lookupOptions 24612)
0x12cea00    j9vm.180        - javaLookupMethod - methodName <clinit>
0x12cea00    j9vm.538        - searching methods from 00000000012D9138 using linear search
0x12cea00    j9vm.181       < exit javaLookupMethod resultMethod 00000000012DA3B8
0x12cea00    j9vm.222       - sendClinit - class java/lang/String
0x12cea00    j9vm.580       > resolveStaticFieldRef(method=0000000000000000, ramCP=00000000012DA3E0, cpIndex=222, flags=20, returnAddress=0000000000000000)
0x12cea00    j9vm.142        > resolveClassRef(ramCP=00000000012DA3E0, cpIndex=83, flags=20)
...

Looking at the code at callin as follows:

void JNICALL
sendClinit(J9VMThread *currentThread, J9Class *clazz)
{
    Trc_VM_sendClinit_Entry(currentThread);
    J9VMEntryLocalStorage newELS;
    if (buildCallInStackFrame(currentThread, &newELS, false, false)) {
        /* Lookup the method */
        J9Method *method = (J9Method*)javaLookupMethod(currentThread, clazz, (J9ROMNameAndSignature*)&clinitNameAndSig, NULL, J9_LOOK_STATIC | J9_LOOK_NO_CLIMB | J9_LOOK_NO_THROW | J9_LOOK_DIRECT_NAS);
        /* If the method was found, run it */
        if (NULL != method) {
            Trc_VM_sendClinit_forClass(
                    currentThread,
                    J9UTF8_LENGTH(J9ROMCLASS_CLASSNAME(clazz->romClass)),
                    J9UTF8_DATA(J9ROMCLASS_CLASSNAME(clazz->romClass)));
            currentThread->returnValue = J9_BCLOOP_RUN_METHOD;
            currentThread->returnValue2 = (UDATA)method;
            c_cInterpreter(currentThread); <------- to be implemented in the case of RISC-V
        }
        restoreCallInFrame(currentThread);
    }
    Trc_VM_sendClinit_Exit(currentThread);  <------- < sendClinit (no <clinit> found) was sent out from here
}

The failure above indicates the code in c_cInterpreter(currentThread) was missing on RISC-V.

As talked to @gacholio previously, given that JIT is disabled, the code here (directly calling the interp function) can be written as wrapper in C with assertion to ensure one of the two return values is "return from callin". I will compile the build to see how far it goes with the piece of code.

The newly added C code for c_cInterpreter(currentThread) seems working but the VM ended up with hang some where later on.

[root@stage4]# uname -a
Linux stage4.fedoraproject.org 4.19.0-rc8 #1 SMP Wed Oct 17 15:11:25 UTC 2018 
riscv64 riscv64 riscv64 GNU/Linux

[root@stage4]# jdk11_rv64_openj9/bin/java -Xtrace:iprint=all  -version
...
0x38f00   j9vm.519   > initializeImpl for java/lang/String (0000000000045D00)
0x38f00   j9vm.520    - call preinit hook
0x38f00   j9vm.20     > sendClinit
0x38f00   j9vm.179     > javaLookupMethod(vmStruct 0000000000038F00, targetClass 0000000000045D00, nameAndSig 0000002000B7DDF0, senderClass 0000000000000000, lookupOptions 24612)
0x38f00   j9vm.180      - javaLookupMethod - methodName <clinit>
0x38f00   j9vm.538      - searching methods from 0000000000043938 using linear search
0x38f00   j9vm.181     < exit javaLookupMethod resultMethod 0000000000044BB8
0x38f00   j9vm.222     - sendClinit - class java/lang/String
0x38f00   j9vm.580     > resolveStaticFieldRef(method=0000000000000000, ramCP=0000000000044BE0, cpIndex=222, flags=20, returnAddress=0000000000000000)
0x38f00   j9vm.142      > resolveClassRef(ramCP=0000000000044BE0, cpIndex=83, flags=20)
...
0x38f00      j9vm.124   < getMethodOrFieldID --> result=000000200419FD18
0x38f00     j9jcl.265   - Java_sun_misc_Unsafe_registerNatives
0x38f00   omrport.333   > omrmem_allocate_memory byteAmount=480 callSite=jnimisc.cpp:827
0x38f00   omrport.322   < omrmem_allocate_memory returns 00000020040099C0
0x38f00      j9vm.361   > Attempting to acquire exclusive VM access.
0x38f00      j9vm.366    - First thread to try for exclusive access. Setting the exclusive access state to J9_XACCESS_PENDING
<----- hang occurred some where
...

Need to figure out what happened in there.

After adding tracepoints in code, the snaptrace shows it got stuck on mprotect(addr, pageSize, PROT_READ | PROT_WRITE) when calling flushProcessWriteBuffers at acquireExclusiveVMAccess() as follows:

19:26:22.079 0x38f00   j9vm.361    > Attempting to acquire exclusive VM access.
19:26:22.080 0x38f00   j9vm.366     - First thread to try for exclusive access. Setting the exclusive access state to J9_XACCESS_PENDING
19:26:22.096 0x38f00   j9vm.600     - acquireExclusiveVMAccess: CALL flushProcessWriteBuffers
19:26:22.134 0x38f00   j9vm.604     > -->Enter flushProcessWriteBuffers
19:26:22.497 0x38f00   j9vm.605      - flushProcessWriteBuffers: ENTER omrthread_monitor_enter
19:26:22.500 0x38f00   j9vm.606      - flushProcessWriteBuffers: EXIT omrthread_monitor_enter
19:26:22.501 0x38f00   j9vm.607 ---> flushProcessWriteBuffers: ENTER mprotect PROT_READ | PROT_WRITE
<----- no EXIT from mprotect

against the code in acquireExclusiveVMAccess at runtime\vm\VMAccess.cpp

acquireExclusiveVMAccess(J9VMThread * vmThread)
{
#if defined(J9VM_INTERP_TWO_PASS_EXCLUSIVE)
#if defined(J9VM_INTERP_ATOMIC_FREE_JNI_USES_FLUSH)
            flushProcessWriteBuffers(vm);  <-------------------------
#endif /* J9VM_INTERP_ATOMIC_FREE_JNI_USES_FLUSH */
            Assert_VM_true(0 == vm->exclusiveAccessResponseCount);
#endif /* J9VM_INTERP_TWO_PASS_EXCLUSIVE */
...

and flushProcessWriteBuffers at runtime\vm\FlushProcessWriteBuffers.cpp

#if defined(J9VM_INTERP_ATOMIC_FREE_JNI_USES_FLUSH)
flushProcessWriteBuffers(J9JavaVM *vm)
{
...
#elif defined(LINUX) || defined(AIXPPC) /* WIN32 */
    if (NULL != vm->flushMutex) {
        omrthread_monitor_enter(vm->flushMutex);
        void *addr = vm->exclusiveGuardPage.address;
        UDATA pageSize = vm->exclusiveGuardPage.pageSize;
-----> int mprotectrc = mprotect(addr, pageSize, PROT_READ | PROT_WRITE);

As talked to @gacholio, this is a new feature called Enable atomic-free JNI that was implemented on all platforms except ARM and OSX. Given that both ARM and RISC-V belong the family of RISC (Reduced Instruction Set Computer), this feature should be disabled on RISC-V for now. I will try to modify the config/setting to see whether it works without the feature.

After fixing up a few minor issues during compilation, the cross-built JDK is functionally working now on the Linux-RISCV/QEMU (downloaded from https://fedorapeople.org/groups/risc-v/disk-images/) as follows:

[root@stage4 RISCV_OPENJ9]# uname -a
Linux stage4.fedoraproject.org 4.19.0-rc8 #1 SMP Wed Oct 17 15:11:25 UTC 2018 
riscv64 riscv64 riscv64 GNU/Linux

[root@stage4 RISCV_OPENJ9]# jdk11_rv64_openj9_v16/bin/java  -version
openjdk version "11.0.4-internal" 2019-07-16
OpenJDK Runtime Environment (build 11.0.4-internal+0-adhoc.jincheng.openj9-openjdk-jdk11)
Eclipse OpenJ9 VM (build riscv_openj9_v2_uma-43a37e3, JRE 11 Linux riscv64-64-Bit Compressed References 20190528_000000 (JIT disabled, AOT disabled)
OpenJ9   - 43a37e3
OMR      - 0dd3de9
JCL      - a699a14 based on jdk-11.0.4+4)

[root@stage4 RISCV_OPENJ9]# cat  HelloRiscv.java
public class HelloRiscv {
   public static void main(String[] args) {
      System.out.println("Hello, Linux/RISC-V");
   }
}

[root@stage4 RISCV_OPENJ9]# jdk11_rv64_openj9_v16/bin/javac  HelloRiscv.java
[root@stage4 RISCV_OPENJ9]# jdk11_rv64_openj9_v16/bin/java  HelloRiscv
Hello, Linux/RISC-V

Given that DDR and other required libraries (X11, etc) are disabled on the cross-build (due to the lack of support on the GNU cross-toolchain), I will get started to set up the environment inside the Linux-RISCV/QEMU by installing all packages required for the native compilation to see whether it is feasible to compile a full-featured native build with the cross-built JDK above as the boot JDK.

Congrats @ChengJin01 on this major milestone!

Already set up the environment in the Linux-RISCV/QEMU and currently working on the following issue detected during the compilation:

java.lang.UnsatisfiedLinkError: awt (Not found in com.ibm.oti.vm.bootstrap.library.path)
        at java.base/java.lang.ClassLoader.loadLibraryWithPath(ClassLoader.java:1707)
...
        at build.tools.icondata.awt.ToBin.main(ToBin.java:35)
gmake[3]: *** [GensrcIcons.gmk:109: /root/RISCV_OPENJ9/openj9-openjdk-jdk11/build/linux-riscv64-normal-server-release/support/gensrc/java.desktop/sun/awt//AWTIcon32_java_icon16_png.java] Error 1
gmake[2]: *** [make/Main.gmk:112: java.desktop-gensrc-src] Error 2

There were a bunch of the exceptions on failure to locate the native awtlibrary (/lib/libawt.so) in the boot JDK as it can't be generated outside due to the lack of X11 support on the cross-toolchain.

Given that the native build should be able to generate the native awtlibrary in the later compilation, I will try to modify the related makefile scripts see whether the following steps help to fix the issue:
1) disable the code of loading the native awt library at the java level (when the boot JDK is a cross-build) so as to enable compiling the native awt library later for the first native build.
2) if everything works fine, replace the cross-build with the compiled native build (already contains the native awtlibrary) as the boot JDK and enable the java loading code above so as to compile another native build.

@ChengJin01 does running with -Djava.awt.headless=true help? It puts the VM into an headless mode which may avoid the need for AWT

@DanHeidinga, unfortunately it doesn't work as the -Djava.awt.headless=true option is already specified in the existing the script code according to the compilation log. e.g.

Calling /root/RISCV_OPENJ9/jdk11_rv64_openj9_v16/bin/java 
-XX:+UseSerialGC -Xms32M -Xmx512M -XX:TieredStopAtLevel=1 
-Duser.language=en -Duser.country=US -Xshare:auto 
-Ddtd_home=/root/RISCV_OPENJ9/openj9-openjdk-jdk11/make/data/dtdbuilder 
-Djava.awt.headless=true <------------------ already specified here
-cp /root/RISCV_OPENJ9/openj9-openjdk-jdk11/build/linux-riscv64-normal-server-release/buildtools/jdk_
tools_classes build.tools.dtdbuilder.DTDBuilder html32 ...
Exception in thread "main" java.lang.UnsatisfiedLinkError: awt (Not found in com.ibm.oti.vm.bootstrap.library.path)
        at java.base/java.lang.ClassLoader.loadLibraryWithPath(ClassLoader.java:1707)
        at java.base/java.lang.ClassLoader.loadLibraryWithClassLoader(ClassLoader.java:1672)
        at java.base/java.lang.System.loadLibrary(System.java:613)
        at java.desktop/java.awt.Toolkit$3.run(Toolkit.java:1395)
...

It seems the java code in OpenJDK (specifically in loadLibraries() at src/java.desktop/share/classes/java/awt/Toolkit.java) is forced to load the native library as long as X11 is enabled.

Hi,
I have successfully managed to compile @ChengJin01 openjdk & openj9 on HiFive Unleashed (i.e., on a physical hardware)

jv@unleashed:~/Projects/J9/openj9-openjdk-jdk11$ pwd
/home/jv/Projects/J9/openj9-openjdk-jdk11
jv@unleashed:~/Projects/J9/openj9-openjdk-jdk11$ uname -a
Linux unleashed 5.0.0-rc1-00028-g0a657e0d72f0 #2 SMP Sun Feb 17 07:27:02 GMT 2019 riscv64 GNU/Linux                                                                         
jv@unleashed:~/Projects/J9/openj9-openjdk-jdk11$ head -n 6 /proc/cpuinfo
processor       : 0
hart            : 1
isa             : rv64imafdc
mmu             : sv39
uarch           : sifive,rocket0

jv@unleashed:~/Projects/J9/openj9-openjdk-jdk11$ cd build/linux-riscv64-normal-server-release/images/jdk/                                                                   

jv@unleashed:~/Projects/J9/openj9-openjdk-jdk11/build/linux-riscv64-normal-server-release/images/jdk$ ./bin/java -version                                                   
openjdk version "11.0.4-internal" 2019-07-16
OpenJDK Runtime Environment (build 11.0.4-internal+0-adhoc.jv.openj9-openjdk-jdk11)
Eclipse OpenJ9 VM (build riscv_openj9_v2_uma-25784ea1f, JRE 11 Linux riscv64-64-Bit Compressed References 20190604_000000 (JIT disabled, AOT disabled)                      
OpenJ9   - 25784ea1f
OMR      - 0dd3de90
JCL      - 6f627e2338 based on jdk-11.0.4+4)

jv@unleashed:~/Projects/J9/openj9-openjdk-jdk11/build/linux-riscv64-normal-server-release/images/jdk$ javac /tmp/HelloRISCV.java    

jv@unleashed:~/Projects/J9/openj9-openjdk-jdk11/build/linux-riscv64-normal-server-release/images/jdk$ cat /tmp/HelloRISCV.java 
public class HelloRISCV {
   public static void main(String[] args) {
      System.out.println("Hello, Linux/RISC-V");
   }
}

jv@unleashed:~/Projects/J9/openj9-openjdk-jdk11/build/linux-riscv64-normal-server-release/images/jdk$ ./bin/javac /tmp/HelloRISCV.java                                    

jv@unleashed:~/Projects/J9/openj9-openjdk-jdk11/build/linux-riscv64-normal-server-release/images/jdk$ ./bin/java -cp /tmp HelloRISCV      
Hello, Linux/RISC-V
jv@unleashed:~/Projects/J9/openj9-openjdk-jdk11/build/linux-riscv64-normal-server-release/images/jdk$ 

This is a native build on Debian (no cross compilation). AWT seems to be working just fine, tried

./bin/java -jar demo/jfc/SwingSet2/SwingSet2.jar

window opens as expexted. However, it is very, very slow - that's expected as there's no JIT (not yet).

The whole process was fairly straightforward, essentially just ../configure and then make all.
You did amazing job, @ChengJin01

@janvrany , really appreciate your confirmation of the compilation on the physical machine ahead of us.

Here're a few questions I'd like to check with you:
1) Where is your boot JDK? it was supposed to be built with the cross-toolchain because there is no way to compile a native build without the cross-build as the boot JDK.
2) If the native build was compiled via the cross-build, theoretically it should complain with exception of loading the native awt library because I already disabled the generation of this library on the cross-build (lack of X11 support in the cross-toolchain), which means there is no awt support on the cross-build.
3) please help to check the following lines exist in your build.log for both the native build and the cross-build. e.g.

Compiling 2788 files for java.desktop
...
Creating support/modules_libs/java.desktop/libawt_xawt.so from 57 file(s)
Creating support/modules_libs/java.desktop/libawt.so from 73 file(s)
Creating support/modules_libs/java.desktop/libawt_headless.so from 26 file(s)

These messages above shouldn't exist on the cross-build.

4) as for ./bin/java -jar demo/jfc/SwingSet2/SwingSet2.jar, did the all pictures show up correctly including the interlaced ones? If yes, that means the JNI/riscv_ffi works good.

1. Where is your boot JDK? it was supposed to be built with the cross-toolchain because there is no way to compile a native build without the cross-build as the boot JDK.

You caught me. I cheated. I did not want to go through the pain of cross-compilation environment, have enough of that as @shingarov mentioned above. So instead I use OpenJDK 11 in zero mode which is available in Debian's riscv64 repos. That works just fine, but it is very slow and I'm unpatient, so I cheated even more and created a "fake" JDK 11 by providing java (javac and so on) commands that run real java on a remote x86-64 server using ssh. I guess you don't like it but it did work. The latter is not needed, you just need to wait loooooonger.

2. If the native build was compiled via the cross-build, theoretically it should complain with exception of loading the native awt library because I already disabled the generation of this library on the cross-build (lack of X11 support in the cross-toolchain), which means there is no awt support on the cross-build.

As I said, no cross-build was involved and my RISX-V Debian has all X11, ALSA, CUPS and what not libraries ready.

3. please help to check the following lines exist in your build.log for both the native build and the cross-build. e.g.

Hmm...I cannot find them. But maybe that's because I re-run make all few times as I was tuning NFS params for my fake JDK (NFS attribute caching does not play well with such a hack). I'll remove everything and launch the build tonight, will let you know tomorrow.

1. as for `./bin/java -jar demo/jfc/SwingSet2/SwingSet2.jar`, did the all pictures show up correctly including the interlaced ones? If yes, that means the JNI/riscv_ffi works good.

Not sure which pictures exactly, but see attached screenshots of what I see.

Screenshot from 2019-06-04 17-34-34
Screenshot from 2019-06-04 17-35-17

HTH

So instead I use OpenJDK 11 in zero mode which is available in Debian's riscv64 repos. That works just fine, but it is very slow and I'm unpatient, so I cheated even more and created a "fake" JDK 11 by providing ...

That explains why you can get through the disturbing issue with the native awtlibrary as they are offered from OpenJDK/zero-mode on Debian.

In any case, we still prefer to compile a full-featured native build via the cross-build in some way with our own changes rather than other means, which helps to confirm cross-builds and native-builds can be compiled with the same changes.

Hmm...I cannot find them. But maybe that's because I re-run make all few times as I was tuning NFS params for my fake JDK...

These messages should show up in the build.log of your native build. please check whether the following libraries exist in your native build:

lib/libawt_xawt.so
lib/libawt.so from
lib/libawt_headless.so

If not, it doesn't look correct in compiling java.desktop and all awt related native libraries are missing in the native build (I am still working on this part. So the changes there still are to be updated)

Not sure which pictures exactly, but see attached screenshots of what I see.

The pictures are correct, which confirms the changes to JNI/riscv_ffi work good.

In any case, we still prefer to compile a full-featured native build via the cross-build in some way with our own changes rather than other means, which helps to confirm cross-builds and native-builds can be compiled with the same changes.

Sure, I was just curious what would it take. In a long run one has to do it the proper way, no question about it.

Hmm...I cannot find them. But maybe that's because I re-run make all few times as I was tuning NFS params for my fake JDK...

lib/libawt_xawt.so
lib/libawt.so from
lib/libawt_headless.so

Yes, they do exist in my .../images/jdk/lib directory.

Yes, they do exist in my .../images/jdk/lib directory.

That means the native build was compiled correctly with the OpenJDK/zero-mode on Debian/riscv64.

It seems the steps mentioned previously (splitting the compilation to two steps) only work for case 1) but fail to work for case 2)
case 1): the JNI-related java code that doesn't depend on the native awtlibrary
case 2): the JNI-related java code that must load the native awtlibrary to call the native method initIDs() but the the native awtlibrary doesn't exist in the cross-build.

The detailed explanation is at src/java.desktop/share/classes/java/awt/Toolkit.java as follows:

    /**
     * Initialize JNI field and method ids
     */
    private static native void initIDs();

    /**
     * WARNING: This is a temporary workaround for a problem in the
     * way the AWT loads native libraries. A number of classes in the
     * AWT package have a native method, initIDs(), which initializes
     * the JNI field and method ids used in the native portion of
     * their implementation.
     *
     * Since the use and storage of these ids is done by the
     * implementation libraries, the implementation of these method is
     * provided by the particular AWT implementations (for example,
     * "Toolkit"s/Peer), such as Motif, Microsoft Windows, or Tiny. The
     * problem is that this means that the native libraries must be
     * loaded by the java.* classes, which do not necessarily know the
     * names of the libraries to load. A better way of doing this
     * would be to provide a separate library which defines java.awt.*
     * initIDs, and exports the relevant symbols out to the
     * implementation libraries.
     *
     * For now, we know it's done by the implementation, and we assume
     * that the name of the library is "awt".  -br.
     *
     * If you change loadLibraries(), please add the change to
     * java.awt.image.ColorModel.loadLibraries(). Unfortunately,
     * classes can be loaded in java.awt.image that depend on
     * libawt and there is no way to call Toolkit.loadLibraries()
     * directly.  -hung
     */
    private static boolean loaded = false;
    static void loadLibraries() {
        if (!loaded) {
            java.security.AccessController.doPrivileged(
                new java.security.PrivilegedAction<Void>() {
                    public Void run() {
                        System.loadLibrary("awt");
                        return null;
                    }
                });
            loaded = true;
        }
    }

As mentioned in the comment above,

     * A better way of doing this would be to provide a separate library which defines java.awt.*
     * initIDs, and exports the relevant symbols out to the implementation libraries.

which means we might have to move initIDs() into a separate library instead of awt in which case it can be generated in the cross-build. Another question is whether it covers all situations we encounter here (need to investigate case by case).

Need to get involved in a job related to the sharedclasses optimization (one month or so) and will get back to this after it is done.

Hi @ChengJin01 ,
thank you and everyone else for the effort on this matter.

I am trying to repeat what you reported above (running a cross-compiled binary in QEMU/Fedora), but coming from the hardware background I seem to be having some very basic issues...

I do have a RISCV toolchain set (on top of Mint 18), I have been using it for a while now, and I do get through the configuration part, having put together all the dependencies that ./configure asked for, including the freemarker.jar and bootjdk11. However, when I try pointing make to use riscv64-unknown-linux-gnu-gcc, I get things I am, unfortunately, incapable of dealing with:

../../omrmakefiles/rules.mk:372: recipe for target 'ut_omrmm.o' failed
GNUmakefile:304: recipe for target 'gc/base' failed
GNUmakefile:232: recipe for target 'mainbuild' failed
GNUmakefile:32: recipe for target 'phase_omr' failed
/home/apaj/openj9/openj9-openjdk-jdk11/closed/OpenJ9.gmk:530: recipe for target 'build-j9' failed
/home/apaj/openj9/openj9-openjdk-jdk11/closed/make/Main.gmk:49: recipe for target 'j9vm-build' failed

I tried looking around for advice what else is required for the RISCV cross-compilation environment, but I keep getting directed to plain old: riscv-isa-sim, riscv-pk, riscv-gnu-toolchain, like in the lowRISCV blog. I do sense that this has to do with Yocto/OpenEmbedded, I even tried following instructions from this pdf, but... I am, and I am truly sorry for this, simply failing to grasp the concept, having no experience in advanced software topics, such as cross-compilation.

So... is there any chance somebody could direct me to a place or maybe even here put together the steps that lead to successful make with riscv64-unknown-linux-gnu-gcc?

Thanks, everyone.

I am trying to repeat what you reported above (running a cross-compiled binary in QEMU/Fedora), but coming from the hardware background I seem to be having some very basic issues...

Hi @apaj,
We have not yet merged/publicized all our changes including OpenJDK & OpenJ9 & OMR (plus the compilation steps) because there are still a bunch of work left to be addressed on the native compilation. A few OMR JIT developers are able to compile a cross-build after they downloaded our changes privately only for their test and further work on JIT side.

So you will end up with any compilation failure which is expected without these changes even though you can fix the configuration part at ./configure on your own.

In addition, the cross-toolchain (plus any related document online) is poorly maintained, which means it needs to be specially customized for the cross-compilation on OpenJ9.

@apaj, we can send the basic compilation steps & link of changes to you to get started if you are interested in compile a cross-build (without JIT) from the scratch for private use (please provide an e-mail to me for confirmation)

I notice the cross-toolchain at https://github.com/riscv/riscv-gnu-toolchain changed up to date recently. Need to double-check to see whether the last changes with less adjustment (specifically whether the linux-headers (5.0), riscv-gcc (8.3.0) and riscv-glibc in there matches during compilation) work correctly to compile a cross-build before getting started to address the issue with the native awtlibrary on the native compilation.

Currently refactoring some code in byte verifier to address the issue with illegal class names. Will get back to this once it is finished.

Hello @ChengJin01 ,

thank you so much for responding and being willing to share. My current work on this topic is for testing, but I am working for a company, so I guess it qualifies as commercial work. However, we don't do changes to this level of software, so no proprietary IP would come of this, i.e. everything would remain open source - if that helps. If it does, address is aleksandar [dot] pajkanovic {at} gmail

If not, I'll try repeating @janvrany 's trick in Debian. I already did some work with Java Zero in Fedora-RISCV, so I hope that it shouldn't be much difference in Debian. I just hoped to avoid that, since there are no filesystem images for QEMU and I don't have experience with creating chroot.

Still, thank you very much, I am looking forward to future updates.

@apaj , the compilation steps were sent to your e-mail (please confirm once you receive that) as requested. Let me know if anything confusing.

Hi @ChengJin01 ,
I confirm I received the email.
Thanks a lot, I'll dive right in and let you know how it goes.

@apaj
Yeah, building suitable Debian RISC-V images is a painful and frustrating exercise.
You may want to have a look at https://github.com/janvrany/riscv-debian - pain and frustration forged into a set of nice scripts to prepare a working environment.

The investigation on the latest version of cross-toolchain shows it includes linux-headers-5.0, gcc-8.3 and glibc-2.28/2.29, which requires the target OS to support glibc-2.28/2.29 so as to run the compiled build; otherwise, it will end up with failure to find the glibc library as follows:

libjvm.so preloadLibrary(/../jdk11_rv64_openj9/lib/compressedrefs/libj9vm29.so): /lib64/lp64d/libc.so.6:
version `GLIBC_2.28' not found (required by /../jdk11_rv64_openj9_v18/lib/compressedrefs/libj9prt29.so)
libjvm.so failed to load: j9vm29

Given that Fedora/RISCV (stage4 image) only supports glibc_2.27 for now, we still stick to use the branch linux-headers-4.15-rc3 at https://github.com/riscv/riscv-gnu-toolchain/tree/linux-headers-4.15-rc3, which includes linux-headers-4.15, gcc-7.2 and glibc-2.27 or replace the existing linux-headers & gcc & glibc on the latest version with linux-headers-4.15, gcc-7.2 and glibc-2.27 from the branch linux-headers-4.15-rc3.

Thanks @janvrany for sharing!

However, just before I set out on that adventure, @ChengJin01 was kind enough to provide some how-to for cross-compilation and... it worked:

[root@fedora-riscv ~]# ~/java/bin/java -version
openjdk version "11.0.4-internal" 2019-07-16
OpenJDK Runtime Environment (build 11.0.4-internal+0-adhoc.apaj.openj9-openjdk-jdk11)
Eclipse OpenJ9 VM (build riscv_openj9_v2_uma-f873582, JRE 11 Linux riscv64-64-Bit Compressed References 20190715_000000 (JIT disabled, AOT disabled)
OpenJ9   - f873582
OMR      - c4c7512
JCL      - fd4b314 based on jdk-11.0.4+10)

This is only to confirm that I've managed to built it and roll it in my Fedora RISCV. I will try some more applications in the following days.

Thanks @ChengJin01 , thanks @janvrany and everyone else.

As mentioned previously, the main issue with the native awtlibrary is that the java code there loads the awtlibrary (doesn't exist in the cross-build) so as to call the native method initIDs(), which is irrelevant to AWT GUI operation.

To simplify the initial work in native compilation, I will first explore to only add/modify a small piece of related code & script (e.g. initIDs() in src/java.desktop/share/classes/java/awt/Toolkit.java and src/java.desktop/share/classes/sun/java2d/Disposer.java) to check whether a separate native library for the cross-build (including all required native initIDs()) works for us this way.

I managed to create a native awt_initlibrary (which includesinitIDs() of Toolkit.java & Disposer.java) in the cross-compilation as follows:

.../openj9-openjdk-jdk11/build/linux-riscv64-normal-server-release/images/jdk/lib# ls libawt*
libawt_init.debuginfo  libawt_init.so
.../openj9-openjdk-jdk11/build/linux-riscv64-normal-server-release/images/jdk/lib# 
grep -r  Java_java_awt_Toolkit_initIDs  ./*
Binary file ./libawt_init.debuginfo matches
Binary file ./libawt_init.so matches
.../openj9-openjdk-jdk11/build/linux-riscv64-normal-server-release/images/jdk/lib# 
grep -r  Java_sun_java2d_Disposer_initIDs  ./*
Binary file ./libawt_init.debuginfo matches
Binary file ./libawt_init.so matches

The later native compilation on Fedora/RISCV seems working with the native library but it still fails with loading the awtlibrary from other java code. Need to analyze the failures case by case.

I notice that the X11 related headers (used for AWT) are already excluded if HEADLESSis defined during compilation. So I will try to recreate the cross-build by specifying --enable-headless-only=yes to avoid generating a separate native library (still keep using the original awt library) intended for initIDs() to see whether it works to compile a native build on Fedora/RISCV. If not the case or more modification is incurred, still need to stick to the separate awt_initlibrary.

The native awtlibrary was generated in the cross-build with a few code/script modifications as --enable-headless-only=yesdoesn't cover all native code with X11headers included.

.../openj9-openjdk-jdk11/build/linux-riscv64-normal-server-release/images/jdk/lib# ls -l libawt*
-rwxr-xr-x 1 root root 3266056 Jul 24 10:09 libawt.debuginfo
-rwxr-xr-x 1 root root   58288 Jul 24 10:09 libawt_headless.debuginfo
-rw-r--r-- 1 root root   13312 Jul 24 10:09 libawt_headless.so
-rw-r--r-- 1 root root  616136 Jul 24 10:09 libawt.so  <-------

.../openj9-openjdk-jdk11/build/linux-riscv64-normal-server-release/images/jdk/lib# grep -r  Toolkit_init  ./*
Binary file ./libawt.debuginfo matches
Binary file ./libawt.so matches
.../openj9-openjdk-jdk11/build/linux-riscv64-normal-server-release/images/jdk/lib# grep -r  Disposer_init  ./*
Binary file ./libawt.debuginfo matches
Binary file ./libawt.so matches

Now go forward to the native compilation on Fedora/RISCV with this cross-build to see whether it really works this way.

There is no issue/exception detected for now when compiling awt-related java code & libraries in native compilation on Fedora/RISCV (still in the progress of native compilation). Will see how it goes till the end of the compilation.

With approx 1~2 day on the native compilation (due to the lack of JIT support/ it seems most of time was spent upon the compilation of java code in OpenJDK & OpenJ9 with the cross-build), the final JDK (native build) has been created with the cross-build (in headless AWT mode) and the native awtlibrary was successfully generated as expected:

root@stage4]# uname -a
Linux stage4.fedoraproject.org 4.19.0-rc8 #1 SMP Wed Oct 17 15:11:25 
UTC 2018 riscv64 riscv64 riscv64 GNU/Linux

[root@stage4 images]# jdk/bin/java   -version
openjdk version "11.0.4-internal" 2019-07-16
OpenJDK Runtime Environment (build 11.0.4-internal+0-adhoc.root.openj9-openjdk-jdk11)
Eclipse OpenJ9 VM (build riscv_openj9_v2_uma-1499e18b0, 
JRE 11 Linux riscv64-64-Bit Compressed References 20190728_000000 (JIT disabled, AOT disabled)
OpenJ9   - 1499e18b0
OMR      - f3072d61
JCL      - afffeda1a3 based on jdk-11.0.4+11)

[root@stage4 images]# cd jdk/lib
[root@stage4 lib]# ls  -l  libawt*.so
-rw-r--r-- 1 root root 628688 Jul 28 22:01 libawt.so
-rw-r--r-- 1 root root  35528 Jul 28 22:01 libawt_headless.so
-rw-r--r-- 1 root root 404832 Jul 28 22:01 libawt_xawt.so

Now need to figure out whether it is possible to set up the test environment on Fedora/Riscv to run part of OpenJ9 sanity test with the natively built JDK by hacking the setting code/scripts involved.

The compilation on the DDR related test shows it failed to locate j9ddr related package as follows:

compile:
     [echo] Ant version is Apache Ant(TM) version 1.10.6 compiled on May 2 2019
     [echo] ============COMPILER SETTINGS============
     [echo] ===fork:                         yes
     [echo] ===executable:                   /root/RISCV_OPENJ9/jdk11_rv64_openj9_native_v3/bin/javac
     [echo] ===debug:                        on
     [echo] ===destdir:                      /root/RISCV_OPENJ9/openj9/test/TestConfig/../../jvmtest/functional/DDR_Test
    [javac] Compiling 54 source files to /root/RISCV_OPENJ9/openj9/test/functional/DDR_Test/bin
    [javac] /root/RISCV_OPENJ9/openj9/test/functional/DDR_Test/src/j9vm/test/ddrext
/AutoRun.java:60: error: package com.ibm.j9ddr.tools.ddrinteractive does not exist <---------
    [javac] import com.ibm.j9ddr.tools.ddrinteractive.Context;
...

which meansj9ddr.jar was missing in the JDK (not generated during compilation) as DDRwas disabled previously by default.

I need to update the setting scripts involved and install libdwarf-develon Fedora/Riscv so as to create a new build via the native compilation to see how it goes to generate DDRrelated components.

The compilation on DDR failed to generate j9ddr related class on Fedora/Riscv at:

.../debugtools/DDR_VM/src/com/ibm/j9ddr/vm29/j9/gc/GCHeapMap.java:92: error: cannot find symbol
                MM_MarkMapPointer markMap = sgc._markingScheme()._markMap();
                                               ^
  symbol:   method _markingScheme()
  location: variable sgc of type MM_SegregatedGCPointer
...
3 errors
gmake[3]: *** [/.../openj9-openjdk-jdk11/closed/make/DDR-jar.gmk:60: 
/.../openj9-openjdk-jdk11/build/linux-riscv64-normal-server-release
/support/ddr/classes/_the.BUILD_DDR_CLASSES_batch] Error 1
gmake[2]: *** [/.../openj9-openjdk-jdk11/closed/make/Main.gmk:70: openj9.dtfj-ddr-jar] Error 2

Investigation shows MM_SegregatedGCPointer.class had not yet been generated at ./build/linux-x86_64-normal-server-release/support/ddr/classes/com/ibm/j9ddr/vm29/pointer/generated when compilingGCHeapMap.java as the compilation of DDR pointerclasses was still under way at an pretty slow speed (most of them were generated but not yet finished with the cross-build due to the lack of JIT support). Theoretically, DDR pointer & structure classes should be created before compiling the j9ddr related classes in any case. So I need to sort out how to update the DDR related makefile scripts involved to get this work on Fedora/Riscv.

Temporarily investigating a bytecode verifier related issue. Will get back to this once it gets resolved.

I suspect the problem with DDR pointer & structure classes came from the make jobs in parallel specified with --with-jobs=4 to speed up the compilation. Will try to set up --with-jobs=1 in configuration to see how it goes by running jobs serially.

The native compilation result shows --with-jobs=1 doesn't help to generate MM_SegregatedGCPointer.class. Given that MM_SegregatedGCPointer.java was already created at openj9/debugtools/DDR_VM/src/com/ibm/j9ddr/vm29/pointer/generated, need to check what happened to the java code generation in all code/scripts involved.

Investigation shows that superset.datwas not correctly generated on RISCV (MM_GlobalCollectorwas missing) as compared to that on Linux/x86.
As a result, the generated pointer classes was wrong in java code.
e.g.

[1]Linux/RISCV
S|MM_SegregatedGC|MM_SegregatedGCPointer|  <--- on RISCV

MM_SegregatedGCPointer.java
@com.ibm.j9ddr.GeneratedPointerClass(structureClass=MM_SegregatedGC.class)
public class MM_SegregatedGCPointer extends StructurePointer { <---- was supposed to extend from MM_GlobalCollectorPointer

[2]Linux/x86
S|MM_SegregatedGC|MM_SegregatedGCPointer|MM_GlobalCollector  <--- on Linux_x86

MM_SegregatedGCPointer.java
@com.ibm.j9ddr.GeneratedPointerClass(structureClass=MM_SegregatedGC.class)
public class MM_SegregatedGCPointer extends MM_GlobalCollectorPointer { <------

Also checked the corresponding libj9ddr_misc29.so.dbg (used to output MM_SegregatedGC in superset.dat) and it seems the DW_TAG_inheritance entry doesn't exist in this file as compared to that of Linux/X86:

[1]Linux/RISCV
 <1><1dd6b>: Abbrev Number: 32 (DW_TAG_class_type)
    <1dd6c>   DW_AT_name        : (indirect string, offset: 0x11755): MM_SegregatedGC
    <1dd70>   DW_AT_declaration : 1
 <1><1dd70>: Abbrev Number: 3 (DW_TAG_pointer_type)
    <1dd71>   DW_AT_byte_size   : 8
    <1dd72>   DW_AT_type        : <0x1dd6b>
 <1><1dd76>: Abbrev Number: 37 (DW_TAG_subprogram)
    <1dd77>   DW_AT_external    : 1

[2]Linux/x86
 <1><37fc6>: Abbrev Number: 45 (DW_TAG_class_type)
    <37fc7>   DW_AT_name        : (indirect string, offset: 0x35d8f): MM_SegregatedGC
    <37fcb>   DW_AT_byte_size   : 456
    <37fcd>   DW_AT_decl_file   : 192
    <37fce>   DW_AT_decl_line   : 37
    <37fcf>   DW_AT_containing_type: <0x1a086>
    <37fd3>   DW_AT_sibling     : <0x3851e>
 <2><37fd7>: Abbrev Number: 23 (DW_TAG_inheritance) <-------
    <37fd8>   DW_AT_type        : <0x25427>
    <37fdc>   DW_AT_data_member_location: 0
    <37fdd>   DW_AT_accessibility: 1    (public)

Need to double-check openj9/runtime/ddr/module.xml to see whether any setting on RISCV was missing during compilation.

The compilation result shows the missing DDR pointer class was already correctly generated by specifying -femit-class-debug-always for CFLAGS & CXXFLAGS on Linux-based platform.

            <makefilestub data="CFLAGS += -femit-class-debug-always">
                ...
                <include-if condition="spec.linux_riscv64.*"/>
            </makefilestub>
            <makefilestub data="CXXFLAGS += -femit-class-debug-always">
        ...
                <include-if condition="spec.linux_riscv64.*"/>
            </makefilestub>

Results are as follows:

[root@stage4 build]# vi ./linux-riscv64-normal-server-release/vm/superset.dat
...
S|MM_SegregatedGC|MM_SegregatedGCPointer|MM_GlobalCollector

[root@stage4 openj9-openjdk-jdk11]# find . -name MM_SegregatedGCPointer.*
./build/linux-riscv64-normal-server-release/support/ddr/classes/com/ibm/j9ddr/vm29/pointer/generated/MM_SegregatedGCPointer.class
./openj9/debugtools/DDR_VM/src/com/ibm/j9ddr/vm29/pointer/generated/MM_SegregatedGCPointer.java

[root@stage4 openj9-openjdk-jdk11]# vi ./openj9/debugtools/DDR_VM/src/com/ibm/j9ddr/vm29/pointer/generated/MM_SegregatedGCPointer.java
...
@com.ibm.j9ddr.GeneratedPointerClass(structureClass=MM_SegregatedGC.class)
public class MM_SegregatedGCPointer extends MM_GlobalCollectorPointer { <-----

Will double-check to see whether anything suspicious left to be addressed during the compilation on Fedora/RISCV; otherwise, will get started to set up the test environment on Fedora/RISCV to see how far it goes in sanity / extended test with -Xint specified.

The test environment was already set up on Fedora/RISC-V and the compilation of test cases is still in progress (definitely slow without JIT support). Will see how it goes later in running tests once the test compilation is completed.

The failing sanity tests are detected as follows for now (the whole test got stuck in IllegalMonitorStateException / not yet completed),
failing_tests_riscv.TXT

[1] pltest failed with "No such file or directory"
[2] test with -Xjit:count=0 specified
[3] test with -Xaot:forceaot specified
[4] sharedclasses related tests
[5] jvmtitest failed due to timeout (without JIT)
[6] hardware triggered SIGFPE tests (division by zero)
[7] JLM test failed with "InstanceNotFoundException : java.lang:type=Compilation"
[8] UnsafeTests failed with TestNGException
[9] Threads related test with IllegalMonitorStateException

Need to analyze these failures case by case except the JIT specific tests which should be excluded.

As for [1] pltest failed with "No such file or directory", the error occurred in j9file_test27() at /runtime/tests/port/j9fileTest.c:

const char *localFilename = "j9file_test27.tst";
I_32 testModes[nModes] = {0, 1, 2, 4, 6, 7, 010, 020, 040, 070, 0100, 0200, 0400, 0700, 0666, 0777, 0755, 0644, 02644, 04755, 06600, 04777};
...
rc = j9file_chmod(localFilename, testModes[m]);
if (expectedMode != rc) {
    outputErrorMessage(PORTTEST_ERROR_ARGS, "j9file_chmod() returned %d expected %d\n", rc, expectedMode);
    break;
}
fd2 = FILE_OPEN_FUNCTION(portLibrary, localFilename, EsOpenWrite, rc);
if (0 == (rc & ownerWritable)) {
    if (-1 != fd2) {
        outputErrorMessage(PORTTEST_ERROR_ARGS, "opened read-only file for writing fd=%d expected -1\n", rc);
   <---- failed when FILE_OPEN_FUNCTION returned 3 to fd2 when rc = 0 / m = 0 & testModes[0] = 0
        break;
    }
...

The location of failure indicates the problem error occurred when opening the read-only file j9file_test27.tst after setting 0 to the permission of this file via chmod.
Given that the file-related operations might affect the dump generation, need to scrutinize the piece of omr code to figure out whether there is some difference between Fedora/RISC-V and a generic Linux/X86 platform at this point.

It turns out the problem with j9file_test27() came from the rootaccount initially granted on Fedora/RISC-V in which case opening a read-only file is allowed. So the issue was solved with a newly created normal user account.

Now keep investigating the following issues detected in pltest:

[ERR] 1: j9sysinfo_test_get_l1dcache_line_size
 [ERR]  si.c line 2188: j9sysinfo_get_cache_info returned -355
 [ERR]
 [ERR]
 [ERR] 2: j9sysinfo_test_get_levels_and_types
 [ERR]  si.c line 2008: j9sysinfo_get_cache_info returned -355
 [ERR]
 [ERR] 3: j9dump_test_create_dump_with_name
 [ERR]  j9dumpTest.c line  213: j9dump_create returned: 1, with filename: 
 The core file created by child process with pid = 11680 was not found. 
 Expected to find core file with name "/.../cmdLineTester_pltest_0/core.11680"
 [ERR]
 [ERR] 4: j9dump_test_create_dump_from_signal_handler
 [ERR]  j9dumpTest.c line  311: j9dump_create returned: 1, with filename: 
 The core file created by child process with pid = 11689 was not found. 
 Expected to find core file with name "/.../cmdLineTester_pltest_0/core.11689"
 [ERR]
 [ERR] 5: j9dump_test_create_dump_with_NO_name
 [ERR]  j9dumpTest.c line  153: j9dump_create returned: 1, with filename: 
 The core file created by child process with pid = 11696 was not found. 
 Expected to find core file with name "core.11696"

For j9sysinfo related test cases, they calls j9sysinfo_get_cache_infoat /runtime/port/unix/j9sysinfo.c, which checks the data at/sys/devices/system/cpu/cpu<N>/cache/

/*
 * Cache information is organized as a set of "index" 
directories in /sys/devices/system/cpu/cpu<N>/cache/.
 * In each index directory is a file containing the cache level 
and another file containing the cache type.
 */

int32_t
j9sysinfo_get_cache_info(struct J9PortLibrary *portLibrary, const J9CacheInfoQuery * query)
{
...

However, cpu<N>/cache doesn't exist on Fedora/RISCV as follows:

[root@stage4 systemd]# cd /sys/devices/system/cpu
[root@stage4 cpu]# ls
cpu0  cpu2  cpu4  cpu6  isolated    offline  possible  uevent
cpu1  cpu3  cpu5  cpu7  kernel_max  online   present
[root@stage4 cpu]# ls *
isolated  kernel_max  offline  online  possible  present  uevent

cpu0:
of_node  subsystem  topology  uevent

cpu1:
of_node  subsystem  topology  uevent

cpu2:
of_node  subsystem  topology  uevent

cpu3:
of_node  subsystem  topology  uevent

cpu4:
of_node  subsystem  topology  uevent

cpu5:
of_node  subsystem  topology  uevent

cpu6:
of_node  subsystem  topology  uevent

cpu7:
of_node  subsystem  topology  uevent

So these test cases should be disabled on Fedora/RISC-V.

For j9dump related tests, it seems the system dump was missing in the failing tests.
So I tried with java -Xdump:java+system+snap:events=vmstop -version to generated all dump files but system dump failed to be created:

[jincheng@stage4 dumptest]$ ../jdk11_rv64_openj9_native_v2_awtddrssl/bin/java -Xdump:java+system+snap:events=vmstop -version
openjdk version "11.0.4-internal" 2019-07-16
OpenJDK Runtime Environment (build 11.0.4-internal+0-adhoc.root.openj9-openjdk-jdk11)
Eclipse OpenJ9 VM (build riscv_openj9_v2_uma-3e80a2a47, JRE 11 Linux riscv64-64-Bit 
Compressed References 20190816_000000 (JIT disabled, AOT disabled)
OpenJ9   - 3e80a2a47
OMR      - 814b4668
JCL      - a3f7b7cdf8 based on jdk-11.0.4+11)
JVMDUMP039I Processing dump event "vmstop", detail "#0000000000000000" at 2019/08/23 17:44:19 - please wait.
JVMDUMP032I JVM requested System dump using '/home/jincheng/RISCV_OPENJ9
/dumptest/core.20190823.174419.25827.0001.dmp' in response to an event
JVMPORT030W /proc/sys/kernel/core_pattern setting 
"|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %e" specifies that the core dump 
is to be piped to an external program.  Attempting to rename either core or core.25857.

----> JVMDUMP012E Error in System dump: The core file created by child process 
with pid = 25857 was not found. Expected to find core file 
with name "/home/jincheng/RISCV_OPENJ9/dumptest/core.25857"

JVMDUMP032I JVM requested Java dump using '/home/jincheng/RISCV_OPENJ9
/dumptest/javacore.20190823.174419.25827.0002.txt' in response to an event
JVMDUMP010I Java dump written to /home/jincheng/RISCV_OPENJ9/dumptest
/javacore.20190823.174419.25827.0002.txt
JVMDUMP032I JVM requested Snap dump using '/home/jincheng/RISCV_OPENJ9
/dumptest/Snap.20190823.174419.25827.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /home/jincheng/RISCV_OPENJ9
/dumptest/Snap.20190823.174419.25827.0003.trc

[jincheng@stage4 dumptest]$ ls
javacore.20190823.174419.25827.0002.txt  Snap.20190823.174419.25827.0003.trc

Need to further analyze what happened to the system dump.

Temporarily refactoring test cases intended for invalid boostrap method argument & possible code issue with NLS message generation in there. Will get back to this once it gets solved.

Hello @ChengJin01,

I read the whole content of your work which is really impressive!
What I am trying to do is to write a yocto recipe in order to (cross-)compile openj9.
Actually I am trying to (cross-)compile in x86-64 host for x86-64 target.

I have passed the configure step and now I am on the compilation.
The problem I am facing now is the problem you also faced with the host tools (tracemerge / tracegen / hookgen). The compilation uses the target gcc (x86_64-intel-linux-g++) and it fails because of the missing correct host build flags etc.

I saw above that you solved the problem by:

Just resolved the issue with the trace/hook tools in OMR by overwriting the cross compiler with the local compiler only when compiling the source of these tools. And keep investigating the similar issue with constgen tool in OpenJ9 (need to check whether it has to be created via the local compiler).

Would it be possible to share the changes you did in order to overwrite the cross compiler and its flags?
Thanks a lot in advance!

Hi @PTamis,

For these trace/hook tools (tracemerge / tracegen / hookgen), there is no need to cross-compile them as they are used to generate header files/constants etc which should be exploited in later compilation. So you have to ensure they are only compiled via the local compiler instead of the cross-compiler.

I will send the link of changes to you for reference once you confirm [email protected] is the e-mail (posted on your github) you prefer to receive.

Hello @ChengJin01,

Yeah I know, this is what I also meant with my previous post. The trace/hook tools require the local / native compiler.
Yes this email is fine, you can send me to that one.

Hi @PTamis, the links of changes has been sent to you for reference. The setting for trace/hook tools should be on OMR side. But you might as well double-check the setting in OpenJ9 in case I missed anything.

Investigation shows the core was generated but relocated to /var/lib/systemd/coredump as explained in http://man7.org/linux/man-pages/man5/coredump.conf.5.html:

/etc/systemd/coredump.conf
Storage=
           Controls where to store cores. ... When "external" (the default), 
            cores will be stored in /var/lib/systemd/coredump/. 

against the configuration Fedora/RISC-V as follows:

 cat /etc/systemd/coredump.conf
#  This file is part of systemd.
...

[Coredump]
#Storage=external  <------ core dump is stored at /var/lib/systemd/coredump by default
#Compress=yes  <---- compressed with .lz4 by default

ls -l  /var/lib/systemd/coredump
total 212604
-rw-r-----  1 ... 'core.DestroyJavaVM\x20h.0.a67f8e39c8bc4e0e8cac422930b884ea.32073.1567051514000000.lz4'
...

To relocate the core dump to the current/working directory, the following settings should be modified in our case:
1) enable the dumpable attribute with 1 or 2 (enabling core dump)

cat /proc/sys/fs/suid_dumpable
0 <--- originally disabled

changes to:

echo 1 >/proc/sys/fs/suid_dumpable

2) disable the compression of the core dump

/etc/systemd/coredump.conf
[Coredump]
#Storage=external
Compress=no  <---- avoid compressing with .lz4

3) avoid piping the core to/usr/lib/systemd/systemd-coredump
as the core processed by systemd-coredump will be store at /var/lib/systemd/coredump

cat /proc/sys/kernel/core_pattern
|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %e

changes to:

echo "core.%p" > /proc/sys/kernel/core_pattern

With all these changes above, the core can be correctly generated in the pltest:

DISABLED test targets:
        cmdLineTester_pltest_tty_extended_0
        cmdLineTester_pltest_numcpus_notBound_0
        cmdLineTester_pltest_numcpus_bound_win_0

PASSED test targets:
        cmdLineTester_pltest_0
        cmdLineTester_pltest_numcpus_bound_linux_0

TOTAL: 11   EXECUTED: 2   PASSED: 2   FAILED: 0   DISABLED: 3   SKIPPED: 6
ALL TESTS PASSED

In addition, there might be still issues with the generated core dump as the jdmpview was unable to open the core as follows:

e.g.
bin/jdmpview -core core.3332
DTFJView version 4.29.5, using DTFJ version 1.12.29003
Loading image from DTFJ...

Could not load dump file and/or could not load XML file: <------
Dump: core.3332 not recognised by any core reader
For a list of commands, type "help"; for how to use "help", type "help help"
...

Given that this is something related to DDR (suspiciously something missing in the core generation), I will get back to investigate the problem after solving the rest of test failures.

I will start on vacation from Sept. 4th (tomorrow) to Sept. 22th and get back to keep working on the remaining issues here.

I am currently trying to exclude all JIT specific test cases in scripts involved before addressing other issues.

Already excluded most of JIT specific test cases in sanity test and currently double-checking the tests related to the sharedclasses.

Already addressed the following issues previously detected in the sanity test on Fedora/QEMU emulator, which were mostly directly/indirectly related to JIT.

[1] pltests failed with "No such file or directory" 
<--- changed the core pattern setting in Linux in generate the system core
 (there is still problem with the core content / will investigate later)

[2] tests with -Xjit:count=0 specified 
<--- excluded due to the lack of JIT support

[3] tests with -Xaot:forceaot specified 
<--- excluded due to the lack of JIT support

[4] sharedclasses related tests 
<---  changed the test script code to  recognize riscv64 for 64bit

[5] jvmtitest failed due to timeout 
<---- changed the test framework to disable the timeout counter due to the lack of JIT support

[6] hardware triggered SIGFPE tests (division by zero)
<---- excluded for now (suspect there is no support on Fedora/QEMU)

[7] JLM tests failed with "InstanceNotFoundException : java.lang:type=Compilation"
<-- excluded as the tests are intended to obtain the info of JIT compiler via MXBean.

[9] Threads related tests with IllegalMonitorStateException
<---- excluded as these tests are part of JIT_Test.

Now investigating the issue on [8] UnsafeTests (specific on testArrayCompareAndExchangeObject)

[6] hardware floating-point related tests
<---- excluded for now due to the lack of support on RV64 according to the RISC-V Specification.

I don't know the details, but I do not understand the comment on lack of support. Any CPU with
F and/or D extensions has support for single and/or double floating point. For sure, HiFive Unleashed has both:

jv@unleashed:~$ head -n 5 /proc/cpuinfo 
processor       : 0
hart            : 1
isa             : rv64imafdc
mmu             : sv39
uarch           : sifive,rocket0

I don't have QEMU at hand to check, but I'm fairly sure QEMU does support F and D extensions too.
Single / double precision support is actually in our RISC-V codegen for TR, see for instance:
https://github.com/janvrany/omr/blob/jv/riscv-devel/compiler/riscv/codegen/FPTreeEvaluator.cpp#L211

For sure, you can have RV64 bit cpu without FP support, but I'm inclined to say we require FP support for now (and fix that in future if needed)

I don't know the details, but I do not understand the comment on lack of support. Any CPU with
F and/or D extensions has support for single and/or double floating point. For sure, HiFive Unleashed has both:
...
I don't have QEMU at hand to check, but I'm fairly sure QEMU does support F and D extensions too.
Single / double precision support is actually in our RISC-V codegen for TR.
For sure, you can have RV64 bit cpu without FP support, but I'm inclined to say we require FP support for now (and fix that in future if needed)

@janvrany , Sorry for confusion (I already updated the previous comment) and many thanks for your clarification on F and D extensions (already noticed this part in the specification)

The test is not intended for floating point support but for SIGFPE signal on Fedora/QEMU (a test with division by zero).

Specifically, there are both software triggered SIGFPE (passed) and hardware triggered SIGFPE (failed) test in our sanity test as follows:
[1] software triggered SIGFPE

Testing: softwareFloat  
Test start time: 2019/08/19 15:31:51 Coordinated Universal Time
Running command: java -Xint -Xdump:system:none -cp "/.../openj9_test/test/TestConfig/scripts/testKitGen/../../../../jvmtest/functional/cmdline_options_testresources/cmdlinetestresources.jar" VMBench.GPTests.GPTest   softwareFloat
Time spent starting: 111 milliseconds
Time spent executing: 6467 milliseconds
Test result: PASSED
Output from test:
 [ERR] Unhandled exception
 [ERR] Type=Floating point error vmState=0x00040000
 [ERR] J9Generic_Signal_Number=00000088 Signal_Number=00000008 Error_Value=00000000 Signal_Code=fffffffa
 [ERR] Handler1=0000002000AE0956 Handler2=0000002000BC64EA
 [ERR] PC=000000200004735E RA=0000002000047352 SP=0000002000A1EFB0 X3=0000000000012870
...
void JNICALL
Java_VMBench_GPTests_GPTest_gpSoftwareFloat(JNIEnv *env, jclass clazz, jobject arg1)
{
#if !defined(WIN32)
#if 1
    pthread_kill(pthread_self(), SIGFPE); <---- it works on Fedora/QEMU
#else
...
#endif
#endif
}

[2] hardware triggered SIGFPE

Testing: hardwareFloat  
Test start time: 2019/08/19 15:31:58 Coordinated Universal Time
Running command: java -Xint -Xdump:system:none -cp ".../openj9_test/test/TestConfig/scripts/testKitGen/../../../../jvmtest/functional/cmdline_options_testresources/cmdlinetestresources.jar" VMBench.GPTests.GPTest   hardwareFloat
Time spent starting: 104 milliseconds
Time spent executing: 4724 milliseconds
Test result: FAILED
Output from test:
 [OUT] 10 / 0 = -1
 [ERR] Survived hardware-triggered SIGFPE! <-------------
>> Success condition was not found: [Output match: Unhandled exception]

/* Avoid error in static analysis */
int gpTestGlobalZero = 0;
...
void JNICALL
Java_VMBench_GPTests_GPTest_gpHardwareFloat(JNIEnv *env, jclass clazz, jobject arg1)
{
#if defined(WIN32)
...
#else

    int a = 10; /* clang optimizes the case of dividend==1 */
    int b = gpTestGlobalZero; /* Avoid error in static analysis */
    int c = a/b; <-------- it fails to trigger SIGFPE on Fedora/QEMU (divided by 0)
    printf ("%i / %i = %i\n", a, b, c);  /* here to stop compiler from optimizing out the div-by-zero */
#endif
}

Based on the results, it turns out there is no way to trigger hardware SIGFPE on Fedora/QEMU which should be postponed till we got the real hardware to double-check.

@ChengJin01
Thanks for clarification. I do think this is actually a correct behavior according to RISC-V spec. See
The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Chapter 6 “M” Standard Extension for Integer Multiplication and Division:

The semantics for division by zero and division overflow are summarized in Table 6.1. The quotient
of division by zero has all bits set, i.e. 2 XLEN − 1 for unsigned division or −1 for signed division.

I double-checked on Unleashed and indeed it behaves like in QEMU, that is 10 / 0 results to -1 and no SIGFPE.

@janvrany , many thanks for confirmation on the hardware. That means the test should be excluded as invalid on RISC-V.

The test isn't exactly invalid, it tells us more about what we need to do to get Java working correctly. A divide by zero in Java still needs to throw an ArithmeticException and cannot return -1.

@pshipton, the RISC-V Spec explains why they designed this way as follows:

We considered raising exceptions on integer divide by zero, 
with these exceptions causing a trap in most execution environments. 
However, this would be the only arithmetic trap in the standard ISA 
(floating-point exceptions set flags and write default values, 
but do not cause traps) and would require language implementors to 
interact with the execution environment’s trap handlers for this case. 

Further, where language standards mandate that a divide-by-zero exception must 
cause an immediate control flow change, only a single branch instruction needs to 
be added to each divide operation, and this branch instruction can be inserted 
after the divide and should normally be very predictably not taken, 
adding little runtime overhead. The value of all bits set is returned 
for both unsigned and signed divide by zero to simplify the divider circuitry. 

The value of all 1s is both the natural value to return for unsigned divide, 
representing the largest unsigned number, and 
also the natural result for simple unsigned divider implementations. 
Signed division is often implemented using an unsigned division circuit 
and specifying the same over?ow result simplifes the hardware.

Ok, but regardless, Java still needs to work correctly.

I double-checked the interpreter code at /runtime/vm/BytecodeInterpreter.hpp:
typically e.g.

    /* ..., dividend, divisor => ..., result */
    VMINLINE VM_BytecodeAction
    idiv(REGISTER_ARGS_LIST)
    {
        VM_BytecodeAction rc = EXECUTE_BYTECODE;
        I_32 divisor = *(I_32*)_sp;
        I_32 dividend = *(I_32*)(_sp + 1);
        if (0 == divisor) {  <----------------
            rc = THROW_DIVIDE_BY_ZERO;
        } else {
            _pc += 1;
            _sp += 1;
            if (!((I_32_MIN == dividend) && (-1 == divisor))) {
                *(I_32*)_sp = dividend / divisor;
            }
        }
        return rc;
    }

        case THROW_DIVIDE_BY_ZERO: \
            goto divideByZero; \

divideByZero:
    updateVMStruct(REGISTER_ARGS);
    prepareForExceptionThrow(_currentThread);
    setCurrentExceptionNLS(_currentThread, J9VMCONSTANTPOOL_JAVALANGARITHMETICEXCEPTION, J9NLS_VM_DIVIDE_BY_ZERO);

So our code does check whether divisor is zero to throw out ArithmeticException, regardless of hardware/platforms.

till we got the real hardware to double-check

FWIW, the only "real hardware" which can act as the ultimate reference to check against, is the Spike simulator. That's the ultimate reference. By definition, if a silicon implementation (such as the SiFive U-540 chip which I presume you are referring to) disagrees with Spike, then it's a bug in U-540.

For the failing test testArrayCompareAndExchangeObjectin [8] UnsafeTests, the investigation shows it failed to swap the old Object address with a new Byte[] address in an Objectarray when calling the native method Unsafe.compareAndExchangeObject() as follows:

    public void testArrayCompareAndExchangeObject() throws Exception {
        testObject(new Object[models.length], COMPAREANDEXCH); <----
    }

    private static final Object compareValueObject = new Object();
    protected static final Object[] models = new Object[] { modelByte, ... };

    protected void testObject(Object target, String method) throws Exception {
...
else if (method.equals(COMPAREANDEXCH)) {
     myUnsafe.putObject(base(target, i), offset, compareValueObject);
     checkSameAt(compareValueObject,
---> myUnsafe.compareAndExchangeObject(base(target, i), offset, compareValueObject, models[i])); 

     }

against the native code in /runtime/oti/UnsafeAPI.hpp

    static VMINLINE j9object_t
    compareAndExchangeObject(J9VMThread *currentThread, MM_ObjectAccessBarrierAPI *objectAccessBarrier, j9object_t object, UDATA offset, j9object_t *compareValue, j9object_t *swapValue)
    {
      ...
   else if (offset & J9_SUN_STATIC_FIELD_OFFSET_TAG) {
     ...
----> result = objectAccessBarrier->inlineStaticCompareAndExchangeObject(currentThread, 
fieldClass, (j9object_t*)valueAddress, *compareValue, *swapValue, true);
     }
...

/runtime/gc_include/ObjectAccessBarrierAPI.hpp
    VMINLINE j9object_t
    inlineStaticCompareAndExchangeObject(J9VMThread *vmThread, J9Class *clazz, j9object_t *destAddress, 
j9object_t compareObject, j9object_t swapObject, bool isVolatile = false)
    {
#elif defined(J9VM_GC_COMBINATION_SPEC)
        if (j9gc_modron_wrtbar_always == _writeBarrierType) {
            return vmThread->javaVM->memoryManagerFunctions->j9gc_objaccess_staticCompareAndExchangeObject(vmThread, clazz, destAddress, compareObject, swapObject);
        } else {
   ...
  ---> protectIfVolatileBefore(isVolatile, false);
   j9object_t result = staticCompareAndExchangeObjectImpl(vmThread, destAddress, compareObject, swapObject, isVolatile);
  ---> protectIfVolatileAfter(isVolatile, false);
  ...
  }

    VMINLINE static void
    protectIfVolatileBefore(bool isVolatile, bool isRead)
    {
        if (isVolatile) {
            if (!isRead) {
                VM_AtomicSupport::writeBarrier();  <------
            }
        }
    }

    VMINLINE static void
    protectIfVolatileAfter(bool isVolatile, bool isRead)
    {
        if (isVolatile) {
            if (isRead) {
                VM_AtomicSupport::readBarrier();
            } else {
                VM_AtomicSupport::readWriteBarrier(); <-----
            }
        }
    }

omr/include_core/AtomicSupport.hpp
e.g.
    VMINLINE static void
    writeBarrier()
    {
    /* Neither x86 nor S390 require a write barrier - the compiler fence is sufficient */
#if !defined(ATOMIC_SUPPORT_STUB)
#if defined(AIXPPC) || defined(LINUXPPC)
        __lwsync();
#elif defined(_MSC_VER)
        _ReadWriteBarrier();
#elif defined(__GNUC__)
#if defined(ARM)
        __sync_synchronize();
#elif defined(AARCH64) /* defined(ARM) */
        __asm __volatile ("dmb ishst":::"memory");
#else /* defined(AARCH64) */
        asm volatile("":::"memory");
#endif /* defined(ARM) */
#elif defined(J9ZOS390)
        __fence();
#endif /* defined(AIXPPC) || defined(LINUXPPC) */
#endif /* !defined(ATOMIC_SUPPORT_STUB) */
    }

It looks like there is missing assembly code specific to RISC-V in read & write barriers to help to deal with CompareAndExchangeoperation on Object. Need to double-check against the RISC-V Spec to see whether the fence instruction can be used to handle the issue on RISC-V.

Already added the missing code in read & write barriers and fixed the issue with CompareAndExchangeoperation on Object in UnsafeTests

...
PASSED: testArrayCompareAndSetBoolean (Regular)
PASSED: testArrayCompareAndSetByte (Regular)
PASSED: testArrayCompareAndSetChar (Regular)
PASSED: testArrayCompareAndSetDouble (Regular)
PASSED: testArrayCompareAndSetFloat (Regular)
PASSED: testArrayCompareAndSetInt (Regular)
PASSED: testArrayCompareAndSetLong (Regular)
PASSED: testArrayCompareAndSetObject (Regular)
...
===============================================
    UnsafeTests
    Tests run: 433, Failures: 0, Skips: 0
===============================================

Will keep running the test to see whether any other issue is detected except the system dump generation.

In addition, it seems the GNU cross-toolchain at https://github.com/riscv/riscv-gnu-toolchain was updated recently (the branch linux-headers-4.15-rc3 includes linux-headers-4.15, gcc-7.2.0 and riscv-glibc-2.27 was removed out there). Need to re-build the cross-compiler with the latest changes to see whether everything goes fine.

@janvrany suggested me to double-check whether it is feasible to only do cross-compilation by mounting the Fedora_Linux OS image to the local system (used for for --sysroot / previously pointed to the sysroot path in the cross-toolchain). Given that there are only a small piece of X11 related changes and no changes on DDR (issue with the libdwarf library), I will first try to remove all the changes on X11 to see how it goes. If everything goes fine, some of changes in JDK11 will be updated/removed soon for the latest results.

Already solved the compilation issue with X11 with --sysroot pointed to the mounted Fedora OS image and the created JDK works good in Fedora/QEMU. Now get started to check whether it is feasible to enable DDR in cross-compilation (specifically get the cross-compiler to locate the sysroot path of dwarf related headers/libraries).

Already discussed with @keithc-ca, it seems there is no easy way to achieve this in cross-compilation unless we reorganize the framework of DDR to enable ddrgen locally on the host system to generate all DDR artifacts targeted for RISCV, which seems too complicated and there is no strong motivation to do in this way for the moment. So we still need to do two-steps compilation for DDR.

Still detected test failures in InstanceFieldVarHandleTests& ArrayVarHandleTests in VarHandlewhen calling the native method compareAndExchange in the extended test. Suspect the previous code for fencewas incomplete in dealing with read/write barrier. Need to double-check the code there against the RISC-V Spec to see what happened.

The investigation shows the failing tests passed when adding printf massage before calling __sync_val_compare_and_swapvia lockCompareExchangeU32(we are using this built-in function on RISC-V for CAS operations), which means some sync/contention issue exists inside __sync_val_compare_and_swap. Need to disassemble this function to see how it works.

Looking at the assembly code in lockCompareExchangeU32 (which directly calls __sync_val_compare_and_swap on RISC-V) as follows:

omr/util/omrutil/AtomicFunctions.cpp
uint32_t
compareAndSwapU32(uint32_t *location, uint32_t oldValue, uint32_t newValue)
{
    return VM_AtomicSupport::lockCompareExchangeU32(location, oldValue, newValue);
}

000000000016de40 <compareAndSwapU32>:
...
  16de52:       02f58263                beq     a1,a5,16de76 <compareAndSwapU32+0x36>
  16de56:       0f50000f                fence   iorw,ow
  16de5a:       1404252f                lr.w.aq a0,(s0)
  16de5e:       00951563                bne     a0,s1,16de68 <compareAndSwapU32+0x28>
  16de62:       1d2427af                sc.w.aq a5,s2,(s0) <-- The RL bit is missing in the SC instruction
  16de66:       fbf5                    bnez    a5,16de5a <compareAndSwapU32+0x1a>
...

The RISC-V Spec says:

 Setting the aq bit on the LR instruction, 
and setting both the aq (acquire)  and the rl (release) bit on the SC instruction <---
makes the LR/SC sequence sequentially consistent, 
meaning that it cannot be reordered with earlier
or later memory operations from the same hart.
...
Software should not set the rl bit on an LR instruction unless the aq bit is also set, 
nor should software set the aq bit on an SC instruction unless the rl bit is also set. <---

Which means the RL (release) bit must be set on the SC (store) instruction to maintain
consistency in memory operations in the LR/SC code sequence.

So we replaced __sync_val_compare_and_swapwith the assembly code by changing sc.w.aq to sc.w.aqrl to achieve this and all failing tests passed on compareAndExchangeoperations.

Will check on the remaining tests to see what is left to be addressed.

For the system dump issue, the exception from jdmpview indicates it failed to recognized the file format:

DTFJView version 4.29.5, using DTFJ version 1.12.29003
Loading image from DTFJ...

java.io.IOException: Dump: core.20191023.193721.12962.0001.dmp not recognised by any core reader
        at com.ibm.j9ddr.corereaders.CoreReader.readCoreFile(CoreReader.java:139)
        at com.ibm.j9ddr.view.dtfj.image.J9DDRImageFactory.getImage(J9DDRImageFactory.java:126)
        at openj9.dtfj/com.ibm.dtfj.image.j9.ImageFactory.getImage(ImageFactory.java:251)
        at openj9.dtfjview/com.ibm.jvm.dtfjview.commands.OpenCommand.imagesFromCommandLine(OpenCommand.java:114)
        at openj9.dtfjview/com.ibm.jvm.dtfjview.commands.OpenCommand.run(OpenCommand.java:82)
        at openj9.dtfj/com.ibm.java.diagnostics.utils.Context.tryCommand(Context.java:141)
        at openj9.dtfj/com.ibm.java.diagnostics.utils.Context.execute(Context.java:97)
        at openj9.dtfjview/com.ibm.jvm.dtfjview.CombinedContext.execute(CombinedContext.java:174)
        at openj9.dtfjview/com.ibm.jvm.dtfjview.CombinedContext.execute(CombinedContext.java:158)
        at openj9.dtfjview/com.ibm.jvm.dtfjview.Session.imageFromCommandLine(Session.java:610)
        at openj9.dtfjview/com.ibm.jvm.dtfjview.Session.sessionInit(Session.java:226)
        at openj9.dtfjview/com.ibm.jvm.dtfjview.Session.<init>(Session.java:175)
        at openj9.dtfjview/com.ibm.jvm.dtfjview.Session.getInstance(Session.java:171)
        at openj9.dtfjview/com.ibm.jvm.dtfjview.DTFJView.launch(DTFJView.java:51)
        at openj9.dtfjview/com.ibm.jvm.dtfjview.DTFJView.main(DTFJView.java:46)
Could not load dump file and/or could not load XML file: Dump: core.20191023.153548.8401.0001.dmp not recognised by any core reader

against the code at:

public class CoreReader
{
    private static final List<Class<? extends ICoreFileReader>> coreReaders;
...
    static {
        List<Class<? extends ICoreFileReader>> localReaders = new ArrayList<>();

        // AIX must be the last one, since its validation condition is very
        // weak.
        localReaders.add(MiniDumpReader.class);
        localReaders.add(ELFDumpReaderFactory.class);
        <--- ELFDumpReaderFactory should be used with the ELF file format in our case
        localReaders.add(MachoDumpReaderFactory.class);
    ...
        localReaders.add(AIXDumpReaderFactory.class);
        coreReaders = Collections.unmodifiableList(localReaders);
...
    public static ICore readCoreFile(String path)
            throws IOException
    {
    ...
            for (Class<? extends ICoreFileReader> clazz : coreReaders) {
            try {
                ICoreFileReader reader = clazz.newInstance();
    DumpTestResult result = reader.testDump(path); <---- the exception was thrown out here

                if (result == DumpTestResult.RECOGNISED_FORMAT) {
                    return reader.processDump(path);
                } else {
                    accruedResult = result.accrue(accruedResult); <-------
...
        switch (accruedResult) { <----- accruedResult = UNRECOGNISED_FORMAT
        case FILE_NOT_FOUND:
            throw new FileNotFoundException("Could not find: " + new File(path).getAbsolutePath());
        case UNRECOGNISED_FORMAT:
---> throw new IOException("Dump: " + path + " not recognised by any core reader");

Looking at the code that verifies the file format in ELFDumpReaderFactory.testDump():

/DDR_VM/src/com/ibm/j9ddr/corereaders/elf/ELFDumpReaderFactory.java
    public DumpTestResult testDump(String path) throws IOException
    {
        if (! new File(path).exists()) {
            return DumpTestResult.FILE_NOT_FOUND;
        }

        return ELFFileReader.isELF(CoreReader.getFileHeader(path)) ? 
DumpTestResult.RECOGNISED_FORMAT : DumpTestResult.UNRECOGNISED_FORMAT;
    }

/DDR_VM/src/com/ibm/j9ddr/corereaders/CoreReader.java
    public static byte[] getFileHeader(String path) throws IOException {
        ImageInputStream iis = new FileImageInputStream(new File(path));
        return getFileHeader(iis);
    }

    public static byte[] getFileHeader(ImageInputStream iis) throws IOException {
        byte[] data = new byte[2048];

        try {
            iis.seek(0);        //position at start of the stream
            iis.readFully(data);
....
/DDR_VM/src/com/ibm/j9ddr/corereaders/elf/ELFFileReader.java
    public static boolean isELF(byte[] signature)
    {
        // 0x7F, 'E', 'L', 'F'
        return (0x7F == signature[0] && 0x45 == signature[1]
                && 0x4C == signature[2] && 0x46 == signature[3]);
    }

for ELFFileReader, the idea of verifying the core file format is: it reads the first 2048 bytes out of the specified core file and directly checks the first 4 bytes against 0x7F, 'E', 'L', 'F' to confirm the file format is ELF.

Checking the core file in the hex format indicates its header is ELF as follows:

[jincheng@stage4 testjdk]$ od -t c -t x1 core.20191023.193721.12962.0001.dmp | head
0000000 177   E   L   F 002 001 001  \0  \0  \0  \0  \0  \0  \0  \0  \0
              7f  45  4c  46  02  01  01  00  00  00  00  00  00  00  00  00 <-----
0000020 004  \0 363  \0 001  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0

In such case, it doesn't make any sense for the code to end up with UNRECOGNISED_FORMATif the core reader is ELFDumpReader.

So I need to build a jdk with printing messages to check to things:
1) whether ELFDumpReaderis used in verifying the core file format
2) if so, why it failed to read the ELF header 0x7F, 'E', 'L', 'F'.

The jdk needs to be built inside Fedora/QEMU to enable DDR, which might take over 1 day to finish (without JIT support). So I will keep investigating the following failing tests detected in the extended test in the meantime.

FAILED test targets:
(attachAPI related) 
        TestAttachAPI_0
        TestSunAttachClasses_0
        TestAttachErrorHandling_0
        TestManagementAgent_0
(GC related)
        gcPolicyNogcTest_0
        gcPolicyNogcTest_1
        gcPolicyNogcOOMTest_0
        gcPolicyNogcOOMTest_1
(VM argument)
        VmArgumentTests_0

The failing test cases in VmArgumentTests_0 are as follows:

        FAILED: testMappedOptions
        java.lang.AssertionError: Target process failed
            at org.testng.Assert.fail(Assert.java:96)
            at org.openj9.test.vmArguments.VmArgumentTests.runProcess(VmArgumentTests.java:1585)
            at org.openj9.test.vmArguments.VmArgumentTests.runAndGetArgumentList(VmArgumentTests.java:1545)
            at org.openj9.test.vmArguments.VmArgumentTests.testMappedOptions(VmArgumentTests.java:479)

        FAILED: testXprod
        java.lang.AssertionError: Target process failed
            at org.testng.Assert.fail(Assert.java:96)
            at org.openj9.test.vmArguments.VmArgumentTests.runProcess(VmArgumentTests.java:1585)
            at org.openj9.test.vmArguments.VmArgumentTests.runAndGetArgumentList(VmArgumentTests.java:1545)
            at org.openj9.test.vmArguments.VmArgumentTests.testXprod(VmArgumentTests.java:568)

By adding messages to print out the specified arguments and error ouput, it turns out these two tese cases are specific to JIT which should be excluded in our case:

testMappedOptions:
.../jdk/bin/java
-classpath
.:testng.jar:vmArguments
vmArguments.ArgumentDumper
JVMJ9VM004E Cannot load library required by: -Xjit:count=1 <-----
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Exception in thread "main" java.lang.AssertionError: Target process failed
        at org.testng.Assert.fail(Assert.java:94)
        at vmArguments.VmArgumentTests.runProcess(VmArgumentTests.java:1588)
        at vmArguments.VmArgumentTests.runAndGetArgumentList(VmArgumentTests.java:1547)
        at vmArguments.VmArgumentTests.testMappedOptions(VmArgumentTests.java:480)
...

testXprod:
.../jdk/bin/java
-classpath
.:testng.jar:vmArguments
-Xprod
-Xint
-Xprod
-Xprod
-Xjit <-------
vmArguments.ArgumentDumper
JVMJ9VM004E Cannot load library required by: -Xjit <-----
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Exception in thread "main" java.lang.AssertionError: Target process failed
        at org.testng.Assert.fail(Assert.java:94)
        at vmArguments.VmArgumentTests.runProcess(VmArgumentTests.java:1588)
        at vmArguments.VmArgumentTests.runAndGetArgumentList(VmArgumentTests.java:1547)
        at vmArguments.VmArgumentTests.testXprod(VmArgumentTests.java:569)
...

I double-checked the AttachAPI related test results on X86_64 against the results on RISC-V and it turns out both of them ended up with same failures in the same test cases:

FAILED test targets:
        TestAttachAPI_0
        TestSunAttachClasses_0
        TestAttachErrorHandling_0
        TestManagementAgent_0

TestTargetResult_riscv_Java8andUp.txt
TestTargetResult_x86_64_Java8andUp.txt

So, the test failures with AttachAPI have nothing to do with RISC-V, which should be excluded in our case.

In addition, the GC related test failures might be caused by the VM option specified on the comment line (environment variable) which overwrote the GC options in the corresponding playlist.xml in the test suite. So I need to double-check to see whether it works with the correct setting.

I am currently getting started to investigating the issue with the system dump and it turns out it failed to extract the RISC-V data as there is no DDR code supporting RISC-V.

Given that the DDR code might related to hardware specifics (e.g. registers), I will try to add code & placeholders (if it is JIT specific) to see how far it goes on the emulator (QEMU). To achieve so, I will first try to add changes to a X86_64 build (this is because any build with DDR support is capable of manipulating DDR data regardless of CPU & platform / Meanwhile, a build with DDR support on RISC-V must be compiled inside Fedora/QEMU, which seems extremely time-consuming for troubleshooting ) to see whether it works. If everything works fine (on VM side without JIT involved), I will move forward to generate a build inside Fedora/QEMU to verify the results.

I already finished running all functional tests (sanity & extended) , excluding all JIT specific (JIT_test, cmdLineTester_decompilationTests, etc) , part of GC tests (no Metronome GC support on RISC-V, etc), and some test failures irrelevant to RISC-V (e.g. these test ended up with the same failure on Ubuntu/AMD64). Now investigating two test failures to see what happened:

FAILED test targets:
        ContendedFieldsTests_90_1
        TestFileLocking_0

Isolated one of the failing methods on RISC-V in TestFileLocking_0as follows:
e.g.

testCB_CB begin
java.lang.reflect.InaccessibleObjectException: 
Unable to make public openj9.internal.tools.attach.target.FileLock(java.lang.String,int) 
accessible: module java.base does not "exports openj9.internal.tools.attach.target" 
to unnamed module @3a38b48a
        at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:340)
        at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:280)
        at java.base/java.lang.reflect.Constructor.checkCanSetAccessible(Constructor.java:189)
        at java.base/java.lang.reflect.Constructor.setAccessible(Constructor.java:182)
        at fileLock.NativeFileLock.<init>(NativeFileLock.java:42)
        at fileLock.GenericFileLock.lockerFactory(GenericFileLock.java:66)
        at fileLock.TestFileLocking.contendForLock(TestFileLocking.java:118)
        at fileLock.TestFileLocking.testCB_CB(TestFileLocking.java:319)
        at fileLock.TestFileLocking.main(TestFileLocking.java:534)

against the output of the latest JDK11/OpenJ9 build (Ubuntu /X86_64):

testCB_CB begin
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by fileLock.NativeFileLock (file:/root/jchau/temp/compare_test/) to constructor openj9.internal.tools.attach.target.FileLock(java.lang.String,int)
WARNING: Please consider reporting this to the maintainers of fileLock.NativeFileLock
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
[TestFileLocking] [ERROR] Locker process started
[TestFileLocking] [ERROR] blocking=true
[TestFileLocking] [ERROR] WARNING: An illegal reflective access operation has occurred
[TestFileLocking] [ERROR] WARNING: Illegal reflective access by fileLock.NativeFileLock (file:/root/jchau/temp/compare_test/) to constructor openj9.internal.tools.attach.target.FileLock(java.lang.String,int)
[TestFileLocking] [ERROR] WARNING: Please consider reporting this to the maintainers of fileLock.NativeFileLock
[TestFileLocking] [ERROR] WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
[TestFileLocking] [ERROR] WARNING: All illegal access operations will be denied in a future release
[TestFileLocking] [ERROR] PROGRESS_WAITING
[TestFileLocking] [ERROR] Locker process blocking lock on /tmp/lockDir/lockFile succeeded
[TestFileLocking] [ERROR] PROGRESS_LOCKED
[TestFileLocking] [ERROR] received stop
[TestFileLocking] [ERROR] Locker process stopping
[TestFileLocking] [ERROR] closing the lock file
[TestFileLocking] [ERROR] PROGRESS_TERMINATED
testCB_CB ok

It looks like there was something wrong in JDK11 when checking the accessibility of openj9.internal.tools.attach.target, in which case it was supposed to trigger warning instead of directly throwing out exception. Need to add printing messages to figure out why it failed with my changes.

Investigation shows the test case testCB_CB in /Java8andUp/src/org/openj9/test/fileLock/TestFileLocking.java and checks the accessibility of openj9.internal.tools.attach.target and then launch another JVM process to run the test after that. The test case passed with the cross-build by explicitly adding --add-exports java.base/openj9.internal.tools.attach.target=ALL-UNNAMED on the command line and in the jvm options of the newly triggered JVM process (this process also checks the accessibility of openj9.internal.tools.attach.target in code / the test case checks it twice)

[jincheng@stage4 testjdk]$ ../jdk11_images_cross/jdk/bin/java  
--add-exports java.base/openj9.internal.tools.attach.target=ALL-UNNAMED 
-cp .:testng.jar:fileLock.jar   fileLock.TestFileLocking
testCB_CB begin
cmdLineBuffer = /home/jincheng/RISCV_OPENJ9/jdk11_images_cross/jdk/bin/java 
-Xint -Dcom.ibm.tools.attach.enable=yes 
--add-exports java.base/openj9.internal.tools.attach.target=ALL-UNNAMED 
-classpath .:testng.jar:fileLock.jar fileLock.Locker lockFile native_doblocking <--- the second JVM process

[TestFileLocking] [ERROR] Locker process started
[TestFileLocking] [ERROR] blocking=true
[TestFileLocking] [ERROR] PROGRESS_WAITING
[TestFileLocking] [ERROR] Locker process blocking lock on /tmp/lockDir/lockFile succeeded
[TestFileLocking] [ERROR] PROGRESS_LOCKED
[TestFileLocking] [ERROR] received stop
[TestFileLocking] [ERROR] Locker process stopping
[TestFileLocking] [ERROR] closing the lock file
[TestFileLocking] [ERROR] PROGRESS_TERMINATED
testCB_CB ok

Meanwhile, the test case passed with the build totally compiled inside Fedora/QEMU without specifying the --add-exports option.

[jincheng@stage4 testcase]$ uname -a
Linux stage4.fedoraproject.org 4.19.0-rc8 #1 SMP Wed Oct 17 15:11:25 UTC 2018 riscv64 riscv64 riscv64 GNU/Linux
[jincheng@stage4 testcase]$ ../images_jdk11_native_v1/jdk/bin/java  -cp .:testng.jar:fileLock.jar   fileLock.TestFileLocking
testCB_CB begin
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by fileLock.NativeFileLock (file:/home/jincheng/RISCV_OPENJ9/testcase/fileLock.jar) to constructor openj9.internal.tools.attach.target.FileLock(java.lang.String,int)
WARNING: Please consider reporting this to the maintainers of fileLock.NativeFileLock
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
[TestFileLocking] [ERROR] Locker process started
[TestFileLocking] [ERROR] blocking=true
[TestFileLocking] [ERROR] WARNING: An illegal reflective access operation has occurred
[TestFileLocking] [ERROR] WARNING: Illegal reflective access by fileLock.NativeFileLock (file:/home/jincheng/RISCV_OPENJ9/testcase/fileLock.jar) to constructor openj9.internal.tools.attach.target.FileLock(java.lang.String,int)
[TestFileLocking] [ERROR] WARNING: Please consider reporting this to the maintainers of fileLock.NativeFileLock
[TestFileLocking] [ERROR] WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
[TestFileLocking] [ERROR] WARNING: All illegal access operations will be denied in a future release
[TestFileLocking] [ERROR] PROGRESS_WAITING
[TestFileLocking] [ERROR] Locker process blocking lock on /tmp/lockDir/lockFile succeeded
[TestFileLocking] [ERROR] PROGRESS_LOCKED
[TestFileLocking] [ERROR] received stop
[TestFileLocking] [ERROR] Locker process stopping
[TestFileLocking] [ERROR] closing the lock file
[TestFileLocking] [ERROR] PROGRESS_TERMINATED
testCB_CB ok

So it means the problem only occurs on the cross-build.

Given that exporting OpenJ9 specific packages to all unnamed modules is granted by default, I suspect there is missing script code in JDK11 to address this situation during cross-building the JDK.

Further investigation shows the problem has nothing to do with the existing changes on RISC-V but the specified build-jdk itself.

To be specific, a normal compilation process needs a build-jdk in the final stage to populate the legacy java 8 packages info (src/java.base/share/classes/jdk/internal/module/jdk8_packages.dat) to the module system of the build being compiled in which case there is no need to read jdk8_packages.dat again to determine the exported packages at run time. That means the java 8 package data comes from the jdk8_packages.dat stored in build-jdk instead of the compiled build.

For local compilations, there is no need to specify a build-jdk as a minimal JDK with the same current source will be automatically created to populate java 8 packages data to the compiled build.
For cross-compilations, a build-jdk (a jdk running on the local system) must be specified to replace the minimal JDK to populate java 8 packages data as the minimal build only runs on the target system.

Originally, there was no difference between jdk8_packages.dat on OpenJDK11/OpenJ9 and on OpenJDK11/Hotspot until it was modified later last year to add OpenJ9 specific packages.
jdk8_packages_in_jdk11_hotspot_dat.txt
jdk8_packages_in_jdk11_openj9_dat.txt
and also was updated last month to replace com.ibm.tools.attach.target with openj9.internal.tools.attach.target.

The problem in our case is it used an older version of build-jdk (without openj9.internal.tools.attach.target in jdk8_packages.dat ) to populate the java 8 package data for the compiled build. Thus, there is no way to determine the exported status for openj9.internal.tools.attach.target at the run time.

To avoid the issue there, only a latest version of OpenJDK11/OpenJ9 (including openj9.internal.tools.attach.target jdk8_packages.dat) or a local build compiled with the same current source must be specified as thebuild-jdk for the cross-compilation.

Now keep investigating the rest of failing test cases detected previously.

I am currently investigating the failing test case testContentionGroupsAndClass in ContendedFieldsTests which failed with -XX:-RestrictContended specified on the command line.

        FAILED: testContentionGroupsAndClass
        java.lang.AssertionError: org.openj9.test.contendedfields.TestClasses$ContGroupContClass 
calculated size 64 actual size 24 difference -40
            at org.testng.AssertJUnit.fail(AssertJUnit.java:59)
            at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:24)
---> at org.openj9.test.contendedfields.FieldUtilities.checkObjectSize(FieldUtilities.java:67)
            at org.openj9.test.contendedfields.ContendedFieldsTests.testContentionGroupsAndClass(ContendedFieldsTests.java:194)

against the failing class at /Java8andUp/src_110_up/org/openj9/test/contendedfields/TestClasses.java

    @Contended
    static class ContGroupContClass {
        @Contended("g1")
        public int intField1;
        @Contended("g1")
        public int intField2;
        @Contended("g1")
        public int intField3;
    }

Looking at the test code in ContendedFieldsTests:

    public void testContentionGroupsAndClass() {
        Object testObjects[] = {new ContGroupContClass(), new ContGroupsContClass()};
        for (Object testObject: testObjects) {
---> FieldUtilities.checkObjectSize(testObject, LOCKWORD_SIZE, paddingIncrement);
            FieldUtilities.checkFieldAccesses(testObject);
        }
    }

/Java8andUp/src/org/openj9/test/contendedfields/FieldUtilities.java
    public static void checkObjectSize(Object testObject, int hiddenFieldsSize, int padding) {
        final Class<? extends Object> testClass = testObject.getClass();
        long fieldsSize = calculateFieldsSize(testClass) + REFERENCE_SIZE /* header */ + hiddenFieldsSize;
        fieldsSize = Math.max(16, fieldsSize); /* minimum object size */
        fieldsSize = ((fieldsSize + OBJECT_ALIGNMENT - 1)/OBJECT_ALIGNMENT) * OBJECT_ALIGNMENT;
 ------------> fieldsSize is 24 bytes before adjusting with padding (paddingIncrement)
        if (padding > 0) { /* round up to a multiple of the cache size */
            fieldsSize = ((fieldsSize + padding - 1)/padding) * padding;
        }
 -------> if cache line size is supported and padding = 64bytes,
              then fieldsSize is 64 bytes before adjusted with padding (paddingIncrement)

        long actualSize = JavaAgent.getObjectSize(testObject); <---- 24 bytes
 -------> fieldsSize is 128 bytes on x86_64 with padding in code
 -------> fieldsSize remains 24 bytes on RISC-V/QEMU, the same one as before without padding

        long difference = actualSize - fieldsSize;
        final String msg = testClass.getName()+" calculated size "+fieldsSize+" actual size "+actualSize +" difference "+(difference);
        logger.debug(msg);
        if (difference != padding) { /* the actual may be one cache line size too big */
            assertTrue(msg, Math.abs(difference) == 0);
        }
    }

So the investigation above shows the -XX:-RestrictContended option doesn't have any impact on the actual size of class object size on Fedora/QEMU.

In OpenJ9, -XX:-RestrictContended depends on the cache line size (which needs to call j9sysinfo_get_cache_info) to do the padding adjustment as follows:

/runtime/vm/jvminit.c
static UDATA
protectedInitializeJavaVM(J9PortLibrary* portLibrary, void * userData)
{
...
    IDATA queryResult = 0;
    J9CacheInfoQuery cQuery = {0};
    cQuery.cmd = J9PORT_CACHEINFO_QUERY_LINESIZE;
    cQuery.level = 1;
    cQuery.cacheType = J9PORT_CACHEINFO_DCACHE;
    queryResult = j9sysinfo_get_cache_info(&cQuery); <----
    if (queryResult > 0) {
        vm->dCacheLineSize = (UDATA)queryResult; <-----
    } else {
        Trc_VM_contendedLinesizeFailed(queryResult);
    }

/runtime/vm/ObjectFieldInfo.hpp
    ObjectFieldInfo(J9JavaVM *vm, J9ROMClass *romClass):
        _cacheLineSize(0),
...
    {
        if (J9ROMCLASS_IS_CONTENDED(romClass)) {
            UDATA dCacheLineSize = vm->dCacheLineSize;
            if (dCacheLineSize > 0) {
                _cacheLineSize = (U_32) dCacheLineSize; <--------
               _useContendedClassLayout = true;
            }
        }
    }
...

According to the check at https://github.com/eclipse/openj9/issues/5058#issuecomment-524930948,
the cachedirectory doesn't exist at/sys/devices/system/cpu/cpu<N> on Fedora_riscv via QEMU,
in which case _cacheLineSize remains 0 (it means there is no padding in calculating the size of class object) and the class object remains unchanged without any padding.

@janvrany also helped to check on Debian/QEMU and ended up with the same result (no cache info via QEMU) but the cache info does exist on Debian booted via the U540 dev board, which means the issue has nothing to do with the OS image but the emulator itself (it doesn't support this feature via QEMU). Actually, Sifive-HiFive U540 RISC-V CPU is 64bytes in terms of cache line size.

Given that OpenJ9 depends on the directory /sys/devices/system/cpu/cpu<N>/cache in code to determine whether to do the padding job, I will add a piece of code to make it aware of the situation to avoid checking the cache info on RISC-V if the directory doesn't exist (e.g. booted via QEMU).

For now, these failing tests will be postponed till we receive the dev board to double-check.

I worked with @LinHu2016 to double-check two failing GC-related test cases: gcRegressionTestsand gcPolicyNogcTests (previously detected in sanity & extended tests), both of which share very similar issue with failure to throw out OOM /trigger GC when keeping allocating objects in a loop.

Investigation shows these cases are timer-based tests, which means OOM should be thrown out within a given time under the assumption that the tests run fast to allocate excessive objects in a given time span.

e. g. in gcRegressionTests

/cmdLineTests/gcRegressionTests/src/com/ibm/tests/garbagecollector/SpinAllocate.java
    /**
     * @param args Takes one argument:  number of seconds to spin for before terminating with a message that the test ran to completion.
     * This argument is required.  It must be in the range [1-60]
     */
    public static void main(String[] args)
    {
        if (1 == args.length)
        {
            int secondsToSpin = Integer.parseInt(args[0]);

   if ((secondsToSpin >= 1) && (secondsToSpin <= 60)) <--- the maximal value is 60secs
            {
   long finishTime = System.currentTimeMillis() + (secondsToSpin * 1000);
      while (System.currentTimeMillis() < finishTime)
                {
                    _objectHolder = new Object();
                }
                System.out.println("Test ran to completion");
            }

jdk/bin/java -Xcompressedrefs -XX:+UseCompressedOops -Xjit -Xgcpolicy:balanced 
-Xint   -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError 
-Xdump:none -Xms8m -Xmx8m -Xgc:fvtest=forceExcessiveAllocFailureAfter=5 
-verbose:gc -Xverbosegclog:foo.log -cp .:gcRegressionTests.jar  
garbagecollector.SpinAllocate 20 <----- OOM should be thrown out within 20secs

Given that there is no JIT support on RISC-V (even it might be still slow on the hardware as compared to x86_64), the tests on RISC-V ran very slow to when allocating objects, in which case it already ran out of time when the JVM was still allocating objects (not yet close to the specified threshold to trigger OOM)

jdk11_riscv64/bin/java -Xcompressedrefs -XX:+UseCompressedOops -Xjit -Xgcpolicy:balanced
 -Xint   -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError 
-Xdump:none -Xms8m -Xmx8m -Xgc:fvtest=forceExcessiveAllocFailureAfter=5 
-verbose:gc -Xverbosegclog:foo.log -cp .:gcRegressionTests.jar  
garbagecollector.SpinAllocate 20 <-- 20secs
...
objCount = 9878
objCount = 9879 <------ allocated objects not enough to trigger OOM
Test ran to completion

as compared to the same test on X86_64:

jdk11_x86_64/bin/java -Xcompressedrefs -XX:+UseCompressedOops -Xjit -Xgcpolicy:balanced -Xint   
-Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xdump:none -Xms8m -Xmx8m 
-Xgc:fvtest=forceExcessiveAllocFailureAfter=5 -verbose:gc -Xverbosegclog:foo.log -cp 
.:gcRegressionTests.jar  garbagecollector.SpinAllocate 20  <-- 20secs
...
objCount = 57443
objCount = 57444 <--- the count of objects triggering OOM on x86_64
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
         ...
    at garbagecollector.SpinAllocate.main(SpinAllocate.java:51)

Thus, the timer or count in the test code has to be assigned with a value big enough when allocating objects so as to trigger OOM on RISC-V.

jdk11_riscv64/bin/java -Xcompressedrefs -XX:+UseCompressedOops -Xjit -Xgcpolicy:balanced -Xint   
-Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xdump:none -Xms8m -Xmx8m 
-Xgc:fvtest=forceExcessiveAllocFailureAfter=5 -verbose:gc -Xverbosegclog:foo.log -cp 
.:gcRegressionTests.jar  garbagecollector.SpinAllocate 200 <--- 200 secs
...
objCount = 56933
objCount = 56934 <--- the count of objects triggering OOM on RISC-V
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        ...
    at garbagecollector.SpinAllocate.main(SpinAllocate.java:51)

From the investigation results above, the JVM needs much more time than expected to trigger OOM or reach the threshold for GC on RISC-V. So these tests need to be updated accordingly to deal with the situation.

With the newly added DDR code, the basic jdmpview & DDR specific commands work good as follows (the system dump was generated with a locally compiled build with DDR enabled on Fedora/QEMU):

[1] jdmpview specific commands:

jdk11_images_x86_64_rv64_11_11_2019/jdk/bin/jdmpview -core   core.20191110.051335.21531.0001.dmp
DTFJView version 4.29.5, using DTFJ version 1.12.29003
Loading image from DTFJ...

> info sys <------

Machine OS: Linux
Machine name:   stage4.fedoraproject.org
Machine IP address(es):
        fe80:0:0:0:5054:ff:fe12:3456
System memory:  8379387904
Dump creation time: 2019/11/10 00:13:35:266
Dump creation time (nanoseconds): 127157239519000
Java version:
JRE 11 Linux riscv-64 (build 11.0.5-internal+0-adhoc.jincheng.openj9-openjdk-jdk11)
...

> info proc <------
     Process ID:
      21531
     Native thread IDs:
      21562 21532 
     Command line:
      /.../jdk/bin/java -Xdump:system:events=throw,filte
     Java VM init options: 
      ...
     JIT was disabled for this runtime
     Environment variables:
      HOSTNAME=stage4.fedoraproject.org
      OLDPWD=/home/jincheng/RISCV_OPENJ9/testjdk

> info memory
JRE: 2421707888 bytes / 4645 allocations
|  
+--VM: 2421695672 bytes / 4632 allocations
|  |  
|  +--Classes: 70906104 bytes / 238 allocations
|  |  |  
|  |  +--Shared Class Cache: 67108960 bytes / 2 allocations
|  |  |  
|  |  +--Other: 3797144 bytes / 236 allocations
|  |  
|  +--Modules: 522544 bytes / 2109 allocations
...

> info mod
        /.../jdk/bin/java @ 0x10000, <no section information>
      /lib/ld-linux-riscv64-lp64d.so.1 @ 0x2000000000, <no section information>
         ...
      linux-vdso.so.1 @ 0x200001a000, sections:
       0x200001a120 - 0x200001a158, name: ".hash", size: 0x38
       0x200001a158 - 0x200001a198, name: ".gnu.hash", size: 0x40
           ...
      /.../jdk/lib/compressedrefs/libomrsig.so @ 0x200001c000, <no section information>
      /lib64/lp64d/libz.so.1 @ 0x2000026000, <no section information>
      /lib64/lp64d/libpthread.so.0 @ 0x200003b000, <no section information>
       ...

> info class java/lang/String
name = java/lang/String
    ID = 0x48f00    superID = 0x3d000    
    classLoader = 0xfff221c8    modifiers: public final 
    number of instances:     2731
    total size of instances on the heap: 43696 bytes
Inheritance chain....
    java/lang/Object
       java/lang/String
...
Fields......
      static fields for "java/lang/String"
        private static final long serialVersionUID = -6849794470754667710 (0xa0f0a4387a3bb342)
        static final boolean enableCompression = false
...
Methods......

Bytecode range(s): 200faf4d54 -- 200faf4d7b:  private void checkLastChar(char)
Bytecode range(s): 200faf4d9c -- 200faf4da9:  byte coder()
Bytecode range(s): 200faf4dcc -- 200faf4e16:  void getBytes(byte[], int, byte)
Bytecode range(s): 200faf4e40 -- 200faf4e6f:  static void checkIndex(int, int)
Bytecode range(s): 200faf4e90 -- 200faf4ebf:  static void checkOffset(int, int)
...

[2] DDR specific commands:

> !classforname   *String <-----------
Searching for classes named '*String' in VM=200400d2b0
!j9class 0x0000000000048F00 named java/lang/String
Found 1 class(es) named *String

> !j9class 0x0000000000048F00
J9Class at 0x48f00 {
  Fields for J9Class:
    0x0: UDATA eyecatcher = 0x0000000099669966 (2573637990)
    0x8: struct J9ROMClass* romClass = !j9romclass 0x000000200FAF3E78
    0x10: void** superclasses = !j9x 0x0000000000047DC8
    0x18: UDATA classDepthAndFlags = 0x00000000020E0001 (34471937)
    0x20: U32 classDepthWithFlags = 0x00000000 (0)
    0x24: U32 classFlags = 0x00000000 (0)
...
>  !j9romclass 0x000000200FAF3E78
J9ROMClass at 0x200faf3e78 {
  Fields for J9ROMClass:
    0x0: U32 romSize = 0x000078F0 (30960)
    0x4: U32 singleScalarStaticCount = 0x00000007 (7)
    0x8: J9SRP(J9UTF8) className = !j9utf8 0x000000200FAE33FC
    0xc: J9SRP(J9UTF8) superclassName = !j9utf8 0x000000200FAE1590
    0x10: U32 modifiers = 0x00000031 (49)
...
> !romclassforname   *String   <-----------
Searching for ROMClasses named '*String' in VM=200400d2b0
!j9romclass 0x000000200FE21F20 named java/util/Formatter$FixedString
!j9romclass 0x000000200FE221B8 named java/util/Formatter$FormatString
!j9romclass 0x000000200FDA6EC0 named java/security/Provider$UString
!j9romclass 0x000000200FAF3E78 named java/lang/String
Found 4 ROMClass(es) named *String

> !dumpromclasslinear 0x000000200FAF3E78    <-----------
ROM Class 'java/lang/String' at 0x000000200FAF3E78
0x000000200FAF3E78-0x000000200FAF3F28 [            (SECTION) romHeader                    ]    176 bytes
0x000000200FAF3F28-0x000000200FAF4748 [            (SECTION) constantPool                 ]   2080 bytes
0x000000200FAF4748-0x000000200FAF4854 [            (SECTION) fields                       ]    268 bytes
0x000000200FAF4854-0x000000200FAF4860 [            (SECTION) interfacesSRPs               ]     12 bytes
0x000000200FAF4860-0x000000200FAF486C [            (SECTION) innerClassesSRPs             ]     12 bytes
0x000000200FAF486C-0x000000200FAF4D40 [            (SECTION) cpNamesAndSignaturesSRPs     ]   1236 bytes
0x000000200FAF4D40-0x000000200FAF97E8 [            (SECTION) methods                      ]  19112 bytes

> !dumpromclass  0x000000200FAF3E78  <-----------
ROM Size: 0x78f0 (30960)
Class Name: java/lang/String
Superclass Name: java/lang/Object
Source File Name: String.java
Generic Signature: Ljava/lang/Object;Ljava/io/Serializable;Ljava/lang/Comparable<Ljava/lang/String;>;Ljava/lang/CharSequence;
Sun Access Flags (0x31): public final super 
J9  Access Flags (0xe810000): (final fields) (preverified) 
Class file version: 55.0
Instance Shape: 0xe
Intermediate Class Data (30960 bytes): 200faf3e78
Maximum Branch Count: 32
Interfaces (3):
  java/io/Serializable
  java/lang/Comparable
  java/lang/CharSequence
Declared Classes (3):
   java/lang/String$UnsafeHelpers   java/lang/String$StringCompressionFlag   java/lang/String$CaseInsensitiveComparatorFields (20):
  Name: serialVersionUID
  Signature: J
  Access Flags (7c001a): private static final 

CP Shape Description:
  . S S S 
  ...
  C C v v 
...
Methods (148):
  Name: checkLastChar
  Signature: (C)V
  Access Flags (10040002): private 
  Max Stack: 3
  Argument Count: 2
  Temp Count: 0

    0 iload1 
    1 bipush 92
    3 ificmpne 19
    6 newdup 20 java/lang/IllegalArgumentException
...

> !methodforname   *Stack0   <-----------
Searching for methods named '*Stack0' in VM=0x000000200400D2B0...
!j9method 0x000000000016B348 --> StackTest.Stack0()V
Found 1 method(s) named *Stack0


> !j9method 0x000000000016B348
J9Method at 0x16b348 {
  Fields for J9Method:
    0x0: U8* bytecodes = !j9x 0x00000020042205F8
    0x8: struct J9ConstantPool* constantPool = !j9constantpool 0x000000000016B1D0 (flags = 0x0)
    0x10: void* methodRunAddress = !j9x 0x0000000000000007
    0x18: void* extra = !j9x 0x0000000000000001
}
Signature: StackTest.Stack0()V !bytecodes 0x000000000016B348
...

> !bytecodes 0x000000000016B348  <-----------
  Name: Stack0
  Signature: ()V
  Access Flags (50001): public 
  Max Stack: 1
  Argument Count: 1
  Temp Count: 0

    0 aload0 
    1 invokevirtual 5 StackTest.Stack1()V
    4 return0 

  Debug Info:
...
> !threads <-------
    !stack 0x0003c200   !j9vmthread 0x0003c200  !j9thread 0x2004006f20  tid 0x541c (21532) // (main)
    !stack 0x0011cf00   !j9vmthread 0x0011cf00  !j9thread 0x2004007910  tid 0x542e (21550) // (Common-Cleaner)
    !stack 0x00137400   !j9vmthread 0x00137400  !j9thread 0x20042156a8  tid 0x5430 (21552) // (Concurrent Mark Helper)
    !stack 0x00139800   !j9vmthread 0x00139800  !j9thread 0x2004215ba0  tid 0x5431 (21553) // (GC Slave)
    !stack 0x0013bd00   !j9vmthread 0x0013bd00  !j9thread 0x2004216440  tid 0x5432 (21554) // (GC Slave)
    !stack 0x0013e100   !j9vmthread 0x0013e100  !j9thread 0x2004216938  tid 0x5433 (21555) // (GC Slave)
    !stack 0x00140500   !j9vmthread 0x00140500  !j9thread 0x2004216e30  tid 0x5434 (21556) // (GC Slave)
    !stack 0x00142a00   !j9vmthread 0x00142a00  !j9thread 0x2004217810  tid 0x5435 (21557) // (GC Slave)
    !stack 0x00144e00   !j9vmthread 0x00144e00  !j9thread 0x2004217d08  tid 0x5436 (21558) // (GC Slave)
    !stack 0x00147200   !j9vmthread 0x00147200  !j9thread 0x2004218200  tid 0x5437 (21559) // (GC Slave)
    !stack 0x0016a400   !j9vmthread 0x0016a400  !j9thread 0x200421a3d8  tid 0x5439 (21561) // (Attach API wait loop)

The failing commands (e.g. !stack/stackslots, info thread ) are mostly related to the stackframes:

> !stack 0x0003c200
Error executing DDR command !stack 0x0003c200 : null
Caused by: java.lang.ExceptionInInitializerError
    at java.base/java.lang.J9VMInternals.ensureError(J9VMInternals.java:193)
    at java.base/java.lang.J9VMInternals.recordInitializationFailure(J9VMInternals.java:182)
    at com.ibm.j9ddr.vm29.tools.ddrinteractive.commands.StackWalkCommand.run(StackWalkCommand.java:91)
    at com.ibm.j9ddr.tools.ddrinteractive.Context.tryCommand(Context.java:229)
    at com.ibm.j9ddr.tools.ddrinteractive.Context.execute(Context.java:202)
    at com.ibm.j9ddr.tools.ddrinteractive.DDRInteractive.execute(DDRInteractive.java:356)
    at com.ibm.j9ddr.command.CommandReader.processLine(CommandReader.java:79)
    at com.ibm.j9ddr.tools.ddrinteractive.DDRInteractive.processLine(DDRInteractive.java:331)
    ... 16 more
Caused by: java.lang.IllegalArgumentException: Unsupported platform <------
    at com.ibm.j9ddr.vm29.j9.stackwalker.StackWalkerUtils.<clinit>(StackWalkerUtils.java:100)
    ... 22 more

 info thread <------
 process id: 21531

  thread id: 21532
   registers:
   native stack sections:
   native stack frames:
   properties:
   associated Java thread: 
    name:          main
    Thread object: java/lang/Thread @ 0xfff03ff8
    Native info:   !j9vmthread 0x3c200  !stack 0x3c200
    Daemon:        false
    ID:            1 (0x1)
    Priority:      5
    Thread.State:  RUNNABLE 
    JVMTI state:   ALIVE RUNNABLE 
    Java stack frames: Exception in thread "main" java.lang.ExceptionInInitializerError
    at java.base/java.lang.J9VMInternals.ensureError(J9VMInternals.java:193)
    at java.base/java.lang.J9VMInternals.recordInitializationFailure(J9VMInternals.java:182)
    at com.ibm.j9ddr.vm29.j9.stackwalker.StackWalker$StackWalker_29_V0.walkStackFrames(StackWalker.java:197)
    at com.ibm.j9ddr.vm29.j9.stackwalker.StackWalker$StackWalker_29_V0.walkStackFrames(StackWalker.java:166)
    at com.ibm.j9ddr.vm29.j9.stackwalker.StackWalker.walkStackFrames(StackWalker.java:97)
    at com.ibm.j9ddr.vm29.view.dtfj.java.DTFJJavaThread.walkStack(DTFJJavaThread.java:370)
    at com.ibm.j9ddr.vm29.view.dtfj.java.DTFJJavaThread.getStackFrames(DTFJJavaThread.java:220)
    at openj9.dtfjview/com.ibm.jvm.dtfjview.commands.infocommands.InfoThreadCommand.printJavaStackFrameInfo(InfoThreadCommand.java:692)
    at openj9.dtfjview/com.ibm.jvm.dtfjview.commands.infocommands.InfoThreadCommand.printJavaThreadInfo(InfoThreadCommand.java:604)
Caused by: java.lang.IllegalArgumentException: Unsupported platform <------
    at com.ibm.j9ddr.vm29.j9.stackwalker.StackWalkerUtils.<clinit>(StackWalkerUtils.java:100)
    at com.ibm.j9ddr.vm29.tools.ddrinteractive.commands.StackWalkCommand.run(StackWalkCommand.java:91)
    at com.ibm.j9ddr.tools.ddrinteractive.Context.tryCommand(Context.java:229)

Need to figure out what happened to StackWalkerUtils.java and what else is missing in other code involved.

I've fixed the issue with stackframes in DDR and verified the result with a X86_64 build (compiled with the same source plus the fix).

The simple test code as follows:

public class StackTest
{
    public void Stack0(){
         Stack1();
    }
    public void Stack1(){
         Stack2();
    }
    public void Stack2(){
         Stack3();
    }
    public void Stack3(){
        Properties jvm = System.getProperties();
        jvm.list(System.out);
        throw new IllegalArgumentException("test Exception");
    }
    public static void main(String[] args)
    {
        StackTest  test = new StackTest();
        test.Stack0();
    }
}

The core dump below was generated by a RISC-V build compiled on Fedora/QEMU (without the fix)

jdk11_images_x86_64/jdk/bin/jdmpview -core   core.20191110.051335.21531.0001.dmp
DTFJView version 4.29.5, using DTFJ version 1.12.29003
Loading image from DTFJ...

For a list of commands, type "help"; for how to use "help", type "help help"
Available contexts (* = currently selected context) : 

Source : file:/.../core.20191110.051335.21531.0001.dmp
    *0 : PID: 21562 : JRE 11 Linux riscv-64 (build 11.0.5-internal+0-adhoc.jincheng.openj9-openjdk-jdk11)

> !threads
    !stack 0x0003c200   !j9vmthread 0x0003c200  !j9thread 0x2004006f20  tid 0x541c (21532) // (main) <-----
    !stack 0x0011cf00   !j9vmthread 0x0011cf00  !j9thread 0x2004007910  tid 0x542e (21550) // (Common-Cleaner)
    !stack 0x00137400   !j9vmthread 0x00137400  !j9thread 0x20042156a8  tid 0x5430 (21552) // (Concurrent Mark Helper)
    !stack 0x00139800   !j9vmthread 0x00139800  !j9thread 0x2004215ba0  tid 0x5431 (21553) // (GC Slave)
...
> !stack 0x0003c200 <------
<3c200>                             Generic special frame
<3c200>     !j9method 0x000000000016B3A8   StackTest.Stack3()V
<3c200>     !j9method 0x000000000016B388   StackTest.Stack2()V
<3c200>     !j9method 0x000000000016B368   StackTest.Stack1()V
<3c200>     !j9method 0x000000000016B348   StackTest.Stack0()V
<3c200>     !j9method 0x000000000016B3C8   StackTest.main([Ljava/lang/String;)V
<3c200>                             JNI call-in frame
<3c200>                             Native method frame

> !stackslots  0x0003c200
<3c200> *** BEGIN STACK WALK, flags = 00400001 walkThread = 0x000000000003C200 ***
<3c200>     ITERATE_O_SLOTS
<3c200>     RECORD_BYTECODE_PC_OFFSET
<3c200> Initial values: walkSP = 0x00000000000FB2D0, PC = 0x0000000000000001, literals = 0x0000000000000000, A0 = 0x00000000000FB2E8, j2iFrame = 0x0000000000000000, ELS = 0x0000002000A1F550, decomp = 0x0000000000000000
<3c200> Generic special frame: bp = 0x00000000000FB2E8, sp = 0x00000000000FB2D0, pc = 0x0000000000000001, cp = 0x0000000000000000, arg0EA = 0x00000000000FB2E8, flags = 0x0000000000000000
<3c200> Bytecode frame: bp = 0x00000000000FB300, sp = 0x00000000000FB2F0, pc = 0x00000020042206A0, cp = 0x000000000016B1D0, arg0EA = 0x00000000000FB310, flags = 0x0000000000000000
<3c200>     Method: StackTest.Stack3()V !j9method 0x000000000016B3A8 <-----------
<3c200>     Bytecode index = 36
<3c200>     Using local mapper
<3c200>     Locals starting at 0x00000000000FB310 for 0x0000000000000002 slots
<3c200>         I-Slot: a0[0x00000000000FB310] = 0x00000000FFF78810
<3c200>         I-Slot: t1[0x00000000000FB308] = 0x00000000FFF041B0
<3c200> Bytecode frame: bp = 0x00000000000FB328, sp = 0x00000000000FB318, pc = 0x0000002004220651, cp = 0x000000000016B1D0, arg0EA = 0x00000000000FB330, flags = 0x0000000000000000
<3c200>     Method: StackTest.Stack2()V !j9method 0x000000000016B388
<3c200>     Bytecode index = 1
<3c200>     Using local mapper
<3c200>     Locals starting at 0x00000000000FB330 for 0x0000000000000001 slots
<3c200>         I-Slot: a0[0x00000000000FB330] = 0x00000000FFF78810
<3c200> Bytecode frame: bp = 0x00000000000FB348, sp = 0x00000000000FB338, pc = 0x0000002004220625, cp = 0x000000000016B1D0, arg0EA = 0x00000000000FB350, flags = 0x0000000000000000
<3c200>     Method: StackTest.Stack1()V !j9method 0x000000000016B368
<3c200>     Bytecode index = 1
<3c200>     Using local mapper
<3c200>     Locals starting at 0x00000000000FB350 for 0x0000000000000001 slots
<3c200>         I-Slot: a0[0x00000000000FB350] = 0x00000000FFF78810
<3c200> Bytecode frame: bp = 0x00000000000FB368, sp = 0x00000000000FB358, pc = 0x00000020042205F9, cp = 0x000000000016B1D0, arg0EA = 0x00000000000FB370, flags = 0x0000000000000000
<3c200>     Method: StackTest.Stack0()V !j9method 0x000000000016B348
<3c200>     Bytecode index = 1
<3c200>     Using local mapper
<3c200>     Locals starting at 0x00000000000FB370 for 0x0000000000000001 slots
<3c200>         I-Slot: a0[0x00000000000FB370] = 0x00000000FFF78810
<3c200> Bytecode frame: bp = 0x00000000000FB388, sp = 0x00000000000FB378, pc = 0x00000020042206D5, cp = 0x000000000016B1D0, arg0EA = 0x00000000000FB398, flags = 0x0000000000000000
<3c200>     Method: StackTest.main([Ljava/lang/String;)V !j9method 0x000000000016B3C8
<3c200>     Bytecode index = 9
<3c200>     Using local mapper
<3c200>     Locals starting at 0x00000000000FB398 for 0x0000000000000002 slots
<3c200>         I-Slot: a0[0x00000000000FB398] = 0x00000000FFF78800
<3c200>         I-Slot: t1[0x00000000000FB390] = 0x00000000FFF78810
<3c200> JNI call-in frame: bp = 0x00000000000FB3C0, sp = 0x00000000000FB3A0, pc = 0x0000002000B68CD0, cp = 0x0000000000000000, arg0EA = 0x00000000000FB3C0, flags = 0x0000000000020000
<3c200>     New ELS = 0x0000000000000000
<3c200> JNI native method frame: bp = 0x00000000000FB448, sp = 0x00000000000FB3C8, pc = 0x0000000000000007, cp = 0x0000000000000000, arg0EA = 0x00000000000FB448, flags = 0x000000000000000C
<3c200>     Object pushes starting at 0x00000000000FB3C8 for 12 slots
<3c200>         Push[0x00000000000FB3C8] = 0x00000000FFF78800
<3c200>         Push[0x00000000000FB3D0] = 0x00000000832409F8
<3c200>         Push[0x00000000000FB3D8] = 0x000000008326D418
<3c200> <end of stack>
<3c200> *** END STACK WALK (rc = NONE) ***

>  !j9method 0x000000000016B3A8  <--------------------                         
J9Method at 0x16b3a8 {
  Fields for J9Method:
    0x0: U8* bytecodes = !j9x 0x000000200422067C
    0x8: struct J9ConstantPool* constantPool = !j9constantpool 0x000000000016B1D0 (flags = 0x0)
    0x10: void* methodRunAddress = !j9x 0x0000000000000007
    0x18: void* extra = !j9x 0x0000000000000001
}
Signature: StackTest.Stack3()V !bytecodes 0x000000000016B3A8
ROM Method: !j9rommethod 0x0000002004220668
Next Method: !j9method 0x000000000016B3C8
> !bytecodes 0x000000000016B3A8
  Name: Stack3
  Signature: ()V
  Access Flags (50001): public 
  Max Stack: 3
  Argument Count: 1
  Temp Count: 1

    0 getstatic 8 java/lang/System.out Ljava/io/PrintStream;
    3 ldc 1 (java.lang.String) Following are the JVM information of your OS :
    5 invokevirtual 9 java/io/PrintStream.println(Ljava/lang/String;)V
    8 getstatic 8 java/lang/System.out Ljava/io/PrintStream;
   11 ldc 2 (java.lang.String) 
   13 invokevirtual 9 java/io/PrintStream.println(Ljava/lang/String;)V
   16 invokestatic 10 java/lang/System.getProperties()Ljava/util/Properties;
   19 astore1 
   20 aload1 
   21 getstatic 8 java/lang/System.out Ljava/io/PrintStream;
   24 invokevirtual 11 java/util/Properties.list(Ljava/io/PrintStream;)V
   27 newdup 12 java/lang/IllegalArgumentException
   30 dup 
   31 ldc 3 (java.lang.String) test Exception
   33 invokespecial 13 java/lang/IllegalArgumentException.<init>(Ljava/lang/String;)V
   36 athrow 

  Debug Info:
    Line Number Table (5):
      Line:    16 PC:     0
      Line:    17 PC:     8
      Line:    20 PC:    16
      Line:    21 PC:    20
      Line:    22 PC:    27

    Variables (0):

> info thread  <---------------------
 process id: 21531

  thread id: 21532
   registers:
   native stack sections:
   native stack frames:
   properties:
   associated Java thread: 
    name:          main
    Thread object: java/lang/Thread @ 0xfff03ff8
    Native info:   !j9vmthread 0x3c200  !stack 0x3c200
    Daemon:        false
    ID:            1 (0x1)
    Priority:      5
    Thread.State:  RUNNABLE 
    JVMTI state:   ALIVE RUNNABLE 
    Java stack frames: 
     bp: 0x00000000000fb310  method: void StackTest.Stack3()  source: StackTest.java:22
      objects: <no objects in this frame>
     bp: 0x00000000000fb330  method: void StackTest.Stack2()  source: StackTest.java:13
      objects: <no objects in this frame>
     bp: 0x00000000000fb350  method: void StackTest.Stack1()  source: StackTest.java:10
      objects: <no objects in this frame>
     bp: 0x00000000000fb370  method: void StackTest.Stack0()  source: StackTest.java:7
      objects: <no objects in this frame>
     bp: 0x00000000000fb398  method: void StackTest.main(String[])  source: StackTest.java:27
      objects: <no objects in this frame>

Given that the RISC-V build generating the system dump above didn't include the fix, I need to compile another build with the fix on Fedora/QEMU so as to double-check the results with this build on Fedora/RISC-V.

The build with the fix (compiled on Fedora/QEMU) works good in running stackframe-related DDR commands in jdmpview as follows:

[jincheng@stage4 ~]$ uname -a
Linux stage4.fedoraproject.org 4.19.0-rc8 #1 
SMP Wed Oct 17 15:11:25 UTC 2018 
riscv64 riscv64 riscv64 GNU/Linux

[jincheng@stage4 ]$ .../jdk/bin/jdmpview -core  core.20191114.230459.2117.0001.dmp
DTFJView version 4.29.5, using DTFJ version 1.12.29003
Loading image from DTFJ...

For a list of commands, type "help"; for how to use "help", type "help help"
Available contexts (* = currently selected context) : 

Source : file:/.../ddr_test/core.20191114.230459.2117.0001.dmp
    *0 : PID: 2150 : JRE 11 Linux riscv-64 (build 11.0.5-internal+0-adhoc.jincheng.openj9-openjdk-jdk11)

> info thread <------
 process id: 2117

  thread id: 2119
   registers:
   native stack sections:
   native stack frames:
   properties:
   associated Java thread: 
    name:          main
    Thread object: java/lang/Thread @ 0xfff03ff8
    Native info:   !j9vmthread 0x3c200  !stack 0x3c200
    Daemon:        false
    ID:            1 (0x1)
    Priority:      5
    Thread.State:  RUNNABLE 
    JVMTI state:   ALIVE RUNNABLE 
    Java stack frames: 
     bp: 0x00000000000fb310  method: void StackTest.Stack3()  source: StackTest.java:22
      objects: <no objects in this frame>
     bp: 0x00000000000fb330  method: void StackTest.Stack2()  source: StackTest.java:13
      objects: <no objects in this frame>
     bp: 0x00000000000fb350  method: void StackTest.Stack1()  source: StackTest.java:10
      objects: <no objects in this frame>
     bp: 0x00000000000fb370  method: void StackTest.Stack0()  source: StackTest.java:7
      objects: <no objects in this frame>
     bp: 0x00000000000fb398  method: void StackTest.main(String[])  source: StackTest.java:27
      objects: <no objects in this frame>


> !threads  <------
    !stack 0x0003c200   !j9vmthread 0x0003c200  !j9thread 0x2004006f70  tid 0x847 (2119) // (main)
    !stack 0x0011cf00   !j9vmthread 0x0011cf00  !j9thread 0x2004007960  tid 0x85a (2138) // (Common-Cleaner)
    !stack 0x00137400   !j9vmthread 0x00137400  !j9thread 0x20041fc768  tid 0x85c (2140) // (Concurrent Mark Helper)
    !stack 0x00139800   !j9vmthread 0x00139800  !j9thread 0x20041fcc60  tid 0x85d (2141) // (GC Slave)
    !stack 0x0013bd00   !j9vmthread 0x0013bd00  !j9thread 0x20041fd280  tid 0x85e (2142) // (GC Slave)
    !stack 0x0013e100   !j9vmthread 0x0013e100  !j9thread 0x20041fd778  tid 0x85f (2143) // (GC Slave)
    !stack 0x00140500   !j9vmthread 0x00140500  !j9thread 0x20041fdc70  tid 0x860 (2144) // (GC Slave)
    !stack 0x00142a00   !j9vmthread 0x00142a00  !j9thread 0x20041fe650  tid 0x861 (2145) // (GC Slave)
    !stack 0x00144e00   !j9vmthread 0x00144e00  !j9thread 0x20041feb48  tid 0x862 (2146) // (GC Slave)
    !stack 0x00147200   !j9vmthread 0x00147200  !j9thread 0x20041ff040  tid 0x863 (2147) // (GC Slave)
    !stack 0x00172400   !j9vmthread 0x00172400  !j9thread 0x2004201298  tid 0x865 (2149) // (Attach API wait loop)

> !stack 0x0003c200 <----------------
<3c200>                             Generic special frame
<3c200>     !j9method 0x000000000016A7A8   StackTest.Stack3()V
<3c200>     !j9method 0x000000000016A788   StackTest.Stack2()V
<3c200>     !j9method 0x000000000016A768   StackTest.Stack1()V
<3c200>     !j9method 0x000000000016A748   StackTest.Stack0()V
<3c200>     !j9method 0x000000000016A7C8   StackTest.main([Ljava/lang/String;)V
<3c200>                             JNI call-in frame
<3c200>                             Native method frame

> !stackslots  0x0003c200 <------------
<3c200> *** BEGIN STACK WALK, flags = 00400001 walkThread = 0x000000000003C200 ***
<3c200>     ITERATE_O_SLOTS
<3c200>     RECORD_BYTECODE_PC_OFFSET
<3c200> Initial values: walkSP = 0x00000000000FB2D0, PC = 0x0000000000000001, literals = 0x0000000000000000, A0 = 0x00000000000FB2E8, j2iFrame = 0x0000000000000000, ELS = 0x0000002000A1F550, decomp = 0x0000000000000000
<3c200> Generic special frame: bp = 0x00000000000FB2E8, sp = 0x00000000000FB2D0, pc = 0x0000000000000001, cp = 0x0000000000000000, arg0EA = 0x00000000000FB2E8, flags = 0x0000000000000000
<3c200> Bytecode frame: bp = 0x00000000000FB300, sp = 0x00000000000FB2F0, pc = 0x0000002002CC62C0, cp = 0x000000000016A5D0, arg0EA = 0x00000000000FB310, flags = 0x0000000000000000
<3c200>     Method: StackTest.Stack3()V !j9method 0x000000000016A7A8
<3c200>     Bytecode index = 36
<3c200>     Using local mapper
...

and also double-checked the shareclasses specific commands as follows:

> !shrc <------------
!j9sharedclassconfig 0x00000020040BB060

!shrc stats [range|layer=<n>]                  -- Print cache stats
!shrc allstats [range|layer=<n>]               -- Print all cache contents
!shrc rcstats [range|layer=<n>]                -- Print romclass cache contents
!shrc cpstats [range|layer=<n>]                -- Print classpath cache contents
!shrc aotstats [range|layer=<n>]               -- Print aot cache contents
!shrc invaotstats [range|layer=<n>]            -- Print invalidated aot cache contents
!shrc orphanstats [range|layer=<n>]            -- Print orphan cache contents

>  !shrc rcstats  <------------
!j9sharedclassconfig 0x00000020040BB060

Meta data region to be used: 0x00000020160386C4..0x000000201607CFFC
1: 0x000000201607BE0C ROMCLASS: java/lang/Object at !j9romclass 0x00000020125ED000 !STALE!
    Index 0 in !shrc classpath 0x000000201607BE40
1: 0x000000201607BDE0 ROMCLASS: java/lang/J9VMInternals at !j9romclass 0x00000020125ED730 !STALE!
    Index 0 in !shrc classpath 0x000000201607BE40
1: 0x000000201607BDB4 ROMCLASS: com/ibm/oti/vm/VM at !j9romclass 0x00000020125EFDB0 !STALE!
    Index 0 in !shrc classpath 0x000000201607BE40
1: 0x000000201607BD88 ROMCLASS: java/lang/J9VMInternals$ClassInitializationLock at !j9romclass 0x00000020125F16B8 !STALE!
...
Cache contains 5682 classes, 371 orphans, 2 classpaths, 0 URLs, 0 tokens
0 AOT, 8 SCOPES, 7 BYTE data, 0 UNINDEXED DATA, 0 CHARARRAY, 3743 stale
stale bytes 11004852
0 JITPROFILE, 0 JITHINT
AOT data length 0 code length 0 metadata 0 total 0
JITPROFILE data length 0 metadata 0 
JITHINT data length 0 metadata 0 
ROMClass data 4202376 metadata 257428
SCOPE data 21929 metadata 96 total 22025
BYTE data 672 metadata 252 rwarea 0
UNINDEXEDBYTE data 0 metadata 0
CHARARRAY data 0 metadata 0
BYTEDATA Summary
    UNKNOWN 0  HELPER 0  POOL 0  AOTHEADER 0
    JCL 0  VM 0  ROMSTRING 0  ZIPCACHE 0  STARTUPHINTS 672
    JITHINT 0  AOTCLASSCHAIN 0 AOTTHUNK 0
DEBUG Area Summary
    LineNumberTable bytes    : 334422
    LocalVariableTable bytes : 536906

> !shrc allstats  <------------
!j9sharedclassconfig 0x00000020040BB060

Meta data region to be used: 0x00000020160386C4..0x000000201607CFFC
1: 0x000000201607BFC0 SCOPE !j9utf8 0x000000201607BFC8 -Xoptionsfile=/home/jincheng/RISCV_OPENJ9/openj9-openjdk-jdk11/build/linux-riscv64-normal-server-release/jdk/lib/options.default -Xlockword:mode=default,noLockword=java/lang/String,noLockword=java/util/MapEntry,noLockword=java/util/HashMap$Entry,noLockword=org/apache/harmony/luni/util/ModifiedMap$Entry,noLockword=java/util/Hashtable$Entry,noLockword=java/lang/invoke/MethodType,noLockword=java/lang/invok
...
7: 0x00000020160386C4 ROMCLASS: java/util/Collections$EmptyIterator at !j9romclass 0x0000002012783CC0 
    Index 0 in !shrc classpath 0x000000201603FD44

Cache contains 5682 classes, 371 orphans, 2 classpaths, 0 URLs, 0 tokens
0 AOT, 8 SCOPES, 7 BYTE data, 0 UNINDEXED DATA, 0 CHARARRAY, 3743 stale
stale bytes 11004852
0 JITPROFILE, 0 JITHINT
AOT data length 0 code length 0 metadata 0 total 0
JITPROFILE data length 0 metadata 0 
JITHINT data length 0 metadata 0 
ROMClass data 4202376 metadata 257428
SCOPE data 21929 metadata 96 total 22025
BYTE data 672 metadata 252 rwarea 0
UNINDEXEDBYTE data 0 metadata 0
CHARARRAY data 0 metadata 0
BYTEDATA Summary
    UNKNOWN 0  HELPER 0  POOL 0  AOTHEADER 0
    JCL 0  VM 0  ROMSTRING 0  ZIPCACHE 0  STARTUPHINTS 672
    JITHINT 0  AOTCLASSCHAIN 0 AOTTHUNK 0
DEBUG Area Summary
    LineNumberTable bytes    : 334422
    LocalVariableTable bytes : 536906

> !shrc classpath 0x000000201603FD44   <------------
!j9sharedclassconfig 0x00000020040BB060

6: 0x000000201603FD44 CLASSPATH
   0)   0x0000000000000004: /home/jincheng/RISCV_OPENJ9/jdk11_images_native_ddr_11_14_2019/jdk/lib/modules  timestamp: 0x000000005DCBBA28

To verify whether !gpinfo & backtraceworks in gdb/debugging (related to signal handler), I need to compile another build on Fedora/QEMU with a piece of code in OpenJ9 to trigger a crash.

Already ran DDR specific test suites modularityddrtests& dumpromclasstests (failed previously due to missing DDR code) and all tests passed as follows:

[jincheng@stage4 TestConfig]$ make _cmdLineTester_modularityddrtests_1

Running make 4.2.1
set TEST_ROOT to /.../test/TestConfig/..
set JDK_VERSION to 11
set JDK_IMPL to openj9
set JVM_VERSION to openjdk11-openj9
set JCL_VERSION to latest
JAVA_HOME was originally set to /.../jdk
set JAVA_HOME to /.../jdk
set SPEC to linux_riscv64_cmprssptrs
Running cmdLineTester_modularityddrtests_1 ...
"/.../jdk/bin/java" -version
openjdk version "11.0.5-internal" 2019-10-15
OpenJDK Runtime Environment (build 11.0.5-internal+0-adhoc.jincheng.openj9-openjdk-jdk11)
Eclipse OpenJ9 VM (build riscv_openj9_v2_uma-0bf887188, 
JRE 11 Linux riscv-64-Bit Compressed References 20191112_000000 (JIT disabled, AOT disabled)
OpenJ9   - 0bf887188
OMR      - 78b1cb9c
JCL      - f22643e1cd based on jdk-11.0.5+10)

===============================================
Running test cmdLineTester_modularityddrtests_1 ...
===============================================
...

---TEST RESULTS---
Number of PASSED tests: 50 out of 50
Number of FAILED tests: 0 out of 50

cmdLineTester_modularityddrtests_1_PASSED

TEST TARGETS SUMMARY
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
PASSED test targets:
    cmdLineTester_modularityddrtests_1

TOTAL: 1   EXECUTED: 1   PASSED: 1   FAILED: 0   DISABLED: 0   SKIPPED: 0
ALL TESTS PASSED


[jincheng@stage4 TestConfig]$ make  _cmdLineTester_dumpromclasstests_0

Running make 4.2.1
set TEST_ROOT to /.../test/TestConfig/..
set JDK_VERSION to 11
set JDK_IMPL to openj9
set JVM_VERSION to openjdk11-openj9
set JCL_VERSION to latest
JAVA_HOME was originally set to /.../jdk
set JAVA_HOME to /.../jdk
set SPEC to linux_riscv64_cmprssptrs
Running cmdLineTester_dumpromclasstests_0 ...
"/.../jdk/bin/java" -version


---TEST RESULTS---
Number of PASSED tests: 3 out of 3
Number of FAILED tests: 0 out of 3

cmdLineTester_dumpromclasstests_0_PASSED


TEST TARGETS SUMMARY
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
PASSED test targets:
    cmdLineTester_dumpromclasstests_0

TOTAL: 1   EXECUTED: 1   PASSED: 1   FAILED: 0   DISABLED: 0   SKIPPED: 0
ALL TESTS PASSED

Will add all DDR changes to the corresponding PR soon.

Currently I've been investigating [1] whether the native code can be debugged in gdb and
[2] whether backtrace in gdb works for a generated system core dump on RISC-V (which should be helpful in troubleshooting in the future)

[1] whether it is feasible to debug native code on RISC-V:

Given that there is no gdb support on Fedora rawhide 2.8_stage4 (no package available for downloading from https://secondary.fedoraproject.org/pub/alt/risc-v/RPMS/riscv64 / stage4 is the stable version we are using to port OpenJ9 for now), I managed to compile a gdb (v8.3) by downloading from the upstream of gdb as follows:

wget "https://ftp.gnu.org/gnu/gdb/gdb-8.3.tar.gz"
tar -xvzf gdb-8.3.tar.gz
gdb-8.3$ ./configure
gdb-8.3$ make
make install

Initially I wrote a simple test program to see whether the generated gdb works fine as follows;

crashtest.c:
 1 #include <stdio.h>
 2 void stack3() {  *((int *)0) = 5; }
 3 void stack2() { stack3(); }
 4 void stack1() { stack2(); }
 5 void stack0() { stack1(); }
 6 int main()
 7 {
 8    stack0(); <--- set up breakpoint
 9    return 0;
10 }  

[jincheng@stage4 ]$ gcc -g  -o crashtest  crashtest.c
[jincheng@stage4 ]$ gdb 
GNU gdb (GDB) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
...
This GDB was configured as "riscv64-unknown-linux-gnu".
...
(gdb) file crashtest
Reading symbols from crashtest...
(gdb) b crashtest:8
No source file named crashtest.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (crashtest:8) pending.
(gdb) r
Starting program: /home/jincheng/.../crashtest
[3]+  Stopped                 gdb  <----- exited automatically for unknown reason

As compared to the same code in debugging on X86_64:

GNU gdb (GDB) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
...
This GDB was configured as "riscv64-unknown-linux-gnu".

jincheng@JCHLNXSVL1:~/$ gcc -g -o crashtest  crashtest.c
jincheng@JCHLNXSVL1:~/$ gdb
GNU gdb (Ubuntu 8.2-0ubuntu1~16.04.1) 8.2
...
This GDB was configured as "x86_64-linux-gnu".
(gdb) file  crashtest
Reading symbols from crashtest...done.
(gdb) b  crashtest.c:8
Breakpoint 1 at 0x400503: file crashtest.c, line 8.
(gdb) r
Starting program: /home/jincheng/crashtest

Breakpoint 1, main () at crashtest.c:8
8          stack0();
(gdb) bt
#0  main () at crashtest.c:8
(gdb) s
stack0 () at crashtest.c:5
5       void stack0() { stack1(); }
(gdb) s
stack1 () at crashtest.c:4
4       void stack1() { stack2(); }
(gdb) s
stack2 () at crashtest.c:3
3       void stack2() { stack3(); }
(gdb) bt
#0  stack2 () at crashtest.c:3
#1  0x00000000004004eb in stack1 () at crashtest.c:4
#2  0x00000000004004fc in stack0 () at crashtest.c:5
#3  0x000000000040050d in main () at crashtest.c:8

Unfortunately, the test comparison above indicates that gdb on RISC-V somehow exit automatically even if the program was compiled with -g (debug info). Also tried with a OpenJ9 cross-build in debug mode and ended up with the same result:

[jincheng@stage4 bcverify]$ gdb
GNU gdb (GDB) 8.3
...
(gdb) file   /home/jincheng/linux-riscv64-normal-server-fastdebug/images/jdk/bin/java
Reading symbols from /home/jincheng/linux-riscv64-normal-server-fastdebug/images/jdk/bin/java...
(No debugging symbols found in /home/jincheng/linux-riscv64-normal-server-fastdebug/images/jdk/bin/java)
(gdb) set solib-absolute-prefix /usr
(gdb) set solib-search-path  /home/jincheng/linux-riscv64-normal-server-fastdebug/images/vm
Reading symbols from /home/jincheng/linux-riscv64-normal-server-fastdebug/images/vm/libjvm.so...
Reading symbols from /home/jincheng/linux-riscv64-normal-server-fastdebug/images/vm/libjvm.so...
Reading symbols from /home/jincheng/linux-riscv64-normal-server-fastdebug/images/vm/libomrsig.so...
...
(gdb) set args -cp . StackTest
(gdb) b /home/jincheng/linux-riscv64-normal-server-fastdebug/images/vm/bcverify/rtverify.c:490
No symbol table is loaded.  Use the "file" command.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (/home/jincheng/linux-riscv64-normal-server-fastdebug/vm/bcverify/rtverify.c:490) pending.
(gdb) r
Starting program: /home/jincheng/linux-riscv64-normal-server-fastdebug/ddr_test/jdk/bin/java -cp . StackTest
[2]+  Stopped                 gdb <----- exited automatically for unknown reason

So it means the latest version of gdb is not that mature (likely still under development) to debug programs at least on Fedora_Riscv.

@ChengJin01
GDB's riscv64 native target (kind of) works these days, I use it regularly when debugging stuff on RISC-V. However, it is not only GDB that is involved, there was a bug in kernel's PTRACE implementation as well as some missing support in GLIBC. This being said, you need a matching combination of these. I can dig out details if you want.

At the very least,(my) GDB does work on my Debian system, see debian-mk-gdb.sh

I'll check stock GDB 8.3 later today.

@janvrany
But for a generated core dump from a simple test, GDB 8.3 works good:
e.g.

[jincheng@stage4 ]$ gdb  crashtest  -core  core.16922
...
[New LWP 16922]
Core was generated by `./crashtest'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000010416 in stack3 () at crashtest.c:5
5     *((int *)0) = 5;
(gdb) bt
#0  0x0000000000010416 in stack3 () at crashtest.c:5
#1  0x000000000001042c in stack2 () at crashtest.c:10
#2  0x0000000000010442 in stack1 () at crashtest.c:15
#3  0x0000000000010458 in stack0 () at crashtest.c:20
#4  0x000000000001046e in main () at crashtest.c:25

So, it is hard to tell whether GDB 8.3 overall works on Fedora_Riscv 2.8_stage4 (there might be issue as to the compatibility with glibc used on the target platform or missing patches). Need to double-check to see whether a lower version of GDB work around this.

@ChengJin01
It looks to me that GDB 8.3.1 itself is okish:

jv@unleashed:/tmp$ ~/Projects/gdb/master/gdb/gdb --data-directory ~/Projects/gdb/master/gdb/data-directory a.out 
GNU gdb (GDB) 8.3.1       
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "riscv64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:         
<http://www.gnu.org/software/gdb/bugs/>.          
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.                                                                                           

For help, type "help".    
Type "apropos word" to search for commands related to "word"...         
Warning: /usr/src/glibc-2.26/nss: No such file or directory.                                                                                    
Warning: /usr/src/glibc-2.26/libio: No such file or directory.
Reading symbols from a.out... 
(gdb) b crashtest.c:8                                                   
Breakpoint 1 at 0x648: file crashtest.c, line 8.                                                                                                
(gdb) r                                                                 
Starting program: /tmp/a.out                                                                                                                    

Breakpoint 1, main () at crashtest.c:8                                                                                                          
8          stack0(); //<--- set up breakpoint                           
(gdb) bt                      
#0  main () at crashtest.c:8        
(gdb) step                                                              
stack0 () at crashtest.c:5          
5       void stack0() { stack1(); }
(gdb)                                                                                                                                           
stack1 () at crashtest.c:4
4       void stack1() { stack2(); }                                     
(gdb)                                                                                                                                           
stack2 () at crashtest.c:3                                              
3       void stack2() { stack3(); }                                     
(gdb) bt                                                                
#0  stack2 () at crashtest.c:3                                          
#1  0x0000002aaaaaa620 in stack1 () at crashtest.c:4
#2  0x0000002aaaaaa636 in stack0 () at crashtest.c:5
#3  0x0000002aaaaaa64c in main () at crashtest.c:8
(gdb)   

```

@janvrany
One thing confused me is the messages coming from your gdb on Debian as follows:

Warning: /usr/src/glibc-2.26/nss: No such file or directory.                                                                                    
Warning: /usr/src/glibc-2.26/libio: No such file or directory.

Given that the first release of GLIBC on RISC-V is 2.27 as announced at https://sourceware.org/ml/libc-announce/2018/msg00000.html, I am wondering why it was locating glibc-2.26 (which has no RISC-V support) instead of 2.27. Did you compile everything including glibc from scratch on Debian?

The build to trigger a crash on our side is a cross-build (compiled via the gnu cross-toolchain with the mounted Fedora image / the same OS used to boot via QEMU), so theoretically there should be no compatibility issue between the kernel and the glibc (2.27). If so, the problem might exist between GDB 8.3 and the glibc2.27/kernel on the Fedora_stage4(rawhide 2.8).

@ChengJin01

One thing confused me is the messages coming from your gdb on Debian as follows:

Yes, that's confusing. Ignore that. This is because I have in my ~/.gdbinit something like:

set directories $cdir:$cwd:/usr/src/glibc-2.26/nss:/usr/src/glibc-2.26/libio

My /home is shared over NFS among all hosts in my build farm (this make things convenient for me) If I remove this line from ~/.gdbinit these warnings go away.

However, it is not only GDB that is involved, there was a bug in kernel's PTRACE implementation as well as some missing support in GLIBC. This being said, you need a matching combination of these.

I just found the mailing-list of discussion about the issue with gdb on Fedora_riscv images at
https://groups.google.com/a/groups.riscv.org/forum/#!topic/sw-dev/CSkwpmjQLqc and the corresponding gdb port at https://github.com/jim-wilson/riscv-linux-native-gdb.

According to the discussion, there does exist the matching issue between gdb, kernel/ptrace and glibc that requires a couple of patches to fix the problem on each of them separately as some of them were not yet get merged to the code base. But the steps there seem tricky (it might need to recompile a patched kernel for Fedora_stage4, etc) to get things work on OpenJ9.

hello

@edmathew234, is there anything you'd like to bring up to us?

Not really, just confuse about your pages

I will re-compile & run all sanity & extended test suite (without JIT involed) with the latest code changes to see what else left needs to be addressed except the gdb issue (which technically has nothing to do with our changes), which needs a couple of days to finish. If nothing new was detected, everything here should be ready for review and get merged to the code base after that.

Anything weird detected later on the hardware will be fixed after we receive the dev board (might be available early next year).

Still on compiling the whole test suites (sanity & extended) on OpenJ9 (got messed up previously due to incorrect modifications in one of test config scripts) and will see how it goes after that.

@janvrany already helped to confirm that backtrace works fine on Debian_riscv with all our RISC-V specific changes, which means there is nothing we can do from OpenJ9 perspective to deal with the gdb issue on Fedora_riscv.

jv@unleashed:/tmp$ .../riscv64-unknown-linux-gnu/gdb/gdb --args ./jdk/bin/java -cp . StackTest
GNU gdb (GDB) 8.3.50.20190905-git    
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.   
Type "show copying" and "show warranty" for details.
This GDB was configured as "riscv64-unknown-linux-gnu".     
...                                                               
Reading symbols from ./jdk/bin/java...                                                                                           
(No debugging symbols found in ./jdk/bin/java)           
(gdb) r                                                                                                                          
Starting program: /tmp/jdk/bin/java -cp . StackTest                                                                              
[Thread debugging using libthread_db enabled]                                                                                    
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
[New Thread 0x20009881e0 (LWP 19755)]                                                                                            
[New Thread 0x2000c2c1e0 (LWP 19756)]
...                                                                                         
verifyBytecodes: Class.Method.Sig = StackTest.Stack3.()V
Thread 2 "main" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x20009881e0 (LWP 19755)]      
verifyBytecodes (verifyData=0x2004086060) at rtverify.c:494
494     rtverify.c: No such file or directory.
(gdb) bt 10                                                                                                                      
#0  verifyBytecodes (verifyData=0x2004086060) at rtverify.c:494
#1  j9rtv_verifyBytecodes (verifyData=verifyData@entry=0x2004086060) at rtverify.c:173                                           
#2  0x0000002000a60d2a in j9bcv_verifyBytecodes (portLib=<optimized out>, clazz=clazz@entry=0x12b000,                            
    romClass=romClass@entry=0x200435ede0, verifyData=verifyData@entry=0x2004086060) at bcverify.c:2547                           
#3  0x00000020009e13ce in performVerification (clazz=0x12b000, currentThread=0x5f00) at ClassInitialization.cpp:148              
#4  classInitStateMachine (currentThread=0x5f00, clazz=0x12b000, desiredState=<optimized out>) at ClassInitialization.cpp:376    
#5  0x00000020009cd9f8 in VM_BytecodeInterpreter::inlInternalsPrepareClassImpl (_pc=<optimized out>, _sp=<optimized out>,        
    this=<optimized out>) at BytecodeInterpreter.hpp:3907
#6  VM_BytecodeInterpreter::run (this=0x20009874c8, vmThread=0x92523bb1116a000) at BytecodeInterpreter.hpp:9527                  
#7  0x00000020009c8d58 in bytecodeLoop (currentThread=<optimized out>) at BytecodeInterpreter.cpp:111                            
#8  0x0000002000a19392 in runCallInMethod (env=0x5f00, receiver=0x0, clazz=0xc5188, methodID=0x200433a7b0, args=0x20009878c8)    
    at callin.cpp:1083                                                                                                           
#9  0x0000002000a280b2 in gpProtectedRunCallInMethod (entryArg=0x2000987888) at jnicsup.cpp:258                                  
(More stack frames follow...)                                   
(gdb)                       

So, we might need to contact Fedora_riscv guys (from Redhat) to see whether there is easier way to work around this in the near future.

Already finished running the functional test suites (sanity & extended) with the cross-builds as follows:
TestTargetResult_functional_12_02_2019_cross_build.tap.txt

TEST TARGETS SUMMARY
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
DISABLED test targets:
...
PASSED test targets:
...
FAILED test targets:
        cmdLineTester_modularityddrtests_1   (no DDR support the cross-build)
        cmdLineTester_dumpromclasstests_0  (no DDR support the cross-build)
        ContendedFieldsTests_90_1          (no cache line support via QEMU / needs to be checked on the hardware)
        TestManagementAgent_0          (actually no testcase running in the test suite /already excluded)

TOTAL: 584   EXECUTED: 237   PASSED: 233   FAILED: 4   DISABLED: 31   SKIPPED: 316
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

As mentioned at https://github.com/eclipse/openj9/issues/5058#issuecomment-546555872, TestManagementAgent_0 should be totally excluded on riscv64 as there is no test case running in the test suite on RISC-V.

So, there is no technical issue left to be addressed on the cross-build. Will double-check the tests with the latest build compiled on Fedora_riscv to see how it goes.

Already finished the tests (sanity & extended) on the build compiled on Fedora/QEMU:

TEST TARGETS SUMMARY
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

FAILED test targets:
    J9vmTest_5  <---------- one failing case on CriticalRegionTest
    ContendedFieldsTests_90_1 (needs to check on the hardware)
    TestManagementAgent_0   (should be excluded)

TOTAL: 584   EXECUTED: 237   PASSED: 234   FAILED: 3   DISABLED: 31   SKIPPED: 316

        +++ j9vm.test.jni.CriticalRegionTest: +++
        command: /.../jdk/bin/java -Xcompressedrefs -Xcompressedrefs -Xjit -Xgcpolicy:gencon -Xint -Xdump -Xms64m -Xmx64m    -classpath /home/jincheng/RISCV_OPENJ9/openj9_test_11_29/test/TestConfig/../../jvmtest/functional/VM_Test/VM_Test.jar:/home/jincheng/RISCV_OPENJ9/openj9_test_11_29/test/TestConfig/../TestConfig/lib/asm-all.jar  j9vm.test.jni.CriticalRegionTest

        Exception in thread "main" java.lang.RuntimeException: 
testAcquireAndSleep_Threads8: not all threads completed before the timeout (7/8)
                at j9vm.test.jni.CriticalRegionTest.reportError(CriticalRegionTest.java:154)
                at j9vm.test.jni.CriticalRegionTest.testAcquireAndSleep(CriticalRegionTest.java:786)
                at j9vm.test.jni.CriticalRegionTest.runBlockingTests(CriticalRegionTest.java:734)
                at j9vm.test.jni.CriticalRegionTest.main(CriticalRegionTest.java:191)
        no-zero exit value: 1
        *** Test FAILED *** (j9vm.test.jni.CriticalRegionTest)

So I reran the failing test on Fedora/QEMU via both FyreVM and local UbuntVM and it passed without any exception:

TEST TARGETS SUMMARY
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
PASSED test targets:
        J9vmTest_5

TOTAL: 1   EXECUTED: 1   PASSED: 1   FAILED: 0   DISABLED: 0   SKIPPED: 0
ALL TESTS PASSED
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

_J9vmTest_5 done

So the problem might be caused by the time-out issue (slow without JIT) or the weird behaviour from QEMU (sometime it just got stuck for unknown reason).

Now all changes with RISC-V were submitted for review as there is no more issue to be addressed.
We will verify the changes on the hardware after receiving the U540 dev board (unavailable until Jan next year).

the weird behaviour from QEMU (sometime it just got stuck for unknown reason).

I've seen this too. Try to compile latest qemu from source - that solved strange hangups for me.

After flashing the latest Fedora_raw image (https://dl.fedoraproject.org/pub/alt/risc-v/repo/virt-builder-images/images/Fedora-Developer-Rawhide-20200108.n.0-sda.raw.xz) to the microSD card with virt-builder on Ubuntu 19.10, I successfully managed to boot it up on the SiFive U540 (https://www.sifive.com/boards/hifive-unleashed)

[root@openj9riscv ~]# uname -a
Linux openj9riscv.ibm.com 
5.5.0-0.rc5.git0.1.1.riscv64.fc32.riscv64 #1 SMP 
Mon Jan 6 17:31:22 UTC 2020 riscv64 riscv64 riscv64 GNU/Linux

with the following MSEL setting (1101 mode) on the hardware against https://github.com/sifive/freedom-u-sdk:

      USB   LED    Mode Select                  Ethernet
 +===|___|==****==+-+-+-+-+-+-+=================|******|===+
 |                | | | | |X| |                 |      |   |
 |                | | | | | | |                 |      |   |
 |        HFXSEL->|X|X|X|X| |X|                 |______|   |
 |                +-+-+-+-+-+-+                            |
 |        RTCSEL-----/ 0 1 2 3 <--MSEL                     |
==========

I will get started to set up everything for cross-compilation (croos-toolchain, QEMU5.0 required by the raw image, etc) on Ubuntu 19.10 as virt-builder only works on at least Ubuntu 18.

Currently investigating an NPE issue at https://github.com/eclipse/openj9/issues/9058. Will get back to this work once the problem in there gets addressed.

I cross-compiled a OpenJ9 build with all my changes which works good on the Fedora_raw booted via QEMU but unexpectedly crashed /ended up with weird exception when running on Fedora_raw booted on the HiFive U540 board.

QEMU:
[jincheng_riscv@fedora-riscv RISCV_OPENJ9_FEDORA]$ 
jdk11_rv64_opej9_cross/bin/java  -version
openjdk version "11.0.7-internal" 2020-04-14
OpenJDK Runtime Environment (build 11.0.7-internal+0-adhoc.jincheng.openj9-openjdk-jdk11)
Eclipse OpenJ9 VM (build riscv_openj9_uma_03_23_2020-01e55bb10, JRE 11 Linux riscv-64-Bit Compressed References 20200328_000000 (JIT disabled, AOT disabled)
OpenJ9   - 01e55bb10
OMR      - 486f78f2a
JCL      - 5867572f18 based on jdk-11.0.7+8)

HiFive U540:
[jincheng_riscv@openj9riscv RISCV_OPENJ9]$ jdk11_rv64_opej9_cross/bin/java  -version
Exception in thread "(unnamed thread)" java/lang/ExceptionInInitializerError
java/lang/ArithmeticException

[jincheng_riscv@openj9riscv RISCV_OPENJ9]$ jdk11_rv64_opej9_cross/bin/java -version
Unhandled exception
Type=Segmentation error vmState=0x00000000
...

After talking to the users of my changes, it seems:
1) It works good on both Debian_riscv/QEMU and Debian_riscv/U540 from @janvrany (our partner on OpenJ9 & OMR/JIT) working on Debian_riscv/U540:

jv@unleashed:/...$ 
./build/linux-riscv64-normal-server-release/images/jdk/bin/java --version
openjdk 11.0.7-internal 2020-04-14
OpenJDK Runtime Environment (build 11.0.7-internal+0-adhoc.jenkins.j9chengjinriscv64linux2)
Eclipse OpenJ9 VM (build HEAD-01e55bb1, JRE 11 Linux riscv-64-Bit Compressed References 20200401_16 (JIT disabled, AOT disabled)
OpenJ9   - 01e55bb1
OMR      - 486f78f
JCL      - 5867572f1 based on jdk-11.0.7+8)

2) It works good on both Fedora_stage4 (the outdated version) /QEMU and Fedora_stage4/U540 from another user working on Fedora_stage4/U540 a couple of weeks ago (unfortunately he was unable to access the lab to get the detailed steps of how to get it work on U540 for now /only roughly remembered he followed my steps to set everything up for stage4 to create the cross-build)

Meanwhile, I requested the Fedora/riscv developer at #fedora-riscv/IRC to double-check the dmesg output after the crash occurred, which turns out to be nothing wrong on the kernel side.

Based on the feedback above, it indicates 3 things:
1) there might be no problem with my changes on OpenJ9.
2) OpenJ9 cross-build does work on a Linux-based system booted on U540 (both Debian_riscv and the older version of Fedora_riscv)
3) there might exist something new/distinct in the latest version of Fedora_riscv that stops the VM from being executed correctly in loading/initialization, which might take quite a long time to figure out what really happened in there & to see what we can do to fix it up on our side.

To confirm the whole cross-compilation is correct & ruling out any possibility that might trigger the crash (e.g. incorrect setting/config, mounted a messed-up image when cross-compilation, etc), I will reset up everything (re-download the Fedora_riscv image, configure tooling & cross-compilation environment, etc) on Ubuntu 19 to double-check /ensure there is nothing wrong/abnormal I ignored previously. If the generated cross-build still crashes on U540, then I need to trace down piece by piece to see what happens in there.

To avoid blocking the OpenJ9 test too long on the hardware, I will follow the janvrany 's step at https://github.com/janvrany/riscv-debian to establish a Debian VM for cross-compilation (hopefully later this week or next week) to see how it goes given that the cross-build proved to be work on Debian_riscv/U540. If eventually it ends up with a workable OpenJ9 cross-build on U540, I will get started to run the tests on Debian_riscv/U540 first while keep investigating the crash issue on Fedora_riscv/U540.

To support the Ubuntu provided cross-compiler (also available on other platforms such as Fedora and Debian), I modified a couple of scripts to pick up the correct cross-compiler if they are directly installed instead of compiled from the source, which seems working
in terms of configuration & cross-compilation (the generated cross-build was verified on Fedora_raw/QEMU)

Tools summary:
* Boot JDK:       openjdk version "11.0.7-internal" 2020-04-14 OpenJDK Runtime Environment (build 11.0.7-internal+0-adhoc.jincheng.openj9-openjdk-jdk11riscv) Eclipse OpenJ9 VM (build riscv_openj9_uma_03_23_2020-04cc5a520, JRE 11 Linux amd64-64-Bit Compressed References 20200415_000000 (JIT enabled, AOT enabled) OpenJ9   - 04cc5a520 OMR      - c9f67b523 JCL      - cc6eef2754 based on jdk-11.0.7+10)  (at /home/jincheng/RISCV_OPENJ9_LOCAL/genjdk_cross/jdk11_openj9_x86_64_riscv_04_15)
* Toolchain:      gcc (GNU Compiler Collection)
* C Compiler:     Version 9.2.1 (at /usr/bin/riscv64-linux-gnu-gcc) <-----
* C++ Compiler:   Version 9.2.1 (at /usr/bin/riscv64-linux-gnu-g++) <------

Will create PRs separately in https://github.com/ibmruntimes/openj9-openjdk-jdk11/issues/218 (OpenJDK11), https://github.com/eclipse/openj9/issues/7421 (OpenJ9) and https://github.com/eclipse/omr/issues/4426 (OMR).

java -version still crashed or ended up with random exceptions on the hardware after resetting up the cross-compilation environment. Will get started to investigate what happened to the code there.

I managed to trigger the dumps with these random exceptions on the debugging build which indicate the exceptions came from different places in the interpreter at runtime:
e.g.
1) java/lang/NullPointerException

#12 0x0000003fd3c88e94 in setCurrentExceptionUTF (vmThread=0x14f00, exceptionNumber=6, detailUTF=0x0) at exceptionsupport.c:65
#13 0x0000003fd3bf06da in VM_BytecodeInterpreterCompressed::run (this=0x3fd45ebab8, vmThread=0x14f00)
    at BytecodeInterpreter.hpp:9635 <-------------

 9632 nullPointer:
 9633         updateVMStruct(REGISTER_ARGS);
 9634         prepareForExceptionThrow(_currentThread);
 9635         setCurrentExceptionUTF(_currentThread, J9VMCONSTANTPOOL_JAVALANGNULLPOINTEREXCEPTION, NULL); <--------
 9636         VMStructHasBeenUpdated(REGISTER_ARGS);

#14 0x0000003fd3bc5c20 in bytecodeLoopCompressed (currentThread=0x14f00) at BytecodeInterpreter.inc:110
#15 0x0000003fd3cc22a2 in cInterpreter (currentThread=0x14f00) at rv64cinterpnojit.cpp:35
#16 0x0000003fd3cc22d4 in c_cInterpreter (currentThread=0x14f00) at rv64cinterpnojit.cpp:45
#17 0x0000003fd3c76bf0 in sendClinit (currentThread=0x14f00, clazz=0x21f00) at callin.cpp:419
#18 0x0000003fd3c02c4c in initializeImpl (currentThread=0x14f00, clazz=0x21f00) at ClassInitialization.cpp:93
#19 0x0000003fd3c04844 in classInitStateMachine (currentThread=0x14f00, clazz=0x21f00, desiredState=J9_CLASS_INIT_INITIALIZED)
    at ClassInitialization.cpp:634
#20 0x0000003fd3c037dc in initializeClass (currentThread=0x14f00, clazz=0x21f00) at ClassInitialization.cpp:305
#21 0x0000003fd2c320a2 in initializeRequiredClasses (vmThread=0x14f00, dllName=0x3fd2c82eb8 "jclse29")
    at common/jclcinit.c:711
710         /* Initialize java.lang.String which will also initialize java.lang.Object. */
711         vmFuncs->initializeClass(vmThread, stringClass); <------

#22 0x0000003fd2c58c96 in standardInit (vm=0x3fcc00dca0, dllName=0x3fd2c82eb8 "jclse29") at common/stdinit.c:158
#23 0x0000003fd2c6519a in scarInit (vm=0x3fcc00dca0) at common/vm_scar.c:312
#24 0x0000003fd2c6534c in J9VMDllMain (vm=0x3fcc00dca0, stage=14, reserved=0x0) at common/vm_scar.c:376
#25 0x0000003fd3ca7eb8 in runJ9VMDllMain (dllLoadInfo=0x3fcc01c758, userDataTemp=0x3fd45ec100) at jvminit.c:3482

2) java/lang/ArithmeticException

#12 0x0000003fdcf18e94 in setCurrentExceptionUTF (vmThread=0x14f00, exceptionNumber=29, 
    detailUTF=0x3fd80bc640 "divide by zero") at exceptionsupport.c:65 <---------------------------------------------
#13 0x0000003fdcf18f88 in setCurrentExceptionNLS (vmThread=0x14f00, exceptionNumber=29, moduleName=1245271629, 
    messageNumber=36) at exceptionsupport.c:82
#14 0x0000003fdce81812 in VM_BytecodeInterpreterCompressed::run (this=0x3fdd87bab8, vmThread=0x14f00)
    at BytecodeInterpreter.hpp:9658 <----------------

 9655 divideByZero:
 9656         updateVMStruct(REGISTER_ARGS);
 9657         prepareForExceptionThrow(_currentThread);
 ----> 9658   setCurrentExceptionNLS(_currentThread, J9VMCONSTANTPOOL_JAVALANGARITHMETICEXCEPTION, J9NLS_VM_DIVIDE_BY_ZERO      );

#15 0x0000003fdce55c20 in bytecodeLoopCompressed (currentThread=0x14f00) at BytecodeInterpreter.inc:110
#16 0x0000003fdcf522a2 in cInterpreter (currentThread=0x14f00) at rv64cinterpnojit.cpp:35
#17 0x0000003fdcf522d4 in c_cInterpreter (currentThread=0x14f00) at rv64cinterpnojit.cpp:45
#18 0x0000003fdcf06bf0 in sendClinit (currentThread=0x14f00, clazz=0x21f00) at callin.cpp:419
#19 0x0000003fdce92c4c in initializeImpl (currentThread=0x14f00, clazz=0x21f00) at ClassInitialization.cpp:93
#20 0x0000003fdce94844 in classInitStateMachine (currentThread=0x14f00, clazz=0x21f00, desiredState=J9_CLASS_INIT_INITIALIZED)
    at ClassInitialization.cpp:634
#21 0x0000003fdce937dc in initializeClass (currentThread=0x14f00, clazz=0x21f00) at ClassInitialization.cpp:305
#22 0x0000003fcade90a2 in initializeRequiredClasses (vmThread=0x14f00, dllName=0x3fcae39eb8 "jclse29")
    at common/jclcinit.c:711 <-----
710         /* Initialize java.lang.String which will also initialize java.lang.Object. */
711         vmFuncs->initializeClass(vmThread, stringClass); <------

#23 0x0000003fcae0fc96 in standardInit (vm=0x3fd800da40, dllName=0x3fcae39eb8 "jclse29") at common/stdinit.c:158
#24 0x0000003fcae1c19a in scarInit (vm=0x3fd800da40) at common/vm_scar.c:312
#25 0x0000003fcae1c34c in J9VMDllMain (vm=0x3fd800da40, stage=14, reserved=0x0) at common/vm_scar.c:376
#26 0x0000003fdcf37eb8 in runJ9VMDllMain (dllLoadInfo=0x3fd801c4f8, userDataTemp=0x3fdd87c100) at jvminit.c:3482

The native stacktraces above indicate the issue occurred during the Initialization of java.lang.String when loading the library jclse29 but there is no more information on the native stack as the exception is random.

I also tried to force verification on all bootstrap classes with -Xverify:all but it still ended up wtih the same exceptions in interperter, which means there is no verification error in the bytecode.

As suggested by @gacholio, having java stack in dumps might be key to learn what happened to the interperter, in which case we requires a build with DDR enabled to trigger dumps with DDR metadata.

Given that there is no DDR support on the cross-build due to the limitations of ddrgen(must run on RISC-V to generate all artifacts for DDR) & OMR, I need to create a build with DDR via the cross-build on the QEMU(emulator) (which takes approx. 2 days to finish / the compilation on the emulator is terribly slow as compared to the cross-compilation at local system) & upload the build onto the hardware to trigger dumps to see how it goes.

In theory, the build created on the emulator should work the same way as the cross-build as both of them are compiled with the same OS image . So the emulator-built build will crash or end up with any exception as expected with dumps, in which case the dumps can be downloaded to local host system for checking the java stack with the build on X86_64 (with DDR support) generated with same code source as the RISC-V.

I managed to create a build with DDR support on Fedora/QEMU and uploaded to Fedora/U540 to trigger core dumps which shows the instruction went out of the scope of the bytecode in String <clinit> as follows:

> !threads
    !stack 0x00014f00   !j9vmthread 0x00014f00  !j9thread 0x3fd8006f80  tid 0x1321 (4897) // (<NULL>)
> !stack 0x00014f00
<14f00>                             Generic special frame
<14f00>     !j9method 0x0000000000020DB0   java/lang/String.<clinit>()V
<14f00>                             JNI call-in frame
<14f00>                             Native method frame

> !stackslots 0x00014f00
<14f00> *** BEGIN STACK WALK, flags = 00400001 walkThread = 0x0000000000014F00 ***
<14f00>     ITERATE_O_SLOTS
<14f00>     RECORD_BYTECODE_PC_OFFSET
<14f00> Initial values: walkSP = 0x0000000000014E10, PC = 0x0000000000000001, literals = 0x0000000000000000, A0 = 0x0000000000014E28, j2iFrame = 0x0000000000000000, ELS = 0x0000003FDF1D4CD0, decomp = 0x0000000000000000
<14f00> Generic special frame: bp = 0x0000000000014E28, sp = 0x0000000000014E10, pc = 0x0000000000000001, cp = 0x0000000000000000, arg0EA = 0x0000000000014E28, flags = 0x0000000000000000
<14f00> Bytecode frame: bp = 0x0000000000014E40, sp = 0x0000000000014E30, pc = 0x0000003FBDB9DA2D, cp = 0x0000000000020DD0, arg0EA = 0x0000000000014E50, flags = 0x0000000000000000
<14f00>     Method: java/lang/String.<clinit>()V !j9method 0x0000000000020DB0
<14f00>     Bytecode index = 65,589 <----------------
<14f00>     Using local mapper
<14f00>     Locals starting at 0x0000000000014E50 for 0x0000000000000002 slots
<14f00>         I-Slot: t0[0x0000000000014E50] = 0x0000000000000001
<14f00>         I-Slot: t1[0x0000000000014E48] = 0x00000000FFF00878
<14f00> JNI call-in frame: bp = 0x0000000000014E78, sp = 0x0000000000014E58, pc = 0x0000003FDE98E1C0, cp = 0x0000000000000000, arg0EA = 0x0000000000014E78, flags = 0x0000000000000000
<14f00>     New ELS = 0x0000000000000000
<14f00> JNI native method frame: bp = 0x0000000000014EA8, sp = 0x0000000000014E80, pc = 0x0000000000000007, cp = 0x0000000000000000, arg0EA = 0x0000000000014EA8, flags = 0x0000000000000000
<14f00>     Object pushes starting at 0x0000000000014E80 for 1 slots
<14f00>         Push[0x0000000000014E80] = 0x00000000FFF001D0
<14f00> <end of stack>
<14f00> *** END STACK WALK (rc = NONE) ***

> !j9method 0x0000000000020DB0
J9Method at 0x20db0 {
  Fields for J9Method:
    0x0: U8* bytecodes = !j9x 0x0000003FCC8C69F8
    0x8: struct J9ConstantPool* constantPool = !j9constantpool 0x0000000000020DD0 (flags = 0x0)
    0x10: void* methodRunAddress = !j9x 0x0000000000000007
    0x18: void* extra = !j9x 0x0000000000000001
}
Signature: java/lang/String.<clinit>()V !bytecodes 0x0000000000020DB0
ROM Method: !j9rommethod 0x0000003FCC8C69E4
Next Method: !j9method 0x0000000000020DD0
> !bytecodes 0x0000000000020DB0
  Name: <clinit>
  Signature: ()V
  Access Flags (11240008): default static 
  Max Stack: 4
  Argument Count: 0
  Temp Count: 2

    0 getstatic 227 com/ibm/oti/vm/VM.J9_STRING_COMPRESSION_ENABLED Z
    3 putstatic 26 java/lang/String.enableCompression Z
    6 getstatic 26 java/lang/String.enableCompression Z
    9 putstatic 228 java/lang/String.COMPACT_STRINGS Z
   12 newdup 229 java/lang/String$CaseInsensitiveComparator
...
  218 anewarray 234 java/io/ObjectStreamField
  221 putstatic 235 java/lang/String.serialPersistentFields [Ljava/io/ObjectStreamField;
  224 return0  <---------------

against the native stack in the dump:

#12 0x0000003fec40021a in setCurrentExceptionUTF (vmThread=0x14f00, exceptionNumber=29, detailUTF=0x3fe80cf710 "divide by zero")
    at exceptionsupport.c:65
#13 0x0000003fec3a5b5a in VM_BytecodeInterpreterCompressed::run (this=0x3feccd2c48, vmThread=0xffffffffffffffeb)
    at BytecodeInterpreter.hpp:9669
#14 0x0000003fec3a3e08 in bytecodeLoopCompressed (currentThread=<optimized out>) 
    at BytecodeInterpreter.inc:110
#15 0x0000003fec421d52 in c_cInterpreter (currentThread=<optimized out>)
    at rv64cinterpnojit.cpp:35
#16 0x0000003fec3f2b0e in sendClinit (currentThread=currentThread@entry=0x14f00, clazz=clazz@entry=0x21f00)
   at callin.cpp:419
    sendClinit(J9VMThread *currentThread, J9Class *clazz)
{
    Trc_VM_sendClinit_Entry(currentThread);
 ...
            c_cInterpreter(currentThread); <---------

#17 0x0000003fec3bbbe2 in initializeImpl (
    currentThread=currentThread@entry=0x14f00, clazz=clazz@entry=0x21f00)
    at ClassInitialization.cpp:93
#18 0x0000003fec3bcd12 in classInitStateMachine (currentThread=0x14f00, 
    clazz=0x21f00, desiredState=<optimized out>) at ClassInitialization.cpp:634
#19 0x0000003fec04c08a in initializeRequiredClasses (
    vmThread=vmThread@entry=0x14f00, 
    dllName=dllName@entry=0x3fec07c818 "jclse29") at common/jclcinit.c:711

vmFuncs->initializeClass(vmThread, stringClass); <--------------

#20 0x0000003fec060bbe in standardInit (vm=vm@entry=0x3fe800db00, 
    dllName=dllName@entry=0x3fec07c818 "jclse29") at common/stdinit.c:158

> !j9class  0x21f00
J9Class at 0x21f00 {
  Fields for J9Class:
    0x0: UDATA eyecatcher = 0x0000000099669966 (2573637990)
    0x8: struct J9ROMClass* romClass = !j9romclass 0x0000003FBDB880C8
    0x10: void** superclasses = !j9x 0x0000000000021EF8
    0x18: UDATA classDepthAndFlags = 0x00000000020E0001 (34471937)
    0x20: U32 classDepthWithFlags = 0x00000000 (0)
    0x24: U32 classFlags = 0x00000000 (0)
    0x28: struct J9ClassLoader* classLoader = !j9classloader 0x0000003FD8074908
    0x30: struct J9Object* classObject = !j9object 0x0000000083C809F8 // java/lang/Class
    0x38: volatile UDATA initializeStatus = 0x0000000000014F00 (85760)
    0x40: struct J9Method* ramMethods = !j9method 0x000000000001FB50 // java/lang/String.checkLastChar(C)V
    0x48: UDATA* ramStatics = !j9x 0x0000000000021E70
    0x50: struct J9Class* arrayClass = !j9class 0x000000000002FC00 // [Ljava/lang/String;

So the dump indicate something went wrong on return in the bytecode of String.<clinit> which requires inline debugging in the interpreter to see what happened to the whole bytecode in String.<clinit>.

The error likely does not occur returning from the clinit, rather when the VM returns to the clinit from one of the invokes in the clinit.

I tried a couple of times with the cross-compiled debug build on Fedora/U540 but it always ended up skipping over the breakpoints as follows (which works good on a X86_64 debug build with the same compilation & debug setting)

[root@openj9riscv RISCV_OPENJ9]# gdb
GNU gdb (GDB) Fedora 9.1-4.0.riscv64.fc32
(gdb) set breakpoint pending on
et sol(gdb) set solib-absolute-prefix /usr
tories  (gdb) set directories  jdk11_rv64_dbg_04_28/vm
v64_(gdb) set solib-search-path  jdk11_rv64_dbg_04_28/vm
n/java
set args  -version
b jdk11_rv64_dbg_04_28/vm/jcl/common/stdinit.c:158
b  jdk11_rv64_dbg_04_28/vm/jcl/common/jclcinit.c:711
b jdk11_rv64_dbg_04_28/vm/vm/ClassInitialization.cpp:93
b jdk11_rv64_dbg_04_28/vm/vm/BytecodeInterpreter.inc:110
(gdb) file   jdk11_rv64_dbg_04_28/images/jdk/bin/java
Reading symbols from jdk11_rv64_dbg_04_28/images/jdk/bin/java...
Missing separate debuginfo for /.../jdk11_rv64_dbg_04_28/images/jdk/bin/java
(No debugging symbols found in jdk11_rv64_dbg_04_28/images/jdk/bin/java)
(gdb) set args  -version
(gdb) b jdk11_rv64_dbg_04_28/vm/jcl/common/stdinit.c:158
No symbol table is loaded.  Use the "file" command.
Breakpoint 1 (jdk11_rv64_dbg_04_28/vm/jcl/common/stdinit.c:158) pending.
(gdb) b  jdk11_rv64_dbg_04_28/vm/jcl/common/jclcinit.c:711
No symbol table is loaded.  Use the "file" command.
Breakpoint 2 (jdk11_rv64_dbg_04_28/vm/jcl/common/jclcinit.c:711) pending.
(gdb) b jdk11_rv64_dbg_04_28/vm/vm/ClassInitialization.cpp:93
No symbol table is loaded.  Use the "file" command.
Breakpoint 3 (jdk11_rv64_dbg_04_28/vm/vm/ClassInitialization.cpp:93) pending.
(gdb) b jdk11_rv64_dbg_04_28/vm/vm/BytecodeInterpreter.inc:110
No symbol table is loaded.  Use the "file" command.
Breakpoint 4 (jdk11_rv64_dbg_04_28/vm/vm/BytecodeInterpreter.inc:110) pending.
(gdb) r
Starting program: /.../jdk11_rv64_dbg_04_28/images/jdk/bin/java -version
[New LWP 27335]
[New LWP 27336]
Exception in thread "(unnamed thread)" java/lang/ExceptionInInitializerError
java/lang/ArithmeticException <---- the exception was thrown out even without hitting any of the breakpoints
[LWP 27336 exited]
[LWP 27335 exited]
[Inferior 1 (process 27332) exited with code 01]

I need to check with the Fedora/RISC-V guys at #fedora-riscv/IRC to see whether anything else needs to be set up in compiling with the debug mode. If nothing special, then the problem might come from the debugger (gdb) itself on U540 as it is not quite stable in debugging & ends up with assertion error intermittently.

Apart from debugging issues, I just tried the same exact build as @ChengJin01 (obtained privately) on my Debian on Unleashed board and I cannot reproduce the issue there:

jv@unleashed:~/tmp/jdk11_rv64_dbg$ sha1sum images/jdk/bin/java
daee2d87dc3f448596641ea03fffec20ebcd2aca  images/jdk/bin/java
jv@unleashed:~/tmp/jdk11_rv64_dbg$ ./images/jdk/bin/java -version
openjdk version "11.0.7-internal" 2020-04-14
OpenJDK Runtime Environment (build 11.0.7-internal+0-adhoc.jincheng.openj9-openjdk-jdk11riscv)
Eclipse OpenJ9 VM (build riscv_openj9_uma_03_23_2020-0d078804c, JRE 11 Linux riscv-64-Bit Compressed References 20200428_000000 (JIT disabled, AOT disabled)
OpenJ9   - 0d078804c
OMR      - 7559e30e5
JCL      - 15533215ee based on jdk-11.0.7+10)
jv@unleashed:~/tmp/jdk11_rv64_dbg$ 

As you may see, no exception, everything seems fine. SwingSet2 demo seems to work too.

1) @janvrany also confirmed that gdb also works for my debug build (cross-compiled with the mounted Fedora image) on Debian/U540:

(gdb) info break                                                                                                                 
Num     Type           Disp Enb Address            What                                                                          
1       breakpoint     keep y   0x000000200143507c in initializeRequiredClasses at common/jclcinit.c:711                         
2       breakpoint     keep y   0x0000002000aafe4c in initializeImpl(J9VMThread*, J9Class*) at ClassInitialization.cpp:93        
3       breakpoint     keep y   <MULTIPLE>                                                                                       
3.1                         y   0x0000002000a76788 in bytecodeLoopCompressed(J9VMThread*) at BytecodeInterpreter.inc:110         
3.2                         y   0x0000002000ab288a in debugBytecodeLoopCompressed(J9VMThread*) at BytecodeInterpreter.inc:110    
(gdb) r                                                                                                                          
Starting program: /home/jv/tmp/jdk11_rv64_dbg/images/jdk/bin/java -version                                                       
[Thread debugging using libthread_db enabled]                                                                                    
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".                                                      
[New Thread 0x2000a171e0 (LWP 2775293)]                                                                                          
[New Thread 0x2000d5a1e0 (LWP 2775297)]                                                                                          
[Switching to Thread 0x2000a171e0 (LWP 2775293)]
Thread 2 "java" hit Breakpoint 1, initializeRequiredClasses (vmThread=0x5f00, dllName=0x2001485e90 "jclse29") at common/jclcinit.
c:711                                                                                                                            
711     common/jclcinit.c: No such file or directory.                                                                            
(gdb) bt
#0  initializeRequiredClasses (vmThread=0x5f00, dllName=0x2001485e90 "jclse29") at common/jclcinit.c:711                [31/1879]
#1  0x000000200145bc86 in standardInit (vm=0x200400ce50, dllName=0x2001485e90 "jclse29") at common/stdinit.c:158                 
#2  0x000000200146816e in scarInit (vm=0x200400ce50) at common/vm_scar.c:312                                                     
#3  0x0000002001468320 in J9VMDllMain (vm=0x200400ce50, stage=14, reserved=0x0) at common/vm_scar.c:376                          
#4  0x0000002000b543da in runJ9VMDllMain (dllLoadInfo=0x200401b868, userDataTemp=0x2000a15100) at jvminit.c:3493                 
#5  0x0000002000c08d3c in pool_do (pool=0x200401b110, doFunction=0x2000b5421c <runJ9VMDllMain>, userData=0x2000a15100) at pool.c:
648                                                                                                                              
#6  0x0000002000b54200 in runInitializationStage (vm=0x200400ce50, stage=14) at jvminit.c:3443                                   

which means at least there is no missing debug setting in the cross-compilation with Fedora and there might be issue between gdb, glic and the kernel on Fedora/U540.

2) After talked to the Fedora/RISC-V tooling developer, it seems there is no special set up required for gdb on Fedora/U540. Meanwhile, they already ran test suites specific to gdb (63K pass and 1K fail / for thread testing, 2628 of those tests PASS, and 92 of those tests FAIL) which means gdb should work in most cases in theory. In addition, they will be doing multi-threading test on the gdb to see whether it works well (planned but not on the top of the priority list). I was also suggested to debug the gdb to see what happened to the debugger (which need to compile the source of gdb) when running JVM.

Considering that the investigation on gdb is pretty tricky and might take a while & end up with nothing useful (but we need gdb work to solve the potential issue in the interpreter on Fedora/RISC-V) and the build works well Debian/U540, I plan to follow the instructions at https://github.com/janvrany/riscv-debian to set up a Debian VM to burn a Debian/U540 image to a SD card to get the OpenJ9 test suites running first to see how it goes while working on the gdb specific issue.

Currently working on an issue specific to verifier and will get back to this work after the issue there is fixed.

I already set things up to get Debian_riscv work on the emulator(QEMU).

root@debian-sid-rv64:~# uname -a
Linux debian-sid-rv64 5.0.0-rc1+ #2 SMP PREEMPT Thu May 14 12:58:08 EDT 2020 riscv64 GNU/Linux

but there were still issues with booting the image on the hardware (hang somewhere in bootstrap /probably has something to with the relocating the ELF binary of the kernel), in which case @janvrany will help to double-check next week to see what happens when trying with a new SD-card.

I managed to login the Debian_riscv on the hardware (U540) after resetting up everything from the scratch (the hang above was likely caused by the corrupted kernel copied to the SD-card previously)

root@unleashed:~# uname -a
Linux unleashed 5.0.0-rc1-56210-g0a657e0d72f0 #1 
SMP Fri May 15 18:05:26 EDT 2020 riscv64 GNU/Linux

The cross-build previously compiled with Fedora_raw also works good on the Debian_riscv/U540 as follows:

jincheng_debian@unleashed:~/RISCV_OPENJ9$ jdk11_openj9_rv64_compile_05_06/bin/java -version
openjdk version "11.0.8-internal" 2020-07-14
OpenJDK Runtime Environment (build 11.0.8-internal+0-adhoc.jincheng.openj9-openjdk-jdk11master)
Eclipse OpenJ9 VM (build master-fcb6d3f43, 
JRE 11 Linux riscv-64-Bit Compressed References 20200506_000000 (JIT disabled, AOT disabled)
OpenJ9   - fcb6d3f43
OMR      - ebc104396
JCL      - 302b1b522a based on jdk-11.0.8+1)

Which means there is no issue with java -version on Debian_riscv/U540, as already confirmed by @janvrany previously.

I will double-check the cross-build & native build again compiled with Fedora_raw again on Debian_riscv/U540 the next week and get started to set up the OpenJ9 test environment for sanity & extended test suites (excluding JIT tests) if everything goes well on the hardware.

Doubled-checked the generated cross-build and the build compiled with the same cross-build as the build-jdk on the Fedora_raw/QEMU, which work good on the Debian_riscv/U540.

[1] the cross-build
jincheng_debian@unleashed:~/RISCV_OPENJ9$ jdk11_openj9_rv64_compile_05_19/bin/java -version
openjdk version "11.0.8-internal" 2020-07-14
OpenJDK Runtime Environment (build 11.0.8-internal+0-adhoc.jincheng.openj9-openjdk-jdk11riscvmaster)
Eclipse OpenJ9 VM (build master-d779b737a, JRE 11 Linux riscv-64-Bit Compressed References 20200519_000000 (JIT disabled, AOT disabled)
OpenJ9   - d779b737a
OMR      - 92e2fea39
JCL      - cfce36dfff based on jdk-11.0.8+3)

[2] the build compiled on the Fedora_raw/QEMU
jincheng_debian@unleashed:~/RISCV_OPENJ9$ jdk11_rv64_qemu_05_20/bin/java   -version
openjdk version "11.0.8-internal" 2020-07-14
OpenJDK Runtime Environment (build 11.0.8-internal+0-adhoc.jinchengriscv.openj9-openjdk-jdk11qemu)
Eclipse OpenJ9 VM (build master-d779b737a, JRE 11 Linux riscv-64-Bit Compressed References 20200519_000000 (JIT disabled, AOT disabled)
OpenJ9   - d779b737a
OMR      - 92e2fea39
JCL      - cfce36dfff based on jdk-11.0.8+3)

Will get started to set up the test environment intended for OpenJ9 test suites. (Need to review the existing test changes previously ran on the emulator & modify accordingly as the OpenJ9 test framework changed earlier this year).

I just created a PR at https://github.com/riscv/riscv-software-list/pull/31 to ensure everybody in the RISC-V Software Ecosystem is aware of the progress of OpenJ9 on RISC-V.

I got started to run the whole OpenJ9 test suites on Debian_riscv/U540 but a bunch of I/O errors were triggered when running a ShareClasses specific test suite as follows:

===============================================
Running test testSCCacheManagement_0 ...
===============================================
...
java.io.FileNotFoundException: /home/jincheng_debian/javasharedresources/C290M11F1A64P_Foo_G41L00.tmp 
(Read-only file system)tee: /...
/test/TKG/../TKG/test_output_15906155788826/TestTargetResult: Read-only file system
    at java.base/java.io.FileOutputStream.open0(Native Method)
    at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
...
root@unleashed:~# [ 1109.891659] mmc0: tried to HW reset card, got error -110
[ 1110.038392] print_req_error: I/O error, dev mmcblk0, sector 1706496 flags 1
[ 1110.052690] print_req_error: I/O error, dev mmcblk0, sector 1707136 flags 1
[ 1110.067001] EXT4-fs warning (device mmcblk0p2): ext4_end_bio:323: I/O error 10 
writing to inode 1845280 (offset 0 size 0 starting block 7480579)
[ 1110.079192] Buffer I/O error on device mmcblk0p2, logical block 7447554
[ 1110.107800] EXT4-fs warning (device mmcblk0p2): ext4_end_bio:323: I/O error 10 
writing to inode 527947 (offset 0 size 794624 starting block 213442)

I suspect the issue is related to the Debian_riscv & the SD-card (not too sure whether it is suitable to run SCC tests directly on the SD-card). So I will temporarily exclude the test suite and re-compile the whole OpenJ9 test to keep going with other test suites. Will get back to address this issue after running all test suites.

I just spotted a test failure in pltest specific to j9sysinfo_get_cache_info() which was skipped previously on QEMU as /sys/devices/system/cpu/cpu<N>/cache/ only exists on the hardware.

 [ERR] Starting test j9sysinfo_test_get_l1dcache_line_size
 [ERR]   j9sysinfo_get_cache_info is not supported on this platform
 [ERR] si.c line 2205: j9sysinfo_test_get_l1dcache_line_size
          j9sysinfo_get_cache_info should have returned -355 but returned -100
 [ERR] 
 [ERR]      LastErrorNumber: -108
 [ERR]      LastErrorMessage: No such file or directory
 [ERR] 

Given that the j9sysinfo_get_cache_info() specific test was never verified on the hardware, I need to check what happened to the test & fix it up if any issue before moving forward with other tests.

The problem with j9sysinfo_get_cache_info on Debian_riscv/U540 is the index1directory doesn't start from /sys/devices/system/cpu/cpu0/cache but from /sys/devices/system/cpu/cpu1/cache as follows.

root@unleashed:/sys/devices/system/cpu# ls
cpu0  cpu2  isolated    offline  possible  uevent
cpu1  cpu3  kernel_max  online   present

root@unleashed:/sys/devices/system/cpu/cpu0/cache# ls
index0  <-------  index1 is missing
uevent  

root@unleashed:/sys/devices/system/cpu/cpu1/cache# ls
index0  index1 <------
uevent

Meanwhile, Debugging shows the existing code only checks the index directory on specified cpu folder (it is cpu0 in such case). To work around this on Debian_riscv, I created a fix which works good in pltest and ContendedFieldsTests specific to the data cache line size (failed on the emulator due to the lack of support on the cache line size):

 [ERR] Starting test j9sysinfo_test_get_l1dcache_line_size
 [ERR]   DCache line size = 64
 [ERR] Ending test j9sysinfo_test_get_l1dcache_line_size
 [ERR] 

TEST TARGETS SUMMARY
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
PASSED test targets:
    cmdLineTester_pltest_0

TOTAL: 1   EXECUTED: 1   PASSED: 1   FAILED: 0   DISABLED: 0   SKIPPED: 0
ALL TESTS PASSED
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

TEST TARGETS SUMMARY
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
PASSED test targets:
    ContendedFieldsTests_90_0
    ContendedFieldsTests_90_1
    ContendedFieldsTests_90_2
    ContendedFieldsTests_90_3

TOTAL: 4   EXECUTED: 4   PASSED: 4   FAILED: 0   DISABLED: 0   SKIPPED: 0
ALL TESTS PASSED

A PR with the fix above was created at https://github.com/eclipse/openj9/pull/9775.

I previously directly added NoOptions next to the 110 mode (the most popular mode coming with gencon gc & JIT in OpenJ9 tests) to ensure we were able to cover as many tests as we can on RISC-V but it ended up with a bunch of test failures in internal Java 8 tests due to the wrong libraries being loaded with NoOptions specified in tests. The reason for this is NoOptions originally meant to be -Xnocompressredrefs but now it means there is no option on the command line for OpenJ9 JDK, in which case it comes with -Xcompressredrefs by default. However, the existing setting for Java 8 in terms of the library path in there for NoOptions specifies lib/default for -Xnocompressredrefs rather than lib/compressedrefs for -Xcompressredrefs (totally separate builds in Java 8)

To avoid this mismatch, I am trying to separate the RISC-V specific tests with the NoOptions Mode from the existing tests to see how it goes in the personal builds on all platforms.

I am nearly close to the end of the OpenJ9 test suites (sanity & extended) with the cross-build on Debian_riscv/U540 and there is no noticeable error/exception/failure detected except the I/O issue on the SD-card if too many I/O operations occur in parallel. All failing tests with I/O issue passes if running individually.

Currently I am running the tests in /test/functional/CacheManagement (the last test suite to be verified in OpenJ9/test) by attempting to increase the time spent on each command specific to to avoid any I/O issue (running very slow on the hardware). If nothing wrong is detected in the end, I will create PRs for all test changes and continue checking the native build (compiled on Fedora_rawhide/QEMU) to see whether all test suites passes without any issue, mainly focusing on the DDR related test given that the cross-build comes without DDR.

All OpenJ9 specific test suite (sanity & extendded at https://github.com/eclipse/openj9/tree/master/test) are finished with the cross-build on Debian_riscv/U540 and no issue specific to the cross-build was detected.

Currently I am running the minimal openjdk tests at first as compared to the results from X86_64 (with JIT off) before exposing downloading link at Adopt to the public

jdk_math
jdk_util
jdk11_tier1_cipher
jdk11_tier1_buffer
jdk11_tier1_iso8859
jdk11_tier1_pack200

Will get back to check the native build with DDR support after finishing the openjdk tests with the cross-build.

I've finished all the minimal openjdk_tests (including the jdk_lang & jdk_pack200) with the cross-build on Debian_riscv/U540 and no risc-v specific issue was detected except a couple of timeout cases, some of which I already verified with a single run and all of them passed. I will get started to check the DDR related tests on the native build (compiled on Fedora_raw/QEMU) to ensure everything works fine.

Just finished the DDR related tests with the native build on Debian_riscv/U540 and everything works fine:
DDR_tests_qemu_build_debian_riscv.txt

TEST TARGETS SUMMARY
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
PASSED test targets:
    cmdLineTester_classesdbgddrext_RISCV_0

TOTAL: 1   EXECUTED: 1   PASSED: 1   FAILED: 0   DISABLED: 0   SKIPPED: 0
ALL TESTS PASSED

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
PASSED test targets:
    testDDRExt_General_openj9_0

TOTAL: 1   EXECUTED: 1   PASSED: 1   FAILED: 0   DISABLED: 0   SKIPPED: 0
ALL TESTS PASSED
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Will get back to keep investigating the exception issue with java -version on Fedora_raw/U540 to see what we can do with this problem.

To figure out what happened to java -version on Fedora_raw/U540, I will first compile a debug build with the cross-build as boot-JDK directly on Debian_riscv/U540 to debug in single steps in interpreter.

Currently investigating an issue with the code of checking the qualified name in loading classes. Will get back to this once the problem there gets addressed.

@ChengJin01
I just a bit of debugging of the java --version crash on HiFive Unleashed. Few observations:
1) It crashes on Debian when running 5.6.12 kernel (loaded by OpenSBI if that matters). The exact JDK on same Debian image but running on 5.0.0rc1 kernel (loaded by BBL) DOES NOT crash. This suggest some change in linux kernel is causing that. At least this is where to start looking as this is the only difference I can see.
2) Vanilla GDB master (commit 4d68fd750fa0ce9de7a5245f9eff539f31a95fb6) compiled on RISCV/Debian as: ../../configure --prefix=/opt/gdb --disable-werror --with-python=/usr/bin/python3.8 --disable-guile does work. One can see backtrace, single stepping seem to work fine.
3) When I run produces different output, see below.

I'll do further debugging in GDB later.

```
Unhandled exception
Type=Segmentation error vmState=0x00000000
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=0000003FC5E026C4 Handler2=0000003FC5CC7B9A InaccessibleAddress=0000000000000005
PC=0000003FC5DADC98 RA=0000003FC5DB12D6 SP=0000003FC66DB8E0 GP=0000002AE6833800
TP=0000003FC66DE8F0 T0=74532F676E616C2F T1=0000000000000000 T2=000000000000005F
S0=0000003FC66DBC38 S1=0000000000005F00 A0=FFFFFFFFFFFFFFFF A1=0000000000011DB0
A2=0000000000000000 A3=FFFFFFFFFFFFFFFF A4=0000000000000001 A5=0000003FC5DADC8E
A6=000000000000F600 A7=0000000000005F00 S2=0000003FC4E7727D S3=0000000000005DD0
S4=00000000FFF00490 S5=0000000000001000 S6=00000000FFF00898 S7=0000000000010000
S8=00000000FFF00498 S9=000000000000FFFF S10=00000000000009F8 S11=000000000040F900
T3=0000003FC6793F80 T4=73736572706D6F63 T5=75646F4D2F676E61 T6=000000000000005B
PC=0000003FC5DADC98
FT0 3fd27616c9496e0b (f: 3377032704.000000, d: 2.884576e-01)
FT1 bfe7154748bef6c8 (f: 1220474624.000000, d: -7.213475e-01)
FT2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FT3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FT4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FT5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FT6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FT7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FS0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FS1 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FA0 ffffffff42ab2760 (f: 1118513024.000000, d: -nan)
FA1 3fe62e42fefa39ef (f: 4277811712.000000, d: 6.931472e-01)
FA2 402f36a092834df1 (f: 2458078720.000000, d: 1.560669e+01)
FA3 bfe00b61fad0540f (f: 4207956992.000000, d: -5.013895e-01)
FA4 ffffffff3e3ab283 (f: 1044034176.000000, d: -nan)
FA5 ffffffff00000000 (f: 0.000000, d: -nan)
FA6 0000003fc0006b10 (f: 3221252864.000000, d: 1.352772e-312)
FA7 0000000000000020 (f: 32.000000, d: 1.581010e-322)
FA6 0000003fc0006b10 (f: 3221252864.000000, d: 1.352772e-312)
FA7 0000000000000020 (f: 32.000000, d: 1.581010e-322)
FS2 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FS3 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FS4 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FS5 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FS6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FS7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FS8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FS9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FS10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FS11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FT8 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FT9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FT10 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FT11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
FCSR 0000000000000001 (f: 1.000000, d: 4.940656e-324)
Target=2_90_20200525_8 (Linux 5.6.12-00007-g0f60ad328c3c)
CPU=riscv (4 logical CPUs) (0x1f3b76000 RAM)
----------- Stack Backtrace -----------

(0x0000003FC5CDB9E2 [libj9prt29.so+0x289e2])

JVMDUMP039I Processing dump event "gpf", detail "" at 2020/07/29 15:26:29 - please wait.
JVMDUMP032I JVM requested System dump using '/home/jv/tmp/jdk/core.20200729.152629.66336.0001.dmp' in response to an event
JVMDUMP010I System dump written to /home/jv/tmp/jdk/core.20200729.152629.66336.0001.dmp
JVMDUMP032I JVM requested Java dump using '/home/jv/tmp/jdk/javacore.20200729.152629.66336.0002.txt' in response to an event
JVMDUMP010I Java dump written to /home/jv/tmp/jdk/javacore.20200729.152629.66336.0002.txt
JVMDUMP032I JVM requested Snap dump using '/home/jv/tmp/jdk/Snap.20200729.152629.66336.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /home/jv/tmp/jdk/Snap.20200729.152629.66336.0003.trc
JVMDUMP013I Processed dump event "gpf", detail "".

@janvrany , Many thanks for the investigation from Debian_riscv.

Given that the latest kernel version of Fedora_raw is 5.5.0-0, I suspect the problem might be related to both the latest kernel or the new loader setting (OpenSBI). I talked to the Fedora_raw developers as to the 5.0.0 version at https://dl.fedoraproject.org/pub/alt/risc-v/disk-images/fedora/rawhide/20190126.n.0/Developer/

Index of /pub/alt/risc-v/disk-images/fedora/rawhide/20190126.n.0/Developer
Icon  Name                                                     Last modified      Size  Description
[PARENTDIR] Parent Directory                                                              -   
[   ] Fedora-Developer-Rawhide-20190126.n.0-sda1.raw.xz        2019-01-28 20:22  1.1G  
[   ] Fedora-Developer-Rawhide-20190126.n.0.CHECKSUM           2019-01-28 20:13  631   
[   ] bbl-5.0.0-0.rc2.git0.1.0.riscv64.fc30.riscv64            2019-01-28 20:13   15M  
[   ] config-5.0.0-0.rc2.git0.1.0.riscv64.fc30.riscv64         2019-01-28 20:13  159K  
[   ] initramfs-5.0.0-0.rc2.git0.1.0.riscv64.fc30.riscv64.img  2019-01-28 20:22   15M  
[   ] vmlinuz-5.0.0-0.rc2.git0.1.0.riscv64.fc30.riscv64        2019-01-28 20:22   14M  

but it seems the old version only support QEMU rather than hardware.

So if the single step debugging works good on Debian_riscv with kernel 5.6.12, it might be worth debugging from there to find out something more useful.

@ChengJin01

it might be worth debugging from there to find out something more useful.

That's exactly the plan :-)

1) Following the previous dumps generated at https://github.com/eclipse/openj9/issues/5058#issuecomment-620210650, I went over the whole static blocks (<clinit>) of java.lang.String and classes involved: JITHelpers_class.txt and String_clinit.TXT and debugged from the following code at runtime/jcl/common/jclcinit.c into the interpreter with single-step on each bytecode on Debian/U540:

    /* Initialize java.lang.String which will also initialize java.lang.Object. */
    vmFuncs->initializeClass(vmThread, stringClass);

However, the debugging result at debug_debian_String_clinit_log.TXT didn't indicate there was any java/lang/ArithmeticException that could occur in the bytecode. To be specific, only div or rem triggers THROW_DIVIDE_BY_ZERO in the interpreter at runtime/vm/BytecodeInterpreter.hpp, which ends up with java/lang/ArithmeticException, but these two instructions only occur in the case of big_endian in the <clinit>) of classes involved unless the corresponding bytecode there was messed up or something abnormal happened to the interpreter (the debugging here still needs to be double-checked later in case anything missed in the debugging)

2) According to the released history of Linux kernel 5.x at https://en.wikipedia.org/wiki/Linux_kernel_version_history#Releases_5.x.y

Releases 5.x.y
Version Original release date   Current version Maintainer  Support model
5.0 3 March 2019[212]   5.0.21[213] Greg Kroah-Hartman  EOL (maintained from March 2019 to June 2019)[213]
5.1 5 May 2019[214] 5.1.21[215] Greg Kroah-Hartman  EOL (maintained from May 2019 to July 2019)[215]
5.2 7 July 2019[216]    5.2.20[217] Greg Kroah-Hartman  EOL (maintained from July 2019 to October 2019)[217]
5.3 15 September 2019[218]  5.3.18[219] Greg Kroah-Hartman  EOL (maintained from September 2019 to December 2019)[219]

against the publicized Fedora_rawhide image at https://dl.fedoraproject.org/pub/alt/risc-v/repo/virt-builder-images/images/

root@jincheng-VirtualBox:~# virt-builder --list | grep riscv64
fedora-rawhide-developer-20190703n0 riscv64    Fedora? Rawhide Developer 20190703.n.0 <----
fedora-rawhide-developer-20191030n0 riscv64    Fedora? Rawhide Developer 20191030.n.0
fedora-rawhide-developer-20191123.n.0 riscv64    Fedora? Rawhide Developer 20191123.n.0
fedora-rawhide-minimal-20191123.n.1 riscv64    Fedora? Rawhide Minimal 20191123.n.1
fedora-rawhide-developer-20200108.n.0 riscv64    Fedora? Rawhide Developer 20200108.n.0
fedora-rawhide-minimal-20200108.n.0 riscv64    Fedora? Rawhide Minimal 20200108.n.0

plus the explanation of OpenSBI (replacing BBL in the booting flow) at https://content.riscv.org/wp-content/uploads/2019/12/Summit_bootflow.pdf

Linux Kernel
– Upstream kernel boots in QEMU
– Device tree hosted in Kernel
– v5.3 kernel works with OpenSBI+U-Boot on HiFive Unleashed <-----

I suspect fedora-rawhide-developer-20190703n0 might work with the Linux kernel < 5.2, which means it still works with BBL rather than OpenSBI. Given that the build works Debian(kernel 5.0) + BBL but failed on Debian(kernel 5.6) + OpenSBI, I plan to try fedora-rawhide-developer-20190703n0 first to see how it goes on U540 before moving forward to debug on Debian(kernel 5.6) + OpenSBI.

I tried fedora-rawhide-developer-20190703n0 at https://dl.fedoraproject.org/pub/alt/risc-v/repo/virt-builder-images/images/ but it ended up being stuck in bootstrap on U540 (whether it was set to 1111 or 1101 in terms of MSEL mode), which confirms the note mentioned at https://dl.fedoraproject.org/pub/alt/risc-v/repo/virt-builder-images/images/index:

notes=Fedora® Rawhide Developer 20190703.n.0
This Fedora/RISCV image contains only unmodified @Core group packages. <--- it means it is incomplete to work for U540

So there is no way to know whether Fedora-rawhide used to work with BBL on U540 as the first workable image fedora-rawhide-developer-20191030n0 on U540 only works with OpenSBI with the 1101 setting in bootstrap (I also burned this image on SD-card and booted it up on U540 with the 1101 setting)

notes=Fedora® Rawhide Developer 20191030.n.0
This is the 1st release to support QEMU virt machine and SiFive Unleashed dev board.

against the output of Linux kernel vesion on fedora-rawhide-developer-20191030n0:

[root@jincheng ~]# uname -a
Linux jincheng.ibm.com 5.4.0-0.rc4.git0.300.2.riscv64.fc32.riscv64  <------ kernel (> 5.3) supports OpenSBI
#1 SMP Mon Oct 28 11:17:31 UTC 2019
riscv64 riscv64 riscv64 GNU/Linux

Given that fedora-rawhide-developer-20191030n0 is a little bit lightweight and runs faster as compared to the latest image fedora-rawhide-minimal-20200108.n.0 we have been using, I will generate a complete snap trace with new tracepoints on U540 to double-check against the equivalent on X86_64 to see whether there is anything useful from there.

@ChengJin01
I tried to debug the crash on Debian, OpenSBI + Linux 2.6.12 where it crashes on SIGSEGV in the interpreter (full backtrace):

FRAME  0 0x0000003FF744FDB6 VM_BytecodeInterpreterCompressed::lastore (BytecodeInterpreter.hpp:5619)

variables:
     1 arrayLength     : 63
     2 index           : 4159067376
     3 this            : 0x3ff7e65ab8
     4 _sp             : 0x5dd0
     5 _pc             : 0x3ff42c727d "P\344)\364?"
     6 rc              : EXECUTE_BYTECODE
     7 arrayref        : 0x1

source: 
  5604                  *(I_64*)_sp = _objectAccessBarrier.inlineIndexableObjectReadI64(_currentThread, arrayref, index);
  5605              }
  5606          }
  5607          return rc;
  5608      }
  5609    
  5610      /* ..., arrayref, index, value1, value2 => ... */
  5611      VMINLINE VM_BytecodeAction
  5612      lastore(REGISTER_ARGS_LIST)
  5613      {
  5614          VM_BytecodeAction rc = EXECUTE_BYTECODE;
  5615          j9object_t arrayref = *(j9object_t*)(_sp + 3);
  5616          if (NULL == arrayref) {
  5617              rc = THROW_NPE;
  5618          } else {
  5619 >>           U_32 arrayLength = J9INDEXABLEOBJECT_SIZE(_currentThread, arrayref);
  5620              U_32 index = *(U_32*)(_sp + 2);
  5621              /* By using U_32 for index, we also catch the negative case, as all negative values are
  5622               * greater than the maximum array size (31 bits unsigned).
  5623               */
  5624              if (index >= arrayLength) {
  5625                  _currentThread->tempSlot = (UDATA)index;
  5626                  rc = THROW_AIOB;
  5627              } else {
  5628                  I_64 value = *(I_64*)_sp;
  5629                  _pc += 1;
  5630                  _sp += 4;
  5631                  _objectAccessBarrier.inlineIndexableObjectStoreI64(_currentThread, arrayref, index, value);
  5632              }
  5633          }

As you can see, arrayref is 0x1 so no wonder it segfaults. _currentThread is 0x5f00, _currentThread->sp is 0x5e10. Here's stackdump (if I got it right):

0x0000000000005DD0; 0x00000000  
0x0000000000005DD4; 0x00000000  
0x0000000000005DD8; 0x00000000  
0x0000000000005DDC; 0x00000000  
0x0000000000005DE0; 0xffffffff  
0x0000000000005DE4; 0x00000000  
0x0000000000005DE8; 0x00000001  
0x0000000000005DEC; 0x00000000  
0x0000000000005DF0; 0x00000000  
0x0000000000005DF4; 0x00000000  
0x0000000000005DF8; 0x00000000  
0x0000000000005DFC; 0x00000000  
0x0000000000005E00; 0x00000002  
0x0000000000005E04; 0x0000003f  
0x0000000000005E08; 0x00000000  
0x0000000000005E0C; 0x00000000  
0x0000000000005E10; 0x00000000  

At first glance, none of the values look like object reference.
However I must admit that so far I'm merely trying to make sense of the code and understand how it works.

I'll continue digging.

Hi @janvrany, Many thanks for your debugging on the latest version of Debian on U540.

If the crash literally happened to the code near to inlineIndexableObjectReadI64 or inlineIndexableObjectStoreI64 for loading/storing data, then it confirmed my previous assumption that the problem might be related to barrier which does have something to do with OpenSBI in that OpenSBI offers a full L2 cache for better performance. If so, we need to review all our code with read/write barrier in OMR to see what happened to the code in such case.

So I will burn the image to SD-card and repeat your steps to generate a debug build and double-check the result to see whether the crash still occurs around there before moving forward.

@ChengJin01: just in case you would find that useful: here are my scripts to (reproducibly) compile debug build of OpenJ9 on RISC-V Debian: https://github.com/janvrany/openj9-openjdk-jdk11-devscripts

After attempting a bunch of times, I managed to get Debian with Linux kernel 5.6 working on U540

SiFive FSBL:       2018-03-20
HiFive-U serial #: 00000287

OpenSBI v0.6
   ____                    _____ ____ _____
  / __ \                  / ____|  _ \_   _|
 | |  | |_ __   ___ _ __ | (___ | |_) || |
 | |  | | '_ \ / _ \ '_ \ \___ \|  _ < | |
 | |__| | |_) |  __/ | | |____) | |_) || |_
  \____/| .__/ \___|_| |_|_____/|____/_____|
        | |
        |_|

Platform Name          : SiFive Freedom U540
Platform HART Features : RV64ACDFIMSU
Platform Max HARTs     : 5
Current Hart           : 4
Firmware Base          : 0x80000000
Firmware Size          : 108 KB
Runtime SBI Version    : 0.2

MIDELEG : 0x0000000000000222
MEDELEG : 0x000000000000b109
PMP0    : 0x0000000080000000-0x000000008001ffff (A)
PMP1    : 0x0000000000000000-0x0000007fffffffff (A,R,W,X)
[    0.000000] OF: fdt: Ignoring memory range 0x80000000 - 0x80200000
[    0.000000] Linux version 5.6.12-149510-g0f60ad328c3c (jincheng@jincheng-pc) (gcc version 10.1.0 (GCC)) #1 SMP Tue Aug 11 15:52:15 EDT 2020
[    0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
[    0.000000] printk: bootconsole [sbi0] enabled
[    0.000000] Zone ranges:
[    0.000000]   DMA32    [mem 0x0000000080200000-0x00000000ffffffff]
[    0.000000]   Normal   [mem 0x0000000100000000-0x000000027fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080200000-0x000000027fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000080200000-0x000000027fffffff]
[    0.000000] software IO TLB: mapped [mem 0xfbfff000-0xfffff000] (64MB)
[    0.000000] CPU with hartid=0 is not available
[    0.000000] CPU with hartid=0 is not available
[    0.000000] elf_hwcap is 0x112d
...
         Starting Update UTMP about System Runlevel Changes...
[  OK  ] Finished Update UTMP about System Runlevel Changes.
[   38.212843] macb 10090000.ethernet eth0: Link is Up - 100Mbps/Full - flow control tx

Debian GNU/Linux bullseye/sid unleashed console
...
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
...
root@unleashed:~# 
root@unleashed:~# uname -a
Linux unleashed 5.6.12-149510-g0f60ad328c3c #1 SMP Tue Aug 11 15:52:15 EDT 2020 riscv64 GNU/Linux

The next step is to generate a couple of builds (including the debug build) to see how it goes with the latest linux kernel + OpenSBI.

I am currently setting up the test environment for the internal tests on Debian(5.0) with BBL/U540. Will get back to the investigation on the latest linux kernel + OpenSBI after the test is completed.

@ChengJin01 Hi Cheng Jin,
Thanks a lot for your Build_Instructions_V11.md on riscv64, I am doing something about porting java (including openjdk and openj9) to riscv. I am setting baseline now, learning and evaluating how hard and time needed to get things done.

By following your instruction I have built a jdk image and successfully run ./java -version.

I searched issues and have no idea about enabling openJ9 JIT on RISCV64 yet. Could you have any plan on openJ9 JIT and a new long-term roadmap instead of #5058? Thank you for your advice.

Hi @zdlgv5,

Many thanks for your support on the OpenJ9/RISC-V project! As for the OpenJ9 JIT on RISC-V, it is already on our priority list but we have not yet get started for now coz there are some details we need to figure out as to what is best way to move forward. Please stay tuned!

FY: @0xdaryl

Hi, @zdlgv5,

I'm currently doing some preparatory work to start working on OpenJ9 part of the JIT and hoping to start doing some real JIT work soon (TM)

Hi, @ChengJin01 and @janvrany ,
Thank you for your prompt reply!
I worked on JavaWEB for two years and I'm new here.
I'm learning more about porting and compiling and looking forward to working together with you in the near future!

For those of you who are concerned about the latest status of OpenJ9 JIT on RISC-V, the issue at https://github.com/eclipse/openj9/issues/11136 was created to track the progress in there.

The investigation with java -version (during my vacation before Christmas break & already double-checked) on Debian (Linux kernel 5.6) + OpenSBI v0.6/U540 indicates that problem was triggered by the failure of lh (assembly opcode on RISC-V) to sign-extend to the correct value on U540 as follows

/runtime/vm/BytecodeInterpreter.hpp
gotoImpl(REGISTER_ARGS_LIST, UDATA parmSize)
    {
        if (parmSize == 2) {
            _pc += *(I_16*)(_pc + 1); <-------------- in the interpreter (the _pc value was messed up)
        } else if (parmSize == 4) {
            _pc += *(I_32*)(_pc + 1);
        } else {
            Assert_VM_unreachable();
        }
        return GOTO_BRANCH_WITH_ASYNC_CHECK;
    }

with the assembly:
7319                _pc += *(I_16*)(_pc + 1);    
   0x0000003ff74489be <+34>:    ld  a5,-40(s0)         
   0x0000003ff74489c2 <+38>:    ld  a5,0(a5)           
   0x0000003ff74489c4 <+40>:    ld  a4,-40(s0)         
   0x0000003ff74489c8 <+44>:    ld  a4,0(a4)           
   0x0000003ff74489ca <+46>:    addi    a4,a4,1          
   0x0000003ff74489cc <+48>:    lh  a4,0(a4) <----------- failed to sign-extend to align with the _pc value
   0x0000003ff74489d0 <+52>:    add a4,a4,a5           
   0x0000003ff74489d2 <+54>:    ld  a5,-40(s0)         
   0x0000003ff74489d6 <+58>:    sd  a4,0(a5)           
...
6: x/x $a4
   0xffe1: <-------------- on Debian 5.6 + OpenSBI v.06

while it works good on the 5.0 + BBL to jump to the right path as follows:

1: x/i $pc
=> 0x2000a159cc <VM_BytecodeInterpreterCompressed::gotoImpl(unsigned long*&, unsigned char*&, unsigned long)+48>:   
    lh  a4,0(a4) <--------------
2: /x $a4 = 0xffffffffffffffe1  <-------- correctly sign-extended on Debian 5.0 + BBL

It doesn't mean lh itself caused the trap but it led to the incorrect calculation of the _pc value in the bytecode which screwed up the rest of execution (jumping into nowhere...)

After discussing with the Western Digital Developers working on Linux kernel & OpenSBI, they concluded that the problem came from the code with sign-extending in OpenSBI at https://github.com/riscv/opensbi/lib/sbi/sbi_misaligned_ldst.c:

int sbi_misaligned_load_handler(ulong addr, ulong tval2, ulong tinst...)
...
    if (!fp)
        SET_RD(insn, regs, val.data_ulong << shift >> shift);  <------

and committed a new fix for the issue two weeks ago at https://github.com/riscv/opensbi/commit/7dcb1e1753e9c5daec0580779ea8c31778bff152

int sbi_misaligned_load_handler(ulong addr, ulong tval2, ulong tinst...)
...
    if (!fp)
        SET_RD(insn, regs, ((long)(val.data_ulong << shift)) >> shift);

@janvrany (our partner) just verified on Debian 5.6 + the patched OpenSBI v0.6 with the fix above and java -version seems working good as expected:

jv@unleashed:~/tmp/jdk/bin$ uname -a
Linux unleashed 5.6.12-00007-g0f60ad328c3c #2 SMP Tue Jan 5 10:40:23 GMT 2021 riscv64 GNU/Linux

jv@unleashed:~/tmp/jdk/bin$ ./java --version
openjdk 11.0.10-internal 2021-01-19
OpenJDK Runtime Environment (build 11.0.10-internal+0-adhoc.jenkins.openj9-openjdk-jdk11)
Eclipse OpenJ9 VM (build HEAD-f2e8dfa4, JRE 11 
Linux riscv-64-Bit Compressed References 20201210_56 (JIT disabled, AOT disabled)
OpenJ9   - f2e8dfa4
OMR      - 5c72bed
JCL      - 5e81062d6 based on jdk-11.0.10+5)

The next steps include:
1) will recreate a new Debian image on our side to verify the result once @janvrany finishes pushing the fix to the repo at https://github.com/janvrany/riscv-debian/tree/linux-5.6.
2) will do internal tests to see how it goes on Debian 5.6 with the newly patched OpenSBI on U540.
3) already notified the Fedora/RISC-V developers to create a new Fedora image once OpenSBI is branched to a new release version with the latest fix.
4) will do the same internal tests to double-check on the latest Fedora/RISC-V image after 3) to confirm everything still works good.

Any opinions on whether we ought not to rely on support for misaligned accesses like that? Perhaps there's some performance benefit to be had?

@keithc-ca , OpenSBI is the standardized binary interface (aimed to replace BBL as a new kind of BIOS on RISC-V) intended for all OS images & hardware on RISC-V (see https://github.com/riscv/opensbi & https://riscv.org/wp-content/uploads/2019/06/13.30-RISCV_OpenSBI_Deep_Dive_v5.pdf)
image

If we chose not to reply on the fix from OpenSBI, there might be a bunch of code changes everywhere in the interpreter except gotoImpl

BytecodeInterpreter.hpp (17 matches)
6,970: _pc += *(I_16*)(_pc + 1); 
6,991: _pc += *(I_16*)(_pc + 1); 
7,013: _pc += *(I_16*)(_pc + 1); 
7,034: _pc += *(I_16*)(_pc + 1); 
7,055: _pc += *(I_16*)(_pc + 1); 
7,076: _pc += *(I_16*)(_pc + 1); 
7,097: _pc += *(I_16*)(_pc + 1); 
7,118: _pc += *(I_16*)(_pc + 1); 
7,142: _pc += *(I_16*)(_pc + 1); 
7,166: _pc += *(I_16*)(_pc + 1); 
7,190: _pc += *(I_16*)(_pc + 1); 
7,214: _pc += *(I_16*)(_pc + 1); 
7,238: _pc += *(I_16*)(_pc + 1); 
7,262: _pc += *(I_16*)(_pc + 1); 
7,286: _pc += *(I_16*)(_pc + 1); 
7,310: _pc += *(I_16*)(_pc + 1); 
7,329: _pc += *(I_16*)(_pc + 1); 

and there is no way to to determine what happens next in the code depending on where the unaligned load/store opcode is placed in assembly (it could happen anywhere in our code except the interpreter). (The reason for this is that the underlying hardware doesn't support misaligned load/store which is why OpenSBI comes for help)
https://github.com/riscv/riscv-isa-manual/issues/394

Use-Case1: Misaligned load/store emulation in M-mode
The M-mode RUNTIME firmware (OpenSBI) will have to do misaligned load/store
emulation for both HS-mode (Hypervisor) and VS-mode (Guest OS) whenever
underlying HW does not implement misaligned load/store. <-----------

This means M-mode
RUNTIME firmware will do unprivledge load for getting faulting instruction.
This is expensive for traps coming from both HS-mode and VS-mode. Particularly,
it's very expensive for traps coming from VS-mode because unprivledge load
can use nested page table walk to read instruction from memory.

https://riscv.org/wp-content/uploads/2019/06/riscv-spec.pdf

An EEI may guarantee that misaligned loads and stores are fully supported, 
and so the software running inside the execution environment will never experience 
a contained or fatal address-misaligned
trap. In this case, the misaligned loads and stores can be handled in hardware, or via an invisible
trap into the execution environment implementation, or possibly a combination of hardware and
invisible trap depending on address.

I am not sure whether it benefits from any change specific to RISC-V in the interpreter but the first thing comes to my mind is the unexpected behaviour anywhere afterwards which will be extremely difficult in troubleshooting.

@gacholio & @pshipton, any input on this?

This is the key point in the spec:

An EEI may guarantee that misaligned loads and stores are fully supported

Notice it says 'may', not 'must'. As it is, it would seem our VM will simply not run on an EEI that doesn't support misaligned memory accesses.

This is the key point in the spec:

An EEI may guarantee that misaligned loads and stores are fully supported

Notice it says 'may', not 'must'. As it is, it would seem our VM will simply not run on an EEI that doesn't support misaligned memory accesses.

Yes to some extent, which is why OpenSBI needs to address all related misaligned memory accesses from the hardware. Given the code is shared across various platforms and the problem never happens on other platform & hardware except RISC-V, I don't see there is any special reason to handle this situation specific to RISC-V on our side (I kind of doubt whether it can be totally fixed simply by modifying the code above in the interpreter as there might be something else we are unaware of in other code in terms of generated assembly).

For now, I would say that we can only run on systems which allow misaligned access. I'd rather not pollute the code with macros unnecessarily.

Even on systems that allow misaligned access, there may be a performance penalty, but until any such penalty is quantified, I agree we should leave the code alone.

With a newly created Debain5.6 + patched OpenSBI (fix upstreamed to https://github.com/janvrany/riscv-debian/tree/linux-5.6 by @janvrany), I verified our build works good as expected on U540 as follows:
image
and also checked with a sample application of the OpenJDK:
image

Will set up the test environment on U540 to do internal tests to see whether everything still goes fine with the new fix.

I finished the internal tests which worked good on Debain5.6 + patched OpenSBI. Will wait for the new version of Fedora_riscv to double-check.

Was this page helpful?
0 / 5 - 0 ratings