Openj9: Crash in OMR::ValuePropagation::buildBoundCheckComparisonNodes in a PR build

Created on 3 Apr 2020  路  14Comments  路  Source: eclipse/openj9

The test in [1] observed two crashes as per https://github.com/eclipse/openj9/pull/9096#issuecomment-608558042 which I will quote here:

The backtrace for the jit_jar_0 failure:

OMR::ValuePropagation::buildBoundCheckComparisonNodes(OMR::ValuePropagation::BlockVersionInfo*, List<TR::Node>*) 
OMR::ValuePropagation::versionBlocks()
TR::LocalValuePropagation::postPerformOnBlocks()
OMR::Optimizer::performOptimization(OptimizationStrategy const*, int, int, int)
OMR::Optimizer::performOptimization(OptimizationStrategy const*, int, int, int)
OMR::Optimizer::optimize()
OMR::Compilation::compile()
TR::CompilationInfoPerThreadBase::compile(J9VMThread*, TR::Compilation*, TR_ResolvedMethod*, TR_J9VMBase&, TR_OptimizationPlan*, TR::SegmentAllocator const&)

The backtrace for the StringPeepholeTest_0 failure

OMR::ValuePropagation::buildBoundCheckComparisonNodes(OMR::ValuePropagation::BlockVersionInfo*, List<TR::Node>*) 
OMR::ValuePropagation::versionBlocks()
TR::GlobalValuePropagation::perform()
OMR::Optimizer::performOptimization(OptimizationStrategy const*, int, int, int)
OMR::Optimizer::performOptimization(OptimizationStrategy const*, int, int, int)
OMR::Optimizer::optimize()
OMR::Compilation::compile()
TR::CompilationInfoPerThreadBase::compile(J9VMThread*, TR::Compilation*, TR_ResolvedMethod*, TR_J9VMBase&, TR_OptimizationPlan*, TR::SegmentAllocator const&)

I don't see any recent changes in this area so these are likely preexisting problems which are now being exposed.

[1] https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_sanity.functional_s390x_linux_Personal/407/

bug jit

All 14 comments

According to [1] we are starting to see this in our automated builds now as well on x86 and Power. @andrewcraik @cathyzhyi have there been any recent changes in area of VP in the last week that you know of? This seems to have started occurring very recently. Perhaps if someone made changes in the area they can help take a look as there is a regression here.

[1] https://openj9.slack.com/archives/CDS7QE9HB/p1586206238032300

@klangman I think this may be related to the following PR:
https://github.com/eclipse/omr/pull/4598/files#diff-0a9a54491cd0a54a9ff1f090bf473a66R6415-R6446

I see this function was changed here and since that change has propagated we have started seeing these failures. Could you please help take a look as this seems to be readily failing?

Grinder shows 3/10 failure rate:
https://ci.eclipse.org/openj9/job/Grinder/753/consoleFull

I'll try to reproduce locally and get a unit test for debugging.

I was going to say the change from @klangman is the only recent change.

Reproduced 6/10 times in a grinder with logging:
https://ci.eclipse.org/openj9/job/Grinder/758/

Ran with:

-Xcompressedrefs -XX:-EnableHCR -Xjit:'count=0,optLevel=hot,{java/util/GregorianCalendar.computeFields(II)I}(tracefull,traceGlobalVP,traceLocalVP,log=computeFields.log)'

This should be good enough to start the investigation. @klangman could you help take a look given you have context having written the code?

@klangman could you help take a look given you have context having written the code?

Sure, but I have a critical issue I am working on right now, I'll try to get to this later in the day.
The current thinking is that this failure is unrelated to OMR #4598 right?

Sure, but I have a critical issue I am working on right now, I'll try to get to this later in the day.
The current thinking is that this failure is unrelated to OMR #4598 right?

That's understandable. I've tagged it to get fixed for the next release so we don't forget about it. The current thinking is that the issue is a direct result of eclipse/omr#4598 which modified the exact same function we are now crashing in.

The current thinking is that the issue is a direct result of eclipse/omr#4598 which modified the exact same function we are now crashing in.

Oh.. OK.. I was a bit confused by Peter's comment. Thought I was off the hook :-)

Looks like the same problem that was fixed by https://github.com/eclipse/omr/pull/5013

Is that fix in this build? I guess that's what I have to look at next.

I don't believe it's within the powers of mortal men to determine what PR's were included in that build. So all I can say is that the core file looks to me to be an example of the issue that I fixed with https://github.com/eclipse/omr/pull/5013.

I think we're talking about the PR build? https://ci.eclipse.org/openj9/job/Build_JDK8_s390x_linux_Personal/589/

15:51:18  OpenJDK Runtime Environment (build 1.8.0_252-internal-jenkins_2020_04_02_15_38-b00)
15:51:18  Eclipse OpenJ9 VM (build HEAD-0e5ad2414, JRE 1.8.0 Linux s390x-64-Bit Compressed References 20200402_589 (JIT enabled, AOT enabled)
15:51:18  OpenJ9   - 0e5ad2414
15:51:18  OMR      - 8b9f8133e
15:51:18  JCL      - 4f7bf238294 based on jdk8u252-b08)

https://github.com/eclipse/omr/pull/5013 sha is e3cb67f

git merge-base --is-ancestor <commit> <version sha> && echo yes
git merge-base --is-ancestor e3cb67f 8b9f8133e && echo yes

The change isn't included in that build.

Thanks for the investigation and help. Closing this one off as it is fixed in upstream. We can reopen if we spot a reproduction.

Was this page helpful?
0 / 5 - 0 ratings