Experimenting with the implementation of JEP331 that was introduced in the Java 12 variant of J9, we found the results are off by quite a lot. Since there seems to be a system to these values, a statistical anomaly can be ruled out. To reproduce I'm attaching a minimal JVMTI agent and a sample that shows the problem. We have also verified that the problem still persists in the most recent build.
Sample:
The sample runs for a minute and alternately allocates two types, i.e., A and B. A contains one long (and is thus 16b - which also corresponds to jvmti getObjectSize) and B contains two longs, (24B). Since the object sampling interval is based on allocated memory (rather than objects) and due to the size ratio of B to A, we expect to get about 60% samples of Bs, and 40% samples of A, i.e., B objects are bigger, so it is more likely that B will be hit by random sample.
Agent:
The agent tags classes A and B on load, and maintains two static counters. It disregards all other objects that might get sampled. At the end, it prints the number of samples, and also the ratio between them. We expect to see something like 40% to 60% (within some margin of error), but we actually see consistently 50% to 50%.
Things we could already rule out:
*) Scalar replacement: We are adding all objects to an array list and clear it periodically to avoid scalar replacement. Also, on shutdown the agent forces a GC and makes a heap dump to show the actual distribution of objects, which shows more than 0 objects (meaning they are really being allocated) and a consistent distribution of 50% to 50%
*) Randomization: Since j9 doesnt randomize the sampling interval, the agent resets the interval with a random value at every sample. If it wouldn't, the samples would be off even more.
*) Interfering objects: Even when the agent samples other objects, the ratios of A to B should remain the same.
Things we already found out:
*) Collections matter: If the ratio is wrong and by how much it is wrong strongly depends on which collection we put the object into. Also the size of the collection matters, for example an array list with 1_000 capacity causes the correct ratios, with 1_000_000 elements the ratios are wrong. With a linked list, the ratios are completely off (0.5% As, 99.5% Bs). We understand that depending on the collection different JIT optimizations may be enabled, but even so, the heap dump we take shows the objects are indeed allocated in every case, meaning the ratios reported are wrong. Maybe there is a JIT optimization that doesn't properly respect heap sampling.
@JasonFengJ9 you were involved in the original implementation of this with @charliegracie.
fyi @dmitripivkine
Targeting to the 0.19 release for now as I'm uncertain how large the work required to address this would be.
It appears due to same cause of https://github.com/eclipse/openj9/issues/6024
I wanted to ask if there are any plans of addressing this problem?
@PhilippLengauer Yes, there's plans to fix it. It's targeted it to the 0.19 milestone which is for the Java 14 release in March of 2020 so it may take a bit but it's definitely on our radar.
@JasonFengJ9 Can you connect with @dmitripivkine and flesh out the plan for how to address this?
Chatted with @dmitripivkine about this, the first step is to try disabling TLH and see if that solves the issue reported and evaluate the performance hit.
Chatted with @dmitripivkine about this, the first step is to try disabling TLH and see if that solves the issue reported and evaluate the performance hit.
This is an experiment only. Final solution should be more advanced.
@LinHu2016 Would you please join this investigation to review what JEP331 exact requirements are and what is inconsistent / missing in current implementation?
Discussed with @LinHu2016 , we are going to run a specjbb 2015 with GC option disableInlineAllocation which essentially forces each allocation OOL but still allocate in TLH, and see what's the performance hit.
option -Xgc:fvtest=disableInlineAllocation can be temporary solution for disabling TLH, run the local test SPECjbb2015, it shows disableInlineAllocation option could impact 11-12% runtime performance(critical-jOPS).
result for -Xmx4G -Xms4G -Xmn1G
RUN RESULT: hbIR (max attempted) = 19881, hbIR (settled) = 16584, max-jOPS = 12127, critical-jOPS = 2990
result for -Xmx4G -Xms4G -Xmn1G -Xgc:fvtest=disableInlineAllocation
RUN RESULT: hbIR (max attempted) = 19881, hbIR (settled) = 16584, max-jOPS = 13917, critical-jOPS = 2637
It seems this won't be resolved for the 0.20 release, moving it forward.
Thanks @LinHu2016 making a test build which improves the allocation sampling.
@PhilippLengauer do you mind to give a try this personal build (Linux x86)? If you can provide an email address, I can share a box link, otherwise please suggest another way to share the file ~260M.
@JasonFengJ9 can't you create a link that is visible to everyone? It doesn't need to be private.
Here is the link https://ibm.box.com/s/f7ueudpuzv65huo6fl4l78zhec9sd239
Please advise if it is not accessiable.
I've downloaded it and will run our tests.
Thanks for the build, i could access it and i'll run the reproducer and our more extensive tests soon. I'll let u know.
I checked, the reproducer yields correct ratios and also our more extensive tests report correct ratios now. Thanks a lot for the fix.
On a similar note, however, now that the ratios are correct, the total extrapolated amount of allocations per type is off by a huge amount. Meaning when we take the configured sampling interval, and start estimating the total allocations based on the amount of events, the configured sampling interval, and the size of individual objects, we are off by multiple orders of magnitude. Is there anything that comes to your mind that could cause that? if u want i can create another minimal reproducer and attach it here (or file a new ticket - because technically its not about ratios anymore, but about absolute values).
@PhilippLengauer thanks the note.
the total extrapolated amount of allocations per type is off by a huge amount.
Is this a regression from existing build w/o this test fix? If not, please create a new issue instead. The fix for the ratios could be submitted first.
yes, i can confirm it actually is a regression. with the build publicly downloadable our agent has an accuracy of about +/-1.5% in our tests when its extrapolating absolute allocation counts. Now, we are seeing the numbers being several orders of magnitude off. Ill strip our test and agent to a minimal reproducer and attach it here if that is ok for u?
EDIT: i should clarify: in the publicly downloadable build the ratios where off by a lot (see issue description) but the total number of objects was about right. now the ratios are correct, but the total number (and thus also the number per type) is off.
Ill strip our test and agent to a minimal reproducer and attach it here if that is ok for u?
@PhilippLengauer pls create such a testcase to demonstrate the regression.
I'm attaching the new reproducer, it contains 3 different samples. Every of the sample executes a dedicated allocation pattern for one minute, the agent (like the previous one) then prints the total number of allocations per type. the agent prints on shutdown the extrapolated number of allocations per type as well as the ratios between them. Similarly to the first agent, the agent also executes a heap dump to make sure the objects are really allocated.
The first sample allocates a single type over and over again.
The second sample allocates three different types of the same size.
The third sample allocates two types, one being 24 bytes, and one 16 bytes. This sample is similar to the original one, in that in theory we should hit more samples for the bigger type (because sampling is based on bytes) but the extrapolation should then yield a 1:1 ratio again in terms of count.
I've also included the output for the most recent adopt hotspot, the most recent adopt j9, and the private build from above.
Hotspot shows in the first sample an actual absolute count of 659.4 million objects, vs extrapolated 659.7 million, so pretty close. In the second sample, 257 million objects of every type have been allocated, extrapolation shows in the worst case 248 million, which means the numbers are 3.5% off. In the third sample, absolute allocations are 387 million vs 380 million, so about 1.8% off. the ratios are in all samples under 1% off.
In J9 public build, the first sample shows absolute allocations of 592.6 million vs 585.4 million, so the error here is already bigger. The second sample is similar in that regard. In the third sample, the the ratios are off by 10%. We can see that even though the object sizes are different, the sample counts are almost the same, which shouldnt be! This is basically the original reported issue.
In j9 private build, the first sample shows absolute allocations of 660 million, vs 1.9 million, so thats already off by quite a lot! The second and third sample show errors of similar magnitudes. Also the ratios are still off by quite a lot in the third sample.
Please reach out to me if u have any questions.
Thanks the detailed note, @PhilippLengauer
Run the test build a few times with ReproducerV2.zip, and got quite a few Extrapolation: -nan% : -nan% : -nan% : -nan% : -nan% as following sample output:
openjdk version "11.0.7-internal" 2020-04-14
OpenJDK Runtime Environment (build 11.0.7-internal+0-adhoc.jenkins.BuildJDK11x86-64linuxPersonal)
Eclipse OpenJ9 VM (build JEP331-8372ac552, JRE 11 Linux amd64-64-Bit Compressed References 20200330_1782 (JIT enabled, AOT enabled)
OpenJ9 - 8372ac552
OMR - d49ba311b
JCL - 34c3dd7d55b based on jdk-11.0.7+9)
##########################
ALLOCATION PATTERN: 1 Sample$A
ALLOCATIONS: 7603062 Sample$A
SINK: 1419404382285
0x A - extrapolated 0x
0x B - extrapolated 0x
0x C - extrapolated 0x
0x D - extrapolated 0x
0x E - extrapolated 0x
Extrapolation: -nan% : -nan% : -nan% : -nan% : -nan% (603064)
Verification : 100.00% : 0.00% : 0.00% : 0.00% : 0.00% (603064)
##########################
ALLOCATION PATTERN: 1 Sample$A, 1 Sample$B, 1 Sample$C 0
ALLOCATIONS: 7635542 Sample$C; 7635542 Sample$B; 7635542 Sample$A
SINK: 1726851967497
0x A - extrapolated 0x
0x B - extrapolated 0x
0x C - extrapolated 0x
0x D - extrapolated 0x
0x E - extrapolated 0x
Extrapolation: -nan% : -nan% : -nan% : -nan% : -nan% (906628)
Verification : 33.33% : 33.33% : 33.33% : 0.00% : 0.00% (906628)
##########################
ALLOCATION PATTERN: 1 Sample$D, 1 Sample$E
ALLOCATIONS: 7721787 Sample$E; 7721787 Sample$D
SINK: -853527405606
0x A - extrapolated 0x
0x B - extrapolated 0x
0x C - extrapolated 0x
0x D - extrapolated 0x
1x E - extrapolated 43690x
Extrapolation: 0.00% : 0.00% : 0.00% : 0.00% : 100.00% (443576)
Verification : 0.00% : 0.00% : 0.00% : 50.00% : 50.00% (443576)
Is this expected?
Not sure if this is due to the change introduced by the build, but I do observe different output in a few runs (some with an actual number while others are just -nan%), is there a way to get consistent output?
@JasonFengJ9 you're welcome. since its sampling, we cannot make a test that is 100% always the same. The log files inside the new reproducer is exactly the output, without any changes.
The number next to the types always say 0. that means that literally 0 events are received. Meaning even though the capability has been successfully requested and the jvmti event has been turned on, no events are sent by the JVM. I've rerun the reproducer and everything is good on my machine.
Maybe on your system a GC is automatically selected that doesnt support JEP331. See https://github.com/eclipse/openj9/issues/9045
Compare j9.private.log (run with the test build https://github.com/eclipse/openj9/issues/7740#issuecomment-606958788) with j9.public.log (run with an adopt JDK11 build w/o the test fix)
j9.private.log | j9.public.log
-- | --
ALLOCATION PATTERN: 1 Sample$A | ALLOCATION PATTERN: 1 Sample$A
ALLOCATIONS: 660372211 Sample$A | ALLOCATIONS: 592697166 Sample$A
29x A - extrapolated 1900544x | 8934x A - extrapolated 585498624x
-- | --
ALLOCATION PATTERN: 1 Sample$A, 1 Sample$B, 1 Sample$C 0 | ALLOCATION PATTERN: 1 Sample$A, 1 Sample$B, 1 Sample$C 0
ALLOCATIONS: 421084996 Sample$C; 421084996 Sample$B; 421084996 Sample$A | ALLOCATIONS: 372946728 Sample$C; 372946728 Sample$B; 372946728 Sample$A
40x A - extrapolated 2621440x | 5468x A - extrapolated 358350848x
28x B - extrapolated 1835008x | 5632x B - extrapolated 369098752x
51x C - extrapolated 3342336x | 5736x C - extrapolated 375914496x
-- | --
ALLOCATION PATTERN: 1 Sample$D, 1 Sample$E | ALLOCATION PATTERN: 1 Sample$D, 1 Sample$E
ALLOCATIONS: 499124243 Sample$E; 499124243 Sample$D | ALLOCATIONS: 454054334 Sample$E; 454054334 Sample$D
78x D - extrapolated 5111808x | 8365x D - extrapolated 548208640x
81x E - extrapolated 3538944x | 8652x E - extrapolated 378011648x |
The number in bold is the number of sampling events received.
Note: these two files are provided at https://github.com/eclipse/openj9/issues/7740#issuecomment-610831022. I did a few runs locally and the result seems similar.
It appears the test build triggered much less sampling events than existing builds for this specific testcase.
@LinHu2016 any insights?
fyi @dmitripivkine
I will try to debug
run the test build with jvm option -Xgc:fvtest=disableInlineAllocation , the result looks good, I think it missed disableCachedAllocations() during initialization when disableInlineAllocationForSamplingBytesGranularity = true in code. I will check the code if I missed the cases later.
openjdk version "11.0.7-internal" 2020-04-14
OpenJDK Runtime Environment (build 11.0.7-internal+0-adhoc.jenkins.BuildJDK11x86-64linuxPersonal)
Eclipse OpenJ9 VM (build JEP331-8372ac552, JRE 11 Linux amd64-64-Bit Compressed References 20200330_1782 (JIT enabled, AOT enabled)
OpenJ9 - 8372ac552
OMR - d49ba311b
JCL - 34c3dd7d55b based on jdk-11.0.7+9)#
ALLOCATION PATTERN: 1 Sample$A
ALLOCATIONS: 128871075 Sample$A
SINK: -88564897140028
1994x A - extrapolated 130678784x
0x B - extrapolated 0x
0x C - extrapolated 0x
0x D - extrapolated 0x
0x E - extrapolated 0x
Extrapolation: 100.00% : 0.00% : 0.00% : 0.00% : 0.00% (871075)
Verification : 100.00% : 0.00% : 0.00% : 0.00% : 0.00% (871075)#
ALLOCATION PATTERN: 1 Sample$A, 1 Sample$B, 1 Sample$C 0
ALLOCATIONS: 60042768 Sample$C; 60042768 Sample$B; 60042768 Sample$A
SINK: 15982455724179
943x A - extrapolated 61800448x
908x B - extrapolated 59506688x
915x C - extrapolated 59965440x
0x D - extrapolated 0x
0x E - extrapolated 0x
Extrapolation: 34.09% : 32.83% : 33.08% : 0.00% : 0.00% (128304)
Verification : 33.33% : 33.33% : 33.33% : 0.00% : 0.00% (128304)#
ALLOCATION PATTERN: 1 Sample$D, 1 Sample$E
ALLOCATIONS: 77701908 Sample$E; 77701908 Sample$D
SINK: 6038799506974
0x A - extrapolated 0x
0x B - extrapolated 0x
0x C - extrapolated 0x
1184x D - extrapolated 77594624x
1797x E - extrapolated 78512128x
Extrapolation: 0.00% : 0.00% : 0.00% : 49.71% : 50.29% (403816)
Verification : 0.00% : 0.00% : 0.00% : 50.00% : 50.00% (403816)
I have new test build, the below is the result of running ReproducerV2, @PhilippLengauer please verify if the result is your expected, Thanks
openjdk version "11.0.7-internal" 2020-04-14
OpenJDK Runtime Environment (build 11.0.7-internal+0-adhoc.jenkins.BuildJDK11x86-64linuxPersonal)
Eclipse OpenJ9 VM (build JEP331-7eb88e2d5, JRE 11 Linux amd64-64-Bit Compressed References 20200412_1869 (JIT enabled, AOT enabled)
OpenJ9 - 7eb88e2d5
OMR - 88faafd2d
JCL - 58d6fa1862f based on jdk-11.0.7+9)#
ALLOCATION PATTERN: 1 Sample$A
ALLOCATIONS: 135718403 Sample$A
SINK: -55568655769625
2104x A - extrapolated 137887744x
0x B - extrapolated 0x
0x C - extrapolated 0x
0x D - extrapolated 0x
0x E - extrapolated 0x
Extrapolation: 100.00% : 0.00% : 0.00% : 0.00% : 0.00% (718403)
Verification : 100.00% : 0.00% : 0.00% : 0.00% : 0.00% (718403)#
ALLOCATION PATTERN: 1 Sample$A, 1 Sample$B, 1 Sample$C 0
ALLOCATIONS: 60736107 Sample$C; 60736107 Sample$B; 60736107 Sample$A
SINK: -214067903698992
930x A - extrapolated 60948480x
928x B - extrapolated 60817408x
960x C - extrapolated 62914560x
0x D - extrapolated 0x
0x E - extrapolated 0x
Extrapolation: 33.00% : 32.93% : 34.07% : 0.00% : 0.00% (208321)
Verification : 33.33% : 33.33% : 33.33% : 0.00% : 0.00% (208321)#
ALLOCATION PATTERN: 1 Sample$D, 1 Sample$E
ALLOCATIONS: 78321002 Sample$E; 78321002 Sample$D
SINK: 10800227454054
0x A - extrapolated 0x
0x B - extrapolated 0x
0x C - extrapolated 0x
1207x D - extrapolated 79101952x
1791x E - extrapolated 78249984x
Extrapolation: 0.00% : 0.00% : 0.00% : 50.27% : 49.73% (642004)
Verification : 0.00% : 0.00% : 0.00% : 50.00% : 50.00% (642004)
@PhilippLengauer please try the test build OpenJ9-JDK11-x86-64_linux-20200412-192815.tar.gz at
https://ibm.box.com/s/z75mlew29acgq166fxh5hbdh1g3iorh6
Hi, sorry for the late reply, ill check
@JasonFengJ9 the link is dead for me, it says "file has been removed or is unavailable to you"
@PhilippLengauer could you give following link a try?
OpenJ9-JDK11-x86-64_linux-20200412-192815.tar.gz
https://ibm.box.com/s/jj1uagr0d6uxvf9lg9zcazc2qdmrb4be
this one works, downloading now ...
ive run both the reproducer and our tests, all looks good with that build. awesome, thanks!
which versions will contain the fix? i guess the most recent 11 and 15?
@PhilippLengauer great to hear that!
The fix is expected to in next update/release of JDK11+ subject to reviewing and build process.
@LinHu2016 could you submit the GC PR?
I am going to add additional j9gc_set_allocation_sampling_interval with maximum UDATA to indicate that the sampling is disabled.
VM/GC PRs can be submitted separately.
@JasonFengJ9 OK, I will summit omr and OpenJ9 PRs for review