Openj9: Shared cache hints for GC heap size

Created on 20 Nov 2018  路  64Comments  路  Source: eclipse/openj9

Remembering the previous heap size settings (-Xmn, -Xmo) after startup can provide a significant startup benefit in subsequent runs, and can improve footprint as well. The GC data can be stored in the shared cache as a hint.
https://unbscholar.lib.unb.ca/islandora/object/unbscholar%3A8100/datastream/PDF/view
https://ieeexplore.ieee.org/document/8121911

Since a shared cache can be used to run more than one application, which may have different heap requirements, the hint should be associated with an application, or at least a main class.

As by design the GC is initialized before the shared cache, new GC APIs will be needed to adjust the GC heap parameters after initialization. Since the heap parameters can be adjusted before any objects are created, it can happen with very low cost.

Doc issue https://github.com/eclipse/openj9-docs/issues/324

gc vm externals

All 64 comments

@vijaysun-omr @mpirvu @amicic @hangshao0 fyi

I guess we are doing this only on gencon ?

Maybe balanced GC too ? We have'nt done any experimenting with balanced GC to say if/how much it helps but I'd like to think that we need to stop treating balanced as a second class citizen if it's not a huge amount of extra work.

bin/java -Xgcpolicy:balanced -verbose:sizes

  -Xmns8M         initial new space size
  -Xmnx128M       maximum new space size

  -Xmos8M         initial old space size
  -Xmox512M       maximum old space size

-Xmn is relevant in Balanced as much as in Gencon (although they have a bit differen meaning: total Nursery for Gencon which is both Allocate and Survivor vs only Allocate (actually called Eden) in Balanced). Bottom line, we should treat them same (we should set Xmns based on recommendation stored in SC)

I thought that -Xmo had no semantical meaning in Balanced, but we do seem to obey the command and do something about it. I just did a quick test and what I can tell is that effectively ends up affecting the total heap sizing (pretty much acting as Xmx/Xms commands). See for example this:

./java -verbose:gc,sizes -Xmx64M -Xmos32M -Xgcpolicy:gencon

-Xmns10880K initial new space size
-Xmnx16M maximum new space size
-Xms43648K initial memory size
-Xmos32M initial old space size
-Xmox54656K maximum old space size
-Xmx64M memory maximum

vs

./java -verbose:gc,sizes -Xmx64M -Xmos32M -Xgcpolicy:balanced

-Xmns16M initial new space size
-Xmnx16M maximum new space size
-Xms32M initial memory size
-Xmos32M initial old space size
-Xmox64M maximum old space size
-Xmx64M memory maximum

While, having different meaning for Gencon and Balanced (old space vs all space), I think it would still be ok to use it.

In short, -Xmns and Xmos could be used for both GC policies to uniquely determine initial total heap and division between Nursery/Eden vs the rest of the heap.

However options should be applied wisely because it might depend on GC policy they were taken. Should stored values be accompanied by GC policy or we can make them invariant?

Yes, we'll have to work out all the relevant environment options that need to match. Or we could store the entire command line.

  • application (main class name)
  • gc policy
  • Xmx

We should design the data record to be extensible in the future. I think there are other values that would be interesting to store from run to run.

Another point to consider: Most of the information stored in SCC is read only, one notable exception being the JIT hints. For most flexibility it would be best to allow RW access to these GC hints.

We can store the main class name and GC policy. If we want to store Xmx, we might want to store Xms as well. There could be so many combinations of Xmx and Xms.

Probably we should turn this feature off if user specified any of -Xmns/-Xmnx/-Xmn/-Xmos/-Xmox/-Xmo/-Xms/-Xmx. I guess -Xmoi, -Xmine, -Xmaxe also matter here ?

If we want to store Xmx, we might want to store Xms as well. There could be so many combinations of Xmx and Xms.

The suggestion to store -Xmx was for the purpose of validating the hint. i.e. a different -Xmx invalidates the hint. However, I'm leaning towards storing the entire command line. As long as the command line remains the same, the gc sizing hints for that command line remain valid. There can be different hints stored for different command lines. I think storing the entire command line should be the first approach. Afterwards we could consider filtering out specific command line options as not being relevant to the GC hints, but not sure this is a necessary feature. During production I expect the command lines don't change.

Probably we should turn this feature off if user specified any of -Xmns/-Xmnx/-Xmn/-Xmos/-Xmox/-Xmo

Agreed. If the values are set explicitly then they shouldn't be overridden.

Here is an example of the entire command line options if I run a simple app:

java -verbose:init --module-path /bluebird/builds/bld_403141/jvmtest/test/SE90/functional/cmdLineTests/utils/utils.jar -m utils/org.openj9.test.ivj.Hanoi 2

Option 0 optionString="-Xoptionsfile=/team/hangshao/JVM29/jvmxa6490/lib/options.default" extraInfo=(nil) from environment variable ="N/A"
Option 1 optionString="-Xlockword:mode=default,noLockword=java/lang/String,noLockword=java/util/MapEntry,noLockword=java/util/HashMap$Entry,noLockword=org/apache/harmony/luni/util/ModifiedMap$Entry,noLockword=java/util/Hashtable$Entry,noLockword=java/lang/invoke/MethodType,noLockword=java/lang/invoke/MethodHandle,noLockword=java/lang/invoke/CollectHandle,noLockword=java/lang/invoke/ConstructorHandle,noLockword=java/lang/invoke/ConvertHandle,noLockword=java/lang/invoke/ArgumentConversionHandle,noLockword=java/lang/invoke/AsTypeHandle,noLockword=java/lang/invoke/ExplicitCastHandle,noLockword=java/lang/invoke/FilterReturnHandle,noLockword=java/lang/invoke/DirectHandle,noLockword=java/lang/invoke/ReceiverBoundHandle,noLockword=java/lang/invoke/DynamicInvokerHandle,noLockword=java/lang/invoke/FieldHandle,noLockword=java/lang/invoke/FieldGetterHandle,noLockword=java/lang/invoke/FieldSetterHandle,noLockword=java/lang/invoke/StaticFieldGetterHandle,noLockword=java/lang/invoke/StaticFieldSetterHandle,noLockword=java/lang/invoke/IndirectHandle,noLockword=java/lang/invoke/InterfaceHandle,noLockword=java/lang/invoke/VirtualHandle,noLockword=java/lang/invoke/PrimitiveHandle,noLockword=java/lang/invoke/InvokeExactHandle,noLockword=java/lang/invoke/InvokeGenericHandle,noLockword=java/lang/invoke/VarargsCollectorHandle,noLockword=java/lang/invoke/ThunkTuple" extraInfo=(nil) from environment variable ="N/A"
Option 2 optionString="-Xjcl:jclse9_29" extraInfo=(nil) from environment variable ="N/A"
Option 3 optionString="-Dcom.ibm.oti.vm.bootstrap.library.path=/team/hangshao/JVM29/jvmxa6490/lib/amd64/compressedrefs:/team/hangshao/JVM29/jvmxa6490/lib/amd64" extraInfo=(nil) from environment variable ="N/A"
Option 4 optionString="-Dsun.boot.library.path=/team/hangshao/JVM29/jvmxa6490/lib/amd64/compressedrefs:/team/hangshao/JVM29/jvmxa6490/lib/amd64" extraInfo=(nil) from environment variable ="N/A"
Option 5 optionString="-Djava.library.path=/team/hangshao/JVM29/jvmxa6490/lib/amd64/compressedrefs:/team/hangshao/JVM29/jvmxa6490/lib/amd64:/usr/local/cuda-5.5/lib64:.:/usr/lib64:/usr/lib" extraInfo=(nil) from environment variable ="N/A"
Option 6 optionString="-Djava.home=/team/hangshao/JVM29/jvmxa6490" extraInfo=(nil) from environment variable ="N/A"
Option 7 optionString="-Duser.dir=/team/hangshao/JVM29/jvmxa6490/bin" extraInfo=(nil) from environment variable ="N/A"
Option 8 optionString="-Djava.runtime.version=pxa6490ea-20170614_01" extraInfo=(nil) from environment variable ="N/A"
Option 9 optionString="-verbose:init" extraInfo=(nil) from environment variable ="N/A"
Option 10 optionString="--module-path=/bluebird/builds/bld_403141/jvmtest/test/SE90/functional/cmdLineTests/utils/utils.jar" extraInfo=(nil) from environment variable ="N/A"
Option 11 optionString="-Djdk.module.main=utils" extraInfo=(nil) from environment variable ="N/A"
Option 12 optionString="-Dsun.java.command=utils/org.openj9.test.ivj.Hanoi 2" extraInfo=(nil) from environment variable ="N/A"
Option 13 optionString="-Dsun.java.launcher=SUN_STANDARD" extraInfo=(nil) from environment variable ="N/A"
Option 14 optionString="-Dsun.java.launcher.pid=21548" extraInfo=(nil) from environment variable ="N/A"
Option 15 optionString="_org.apache.harmony.vmi.portlib" extraInfo=0x7f278000ca20 from environment variable ="N/A"

There are so many default options prepended/appended, which are not related to this feature at all. The old space and new space sizes are only two numbers, but the entire CML is such a long string. I guess it is not worth storing the whole CML, Also there is an option -Dsun.java.launcher.pid, which will be different from run to run.

If GC is going to check for the presence of -Xmns/-Xmnx/-Xmn/-Xmos/-Xmox/-Xmo/-Xmx/-Xms/... and gc policy to decide whether to turn off this feature, storing the main class probably should be sufficient.

Just the main class isn't sufficient, other parameters such as -Xmx, -Xms and gcpolicy need to be stored as well. If they change then the hint is invalidated. We can try to find a balance between parameters that matter and parameters that don't. We can filter all the default options out of the parameter list and maybe some other specific parameters as well, but in general I think if options are modified then the behavior of the app can change and invalidate the hint.

I will save the new/old space sizes as well as the following info:
main module/main class (sun.java.command)
-Xgc and -Xgcpolicy
-Xmx
-Xms
-Xsoftmx
-Xmoi
java.class.path
jdk.module.path

Do you see any GC options that are missing here ? @amicic @dmitripivkine

As @pshipton suggested, I would not even try to recognize various -Xm? options (or any other option). If anything in options changed, the hints would be invalidated. It would be, in general, complicated to try to interpret -Xm? options to validate that they effectively mean the same initial/total heap sizing, even though the option themselves are different.

For example, these two things are effectively same:
1) -Xmx4G -Xms4G
2) -Xmx4G -Xms4G -Xmn1G
but I would still invalidate the hints.

The hints for initial heap sizing would come from internal API and have no relationship with the options. We are yet to agree what API is to be used, but effectively we need two:

  • get/set total initial heap size (what normally is controlled by -Xms option)
  • get/set Nursery/Eden initial heap size (what normally is controlled by -Xmns option).
    On the first run those would be default ones (8M/2M respectively for total/Nursery), but on subsequent, they would be higher values, as suggested by SC hints.

I believe we need keep GC policy as well

If anything in options changed, the hints would be invalidated.

+1 this to approach. For the initial implementation, we should just validate the commandline is the same.

I believe we need keep GC policy as well

Yes, I should say -Xgc and -Xgcpolicy

OK. Then I am going the save the entire command line.

@hangshao0 Any update on this? We're getting close to the reality check date for the 0.12.0 milestone

Any update on this?

Still writing the code. I guess it might take me 1 - 2 more days for the first set of VM code change. Once it is reviewed and merged, GC needs to change their code. Then VM can start storing the GC hints and enable this feature.

I think we need some new tests as well.

Note the shared cache part of this #3908 is merged

GC changes are in #4168

The feature isn't enabled by default so I think we should keep this issue open, however the work is completed for the 0.12 release, in particular since the shared cache is no longer enabled by default.

What is the option to enable this feature with a v0.12 OpenJ9 build ? We would like to try it on some tests to see the behaviour.

@vijaysun-omr see #4168 for details
-XXgc:heapSizeStatupHintWeightNewValue=
-XXgc:heapSizeStatupHintConservativeFactor=
You also need to enable shared classes, either normally or via -Xshareclasses:bootClassesOnly

@vijaysun-omr In short, just add -XXgc:heapSizeStatupHintWeightNewValue=80.

Longer story:
1) the option is to be renamed soon (https://github.com/eclipse/openj9/pull/4240)
2) we'll likely introduce another option that will be documented just to enable/disable the feature
3) Balanced GC is not covered yet
4) we need to check/test if there is a race at startup between expand-due-to-hints vs first GC, and potentially take some remedies
5) we continuously update the hints on every restart. Hint values typically continue growing, and it may take a few updates to converge to a stable value (this is what we need feedback for)
6) updates acquire a global SC lock and there might be contention in case of high number of VMs being started at about the same time. If there are real life problems we may limit the number of updates or remove them altogether (just create the hint on first start and never update).
7) in generational configuration, Tenure and Nursery hints are independently maintained. There is sort of an anomaly that Nursery hint is aggressively growing (on restarts), while Tenure hint grows on first restart buy may continue to decline on subsequent ones. This is because with large Nursery all early created object end up staying for long time there, making first few Scavenges expensive and wanting to expand even more. On the other side, it takes longer time before we start Tenuring and there is less need to expand Tenure. Perhaps this could be compensated with decreasing Tenure age threshold with larger Nursery (initial threshold is always 10, no matter what heap size is), but is sort of a independent issue to investigate.

Note also that command line option must fully match for hints to be taken into account. It also includes implicit options induced by JCL. It's been observed that order of these options may change (probably due to some startup race), which would silently ignore the existing hint and would update them with completely fresh values. Then it will take again a few more restarts for them to converge (assuming the order of options stay same).

With multiple VM starting at the same time, values will not converge. It must be a VM run that will read values (and expand on them) after the previous run updated the values for the process of converging to make progress. For simultaneous VM starts, all of them read the same value, and all of them will update with their new value, but effectively only the slowest/last one that updated will 'win'.

If we are having -XX options, presumably this requires documentation? If so, please add the doc:externals label.

[EDIT: non-issue]
Another issue to think about is how to deal with heterogeneous multi JVM environment. Take a simple scenario of master and slave JVM.

They will certainly have different command line options, and even though there might be single anonymous shared cache, then while starting slave (after starting master) the hints will not be used.

If both master and slave are restated at some point later (and brought up in same order - master first and slave second), again, no hints will be used at any of the restarts.

If only one of them is being restarted (more likely slave), hints will be used, but this is not very realistic scenario.

Simple solutions are:

  • to create separate caches for master and slave (which may make sense for some other reasons, too)
  • to disable the hints updating on JVMs that would not benefit very much from them (this would be the same public option mentioned a few comments before), so that only one of them (more likely slave in this scenario) is updating/consuming them.

I don't understand the following statement. Why aren't hints saved and used on restart?

If both master and slave are restated at some point later (and brought up in same order - master first and slave second), again, no hints will be used at any of the restarts.

Hints are saved always, but not used because of command line mismatch. When master and slave are started in this order, and slave is the last one that saved the hint and then both are restarted in the same order, then master will try to read the hints that slave saved and ignore them. Then shortly after when slave also is brought up it may not be able to read them, if master just updated them.

This isn't (or shouldn't be) the way it works. Every different set of command lines has it's own hints, which can be used independently. i.e. master will use the hints for the master command line, and similarly the slave.

Agreed, it would be very nice to have that capability, but I don't think that's in place, is it?

It should already work that way. Unless you have seen different behavior?

On closer inspection it does seem to work. I apologize for the confusion.

Is there some limit on the number of different command lines that we can store GC hints for in a given SCC ? fyi @mpirvu because we were discussing this today.

@vijaysun-omr the only limit is the size of the cache. We wondered if we needed to add a limit, but not sure if it is necessary. Likely there will be something that changes the command line all the time, but otherwise there probably aren't that many variations.

It's been observed that order of these options may change (probably due to some startup race)

@amicic what options changed? Something to do with the VM that we can fix, or something to do with the application?

@pshipton we (Marius and I) were also wondering about the same thing, i.e. potential issues with storing as many distinct paths as can be fit. One case that we have seen change the path (directory where java is installed) continuously is Hadoop when the master installs workers on several machines repeatedly and uses slightly different directory names every time.

In a case like that there is no benefit to storing the hints. It would be better to either disable them, or only store a subset of the command line which doesn't change.

Do you have an example Hadoop command line?

I didn't keep the Hadoop command line but the pattern is that the master sends the jar files through the network to the slaves, and the slaves create a temporary directory and copy the jar files there (example: /data6_p2/bi18/mapred/local/taskTracker/biadmin18/jobcache/job_201403251225_0001/jars)
Hadoop was the reason we added the -Xshareclasses:restrictClassPaths option.

For this specific case, we could store a subset of the command line when -Xshareclasses:restrictClassPaths is specified, and ignore the -cp option.

Please open an issue at the docs repo when you have agreed on the externals for this item: https://github.com/eclipse/openj9-docs/issues/new?template=new-documentation-change.md

@amicic we'd like to ensure this gets delivered for the 0.15 milestone. Do you have an outlook for completion?

@amicic gentle ping. Can you update this with the outlook?

With some luck there is not much work left:

  • make sure everything is still in order. I already tried it briefly a few days ago and while it seemed to work most of the time, it seemed like it would intermittently lose hints. Need to investigate more.
  • introduce a clean option to enable/disable the feature (right now I control it with an internal tuning/weight parameter which is set to 0, which effectively disables the feature).
  • test race conditions between early first GC and heap expansion based on hints
  • ideally, test high JVM count simultaneous startup scenario (SC lock contention or anything else) - I'm unlikely to do anything about it, since I don't have such scenario available.

Balanced GC will not be covered in the first release.

test high JVM count simultaneous startup scenario

This is just a matter of starting a number of JVMs at approximately the same time, such as starting up 20 JVMs running the SwingSet demo.

The option has been added for 0.15. Moving to 0.16 to consider enabling by default

I've done some basic testing to confirm everything is order, and there are no races with first GC.

@amicic has the concern about always writing to the cache been addressed? i.e. the GC code will only attempt to write a hint if it has calculated a different value than has already been stored to the cache? If this is addressed, I suggest we enable the feature by default just after branching for the 0.15 release, to give time to identify any problems before the next release.

The hint is always updated. Values will almost always be different. I have not done anything to test if there is a problem with excessive updates.
We could still try to reduce it, blindly. Perhaps update it if 1) it differs by some margin (>10%) or 2) with low probability (10%) even if within the margin. The latter is to ensure it's updated in very gradual run-to-run load change.

We shouldn't enable the feature by default until this has been resolved. Writing to the shared cache on every startup affects starting up multiple JVMs concurrently, as each JVM will serialize on the write to the cache.

A multiJVM test... I used a fairly simple workload (single threaded, very few classes and methods). Started 50 JVMs simultaneously, on a machine with 120 threads. GC thread and compilation thread count reduced to 2 and 1 respectively.
Measured average time to complete the tests as reported by Linux time command, in seconds.

4 group of tests, with 4-6 sequential runs within each group. Before each group, SC was wiped.

SC alone appears to regress completion time (since the test is so simple, there is probably no benefit of SC, but there is some cost of maintaining SC?). However, use of hints does not seem to introduce additional regression.

no SC
7.7104
7.08792
7.25356
8.0851

SC, no hints
9.23496
8.95598
9.325
8.82136

SC, use hints
7.41268
10.0347
7.37226
8.4999
8.2855
8.60048

SC, use hints, skip insignificant updates
7.53726
8.9249
9.76732
8.5943
10.1544
7.5088

SC alone appears to regress completion time (since the test is so simple, there is probably no benefit of SC, but there is some cost of maintaining SC?).

Might be the same issue as https://github.com/eclipse/openj9/issues/5918. Using -Xshareclasses:noTimestampChecks can workaround this issue.

@pshipton I understand the concern. There might be a real problem, but I'm reluctant to add additional logic before observing a problem and proving the logic helps. If you or somebody else try and succeed to create a problem with their own scenarios, I'm more than willing to investigate...

@amicic based on the results in https://github.com/eclipse/openj9/issues/3743#issuecomment-503670540, I agree it seems fine as-is. ~What platform did you test on?~ nm, I see Linux.

@amicic do you agree to enable -XX:+UseGCStartupHints by default next week after the branch for the 0.15 release?

@amicic We've branched for 0.15.0. Can enable -XX:+UseGCStartupHints by default?

PR that enables it by default has just been merged.

Since https://github.com/eclipse/omr/pull/4079 is merged, I believe this can be closed.

Was this page helpful?
0 / 5 - 0 ratings