Openj9: Support multiple shared classes caches in single JVM

Created on 12 Apr 2019  路  30Comments  路  Source: eclipse/openj9

Currently, one JVM instance can only startup one shared classes cache. Inside docker container where there are mutiple layers, if any write operation happened on a shared cache that is in a lower read-only layer, docker will do a "copy-on-write" that adds the whole shared cache to the top writable layer. We will end up having multiple copies of the shared cache in the docker image, which is undesirable.

To solve the docker issue above, one JVM instance should be able startup multiple shared caches. The shared cache in the lower read-only layers should be started up in read-only mode and the shared cache in the top writable layer should be started up in read-write mode.

Currently, we are using SRPs to locate the resources within the same cache. However, this won't work if we need to locate resources in another cache, as we cannot guarantee the caches being mapped at fixed addresses over runs. Instead of using SRPs, we can use something like a cache layer number as well as offset to the cache head. After all the caches are mapped, we know the address of each cache head. We can find the correct cache head via cache layer number and then find the corresponding resource using the offset.

The read-write cache at the top layer is build on top of all the caches in the lower read-only layers. We need to record the cache dependencies here. Also changes to a lower layer cache might invalidate the read-write cache on top of it. So we need to find a way to record the state of read-only caches when building the read-write cache.

  • [x] Add VM support for multi-layer cache.

  • [x] Add DDR support for multi-layer cache.

  • [x] Add JCL and JVMTI support for multi-layer cache.

  • [x] Add new -Xshareclasses option to destroy all layer caches.

  • [x] Currently -Xshareclass:printStats only shows stats of top layer cache. You need to use -Xshareclasses:layer=<lowerLayerNumber>,printStats to show stats of a low layer cache. Ideally -Xshareclass:printStats should show stats of all layer caches.

  • [x] When iterating caches and setting SH_OSCache_Info.isCorrupt, we startup the cache and call readCache() in SH_CacheMap::startupForStats() to detect corruptions. Currently we are doing this only in layer 0 cache.

  • [x] Fix the shared cache APIs that if there is a request to update data in a read-only layer, a new data record should be created in the top level write layer.

  • [x] Add test

vm

Most helpful comment

Do we want to 2 digits or 1 digit for the layer number ?

While 10 layers seem plenty today we cannot anticipate how customers will use it in the future, so I would say 2 digits is better unless there is a disadvantage to it.

All 30 comments

I opened this issue as a place to have discussions and receive feedback/suggestions in the open.

FYI: @DanHeidinga , @pshipton , @mpirvu , @vijaysun-omr

So when building the read-write cache, there are two kinds of information that needs to be recorded.

  1. All the names (ids) of read-only caches that the read-write cache is build on.
  2. The state of each readOnly cache when building the read-write cache. (This information should help us detect if the readOnly cache has been destroyed and recreated. The readOnly cache might be still valid if there are new data added in it.)

There are ideas storing such id/state info into the readWrite cache (as metadata or in the cache header) or in a separate txt file.

I slightly prefer keeping such info in the cache itself. Using txt file is more error-prone to me, as we might need to handling issues like racing condition creating such file, permissions, CML utilities to manage such files. Also we might end up having many txt files. User can do unexpected things on such txt file.

If we want to store such info into the cache, it may look like this:
If we are building a read-write cache (let's call it SCC_L2), it is build on top of read-only cache SCC_L1 and SCC_L0. We can store such info of both SCC_L1 and SCC_L0 to SCCL3 as metadata. Storing as metadata allows us to have as many id/state info as we want in one readWrite cache. Or we can store just info of SCC_L1 into the header of SCC_L2, and let SCC_L1 header store the id/state info of SCC_L0.

I'm fine with storing the prereq cache info directly into the cache. Agree that adding extra artifacts could cause extra problems to resolve, and perhaps allow things to get out of sync. It means searching for all the "_Lx" names (x = 0 - N), but they all need to be opened anyway.

Actually, in a docker scenario I'd expect only the caches that need to be opened to be present. Outside of docker, there could be a number of independent paths of caches, so not all caches would necessarily be opened.

i.e. you could have the following dependencies
SCC_L5 -> SCC_L2 -> SCC_L0
SCC_L4 -> SCC_L1 -> SCC_L0

Maybe the above would never happen, although theoretically possible, assuming we always open the highest Lx cache as read-only when creating a new cache.

SCC_L5 -> SCC_L2 -> SCC_L0
SCC_L4 -> SCC_L1 -> SCC_L0

Is the list of cache dependencies always a linear list? Can it be a DAG?

SCC_L5 |-> SCC_L2 -> SCC_L0
       |-> SCC_L4 -> SCC_L1

The way we encode the data will be effected by the limitations we build in.

Is the list of cache dependencies always a linear list? Can it be a DAG?

It might not always be linear list, it can be DAG. If we want to support DAG, storing the dependencies as metadata looks like the correct way to go.

A linear list is simple: each layer overrides the child layer and I'd expect a ROMClass from a later layer to be used instead of one from a lower layer. How will conflicting data be handled in a DAG?

Whether it is a linear list or DAG, the prerequisite caches have to be started up first.
e.g. linear case:

SCC_L5 -> SCC_L2 -> SCC_L0 (SCC_L5 depends SCC_L2 and SCC_L0, SCC_L2 depends on SCC_L0)

The cache startup order is: SCC_L0, SCC_L2, SCC_L5.

DAG case:

SCC_L5 |-> SCC_L2 -> SCC_L0
       |-> SCC_L4 -> SCC_L1

A legal cache start up order can be SCC_L0, SCC_L2, SCC_L1, SCC_L4, SCC_L5

For conflicting data, we can always let the one in the later cache win.

A legal cache start up order can be SCC_L0, SCC_L2, SCC_L1, SCC_L4, SCC_L5

Right, and SCC_L1, SCC_L4, SCC_L0, SCC_L2, SCC_L5 is another legal order. How do we pick a consistent order from run to run?

SCC_L5 |-> SCC_L2 -> SCC_L0
       |-> SCC_L4 -> SCC_L1

In cold run of SCC_L5, e.g. user creates SCC_L5 using:

-XX:readCache:name=SCC_L0 -XX:readCache:name=SCC_L2 -XX:readCache:name=SCC_L1 -XX:readCache:name=SCC_L4 -Xshareclasses:name=SCC_L5.

We will store the info of the prerequisite caches (in the order they appear in the CML) as the first 4 metadata in SCC_L5. So SCC_L5 knows what its prerequisite caches are and the order.

In warm run of SCC_L5,
We will find the metadata of its prerequisite caches first inside the startup() call. The SCC_L5 startup call needs to make sure there are 4 caches that are already started up and these 4 caches are SCC_L0, SCC_L2, SCC_L1, SCC_L4. We can enforce the same order here, but I guess we don't need to.
The SCC_L2 startup call ensures that SCC_L0 is started up before it, and the SCC_L4 startup call ensures SCC_L1 is started up before it.

We are already storing the entire CML under GC hints. So all the legal CMLs that have been successfully started up SCC_L5 are already stored in SCC_L5. User can always check for them and use them to startup

-XX:readCache:name=SCC_L0 -XX:readCache:name=SCC_L2 -XX:readCache:name=SCC_L1 -XX:readCache:name=SCC_L4 -Xshareclasses:name=SCC_L5.

Aha. That's what I was missing. My model had been that the user would only need to specify the "current" cache and it would know its dependents.

ie: java -Xshareclasses:name=SCC_L5 .... and the SCC_L5 cache would have recorded that it depends on _L2 & _L4

I prefer to always have "-XX:readCache" in the CML so that users know what they are going to open. It is more obvious (for them and for us) to debug. Hopefully this can prevent them from forgetting the prerequisite caches and deleting them by mistake.

It needs to be easier to use. The user should only have to specify the cache name, and the rest is automatic. IMO the readCache option is for prototyping, or advanced/testing use, and shouldn't be needed in the final solution. The list of prereqs can be shown in printStats, javacore and DDR extensions, it doesn't need to be explicit on the command line.

I don't think we should be so concerned with corner cases like prerequisite cache being deleted. If one is missing or corrupted, the JVM can either fail to start or start without using a cache.

What about the usability of that approach? If we ship a docker image with a default cache, anyone building ontop of that image will need to know about (and specify!) that cache to use it. And if they (say an appserver) added their own cache, their end users would need to know about both caches.

In warm runs, we can startup the current cache to fetch the prerequisite caches and then start them all up in correct order.

In cold run, shouldn't user tell us what are the caches they want to build ontop of ? Or we use all the caches we found in the cache directory as prerequisite caches and build a new one ?

we use all the caches we found in the cache directory as prerequisite caches and build a new one ?

Something like that. I do think we need a new option in order to determine if a new cache should be created or not. As you mentioned earlier, we need to figure out the case where two JVMs are started to use a new layer. Something like createLayer=<level> would work, but I'd prefer if the layer # didn't need to be specified.

we use all the caches we found in the cache directory as prerequisite caches and build a new one ?

Then dependency is a linear list.

Another problem in cold run is that we don't know the dependencies between these existing caches. We need something that tells us the dependencies so that we know the correct order to start them up.
One possibility is to automatically append sth like _L0, _L1 to the cache name. Or we start them up in the order of their creation time.

Yes, I was thinking there would be "_Lx" added into the cache name. Then the logic is to open the largest x, and from there, each cache knows either the next or all prereqs.

Then the logic is to open the largest x, and from there, each cache knows either the next or all prereqs.

I was thinking about the opposite. Starting from L0 to the largest Lx. Each cache is started up after its prerequisite cache. (We need the address of all prerequsite caches when reading metadata of Lx pointing outside of Lx). The largest Lx is started up in readwrite mode and the rest in readonly mode. This brings me back to the previous comment:

Actually, in a docker scenario I'd expect only the caches that need to be opened to be present. Outside of docker, there could be a number of independent paths of caches, so not all caches would necessarily be opened.

i.e. you could have the following dependencies
SCC_L5 -> SCC_L2 -> SCC_L0
SCC_L4 -> SCC_L1 -> SCC_L0

To play it safe, I assume Lx always depends on L0, L1, ... , Lx-1. We don't know where the pointers inside JIT/AOT data are pointing to.

To play it safe, I assume Lx always depends on L0, L1, ... , Lx-1. We don't know where the pointers inside JIT/AOT data are pointing to.

Seems a little backwards. If the prereq metadata is needed before loading all the other metadata, the header can point to the prereq metadata so it can be read first and the prereq caches loaded before the full metadata read.

While looking at the code, I noticed _minimumAccessedShrCacheMetadata/_maximumAccessedShrCacheMetadata are in the SH_CacheMap level. They need to go into SH_CompositeCacheImpl level in the multi-cache case. Even in the current scenario there is only one cache, SH_CompositeCacheImpl is a more proper place to put _minimumAccessedShrCacheMetadata/_maximumAccessedShrCacheMetadata. I will create a PR to fix this.

I will create a PR to fix this.

https://github.com/eclipse/openj9/pull/5713

About changing the format of cache file name. I am thinking about adding L<LayerNumber> after the generation number. Do we want 2 digits or 1 digit for the layer number ? Using 1 digit allows us to support as many as 10 layers and 2 digits allow 100 layers.

The new cache file name would be something like: C290M4F1A64P_CC1_G37L1 orC290M4F1A64P_CC1_G37L01

Do we want to 2 digits or 1 digit for the layer number ?

While 10 layers seem plenty today we cannot anticipate how customers will use it in the future, so I would say 2 digits is better unless there is a disadvantage to it.

What are the prospects of getting this into the 0.16 release? Can we aim for that?

How does this feature behave when a Default SCC already exists? Does it recognize the existence of a Default SCC and place it in the chain?

Does it recognize the existence of a Default SCC and place it in the chain?

Yes

I've added a task list in the description to record the works already done and the works to be done.

All the items in the description are done. I'm closing this.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

AlenBadel picture AlenBadel  路  106Comments

M-Davies picture M-Davies  路  52Comments

dsouzai picture dsouzai  路  59Comments

sophia-guo picture sophia-guo  路  52Comments

andrew-m-leonard picture andrew-m-leonard  路  52Comments