Description
This is about optimising how our (production) classloader RunnerClassLoader can efficiently open resources, and release the internal caches so to keep using a minimal footprint.
Implementation ideas
Vertx has a caching layer implemented by io.vertx.core.file.impl.FileResolver, which will trigger loading the full list of resources of any parent directory as well when a single resource is requested, as it attempts to then create a cache on filesystem.
This is an interestin idea, but it doesn't play along well with the current implementation of the Quarkus RunnerClassLoader, as attempting to load e.g. META-INF/resources/ will trigger the load on all jars which have such a resource path (which means opening many jar files).
Following up on ideas discussed in #13219 , some options IMO worth considering:
io.vertx.core.file.impl.FileResolver explicitly during boot to build up the cache on filesystem, then release all cachesRunnerClassLoader, specifically for resources to have more precision - so to hit the right jar files.We could also take this opportunity to actually process the resources which have been explicitly mentioned by StaticResourcesBuildItem more optimally, and fallback to a "slow mode" (existing?) for any unlisted resource.
This could be an opportunity to provide a warning about the need of a resource being loaded, as this would fail in native mode.
Similarly services from META-INF/services/ could be specifically optimised, and we can take advantage of having a fallback mode in JVM which would produce a warning for native mode.
To make it more obvious that there's a missing resource registration, I'd also propose having an environment flag which upgrades the warning into an exception.
This would be useful during testing; although I believe it's probably best left as an opt-in to not nag people too much - especially as they might not be interested in extreme optimisations and/or native images.
Personally I very much like the idea of copying resources to the file system because everything because other than the fact that at runtime we don't have to involve any jars to load the resources, the whole deployable artifact still remains self contained because we can place these files within the root quarkus-app directory that contains everything else the fast jar needs.
I'll take a quick look at that approach hopefully tomorrow.
I wonder what we'll have to do for duplicate resources. I can only think of ServiceLoader "files" to be valid duplicates, which we should treat in a special way, but there might be other such cases. Is a warning going to be enough, or does it actually need to be supported?
I doubt duplicate resources will be a problem. If we copy all of them to the file system and keep the proper index entries (as we already do for entries that exist in multiple jars), I don't think there will be any issues.
It remains to be seen in practice of course :)
to be clear, I'm not referring to package names and directories. Those are simple, just merge the content.. indeed like we do currently.
I mean what about resources which are duplicated by full name; often it's a mistake (so a warning will be fine) but there's legal patterns that we need to handle explicitly, such as the case of ServiceLoader resources.
I wasn't able to look into this in any details today unfortunately. Next week hopefully :)
n.p. no rush. BTW I wonder if we should split it further; the handling of ServiceLoader-loading seems like an easy spin-off I'd like to look at too :)
Yup, that's probably the best place to start. I don't think we can do this in parallel though as I imagine there will be a lot of conflicts.
But feel free to start it if you like. Otherwise I should be able to pick it up early next week.
I might have some time over the weekend to get started. If I do, I'll let you know and open a draft PR we can discuss.
@Sanne @stuartwdouglas I had another idea I would like your thoughts on:
My basic premise is that I want to augment the RunnerClassLoader with the exact locations of (most of) the resources that are actually loaded by the application (and none other so the index doesn't grow to large).
The way I thought we can achieve that is by doing the following:
quarkus.runner.index is set to true. This means that when -Dquarkus.runner.index=true only static-init StartupTasks will be run and the application will then exit - so it should be perfectly safe.-Dquarkus.runner.index=true is set, then QuarkusEntryPoint would use a wrapper of RunnerClassLoader that would capture the actual jars containing each resource that was loaded (by the static-init tasks obviously). When the application exits (because it finished the static-init tasks), it would write those captured jars to a new data file on disk.QuarkusEntryPoint would also look for this new data file and if found, create the RunnerClassLoader using it - otherwise it would just do what it does now.Using this, I think we can capture the exact jars that contain most of the resources loaded without making the indexes large.
The cost of this solution would be a small increase in the build time - namely the time it takes run the only the application's static-init StartupTasks.
What do you guys think?
Here is a draft PR of the idea outlined above: https://github.com/quarkusio/quarkus/pull/13365
@stuartwdouglas I was also playing around with using bytecode recording for capturing the classpath instead of using the dat file and although it's certainly possible, the complexity it brings doesn't see to warrant it.
I assume you agree :). If so, I won't pursue that avenue any further.
I'll reopen this as we only implemented a small amount of these ideas so far...
Let me know if I should break it up in smaller items.
I guess we can just keep it open for the time being 馃槑