It seems there is a big regression at code removal level between 19.3.1 and 20.0.0. The reports mentioned are available in https://github.com/sdeleuze/graal-issues/tree/master/code-removal-regression.
The very same jafu example:
With GraalVM 19.3.1:
With GraalVM 20.0.0:
A diff between packages reports available in the jafu-sample-19.3.1-reports and jafu-sample-20.0.0-reports shows that GraaVM 20.0.0 is unable to detect that following packages and related classes are not used:
com.sun.org.apache.xalancom.sun.org.apache.xercescom.sun.xml.internal.streamjavax.xml.datatypejavax.xml.namespacejavax.xml.streamjavax.xml.transformjdk.xml.internalorg.w3c.domorg.xml.sax.helpersSame problem with the jafu-webmvc one:
With GraalVM 19.3.1:
With GraalVM 20.0.0:
Similar diff with a lot of XML related classed can be observed between packages reports available in the jafu-webmvc-sample-19.3.1-reports and jafu-webmvc-sample-20.0.0-reports.
@arodionov do you think this is related to your recent commit on xml packages?
Yes, this regression is related with the following Feature that enables XML support https://github.com/oracle/graal/blob/master/substratevm/src/com.oracle.svm.hosted/src/com/oracle/svm/hosted/config/JavaxXmlClassAndResourcesLoaderFeature.java
The XML-parsers support has been added in GraalVM 20.0.0.
Now it is possible to use Spring XML configuration without the agent.
As a result, if the code that loads parser via reflection is reachable, the specified parser and all related classes will be loaded too.
Spring with Java 8 triggers the _DocumentBuilderFactory_ and _SAXParserFactory_ parsers by the following branch of the call-tree: _org.springframework.boot.SpringApplication.run -> org.springframework.boot.SpringApplication.getSpringFactoriesInstances -> org.springframework.core.io.support.SpringFactoriesLoader.loadFactoryNames -> org.springframework.core.io.support.SpringFactoriesLoader.loadSpringFactories -> org.springframework.core.io.support.PropertiesLoaderUtils.loadProperties -> org.springframework.core.io.support.PropertiesLoaderUtils.fillProperties -> java.util.Properties.loadFromXML -> java.util.Properties$XmlSupport.load -> ... -> javax.xml.parsers.FactoryFinder.find_
This loads ~490 classes.
In Java 11 there is another implementation of the java.util.Properties.loadFromXML, with direct _SAXParser_ call (without reflection usage). So, for Java 11 images size shouldn't change between GraalVM versions.
Now I am working on reducing the number of pulled classes, by splitting them between different parsers. The size of the image should decrease for 2Mb after this.
Thanks for the detailed analysis, @arodionov.
For this particular use case (looking up the spring.factories file), Spring will never actually load the properties from an XML format, since the spring.factories file name does not end in .xml. So perhaps SpringFactoriesLoader could be reworked to omit the possibility of loading an XML file when loading spring.factories files, or perhaps we find some other workaround.
Though, Spring is always going to use java.util.Properties in such use cases. So perhaps it's a moot point about trying to avoid the loading of those XML types unless we introduce a substitution for Java 8.
@sbrannen
unless we introduce a substitution for Java 8.
Yes, it could be a solution. For Java 11 java.util.Properties brings only ~16 classes with SAXParser.
I have been able to restore previous size and memory consumption for our jafu sample (the most basic one) with this substitution. That said I still think optimizing the number of classes compiled make sense for use cases where XML parsers are effectively needed.
The jafu-webmvc one still includes them even if does not use XML converters, so I suspect another codepath includes it, I will have a deeper look.
@arodionov Could you explain how you proceed to diagnose what triggers XML parser usage? I have been myself using -H:+PrintAnalysisCallTree + searching in the call_tree_project_*_*.txt occurrences of directly calls + XML parser class fully qualified name, but it is not super efficient. Do you have a more efficient workflow?