Currently the compilation of the generated code is done through Janino's SimpleCompiler.
Each dataset is compiled with the steps described in the ISimpleCompiler interface:
To set up an ISimpleCompiler object, proceed as follows:
Create an ISimpleCompiler-implementing object
Call any of the ICookable.cook(String, Reader) methods to scan, parse, compile and load the compilation unit into the JVM.
Call getClassLoader() to obtain a ClassLoader that you can use to access the compiled classes.
Then the class loader created is set as the Compiler's parent class loader, so that the next cook invocation can find the references classes.
This creates an issue where big pipelines with 500 or more filters create a chain of hundreds of class loaders. Measuring the time spend in cook, we can see that initial invocations take sub 100ms but after a few hundred the time increases up to 500 or 600 ms per dataset compilation.
For large pipelines of 1000 datasets this can lead to tens of minutes of compilation, which gets worse in multiple pipeline scenarios since there's only a single compiler, meaning that the Class Loader chain is the same even for distinct non connected pipelines.
A few alternatives to evaluate:
cook operation.Related:
I'm pretty confident that we _don't_ need to reference previous datasets, which would make units significantly more cacheable and potentially eliminate the need for the global lock. Preliminary tests with the below generates code that _doesn't_ reference specific datasets, and seems to work, at least with moderate complexity if/else chaining.
~~~ diff
diff --git a/logstash-core/src/main/java/org/logstash/config/ir/compiler/VariableDefinition.java b/logstash-core/src/main/java/org/logstash/config/ir/compiler/VariableDefinition.java
index e6644cca6..83013003b 100644
--- a/logstash-core/src/main/java/org/logstash/config/ir/compiler/VariableDefinition.java
+++ b/logstash-core/src/main/java/org/logstash/config/ir/compiler/VariableDefinition.java
@@ -72,6 +72,8 @@ final class VariableDefinition implements SyntaxElement {
safe = EventCondition.class;
} else if (DynamicMethod.class.isAssignableFrom(clazz)) {
safe = DynamicMethod.class;
- } else if (BaseDataset.class.isAssignableFrom(clazz)) {
- safe = BaseDataset.class;
} else {
safe = clazz;
}
~~~
-- https://github.com/elastic/logstash/pull/12038#discussion_r444397945
I've opened a WIP PR that eliminates chaining and the global lock here: https://github.com/elastic/logstash/pull/12047
I believe we can now close this as fixed by #12047 and #12060
Most helpful comment
Related: