Logstash: Avoid class loader chaining in janino compilation

Created on 23 Jun 2020  路  3Comments  路  Source: elastic/logstash

Currently the compilation of the generated code is done through Janino's SimpleCompiler.

Each dataset is compiled with the steps described in the ISimpleCompiler interface:

To set up an ISimpleCompiler object, proceed as follows:

Create an ISimpleCompiler-implementing object
Call any of the ICookable.cook(String, Reader) methods to scan, parse, compile and load the compilation unit into the JVM.
Call getClassLoader() to obtain a ClassLoader that you can use to access the compiled classes.

Then the class loader created is set as the Compiler's parent class loader, so that the next cook invocation can find the references classes.

This creates an issue where big pipelines with 500 or more filters create a chain of hundreds of class loaders. Measuring the time spend in cook, we can see that initial invocations take sub 100ms but after a few hundred the time increases up to 500 or 600 ms per dataset compilation.

For large pipelines of 1000 datasets this can lead to tens of minutes of compilation, which gets worse in multiple pipeline scenarios since there's only a single compiler, meaning that the Class Loader chain is the same even for distinct non connected pipelines.

A few alternatives to evaluate:

  1. compile the whole pipeline as a single cook operation.
    a) Pro: much faster (the whole pipeline can be compiled under 1 second)
    b) Con: no longer possible to cache dataset compilations across workers or pipelines
  2. Find a way to reuse the same class loader instead of creating a chain. Maybe using the Compiler instead of SimpleCompiler.
    a) Pro: allows maintaining the current caching strategy
    b) Con: not even sure if this is possible, we need to investigate the Compiler class
Java Execution performance improvements

Most helpful comment

Related:

I'm pretty confident that we _don't_ need to reference previous datasets, which would make units significantly more cacheable and potentially eliminate the need for the global lock. Preliminary tests with the below generates code that _doesn't_ reference specific datasets, and seems to work, at least with moderate complexity if/else chaining.

~~~ diff
diff --git a/logstash-core/src/main/java/org/logstash/config/ir/compiler/VariableDefinition.java b/logstash-core/src/main/java/org/logstash/config/ir/compiler/VariableDefinition.java
index e6644cca6..83013003b 100644
--- a/logstash-core/src/main/java/org/logstash/config/ir/compiler/VariableDefinition.java
+++ b/logstash-core/src/main/java/org/logstash/config/ir/compiler/VariableDefinition.java
@@ -72,6 +72,8 @@ final class VariableDefinition implements SyntaxElement {
safe = EventCondition.class;
} else if (DynamicMethod.class.isAssignableFrom(clazz)) {
safe = DynamicMethod.class;

All 3 comments

Related:

I'm pretty confident that we _don't_ need to reference previous datasets, which would make units significantly more cacheable and potentially eliminate the need for the global lock. Preliminary tests with the below generates code that _doesn't_ reference specific datasets, and seems to work, at least with moderate complexity if/else chaining.

~~~ diff
diff --git a/logstash-core/src/main/java/org/logstash/config/ir/compiler/VariableDefinition.java b/logstash-core/src/main/java/org/logstash/config/ir/compiler/VariableDefinition.java
index e6644cca6..83013003b 100644
--- a/logstash-core/src/main/java/org/logstash/config/ir/compiler/VariableDefinition.java
+++ b/logstash-core/src/main/java/org/logstash/config/ir/compiler/VariableDefinition.java
@@ -72,6 +72,8 @@ final class VariableDefinition implements SyntaxElement {
safe = EventCondition.class;
} else if (DynamicMethod.class.isAssignableFrom(clazz)) {
safe = DynamicMethod.class;

I've opened a WIP PR that eliminates chaining and the global lock here: https://github.com/elastic/logstash/pull/12047

I believe we can now close this as fixed by #12047 and #12060

Was this page helpful?
0 / 5 - 0 ratings