Logstash: Null JSON with JSON filter can halt a pipeline

Created on 5 Oct 2017  路  19Comments  路  Source: elastic/logstash

This is a regression from 5.6, and is quite an odd bug.

The reproduction case is simple (but was discovered in non-trivial configuration).

bin/logstash -e 'input { generator {  message => ""  } } filter { json {   source => "message" }} output { stdout { codec => rubydebug } }'

When run against 5.6 the generator keeps chugging along with alot of "_jsonparsefailure" tags...but when run against 6.0+ it halts the pipeline with some very long exception traces with java.lang.NullPointerException seemingly the main culprit.

I will not include those here for brevity since it is easy to reproduce. If looking at the stack traces, slightly more readable stack traces can be obtained by turning off invokedynamic (however, that flag seems unrelated to the underlying bug).

I confirmed that the same version of json filter was in use, as well confirmed this is an issue on a very early version of 6.0 before the Jackson upgrade.

Here is where it get weird....
In logstash/json.rb (which hasn't been changed for a while), the following code illustrates the bug:

 result = JrJackson::Ruby.parse(data, options)
 return result

...and the following _fixes_ it

result = JrJackson::Ruby.parse(data, options)
puts result
return result

This is a Grade A Heisenbug.

bug

Most helpful comment

Oooh I love Heisenbugs!

popcorn

All 19 comments

Oooh I love Heisenbugs!

popcorn

/cc @guyboertje

I am able to reproduce in isolation outside logstash. Will tidy up my examples and share shortly.

What I can tell so far, and this is really weird, is that the NPE is thrown when dereferencing the returned value of JrJackson::Ruby.parse.

In the former case shown ^^ by @jakelandis , result is passed down and it will get dereferenced here https://github.com/logstash-plugins/logstash-filter-json/blob/af2173050a49c85886cd74455821affddcbf31de/lib/logstash/filters/json.rb#L88 which matches the logstash stack trace.

In the latter case, puts result triggers the NPE which gets rescued at https://github.com/logstash-plugins/logstash-filter-json/blob/af2173050a49c85886cd74455821affddcbf31de/lib/logstash/filters/json.rb#L77 and then correctly handled by that block.

to reproduce, using JrJackson 0.4.3 and JRuby 9.1.13.0 (also tried with JrJackson 0.4.2, JRuby 9.1.12.0 and 1.7.23)

Gemfile

source "https://rubygems.org"
gem "jrjackson", "=0.4.3"

crash.rb

require "jrjackson"

def parse(data)
  puts("enter parse")
  o = JrJackson::Ruby.parse("", {})
  puts("after JrJackson::Ruby.parse")
  puts("in load, o = #{o.inspect}") # this throws NPE
  o
end

result = JrJackson::Ruby.parse("", {}) # this works
puts("in main, result = #{result.inspect}")

result = parse("")
puts("parse result = #{result.inspect}")
$ bundle exec ruby crash.rb

in main, result = nil
enter parse
after JrJackson::Ruby.parse
Unhandled Java exception: java.lang.NullPointerException
java.lang.NullPointerException: null
              getClass at org/jruby/runtime/callsite/CachingCallSite.java:444
                  call at org/jruby/runtime/callsite/CachingCallSite.java:125
  invokeOther8:inspect at crash.rb:7
                 parse at crash.rb:7
                  call at org/jruby/internal/runtime/methods/CompiledIRMethod.java:168
          cacheAndCall at org/jruby/runtime/callsite/CachingCallSite.java:338
                  call at org/jruby/runtime/callsite/CachingCallSite.java:163
   invokeOther16:parse at crash.rb:14
                <main> at crash.rb:14
   invokeWithArguments at java/lang/invoke/MethodHandle.java:627
                  load at org/jruby/ir/Compiler.java:95
             runScript at org/jruby/Ruby.java:828
           runNormally at org/jruby/Ruby.java:747
           runNormally at org/jruby/Ruby.java:765
           runFromMain at org/jruby/Ruby.java:578
         doRunFromMain at org/jruby/Main.java:417
           internalRun at org/jruby/Main.java:305
                   run at org/jruby/Main.java:232
                  main at org/jruby/Main.java:204

So the interesting bit here is that the first call the JrJackson in the main context works and nil is returned. But the second call within the parse method returns an object that when dereferenced is throwing an NPE.

The next step would be to identify why the JrJackson returned object in the method context end up as a null self in https://github.com/jruby/jruby/blob/8e1c11525f55ea363050c043a4ed4fa210228320/core/src/main/java/org/jruby/runtime/callsite/CachingCallSite.java#L124-L125

I noticed that I could not reproduce it while in a debugger, and found that adding this to config/jvm.options:

-Djruby.compile.mode=OFF

Will also _fix_ it... so I think this is something w/r/t to JRuby and JIT. (note this is not actually a _fix_ for the bug)

I played around with JIT settings abit, and set -w 1 to ensure only 1 pipeline worker.

When set to:

-Djruby.compile.mode=JIT
-Djruby.jit.threshold=1000

then after about 1000 (not precisely, but close to), the pipeline halts with this line:

2017-10-05T16:39:52.695-05:00 [Ruby-0-JIT-1] INFO JITCompiler : done jitting: <block> filter_CLOSURE_2.filter_CLOSURE_2 at /Users/jake/workspace/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-json-3.0.4/lib/logstash/filters/json.rb:88

if set -Djruby.jit.threshold=20 it halts after about 20.

-Djruby.jit.threshold=0 to zero shows the both lines of code in question

2017-10-05T16:47:09.425-05:00 [Ruby-0-Thread-8@[main]>worker0: :1] INFO JITCompiler : done jitting: Json LogStash::Filters::Json.filter at /Users/jake/workspace/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-json-3.0.4/lib/logstash/filters/json.rb:68
2017-10-05T16:47:09.435-05:00 [Ruby-0-Thread-8@[main]>worker0: :1] INFO JITCompiler : done jitting: Json LogStash::Json.jruby_load at /Users/jake/workspace/logstash/logstash-core/lib/logstash/json.rb:36

Setting -Djruby.jit.logging.verbose=true didn't seem to provide any additional information.

Ideally we should reproduce in isolation by also taking out JrJackson....

The following may be an acceptable workaround (in config/jvm.options) if we can't fix the root cause

-Djruby.jit.exclude=LogStash::Json#jruby_load

Merging my investigation with @colinsurprenant 's ....

Using @colinsurprenant example above ^^ (crash.rb), and pulling bundler out of the equation...

  • ensure JRuby is installed (and is the default ruby)
  • gem install jrjackson

Works:

jruby -J-Djruby.compile.mode=OFF crash.rb

Fails:

jruby -J-Djruby.compile.mode=JIT -J-Djruby.jit.threshold=0 -J-Djruby.jit.logging=true -J-Djruby.jit.logging.verbose=true crash.rb

Also Fails (but less JIT messages):

jruby -J-Djruby.compile.mode=JIT -J-Djruby.jit.threshold=100 -J-Djruby.jit.logging=true -J-Djruby.jit.logging.verbose=true crash.rb

Also, I have confirmed that I can _fix_ 6.x by putting ANY puts "whatever" just below the call to JrJackson::Ruby.parse(data, options). I suspect that if _fixes_ it by breaking the in-lining of the JIT, and also suspect that for what ever reason, 5.6 isn't getting in-lined hence not broken. (<-- alot of unproven assumptions) It doesn't seem JRuby version specific, as @colinsurprenant noted this behavior is present in JRuby 1.7.

I haven't dived deep into JRuby's JIT compiler or JrJackson yet, but am confident this is where the problem lies at the intersection of JRuby JIT and JrJackson.

I will also continue investigating this morning. At this point I don't think we should consider the JRuby JIT options as a workaround because of unknown side effects, I'd first prefer we fully understand what is going on and hopefully at least be able to create a simple/no dependencies reproduction case that we can use to involve the JRuby team or better yet find and submit a fix for the problem.

So I found the source of the problem, will submit a fix into JrJackson shortly and will create a simple reproduction case for JRuby.
Basically the problem is that the JrJackson::Ruby.parse Java extension method should return a IRubyObject but returns a Java null and this seems to trigger a edge case in JRuby when a null is returned from an Java ext method called from another method but not from main where it is correctly converted to a Ruby nil.

JrJackson fix PR in guyboertje/jrjackson#65

Great stuff @guyboertje - let me know once JrJackson v0.4.4 is out :D

JrJackson 0.4.4 is out - will submit update PR.

PR #8475

I manually tested the fix and it solves. I think we should also add a regression spec for this, WDYT @jakelandis ?

I am always +1 on more tests .... however, in this case since the issue wasn't so much our logic but rather a behavior specific to JRuby JIT compiler and a 3rd party library. Given that the library is fixed and is confirmed to fix the bug, not sure how much value a test would provide. For example, how likely is it that future developer could unintetionally regress this behavior ? (imo, pretty low = pretty low need for regression test)

Agree but at the same time this bug would have been caught if we had a spec to define the behaviour of deserializing an empty string...!

fix merged into master, 6.x and 6.0.

Was this page helpful?
0 / 5 - 0 ratings