I've found that it's possible to stress JRuby into crashing with an out of memory error with the following code.
The code zips the bytes of two strings together, their lengths don't matter at all, a single character is sufficient. It then checks whether any of the bytes are nil, which should be impossible, but happens. Originally I didn't check for nil, but the first indication that there was a problem was that I got errors that I did things with nil where there couldn't be any nil. When I put a begin鈥escue around it to see what was nil I got an out of memory error instead.
s1 = 'a'
s2 = 'b'
100000.times do
b1 = s1.each_byte
b2 = s2.each_byte
bytes = b1.zip(b2).flatten
if bytes.any? { |b| b.nil? }
puts('this can never happen')
end
end
prints the following in JRuby 1.7.18, 1.7.19 and HEAD (probably all other versions too):
this can never happen
this can never happen
this can never happen
this can never happen
this can never happen
Error: Your application used more memory than the safety cap of 500M.
Specify -J-Xmx####m to increase it (#### = cap size in MB).
Specify -w for full OutOfMemoryError stack trace
the exact number of "this can never happen" differ.
The full stack trace of the OutOfMemoryError is:
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:713)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
at org.jruby.RubyEnumerator$ThreadedNexter.ensureStarted(RubyEnumerator.java:700)
at org.jruby.RubyEnumerator$ThreadedNexter.next(RubyEnumerator.java:654)
at org.jruby.RubyEnumerator.next(RubyEnumerator.java:461)
at org.jruby.RubyEnumerator$INVOKER$i$0$0$next.call(RubyEnumerator$INVOKER$i$0$0$next.gen)
at org.jruby.RubyClass.finvoke(RubyClass.java:616)
at org.jruby.runtime.Helpers.invoke(Helpers.java:593)
at org.jruby.RubyBasicObject.callMethod(RubyBasicObject.java:359)
at org.jruby.RubyEnumerable.zipEnumNext(RubyEnumerable.java:1679)
at org.jruby.RubyEnumerable$50.call(RubyEnumerable.java:1635)
at org.jruby.runtime.CallBlock.doYield(CallBlock.java:80)
at org.jruby.runtime.BlockBody.yield(BlockBody.java:82)
at org.jruby.runtime.Block.yield(Block.java:147)
at org.jruby.RubyString.enumerateBytes(RubyString.java:5468)
at org.jruby.RubyString.each_byte19(RubyString.java:5275)
at org.jruby.RubyString$INVOKER$i$0$0$each_byte19.call(RubyString$INVOKER$i$0$0$each_byte19.gen)
at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodZeroBlock.call(JavaMethod.java:472)
at org.jruby.RubyClass.finvoke(RubyClass.java:541)
at org.jruby.runtime.Helpers.invoke(Helpers.java:589)
at org.jruby.RubyBasicObject.callMethod(RubyBasicObject.java:394)
at org.jruby.RubyEnumerator.each(RubyEnumerator.java:294)
at org.jruby.RubyEnumerator$INVOKER$i$each.call(RubyEnumerator$INVOKER$i$each.gen)
at org.jruby.RubyClass.finvoke(RubyClass.java:520)
at org.jruby.runtime.Helpers.invoke(Helpers.java:577)
at org.jruby.RubyEnumerable.callEach(RubyEnumerable.java:96)
at org.jruby.RubyEnumerable.zipCommonEnum(RubyEnumerable.java:1626)
at org.jruby.RubyEnumerable.zipCommon19(RubyEnumerable.java:1547)
at org.jruby.RubyEnumerable.zip19(RubyEnumerable.java:1491)
at org.jruby.RubyEnumerable$INVOKER$s$0$0$zip19.call(RubyEnumerable$INVOKER$s$0$0$zip19.gen)
at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:210)
at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:206)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:161)
at tmp.jruby_issue.invokeOther3:zip(tmp/jruby_issue.rb)
at tmp.jruby_issue.\=tmp\|jruby_issue\,rb_CLOSURE_1__tmp\|jruby_issue\,rb_0(tmp/jruby_issue.rb:7)
at org.jruby.runtime.CompiledIRBlockBody.commonYieldPath(CompiledIRBlockBody.java:66)
at org.jruby.runtime.IRBlockBody.yieldSpecific(IRBlockBody.java:84)
at org.jruby.runtime.Block.yieldSpecific(Block.java:116)
at org.jruby.RubyFixnum.times(RubyFixnum.java:300)
at org.jruby.RubyFixnum$INVOKER$i$0$0$times.call(RubyFixnum$INVOKER$i$0$0$times.gen)
at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:303)
at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:141)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:145)
at tmp.jruby_issue.invokeOther13:times(tmp/jruby_issue.rb)
at tmp.jruby_issue.__script__(tmp/jruby_issue.rb:4)
at java.lang.invoke.MethodHandle.invokeWithArguments(MethodHandle.java:636)
at org.jruby.ir.Compiler$1.load(Compiler.java:112)
at org.jruby.Ruby.runScript(Ruby.java:827)
at org.jruby.Ruby.runScript(Ruby.java:820)
at org.jruby.Ruby.runNormally(Ruby.java:750)
at org.jruby.Ruby.runFromMain(Ruby.java:572)
at org.jruby.Main.doRunFromMain(Main.java:404)
at org.jruby.Main.internalRun(Main.java:299)
at org.jruby.Main.run(Main.java:226)
at org.jruby.Main.main(Main.java:198)
This may just be a down side of our having to use threads for all Enumerator#next logic.
You are creating two enumerators and then zipping the one against the other. This will probably create at least one thread, and possibly two. Once those threads reach the end of the data, they _should_ shut down. If you walk away from them before they're complete, they should also shut down. What you're seeing here is that too many threads have been created and not cleaned up (perhaps due to GC delays) and so we can't create any more.
I attempted to make it force a GC when it fails to create a new thread, but it doesn't seem to help here.
It looks like our best bet would be to finally start making non-threaded enumerator logic similar to what we already have for Array#each (RubyEnumerator.ArrayNexter for example). That should make it possible for us to handle more core-class cases without threads, which _should_ make your case work.
Thanks for looking into this. From my casual understanding of the problem it feels like as long as the underlying collection is sequential or in some other way externally enumerable there should be no need to use threads for enumeration.
I looks, for example, like the RubyString#each_byte creates RubyEnumerators and passes a size function, so could it also pass a function that enumerated the string (essentially a RubyEnumerator.Nexter)? I'm basically just trying to see if I'm understanding the underlying code correctly, I'm not familiar enough with it to see the whole picture or the downsides of a solution like that.
What I proposed is kind of what happens for RubyArray, but instead of having RubyEnumerator check the type of the underlying collection and deciding on the best "nexter" strategy the collection creates the strategy when it creates the RubyEnumerator.
Most helpful comment
It looks like our best bet would be to finally start making non-threaded enumerator logic similar to what we already have for Array#each (RubyEnumerator.ArrayNexter for example). That should make it possible for us to handle more core-class cases without threads, which _should_ make your case work.