TruffleRuby returns the wrong arity integer for methods defined in cext_ruby.rb.
BigDecimal._loadBigDecimal._load has one argument, so we expect arity to be 1.
On MRI:
➜ ruby -v
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-darwin19]
➜ irb
irb(main):001:0> require 'bigdecimal'
=> true
irb(main):002:0> BigDecimal.method(:_load).arity
=> 1
On TruffleRuby:
➜ ruby -v
truffleruby 21.1.0-dev-66492859, like ruby 2.7.2, GraalVM CE JVM [x86_64-darwin]
➜ irb
irb(main):001:0> require 'bigdecimal'
=> true
irb(main):002:0> BigDecimal.method(:_load).arity
=> -1
Psych::Emitter#scalarPsych::Emitter#scalar takes 6 arguments, so we expect arity to be 6.
On MRI:
% chruby 2.7
% ruby -v
ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-darwin20]
% irb -rpsych
>> Psych::Emitter.instance_method(:scalar).arity
=> 6
On TruffleRuby
% chruby truffleruby-jvm-ce
% ruby -v
truffleruby 21.1.0-dev-66492859, like ruby 2.7.2, GraalVM CE JVM [x86_64-darwin]
% irb -rpsych
>> Psych::Emitter.instance_method(:scalar).arity
=> -1
These methods defined through rb_define_singleton_method and rb_define_method. Taking BigDecimal._load as an example, you can see this method being defined here using rb_define_singleton_method.
rb_define_singleton_method(rb_cBigDecimal, "_load", BigDecimal_load, 1);
The last argument in there is argc which means the arity that was passed in is correct at this point. This is where the implementation differs between MRI and TruffleRuby.
In MRI, both rb_define_singleton_method and rb_define_method in class.c eventually uses the function rb_add_method_cfunc without modifying the argc variable. We assume that argc where the Method#arity derive its value from.
On TruffleRuby, the implementation for rb_define_method() is different. Within Truffle::CExt#rb_define_method, a proc is built with *args, &block, and so the method is defined with that proc as the method body.
def rb_define_method(mod, name, function, argc)
# …
method_body = Truffle::Graal.copy_captured_locals -> *args, &block do
# args is determined based on argc
end
mod.define_method(name, method_body) # uses the modified args in method_body
end
Fix in the Ruby code where Truffle::CExt#rb_define_method is defined: Create a proc with the correct arity using eval.
Fix in Java: right now a proc-defined method inherits its arity from the proc’s sharedMethodInfo, but we could change this to take an (optional?) explicit arity for the method, and pass argc through verbatim from our implementation of #rb_define_method rather than trying to smuggle it through as part of the proc itself.
Were our assumptions correct? If so, which solution sounds more feasible? Is there another one we haven't thought of?
Thanks @tomstuart for helping with this issue!
I tried this to see what happened:
def rb_define_method(mod, name, function, argc)
if argc < -2
raise "Unsupported rb_define_method argc: #{argc}"
end
if argc < 0
method_body = -> *args, &block do
case argc
when -1
# (int argc, VALUE *argv, VALUE obj)
args = [args.size, Truffle::CExt.RARRAY_PTR(args), Primitive.cext_wrap(self)]
when -2
# (VALUE obj, VALUE rubyArrayArgs)
args = [Primitive.cext_wrap(self), Primitive.cext_wrap(args)]
end
exc = $!
Primitive.thread_set_exception(nil)
# Using raw execute instead of #call here to avoid argument conversion
# We must set block argument if given here so that the
# `rb_block_*` functions will be able to find it by walking the
# stack.
res = Primitive.cext_unwrap(Primitive.call_with_c_mutex_and_frame(function, args, block))
Primitive.thread_set_exception(exc)
res
end
else
arg_names = ('arg_0'..).take(argc)
method_body = binding.eval <<~PROC
-> #{(arg_names + ['&block']).join(',')} do
args = #{arg_names.join(',')}
# (VALUE obj); (VALUE obj, VALUE arg1); (VALUE obj, VALUE arg1, VALUE arg2); ...
args = [Primitive.cext_wrap(self), *args.map! { |arg| Primitive.cext_wrap(arg) }]
exc = $!
Primitive.thread_set_exception(nil)
# Using raw execute instead of #call here to avoid argument conversion
# We must set block argument if given here so that the
# `rb_block_*` functions will be able to find it by walking the
# stack.
res = Primitive.cext_unwrap(Primitive.call_with_c_mutex_and_frame(function, args, block))
Primitive.thread_set_exception(exc)
res
end
PROC
end
method_body = Truffle::Graal.copy_captured_locals(method_body)
mod.define_method(name, method_body)
end
But unfortunately this is what happened:
% ruby -v
truffleruby 21.1.0-dev-f18b9df5, like ruby 2.7.2, GraalVM CE JVM [x86_64-darwin]
% irb
mxbuild/truffleruby-jvm-ce/languages/ruby/lib/truffle/truffle/cext_ruby.rb:50:in `const_missing': uninitialized constant Truffle::CExt::Primitive (NameError)
from mxbuild/truffleruby-jvm-ce/languages/ruby/lib/truffle/truffle/cext_ruby.rb:50:in `systmpdir'
from mxbuild/truffleruby-jvm-ce/languages/ruby/lib/mri/tmpdir.rb:16:in `<class:Dir>'
from mxbuild/truffleruby-jvm-ce/languages/ruby/lib/mri/tmpdir.rb:14:in `<top (required)>'
from <internal:core> core/kernel.rb:234:in `gem_original_require'
from mxbuild/truffleruby-jvm-ce/languages/ruby/lib/mri/rubygems/core_ext/kernel_require.rb:92:in `require'
from mxbuild/truffleruby-jvm-ce/languages/ruby/lib/mri/tempfile.rb:9:in `<top (required)>'
from <internal:core> core/kernel.rb:234:in `gem_original_require'
from mxbuild/truffleruby-jvm-ce/languages/ruby/lib/mri/rubygems/core_ext/kernel_require.rb:92:in `require'
from mxbuild/truffleruby-jvm-ce/languages/ruby/lib/mri/reline/line_editor.rb:4:in `<top (required)>'
from <internal:core> core/kernel.rb:234:in `gem_original_require'
from mxbuild/truffleruby-jvm-ce/languages/ruby/lib/mri/rubygems/core_ext/kernel_require.rb:92:in `require'
from mxbuild/truffleruby-jvm-ce/languages/ruby/lib/mri/reline.rb:8:in `<top (required)>'
from <internal:core> core/kernel.rb:234:in `gem_original_require'
from mxbuild/truffleruby-jvm-ce/languages/ruby/lib/mri/rubygems/core_ext/kernel_require.rb:92:in `require'
from mxbuild/truffleruby-jvm-ce/languages/ruby/lib/mri/irb.rb:13:in `<top (required)>'
from <internal:core> core/kernel.rb:234:in `gem_original_require'
from mxbuild/truffleruby-jvm-ce/languages/ruby/lib/mri/rubygems/core_ext/kernel_require.rb:92:in `require'
from mxbuild/truffleruby-jvm-ce/languages/ruby/lib/gems/gems/irb-1.2.6/exe/irb:9:in `<top (required)>'
from <internal:core> core/kernel.rb:400:in `load'
from <internal:core> core/kernel.rb:400:in `load'
from mxbuild/truffleruby-jvm-ce/languages/ruby/bin/irb:42:in `<main>'
Thanks for the great issue.
Yes, that analysis is correct.
Regarding solutions:
eval() is quite expensive, and doing it for every method defined in C extensions would likely be a significant overhead.rb_define_* is not always an arity number, for the case where argc is -2, but the arity of such a method is actually -1. From that arity number we need to generate a proper Arity object. The argumentDescriptors should be simply null for C methods.@tomstuart Primitive can only be used with the magic comment # truffleruby_primitives: true or in core library files.
So if you want to try that example you need # truffleruby_primitives: true as the first line of the eval'ed code.
@chrisseaton suggested we could have eval-free special cases for common arities (maybe 0, 1 and 2?) and save eval for the far less common general case. Do you think that’s a useful compromise or is it straightforwardly better to “fix it in Java”?
I think we should avoid eval entirely, it should be fairly easy to add this primitive.
BTW, there is already Primitive.proc_create_same_arity which is kind of similar, it could even be used to fix this (together with an array/hash of lambdas with the proper arguments to copy from), but something more direct seems better.
OK, thanks!
Most helpful comment
@tomstuart Primitive can only be used with the magic comment
# truffleruby_primitives: trueor in core library files.So if you want to try that example you need
# truffleruby_primitives: trueas the first line of the eval'ed code.