Truffleruby: to_enum(:scan, ...) does not set Regexp.last_match

Created on 28 Nov 2018  路  10Comments  路  Source: oracle/truffleruby

A weird effect of using to_enum with scan is that it will not set the Regexp.last_match. Using scan alone does it just fine.

In MRI and JRuby:

"wawa".to_enum(:scan, /./).map {|o| o }
 #=> ["w", "a", "w", "a"]
"wawa".to_enum(:scan, /./).map {|o| Regexp.last_match }
 #=> [#<MatchData "w">, #<MatchData "a">, #<MatchData "w">, #<MatchData "a">]

In truffleruby:

"wawa".to_enum(:scan, /./).map {|o| o }
 #=> ["w", "a", "w", "a"]
"wawa".to_enum(:scan, /./).map {|o| Regexp.last_match }
 #=> [nil, nil, nil, nil] 
Regexp.last_match
 #=> nil
# Showing that Regexp.last_match wasn't changed at all
"hello".scan('l') 
 #=> ["l", "l"]
"wawa".to_enum(:scan, /./).map {|o| o }
 #=> ["w", "a", "w", "a"]
"wawa".to_enum(:scan, /./).map {|o| Regexp.last_match }
 #=> [#<MatchData "l">, #<MatchData "l">, #<MatchData "l">, #<MatchData "l">] # We still have "l"
Regexp.last_match
 #=> #<MatchData "l"> # We still have "l"
bug

All 10 comments

Looks like an easy fix: https://github.com/oracle/truffleruby/blob/c4cda962fa9f84d524a6216218cf59fbf7bd77be/src/main/ruby/core/string.rb#L219 should set $~ in block.binding and not the caller binding.

@eregon can you make a priority with this issue if the fix is easy, please?

@deepj I'll take care of this, I forgot about it.
Please mention @chrisseaton for priority issues.

@chrisseaton Please, is this possible to make a priority, please?

Will do.

@MaxLap FWIW, a simpler and better workaround than https://github.com/deep-cover/deep-cover/commit/f3da81b384dd503c4747af24eaac3d4b38bddf47 (e.g., that doesn't work if the matcher matches an empty String) is:

# Instead of:
source.to_enum(:scan, matcher).map { Regexp.last_match }
# This works around:
match_datas = []
source.scan(matcher) { match_datas << Regexp.last_match }
match_datas

Additionally that workaround is also faster on CRuby: https://github.com/rubocop-hq/rubocop/pull/8602

The bug here is due to to_enum having a few blocks of indirection, and so when we set $~ we end up setting it on an internal block of Enumerable#map and not the user-supplied block. I'm looking at how to fix this.

Removing this from priority because it seems really hard to fix unfortunately, and there is an easy workaround (see my message just above).

One potential idea to fix this is to mark some internal methods/blocks as ignored for $~, and keep searching further.
That could work for cases where $~ is set on the caller method.
Here it is set on a block though, so maybe we'd need to mark the intermediate block as ignored for $~ and remember the original block by storing it maybe as a field of that intermediate block.

Now we've refactored how special variables are handled we can actually implement this pretty efficiently. Essentially we can do something like

def map(&block)
  sv = Primitive.proc_special_variables(block)
  each { |x|
    Primitive.regexp_last_match_set(sv, $~) if $~
    Primitive.io_last_line_set(sv, $_) if $_
    ret = yield(x)
    $~ = nil
    $_ = nil
    ret
  }
end

This isn't quite perfect, we should ideally have a marker value so we can correctly detect when $~ and $_ have been specifically changed to nil. There are a couple of places where we need to change this, but it's a fairly small change overall.

This was resolved in https://github.com/oracle/truffleruby/commit/94d8eb08c75a696cd829c6cad5f9919cd137086d using a variation of the technique I suggested above, so I will close this bug.

Was this page helpful?
0 / 5 - 0 ratings