Crystal: Bug with optional-groups in Regex processing

Created on 2 Feb 2017  路  29Comments  路  Source: crystal-lang/crystal

I believe I came across a bug in:

# brew list crystal-lang --versions
crystal-lang 0.20.5_2
# crystal --version
Crystal 0.20.5 (2017-02-01)
# On macOS 10.10.5 (Yosemite), Darwin Kernel Version 14.5.0

This came up while modifying a simple ruby script that I have so that I could compile it in crystal. In the source code shown at the bottom of this post, the key lines are:

    thisarg = "fri12:13:14fri"  # this value works
    thisarg = "12:13:14"        # this value dies
    case thisarg
    when /^ (Sun|Mon|Tue|Wed|Thu|Fri|Sat)? (\d\d):(\d\d):(\d\d) (Sun|Mon|Tue|Wed|Thu|Fri|Sat)? $/xi
      # ...
    printf "#DBG:  wday#1=%s wday#5=%s\n", $1.inspect, $5.inspect
    user_wday = $1
    user_wday = $5      unless $5.nil?

What's happening there is that I have an optional-group to match the days of the week. The intent that a day of the week could appear before the timestamp, after the timestamp, or not at all. When I run this program, I get:

# crystal build delay_to_BUG.cr && ./delay_to_BUG && echo yes
#DBG:  processing thisarg="12:13:14"
#DBG:  at 'when' pattern#1
#DBG:  hr=12 min=13 sec=14
#DBG:  processing basetime="2017-02-02 12:13:14 -0500"
Nil assertion failed (Exception)
0x1055c8f02: *CallStack::unwind:Array(Pointer(Void)) at ??
0x1055c8ea1: *CallStack#initialize:Array(Pointer(Void)) at ??
0x1055c8e78: *CallStack::new:CallStack at ??
0x1055ba651: *raise<Exception>:NoReturn at ??
0x1055ba631: *raise<String>:NoReturn at ??
0x1055e3793: *Nil#not_nil!:NoReturn at ??
0x105608c0e: *Regex::MatchData#[]<Int32>:String at ??
0x1055b9beb: __crystal_main at ??
0x1055c79d8: main at ??

(as you'll see, all those #DBG lines are printed by my program). If I change the program to use the value "fri12:13:14fri" (which has the day-of-week both before and after the timestamp), then I get:

# crystal build delay_to_BUG.cr && ./delay_to_BUG && echo yes
#DBG:  processing thisarg="fri12:13:14fri"
#DBG:  at 'when' pattern#1
#DBG:  hr=12 min=13 sec=14
#DBG:  processing basetime="2017-02-02 12:13:14 -0500"
#DBG:  wday#1="fri" wday#5="fri"
Will sleep until Fri Feb 03 12:13:14

I haven't tested that the code is actually working the way I intend it to, but at least it doesn't die somewhere in Regex processing. Here's the full source code. This is a subset of the full script, but I wanted to include enough so you could see why I'm trying to do what I'm trying to do:

# -------+---------+---------+-------- + --------+---------+---------+---------+
# Copyright (c) 2015,2016    - Garance Alistair Drosehn <[email protected]>.
# Copyright (c) 2017       for Crystal-ized version, also by Garance.
# All rights reserved.
#
# -------+---------+---------+-------- + --------+---------+---------+---------+

st = Time.now       # starting time
newhr = st.hour
newmin = st.minute
newsec = st.second
newmsec = st.millisecond   # crystal uses milliseconds, ruby uses microseconds
addsec = 0

    thisarg = "fri12:13:14fri"  # this value works
    thisarg = "12:13:14"        # this value dies

    printf "#DBG:  processing thisarg=\"%s\"\n", thisarg
    case thisarg
    when /^ (Sun|Mon|Tue|Wed|Thu|Fri|Sat)? (\d\d):(\d\d):(\d\d) (Sun|Mon|Tue|Wed|Thu|Fri|Sat)? $/xi
    printf "#DBG:  at 'when' pattern#1\n"
    newhr  = $2.to_i
    newmin = $3.to_i
    newsec = $4.to_i
    newmsec = 0
    addsec = 0
    printf "#DBG:  hr=%d min=%s sec=%s\n", newhr, newmin, newsec
    bt = Time.new(st.year, st.month, st.day, newhr, newmin, newsec, newmsec,
        Time::Kind::Local)
    printf "#DBG:  processing basetime=\"%s\"\n", bt.to_s

    #   If the user specifies the DayOfWeek both before and after the
    #   timestamp, then take the value after the timestamp.
    printf "#DBG:  wday#1=%s wday#5=%s\n", $1.inspect, $5.inspect
    user_wday = $1
    user_wday = $5      unless $5.nil?
    unless user_wday.nil?
        case user_wday.downcase #   Check the lowercase version.
        when "sun" then want_wday = 0
        when "mon" then want_wday = 1
        when "tue" then want_wday = 2
        when "wed" then want_wday = 3
        when "thu" then want_wday = 4
        when "fri" then want_wday = 5
        when "sat" then want_wday = 6
        else
        STDERR.printf "*** internal error processing day-of-week: '%s'\n",
            thisarg
        exit 1
        end
        if bt.day_of_week != want_wday
        add_days = (want_wday - bt.day_of_week.to_i)
        add_days += 7       if add_days < 0
        addsec += (add_days * 24 * 3600)
        end
    end

    #   If the given %H:%M:%S time is earlier than "now", then move
    #   it up by 24 hours so it will be tomorrow.
    sleep_sec = (bt - st).to_i + addsec
    addsec += 24 * 3600     if sleep_sec <= 0
    end


begin
    bt = Time.new(st.year, st.month, st.day, newhr, newmin, newsec, newmsec,
    Time::Kind::Local) + Time::Span.new(0, 0, addsec)
    sleep_sec = bt - st
    printf "Will sleep until %s\n", bt.to_s("%a %b %d %T")
    sleep sleep_sec
rescue  # Interrupt (how ruby would catch the control-C)
    printf "\n#...Attn! at %s\n", Time.now.to_s("%T")
    exit 2
end
exit 0
question

All 29 comments

And to cheat with a quick add-on question: How do I get crystal to capture control-C's? It doesn't seem to have the rescue Interrupt which I use in the ruby script...

Hey @drosehn! Thanks for the report. The catch here is that $1, $2, etc in Crystal regexes are of type String; so when dealing with an optional group, and attempting to access it, the nil value is tried to be coerced to String and fails.

If you want to access an optional group as a String?, you'd need to use the $1? syntax, or the match_data[1]? one if using the .match method.

Still, the error reported could be more descriptive, and I'm not sure if the $1? syntax is documented at all. I'll open new issues for those items while closing this one.

Hello! Thanks for the quick reply! But I believe there really is a problem here, and that we should take a closer look at it.

In the version that blows up, the program does successfully print out:

#DBG:  processing basetime="2017-02-02 12:13:14 -0500"

The next line after that is:

    printf "#DBG:  wday#1=%s wday#5=%s\n", $1.inspect, $5.inspect

That debug statement is not printed. And I believe that any Object.inspect should succeed. As an example, if the $1 and $5 there are replaced with nil, the print statement succeeds:

printf "#DBG:  wday#1=%s wday#5=%s\n", nil.inspect, nil.inspect

produces: #DBG: wday#1=nil wday#5=nil

If we skip over that print statement to see what's after it, the code is doing:

    user_wday = $1
    user_wday = $5      unless $5.nil?
    unless user_wday.nil?

No matter what $1 is, I expect that I should be able to save it to a new variable. And $5 will only replace that setting if it is not nil. So it seems to me that after the second statement, user_wday must be either be nil or it is a String. In the case where this fails, I am expecting it to be nil. And in the original ruby script, it is nil at this point. All of the code which expects user_wday to be a String is inside that unless user_wday.nil? conditional clause. So I do not understand why this code would run into any problem.

And if it did run into a problem, I'd expect an error message at compile time; one which mentioned String | Nil. I've certainly generated a lot of those compile-time errors when converting my ruby scripts into crystal! 馃槃 I would not expect a Nil assertion failed (Exception) at run time, where that exception occurs somewhere in Regex::MatchData#[]<Int32>:String at ??.

You are absolutely right in that any Object.inspect should work. The reason why the $1.inspect fails is that $1 itself (regardless of the inspect) is actually hiding a computation that involves a not_nil!. So you cannot save it to a new variable, since it actually fails to compute.

What you are trying to do would work exactly as you expect if you use $1? everywhere instead of $1.

Let me know if this clarifies the issue!

Well, I understand what you're saying now, but that strikes me as a really odd and unpleasant way for this to be handled. Especially if $1.inspect cannot work when $1 is an optional-capture-group.

Just make sure to use $1?.inspect, with the ?, for the cases where the capture group is optional

I would understand if I ran into problems while trying to reference $1 as a String, but it seems weird to say that $1.nil? "involves a computation", where that computation (which crystal is in control of) needs to generate an exception instead of returning "Yes, $1 is a nil".

I do understand what you're saying. I'm just wondering if crystal could handle .nil? in a friendlier way.

trying to reference $1 as a String

I think this is the core of the argument. It's not that you reference it as a particular type, it's that Crystal needs $1 to have a type regardless of how to use it, and its type is String, not String?. This makes $1.nil? pointless, since $1 is already a non-nillable String.

So, if you try to make use of $1 in a case where group 1 was not captured, Crystal simply raises, as it would if you try to access match_data[1] when match_data has no group 1.

As one additional argument backing the current behavior, if $1 could be nil, then the type would be String | Nil, then every time you want to perform some String operation with it you will need to check it is not a nil. Most of the time you are actually matching something, hence it is better if $1 is not nillable as an expression and let $1? handle the nilable case.

It is the same story when accessing ENV or a Hash, #[](index) returns a non nilable value but #[]?(index) does.

Doing something more clever with regex might seem to work with regex literals, but won't work if regex to match with is determined in runtime.

Well, I do understand what people are saying here. And I should probably leave it at that, since I don't want to sound like I'm upset and arguing the point. I just wish that the two methods of .nil? and .inspect could handle these situations with some "magic" (to use the technical term) which would be friendlier to the programmer. Particularly programmers who will find themselves constantly switching between ruby and crystal.

However, I do not know anything about the code in the area where that "magic" would need to be implemented, and I can believe that adding the magic would cause other very undesirable problems in a compiled (instead of interpreted) environment. So l'll stop.

Adding the documentation for $1? will certainly help.

So, any quick answer for rescue Interrupt? 馃槃

Hehe no worries about arguing, we're always happy to discuss about the language :-)

I think now I can see what you mean about inspect handling _magic_ cases. The catch would be for inspect to actually be able to catch any exceptions when resolving the object being inspected. This can be handled via macros, something like:

macro inspect_with_magic(x)
  begin
    {{ x }}.inspect
  rescue e
    e.message
  end
end

/(A)?(B)?C/ =~ "BC"
pp inspect_with_magic($1)
pp inspect_with_magic($2)

The output for this program is:

inspect_with_magic($~[1]) # => "Nil assertion failed"
inspect_with_magic($~[2]) # => "\"B\""

However, inspect is not designed to work like this; though I can see how an inspect_with_magic could come in handy when debugging.

Sorry about missing that one! You can use Signal#trap for that:

Signal::INT.trap do |x|
  puts "Handling INT"
end

Yet again I think this comes down to the difference of "least surprise" between crystal and ruby developers. This behaviour is not a surprise at all given that $1 has type String. However as a crystal developer calling $1.nil? and recovering true would be a huge surprise. For a ruby developer it would not.

Given that, I think catering to the principle of least surprise for crystal developers, instead of ruby developers is the correct strategy. Just as uou should not go into "js mode" when learning C, you should not go into "ruby mode" when learning crystal. I really do think that the two languages have little enough in common to make that comparison apt.

You have the advantage of having worked with the compiler enough to know what it's doing and how it is doing what it is doing. I'm blissfully ignorant of the implementation details, and am just wondering what might be helpful and less-risky for the programmer.

The magic I was thinking of would be more along the lines of:

    inspect_1 = $1.inspect  rescue "nil as (String?)"
    inspect_5 = $5.inspect  rescue "nil as (String?)"
    printf "#DBG:  wday#1=%s wday#5=%s\n", inspect_1, inspect_5
    nil_1? = $1.nil?        rescue true
    nil_5? = $5.nil?        rescue true
    printf "#DBG:  nil wday#1=%s wday#5=%s\n", nil_1?, nil_5?
    user_wday = nil
    user_wday = $1      unless nil_1?
    user_wday = $5      unless nil_5?

except, of course, that I want $1.nil? to work like nil?_1 without the clutter of temporary variables.

The only two methods where I want this magic are .nil? and .inspect. For every other method call I'm completely happy with what crystal is doing. Note that I am (naively) hoping that this could be implemented by a simple if statement somewhere. I understand that it would be absurd if crystal needed to invisibly wrap every call to .nil? on every object with an actual rescue clause in order to do this magic.

Also note that I'm coming at this as a person who will need to switch between ruby and crystal on a daily basis. I'm not writing the next replacement for Rails, where I'll be writing one huge program for the next six years. I'm writing lots of programs, most of which are under 1000 lines, and switching back-and-forth based on the emergency of the day. So, yes, it'd be nice for me when I can write the same code in crystal and in ruby. This is a case where I'd be required to use $1? in crystal, and unable to use it in ruby. And in this case, I really don't see any advantage to that requirement.

Again, I'm not upset here. I just pointing out that I work in a very different environment. If I could get a job working full-time at writing crystal, I'd be very happy to take that!

@spalladino : Signal::INT.trap works. Thanks!

Would it make sense to mention that in the documentation for Exceptions? Not a long tutorial on ruby vs crystal, but just a comment such as "Note that there are some events which ruby will handle via exceptions, but would be handled via Signal in crystal."

Just enough to get the programmer looking in the right direction.

This is a case where I'd be required to use $1? in crystal, and unable to use it in ruby.

I understand. That is a big difference with Ruby that we decided to make. It's more visible in hashes than in regexes, where hash[:missing_key] raises in Crystal instead of returning nil as in Ruby, and you need to fall back to hash[:missing_key]? for that behaviour. Hopefully it will become natural to look out for that stuff after some time working part-time with Crystal.

If I could get a job working full-time at writing crystal, I'd be very happy to take that!

Love to hear that :-)

As for the signal trap, agree. There's actually much missing from the docs, so if you'd like to send a PR with that, it'd be much welcome. However, I'd like to avoid explaining it _as opposed to Ruby_, since some Crystal programmers might have no Ruby background, so the docs should focus exclusively on Crystal. Maybe as another chapter in https://crystal-lang.org/docs/syntax_and_semantics/exception_handling.html? Though it's not exactly exceptions, so I'm not sure. Where would you expect to find it in the docs?

@drosehn the main problem with that magic is that they would be exceptional rules to the semantic. the $1 is an expression that when it's value is been computed it raises (due to #not_nil!). Allowing a method call over that expression to succeed will require some sort of laziness that it's not present in crystal.

The only workaround I can imagine for you is to use macros to _mimic_ some sort of laziness.

macro v?(n)
  begin
    !{{n}}.nil?
  rescue e
    if e.message == "Nil assertion failed"
      false
    else
      raise e
    end
  end
end

def foo
  raise "other exception"
end

/(a)?/ =~ "b"
p v?($1) # => false

/(a)?/ =~ "a"
p v?($1) # => true

p v?(foo) # => raises "other exception"

When I got the compile-time error that Interrupt was not defined (or something like that), I went to:
https://crystal-lang.org/api/Exception.html

My thinking was "crystal has a different name for this exceptional event, I wonder what it is?"

I also went to:
https://crystal-lang.org/docs/syntax_and_semantics/exception_handling.html

but only because the first one did not give me any useful pointers. I wouldn't really expect a pointer to signals to come up there. And given that I thought was looking for some specific API detail, I wouldn't have looked there first.

Long shot here - is it allowed to redefine a macro?

There's ruby.cr, a shard that tries to provide a _Ruby compatibility layer for Crystal_, and maybe _there_ it makes sense to patch $1 and the like to work as in Ruby - ie, allow .inspect and .not_nil? to succeed on a non-existent match.

I agree with both staying true to Crystal's philosophy and least surprise for Ruby programmers - in that order. Maybe this shard is the compromise we're looking for.

Sincerely, I'm not a big fan of adding switches (or shards) to change the behaviour of the language in some projects. This would make it quite difficult to share Crystal code between devs, as you'd need to be aware of what kind of redefinitions or aliases are in place, and goes against the philosophy of having a single way of doing things (ie no aliases, code formatter, etc).

That being said, it's not inspect and not_nil? that need to be changed, but$1, since the exception occurs when trying to evaluate the group itself, and not when invoking anything on it. And I'm not sure how $1 is implemented, so can't be sure whether it can be redefined.

I think it would be difficult to implement what I want in a compatibility layer. Note that my question is bigger than the variable $1, although that is the example which would come up the most (especially for ruby developers, but I suspect you'd find a similar mindset in perl and python developers). But I've spent too much time talking about this already, so I won't go into a full description. The short version is that I would want the magic to work for any variable of any type which is similar to how $1 is handled. If the magic worked only for $1, then programmers would find that to be inconsistent and annoying. (IMO, at least)

But @bcardiff 's explanation of what would be needed at the implementation level is convincing enough to me. If the magic was easy to implement, then it'd be nice to do it. But it's not worth a major re-factoring of the compiler!

Just one minor clarification: note that $1 is not a variable, but a method call: it's syntax sugar for matchdata[1].

(edit for clarity: I posted this before seeing spalladino's last comment)

Yet again I think this comes down to the difference of "least surprise" between crystal and ruby developers. This behaviour is not a surprise at all given that $1 has type String.

I'm a newcomer to Crystal who also has a Ruby background, so perhaps I'm just not grokking what you're saying, but what surprises me is that accessing $1 performs any computations at all, since we typically think of "variables" as being references to already-existing objects. This is different from the missing-hash-key case, since it's understood that the receiver of the [] method can do whatever handling it wants, including raising exceptions.

Simply changing the literature to refer to $1, $~, and friends as methods, rather than "special variables", would ameliorate this surprise.

@ezrast Yes, the whole idea of "special variables" is a surprise to me in the first place. They stand out to me as a wart in crystal's design, an unnecessary holdover from ruby. However if you do know it's a method call (which is not obvious) it becomes a lot clearer.

The $ for regex matches were kept for case to read better than noisy repetition:

case line
when /(.+):(.*)/
  {$1, $2}
when /(.+)=(.*)/
  {$1, $2}
end

if m = line.match(/(.+):(.*)/)
  {m[1], m[2]}
elsif m = line.match(/(.+)=(.*)/)
  {m[1], m[2]}
end

There would also be no possibility to access them when using the =~ and !~ operators:

if line =~ /(.+):(.*)/
  {$1, $2}
end

if m = line.match(/(.+):(.*)/)
  {m[1], m[2]}
end
Was this page helpful?
0 / 5 - 0 ratings

Related issues

oprypin picture oprypin  路  3Comments

Sija picture Sija  路  3Comments

asterite picture asterite  路  3Comments

costajob picture costajob  路  3Comments

lgphp picture lgphp  路  3Comments