Crystal: Nil assertion failed when accessing $~

Created on 1 Aug 2017  路  35Comments  路  Source: crystal-lang/crystal

When a regex match fails, accessing $~ raises an exception: https://carc.in/#/r/2gks

puts $~
Nil assertion failed (Exception)
0x5627dc75aa80: not_nil! at /usr/lib/crystal/class.cr 65:0
0x5627dc73254e: __crystal_main at /usr/lib/crystal/thread/mutex.cr 1:1
0x5627dc741989: main at /usr/lib/crystal/main.cr 12:15
0x7fc2284c44ca: __libc_start_main at ??
0x5627dc731e6a: _start at ??
0x0: ??? at ??



md5-92c0723785934ea9d164e1f739fcc327



$ crystal -v
Crystal 0.23.1 [e2a1389] (2017-07-13) LLVM 3.8.1
bug docs topicsemantic

Most helpful comment

Honestly, I wouldn't be suuuper opposed to just dropping the whole affair (including =~) and forcing users to

if match = foo.match(/bar/)
  puts match[1]
end

Yes these are nice for little one of scripts, but I don't see that as Crystal's domain anyway. For more serious software the above trumps in clarity.

All 35 comments

The documentation says:

After performing a match, the special variable $~ will be an instance of Regex::MatchData if it matched, nil otherwise.

So $~ is only valid after a match, and then it is either MatchData object or nil if it did not matched.
I find it reasonable to throw an exception if it is not accessed after a match, although it could have a specific error message.

@straight-shoota I understand, but this behavior makes impossible to use $~ whithin a condition expression after trying to match, for example: https://carc.in/#/r/2gma

"a" =~ /b/ || $~

Hmm. I think that works if you use an explicit if, but not if you're trying to use conditionals in a one-liner (as in your example). I'd generally use:

if "a" =~ /b/
    $~    # or whatever
end

This might be a situation where crystal could be smarter about the one-liner case.

@drosehn I ran up into this incompatibility when trying to port this Ruby code:

def auto_linked?(left, right)
  (left =~ AUTO_LINK_CRE[0] and right =~ AUTO_LINK_CRE[1]) or
    (left.rindex(AUTO_LINK_CRE[2]) and $' !~ AUTO_LINK_CRE[3])
end

I found a workaround, but the same happens for $1 and $1?.

IMO $~ should be nil by default, but if it's not possible, the error message should be more accurate at least. Now it gives no clue on what's happening since I'm not calling #not_nil! in my code.

I'm just another crystal user here, not a developer of the compiler. This seems to me like a situation where the compiler could do a better job, but I don't have the experience or time to investigate/fix the compiler itself. I was just suggesting one tactic you could try to work around it.

As something of an aside: Note that crystal never promises to be completely compatible with ruby. In some situations it absolutely cannot be (because the goal is to compile into very fast code). In other situations the people who are doing all the work to write crystal happen to have their own opinions on the best way for the language to work. They might agree completely with my earlier comment, or they might not. However I know that I don't have the time to write my own compiler (not even if I spent the rest of my life trying to do it!).

Cheerio.

@hugoabonizio Sring#rindex(Regex) on Crystal doesn't update $~. I guess your issue comes from this. In regard to this point, "$~ should be nil" and etc are irrelevant.

But I actually agree with the part of your opinion: "the error message should be more accurate at least."

@drosehn I understand the Crystal goals, just pointed this situation where - agreeing with you - "the compiler could do a better job". Thank you for your help! :smiley:

@MakeNowJust I understand, probably just fixing this inconsistency with other regex related methods could fix the problem. For now I found a workaround and moved on, but a better error message implemented by someone with more knoledge than me on the compiler world be nice!

It would be sweet if the compiler would somehow force you to check for a match instead of blowing up later with an NPE...since crystal in other places is "nil safe" as it were. If possible...

Given

"foo" =~ /bar/
puts $~

raises this is at least a documentation bug, though I would prefer the documented behavior too. I wonder if we can't rewrite something like

puts $~ if "foo" =~ /bar/

to something like:

"foo" =~ /bar/
__temp123 = assign_match
puts __temp123 if __temp123

to preserve the "no nil check inside if" behavior of $~ while making it nilable.

It would be sweet if the compiler would somehow force you to check for a match instead of blowing up later with an NPE...

That might actually be a valid option. The comment in https://github.com/crystal-lang/crystal/blob/master/src/compiler/crystal/semantic/main_visitor.cr#L576-L581 states that it adds not_nil! because it would be annoying to ask the user to always check for nil. That might be useful in a scripting context when you're certain that the match data result will be populated after a regex match. But as @rdp stated, this essentially contradicts Crystal's Nil safety and in introduces a null pointer exception in Crystal's core library.

not_nil! should only be used when it is safe to assume the value can't be nil for reasons the compiler is incapable to figure out or if you don't care whether the code blows up in case of an unexpected state. Neither of these reasons is valid for accessing global variables $~ and $?.

So I suppose we should remove that not_nil! call.

Maybe could introduce nn! or !! so that not_nil's aren't as...many characters to type? :) . Just brainstorming. Cheers!

No, that defeats the purpose. not_nil! is supposed to be inconvenient because it's usually to be avoided.

This can be one reason to remove the dollar sign notation (https://github.com/crystal-lang/crystal/issues/6969).

Sorry, could someone write a "tl;dr" summary of the problem and proposed solution?

$~ is nilable because when you do "foo" =~ /bar/ the match data might be nil. Now in Ruby one uses $~, $1 and others like this:

if "foo" =~ /bar/
  # Because the `if` check guarantees that $~ will be set, this will never fail
  puts $~
end

Or:

case "foo"
when /bar/
  puts $1
end

which is essentially the same.

But note that $~ can be nil, it's just that you don't usually use it outside an if that checks for a match.

That's why the compiler inserts not_nil! for you, because otherwise you'll always have to do:

case "foo"
when /bar/
  # Ugh, why am I checking for not nil if I know this is not nil?
  puts $~.not_nil!
end

So what are the proposals to improve this? (or why is this a problem in the first place? the code from OP seems like ficticious code, I never had NilAssertionFailed raised because of a misuse of $~, what about you all?)

Note it can also be reproduced with all dollar variants $0, $1 etc.
One can do this:

foo =~ /bar/
puts $~

Reversing the order, by mistake, returns an error.

Another issue is this variable get leaked:

if "foo" =~ /foo/
  puts $~
end

puts $~

Logically, the variable should be scoped, and not available before using a regex match.
One goal of the language is to catch the errors early at the compilation time.

You're right, these global variables have some special use cases and one could consider them usually guarded by a regex match. I think I somehow mixed up $~ and $1. $~ is always available when a regex match was successful, but $1 etc. only if such a capture group exists.

The issue discussed here is that when not used in a guarded branch, $~ might be nil and immediately raises. The error message is confusing to the user because it is not clear where the nil assertion comes from (it's added by the compiler and doesn't exist in code).

As long as you only use $~ as it is supposed to, that nil assertion should never fail. But it can easily happen that something goes wrong, especially if you're not familiar with the exact semantics like being visible only one scope upwards. Refactoring may also break things.

In case it goes wrong, the user should be presented with a meaningful error message.
So the suggested improvement would be to replace .not_nil! with a customized exception such as No match data available.

As mentioned in https://github.com/crystal-lang/crystal/issues/4776#issuecomment-518645780 the documentation currently states that $~ would be nil if there is no match.

So it would be an option to actually change the implementation to what is documented. But given the special nature of these global variables, I suppose this is not a good idea as per @asterite's comment. In any case, the documentation should reflect the actual behaviour.

Improving the error messages is a good idea 馃憤

I never had any issue. I always access $ variables after verifying that there was a match. I'd be very annoyed if suddenly I had nilable $ match variables.

Now, the injected not nil check message could be more explicit about having no match (unexpected). Similar to how [] accessors raise an explicit message / exception.

At least, $~.as Regex::MatchData and $0.as String would be better.

@j8r cast from Nil to Regex::MatchData failed is not much of an improvement.

You know it's about a regex syntax @straight-shoota , compared to nothing at all - it helps.

It might help a bit, but it's not enough. If we're changing it, we should change it to a good error message, not a slightly less shitty one.

Honestly, I wouldn't be suuuper opposed to just dropping the whole affair (including =~) and forcing users to

if match = foo.match(/bar/)
  puts match[1]
end

Yes these are nice for little one of scripts, but I don't see that as Crystal's domain anyway. For more serious software the above trumps in clarity.

I'd also support removing that feature entirely an alternative option. This was discussed before in #6969 and decided to keep them (though not a strong decision).

When matching a regex as a when condition it would be more complicated to rewrite without $~.

I wouldn't mind dropping them and having to write it like in https://github.com/crystal-lang/crystal/issues/6969#issuecomment-431613793 :

case
when match = line_to_parse.match(/host=(.*)/)
  hostname = match[1]
when match = line_to_parse.match(/credentials=(.*?):(.*)/)
  user = match[1]
  password = match[2]
end

vs.

case line_to_parse
when /host=(.*)/
  hostname = $1
when /credentials=(.*?):(.*)/
  user = $1
  password = $2
end

it's a bit more verbose but: no magic, simpler language, simpler compiler. But it almost reduces the main use case of ===.

@asterite It's going to be even more verbose when the when branches not only match on regular expressions, but also do comparisons, method calls etc. on the target.

But it almost reduces the main use case of ===.

Oh yes. It might not be a bad move if we could remove that as well... It's always difficult to grasp the difference between === and related operators and that it looks similar to equality comparison but is actually not commutative.
Helping to get rid of === would actually be supporting argument for removing $~.

But if we want to continue in this direction, we should have a dedicated discussion about that, and especially look closer at the consequences such a change would have.

I really tried to find an user friendly alternative, there is this option.
It's a bit like IO, the result is passed around by reference.

I think ruby uses === for range's but...that doesn't mean it has to stay I suppose...

Sorry, I meant === is pretty useful for regex and combining it with $~, but === will stay in the language forever. I don't think that needs to be discussed.

Can someone actually detail what the issue is and code to reproduce it here? Is it to do with the compiler flow checking regex matches and replacing $~ with $~.not_nil!? Is the code good or bad?

Either way the error message is bad, but i'm not sure if it's occurring when it should or not? And if it's occuring in sane code?

Yes, it's the implicit .not_nil! inserted by the compiler that causes unexpected errors. This typically happens when the code is faulty, but it's a mistake that can happen easily. Especially since the documentation states that $~
returns nil instead of raising.

A simple reproducible sample already mentioned above is "a" =~ /b/; puts $~.

@straight-shoota is $~ ever typed as nil??

AFAIK the special variable $~ is actually nilable, but only internally. The main visitor replaces each access to that variable with $~.not_nil!, so its not nilable when used in user code.

what the fuck

Haha, yeah, there's a bit of magic going on there. The $~ variable is parsed as a global variable, and then when it's typed it's replaced with $~.not_nil!, but $~ in that expression is actually a local variable.

But I don't think it matters how it's implemented as long as it behaves correctly. I do think we can and should improve the error message. I started working on this but stopped because I have limited time, but once I have more time I can fix it.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

asterite picture asterite  路  3Comments

will picture will  路  3Comments

RX14 picture RX14  路  3Comments

oprypin picture oprypin  路  3Comments

ArthurZ picture ArthurZ  路  3Comments