Crystal: Remove special dollar sign variables in Regex

Created on 20 Oct 2018  路  13Comments  路  Source: crystal-lang/crystal

Special dollar sign variables like $~ or $1 aren't needed for Regex - captured groups can already be accessed through Regex::MatchData with brackets like md[1].
I don't know any other place where the dollar sign is used in the language since the removal of global variables in https://github.com/crystal-lang/crystal/issues/4715.

Most helpful comment

Unless there are typing / safety issues or future issues with incremental/modular compilation I don't see a reason to remove them.

It's true that there are not essential but they serve well for matching and extracting IMO.


@j8r note that $1, etc are not global variables. Its substitution by class variables is not accurate.

All 13 comments

There's also still a global variable used in system.

AFAIK, $~ is the only way to access the match data when using a Regex in a when statement. I wonder if a more general way to get at the results of the === comparison in a when block would be useful.

edit for clarification: this is the kind of code I want to preserve:

case line_to_parse
when /host=(.*)/
  hostname = $1
when /credentials=(.*?):(.*)/
  user = $1
  password = $2
end

Yes, the above example is the only reason worth having $1 in the language.

That said, I wouldn't mind removing them, and also removing $? (but for that we'll have to remove the backtick method too). You can write that with a series of if-else. It's uglier, more code to write, but it's semantically the same and it works:

if match = line_to_parse.match(/host=(.*)/)
  hostname = match[1]
elsif match = line_to_parse.match(/credentials=(.*?):(.*)/)
  user = $1
  password = $2
end

We can use class variables to replace the global scope of dollar variables.
For example here is a rather naive and clumsy example:

class String
  @@match : Regex::MatchData? = nil
  def self.match : Regex::MatchData
    @@match.not_nil!
  end

  def m(regex : Regex)
    @@match = self.match regex
  end
end
line_to_parse = "credentials=host:test"
case line_to_parse
when .m /host=(.*)/
  hostname = String.match[1]
when .m /credentials=(.*?):(.*)/
  _, user, password = String.match
end
p user, password # => "host" "test"

There are currently no global variables used, but you are suggesting replacing this with global variables. That definitely won't do.

Invalid memory access for this example:

class String
  @match : Regex::MatchData? = nil
  def match : Regex::MatchData
    @match.not_nil!
  end

  def m(regex : Regex)
    @match = self.match regex
  end
end

# Invalid memory access (signal 11) at address 0x562dd12ba3f0

line_to_parse = "credentials=host:test"
case line_to_parse
when .m /host=(.*)/
  hostname = line_to_parse.match[1]
when .m /credentials=(.*?):(.*)/
  _, user, password = line_to_parse.match
end
p user, password

If Regex#=== would return the MatchData object, case could be changed to optionally store the results of the case comparison and we wouldn't need $1:

case "line_to_parse" and_keep_result_as |md|
when /host=(.*)/
  hostname = md[1]
when /credentials=(.*?):(.*)/
  user = md[1]
  password = md[2]
end

That might allow for some other interesting things to be done with === overloads.

@ezrast That looks way to complicated to be worth thinking about it :D
Then it would be better to just use if with match. Less complexity in the language and only a little bit more code to write.

However, I don't see a real reason to remove these dollar accessors for regex groups. They might look a bit off, especially since there are no global variables. But in the end, I think I'd prefer to keep them because they're easy to use and I don't think there are any real issues with them.

Don't fix what's not broken they say :)

Well, it does make the implementation of the compiler more complex. And if you use them without a match you get an ugly exception, and this is not trivial to fix. So yeah, removing these will at least simplify the compiler implementation.

Unless there are typing / safety issues or future issues with incremental/modular compilation I don't see a reason to remove them.

It's true that there are not essential but they serve well for matching and extracting IMO.


@j8r note that $1, etc are not global variables. Its substitution by class variables is not accurate.

They're not beautiful, but they're limited to being passed only a single call up the call stack and they're safe. There have been discussions about removing them before (started by me) and we arrived at the same conclusion as this thread. Closing.

If there are still bugs surrounding them and bad error messages, that should be their own issues.

What about this solution?

class Result
  property match_data : Regex::MatchData { raise "no match" }
  def initialize
  end
  forward_missing_to match_data
end

class String
  def match(regex : Regex, result : Result)
    if result && (match_data = match regex)
      result.match_data = match_data
    end
  end
end

result = Result.new
case "abcd0123"
when .match(/([0-9]+)/, result)
  puts result[1] #=> 0123
end

Removing the dollar signs will solve the cryptic nil assertion failed (https://github.com/crystal-lang/crystal/issues/4776), and will bring more consistent style.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

RX14 picture RX14  路  3Comments

relonger picture relonger  路  3Comments

cjgajard picture cjgajard  路  3Comments

asterite picture asterite  路  3Comments

oprypin picture oprypin  路  3Comments