In Ruby, String and Regexp both have a match? method which returns true or false depending on whether there is a match or not, rather then MatchData.
This is quite useful as often all you what to know is _if_ there is a match or not, rather than the details.
So for example, if I want to know if all the letters in a content string are in caps (e.g. "yelling"), but not the details,
I'd just like to use match? and return true or false
e.g.
[6] pry(main)> 'THE dog IS NOT IN CAPS'.match?(/^[^a-z]*[A-Z][^a-z]*$/)
=> false
[7] pry(main)> 'THE CAT IS IN CAPS'.match?(/^[^a-z]*[A-Z][^a-z]*$/)
=> true
At the moment String and Regex only support match : MatchData?, not match?.
To achieve the same with Crystal ("spare me the details - just tell me if it matches") is possible with a double - bang...
$ cat issue.cr
puts !!"THE dog IS NOT IN CAPS".match(/^[^a-z]*[A-Z][^a-z]*$/)
puts !!"THE CAT IS IN CAPS".match(/^[^a-z]*[A-Z][^a-z]*$/)
$ crystal issue.cr
false
true
...but I think it would be more intuitive (and readable) if there was a match? : Bool
I think this would be nice to have, though I guess it should be named matches?.
The advantage would be to avoid creating a MatchData so it would be slightly faster. But we should see if it's really faster because MatchData in Crystal is a struct, though some memory is allocated by PCRE, but I don't know if that can be avoided.
If it's really faster I'd be okay with it, if not I think MatchData? is just fine, since nil is falsey in any boolean expresssions.
man pcreapi says
If neither the actual string matched nor any captured substrings are of interest,
pcre_exec()may be called withovectorpassed asNULLandovecsizeas zero.
So, we can avoid PCRE substring capture memory allocation at least. However I don't know how much this effect performance.
I implemented matches? naively:
patch:
diff --git a/src/regex.cr b/src/regex.cr
index db5ad35cf..d6aeee121 100644
--- a/src/regex.cr
+++ b/src/regex.cr
@@ -481,8 +481,7 @@ class Regex
ovector_size = (@captures + 1) * 3
ovector = Pointer(Int32).malloc(ovector_size)
- ret = LibPCRE.exec(@re, @extra, str, str.bytesize, byte_index, (options | Options::NO_UTF8_CHECK), ovector, ovector_size)
- if ret > 0
+ if internal_matches?(str, byte_index, options, ovector, ovector_size)
match = MatchData.new(self, @re, str, byte_index, ovector, @captures)
else
match = nil
@@ -491,6 +490,26 @@ class Regex
$~ = match
end
+ def matches?(str, pos = 0, options = Regex::Options::None) : Bool
+ if byte_index = str.char_index_to_byte_index(pos)
+ matches_at_byte_index?(str, byte_index, options)
+ else
+ false
+ end
+ end
+
+ def matches_at_byte_index?(str, byte_index = 0, options = Regex::Options::None) : Bool
+ return false if byte_index > str.bytesize
+
+ internal_matches?(str, byte_index, options, nil, 0)
+ end
+
+ private def internal_matches?(str, byte_index, options, ovector, ovector_size)
+ ret = LibPCRE.exec(@re, @extra, str, str.bytesize, byte_index, (options | Options::NO_UTF8_CHECK), ovector, ovector_size)
+ # TODO: when `ret < -1`, it means PCRE error. It should handle correctly.
+ ret >= 0
+ end
+
# Returns a `Hash` where the values are the names of capture groups and the
# keys are their indexes. Non-named capture groups will not have entries in
# the `Hash`. Capture groups are indexed starting from `1`.
benchmark script:
require "benchmark"
# This regular expressions comes from marked.js.
# https://github.com/markedjs/marked/blob/master/lib/marked.js#L466-L469
regex = /^!?\[((?:\[[^\[\]]*\]|\\.|`[^`]*`|[^\[\]\\`])*?)\]\(\s*(<(?:\\[<>]?|[^\s<>\\])*>|[^\s\x00-\x1f]*)(?:\s+("(?:\\"?|[^"\\])*"|'(?:\\'?|[^'\\])*'|\((?:\\\)?|[^)\\])*\)))?\s*\)/
README = File.read("README.md")
Benchmark.ips do |x|
x.report("match") { README.size.times { |i| regex.match(README, i) } }
x.report("matches?") { README.size.times { |i| regex.matches?(README, i) } }
end
And compiling benchmark script with --release and running it under crystal/crystal repo, then:
match 3.43k (291.45碌s) (卤 5.38%) 201kB/op 1.73脳 slower
matches? 5.95k (168.01碌s) (卤 6.00%) 0.0B/op fastest
I'll open a new PR if needed.
Yes, please!
Most helpful comment
I implemented
matches?naively:patch:
benchmark script:
And compiling benchmark script with
--releaseand running it under crystal/crystal repo, then:I'll open a new PR if needed.