Regexp.compile seems to ignore multiLine=true.
Source Code:
use Regexp;
config var multiLine = true;
const reLeadingWhitespace = compile("^a$", multiLine=multiLine);
{
const s = "a";
var indents = reLeadingWhitespace.matches(s);
writeln('indents.size:', indents.size);
}
{
const s = "a\n";
var indents = reLeadingWhitespace.matches(s);
writeln('indents.size:', indents.size);
}
Output of running above program:
./example --multiLine=true
1
0
./example --multiLine=false
1
0
Expected output:
./example --multiLine=true
1
1
./example --multiLine=false
1
0
See associated regexr as a correctness reference.
chpl --version: chpl version 1.23.0 pre-release (188e1b3772)I chatted a bit about this issue with @e-kayrakli and @mppf offline.
Some next steps to consider:
opts.multiline == 'm' in the re2 wrapper code?I dug a bit deeper and I think this is a documentation issue. (re2 itself has
pretty limited documentation)
multiLine has no effect (hardwired to false) if you are not using posix,
which we don't by default. The solution is either to use POSIX syntax by passing
posix=true to compile, or use m flag in your regular expression. So, the
above code must be rewritten as:
use Regexp;
config var multiLine = true;
var regexpString = "^a$";
if multiLine then
regexpString = "(?m:"+regexpString+")";
else
regexpString = "(?:"+regexpString+")";
const reLeadingWhitespace = compile(regexpString);
{
const s = "a";
var indents = reLeadingWhitespace.matches(s);
writeln('indents.size:', indents.size);
}
{
const s = "a\n";
var indents = reLeadingWhitespace.matches(s);
writeln('indents.size:', indents.size);
}
where m flag is applied to the group.
See https://github.com/google/re2/blob/e48b461c1e3e09574300587672c2498b77bc24dc/re2/re2.h#L579-L585
Things we can do:
We should definitely do 1.
I think we can also do 2, but it'll have to be a runtime warning as these flags
are not params. We can have param overloads, and give only compilerWarnings but
I am not sure if the asymmetry is a good idea. Given that this is not probably
gonna happen in performance-critical code, runtime warnings shouldn't be a huge
deal except for annoyance. We can also think about adding a CHPL_REGEXP_QUIET
or something to control the behavior.
@e-kayrakli - do you know a reason why we cannot just put this
if multiLine then regexpString = "(?m:"+regexpString+")";
in the regexp module itself based on the already existing multiline option?
I don't. And it may be an option.
But I don't have experience in passing flags to ~capture~ groups like this.
If the user-provided regexpString already contained groups, would wrapping those in (?m: ... ) be problematic? I don't know how nested groups works with re2.
I believe that the (?m:) syntax does not create a new capture group (but if it does there is other syntax to ask it not to do that).
I believe that the (?m:) syntax does not create a new capture group (but if it does there is other syntax to ask it not to do that).
That's right, fixed my comment above.
Nonetheless, I just wanted to express my uncertainty about introducing a nesting level with that approach. @ben-albrecht captured that though much better than I did.