Sdk: Bring Dart's RegExp support in line with JavaScript: lookbehinds, property escapes, and named groups.

Created on 25 Oct 2018  Β·  19Comments  Β·  Source: dart-lang/sdk

So, it appears that JavaScript now does support positive lookbehinds (e.g. "(?<=foo)"), Unicode property escapes (i.e. "\\p{Greek}"), and named capture groups (e.g. "(?<foo>.*)"), at least according to the next release of the ECMA-262 specification. They are already supported in Chrome 64 and later.

https://github.com/tc39/proposal-regexp-named-groups
https://github.com/tc39/proposal-regexp-lookbehind
https://github.com/tc39/proposal-regexp-unicode-property-escapes

Unfortunately, the Dart VM/AOT compiler version of RegExp (i.e. when not compiled to JS) doesn't seem to support them, so the statement "Dart regular expressions have the same syntax and semantics as JavaScript regular expressions" isn't really true anymore. Some regular expressions work on Dartpad, but not in Flutter, for instance.

Dart should implement these too: they're also super useful.

area-library area-vm type-bug

Most helpful comment

I think this is great midsize project for somebody getting familiar with the VM. Assigning to @sstrickl

All 19 comments

We have a discrepancy now. Dart compiled for the web does support these features, native Dart (including Flutter) does not. It's possible for people to write libraries, e.g., using DDC, that are intended to be portable, but which fail when used on Flutter.

There have been stack-overflow answers suggesting look-behind, and they even tested it in dartpad first. I can't fault them, but it's not a valid answer for a Flutter question.

We should address this in some way. Either:

  • Say that it's deliberate, and do nothing. Web users get to use the full power of ECMAScript RegExps.
  • Make a plan to update the VM RegExp engine.
  • Do something to catch VM-invalid RegExps in browser code (e.g., a debug-mode only syntax check).

Since most RegExps are string literals, the analyzer can already now warn if a RegExp is not a valid ES5.1 RegExp.

Also, with the announcement of Hummingbird, the story gets even more muddled: someone could implement a full working Flutter example and even share a link to it on SO, and be confused as to why it fails on Flutter when compiled for mobile.

Yes, there could be an analyzer check that warns users building for mobile that they can't use features meant for the web, but that feels like more of an excuse than a real solution.

I think this fractures Flutter's "write once, run anywhere" magic that Dart powers so well for virtually everything else.

Just to clarify: We have announced Hummingbird as an experimental project, not as an officially complete & supported product.

But, yes, we'll need to work through this problem, and resolve it to preserve good compatibility across our various host targets.

moving to @kevmoo to track Hummingbird requirements. Maybe we need a label for these.

Yes, of course, it's just a tech preview. But Hummingbird isn't the only reason to want this: if someone shares business logic between AngularDart and mobile, they'll have similar issues.

Agreed!

On Tue, Dec 11, 2018 at 7:48 AM Greg Spencer notifications@github.com
wrote:

Yes, of course, it's just a tech preview. But Hummingbird isn't the only
reason to want this: if someone shares business logic between AngularDart
and mobile, they'll have similar issues.

β€”
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
https://github.com/dart-lang/sdk/issues/34935#issuecomment-446250662,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABCipKPuIazRFpCJgIhhccTj1tBeopEks5u39PAgaJpZM4X7HmQ
.

I think this is great midsize project for somebody getting familiar with the VM. Assigning to @sstrickl

We might also want to support the dotAll flag (/s) which makes . match any character, not just non-newline characters. It's not strictly necessary. Since it requires a flag that we don't pass to the JS RegExp, the current Dart RegExp doesn't support this feature on the web.

The same can be said for the Unicode flag (/u), but it's a useful feature, so we want to provide access to it. It's not urgent, but it is important.

The sticky (/y) flag is not something we need directly, we already support that with our matchAsPrefix operation on Pattern. The JS compiled version of that function could use the sticky flag if it's available (can we feature detect that?).

Features enabled entirely by RegExp syntax is the big problem here, because those features may already be available when running on the web, and they then fail when run on Flutter or the VM.

I think adding Unicode property group support (which I've been working on) definitely requires appropriate handling of non-BMP characters (and thus the Unicode flag) if you want to ensure that dart2js and the Dart VM match here. For one, Javascript only allows them in /u mode anyway. In addition, certain Unicode property groups that seem like they'd see fairly common use, like Decimal_Number, contain non-BMP characters. Without that support, a regexp that uses \p{Decimal_Number} would have different matching semantics in dart2js and Dart VM if we tried to allow it in BMP mode (i.e., restricting the property groups to only their BMP characters and throw out others).

Wait, since \p (and \P) is only allowed in Unicode mode, does dart2js just always use the Unicode flag with regular expressions, or does it attempt to detect regexps that use /u-only features and turn it on if they're found? If the former, that's an even bigger issue than I thought, as there are a fair number of changes in semantics of RegExp parsing/matching when that's enabled.

dart2js RegExp has not changed substantially since it was first written, well before ES2015. It completely ignores the possibility of /u. I'd call it a big issue.
Another issue is browser compatibility. We are trying to deprecate IE11, but we have not done so yet.

There are two issues here:

  • Current incompatibility between VM and dart2js,
  • Lack of features compared to JS.

The former is the more urgent issue. Users can write code that acts differently when compiled for the VM or JavaScript. That issue is restricted to non-flag based changes to RegExp grammar: look-behinds and named capture groups. Those features work in JS RegExps with no new flags, so Dart compiled to JS can use them, and Dart compiled for the VM cannot.

There are other new features in JavaScript regexps that require flags to enable. Since the Dart JS RegExp code does not pass those flags, the features are not available.
This includes unicode regexps (/u and the syntax changes enabled by that flag) and dot-all (/s).
We will want to catch up with those features as well, but that requires changing the RegExp object (in order to add isUnicode and isDotAll getters like we have for the other flags). We could make UnicodeRegExp a new class, but that still only handles one of the flags, and there is no design reason to do it.

Arguably, we should make named capture groups available from the Match object by name, but we can choose to not do that yet (they are also available as positional groups). We can even stealth-introduce it by letting our RegExp implementation return a RegExpMatch with more features, without changing the RegExp interface. Then only users aware of this would be able to down-cast the Match to a RegExpMatch and access the new features. We can then combine that with the other changes above for when we need to change the RegExp interface, and only break things once (if anyone implements RegExp, which they probably shouldn't).

Reopening due to revert in 9238e253055b0028, checking into issue that caused revert now.

Did this ever get implemented?

Yes, it was part of 2.4 release.

Also the Unicode part? I'm not seeing \p{Lu} detecting uppercase letters.

@larssn Did you pass unicode: true to the RegExp constructor?

No I did not! πŸ˜„

Perfect, it works. Thank you.

Hi @larssn and @mraleph . I am using p{Myanmar} to detect Burmese language in my flutter app. But it doesn't work. I also pass unicode : true to the constructor. Would you mind to help me cuz I am new to flutter.
image

This issue is closed, so you are more likely to get a timely response if you post your question as a new issue.

In this case, I believe there are two issues with the code.
First, the pattern should be a raw string, to avoid the \p being converted to just p. Second, the Myanmar script is a value for the Script property, not a category by itself, so I believe the pattern should be declared as:

Pattern pattern = r'\p{Script=Myanmar}`;`

With that change, it should work.
Running the following code on dartpad or on the native VM works for me:

  var re = RegExp(r"\p{Script=Myanmar}+", unicode: true);
  print(re.firstMatch("α€™α€Όα€”α€Ία€™α€¬α€˜α€¬α€žα€¬")[0]);  // prints α€™α€Όα€”α€Ία€™α€¬α€˜α€¬α€žα€¬
Was this page helpful?
0 / 5 - 0 ratings

Related issues

rinick picture rinick  Β·  3Comments

sgrekhov picture sgrekhov  Β·  3Comments

DartBot picture DartBot  Β·  3Comments

ranquild picture ranquild  Β·  3Comments

brooth picture brooth  Β·  3Comments