Stl: <regex>: vNext overhaul

Created on 22 Dec 2019  路  8Comments  路  Source: microsoft/STL

Just a question: It seems an ABI breaking release is on the (distant) horizon and I was wondering if you'll use the opportunity to overhaul std::regex or if you have practically given up on it due to the vicious cycle of small user base and bad performance.

Also tracked by Microsoft-internal VSO-110128 / AB#110128 and VSO-177627 / AB#177627.

vNext note: Resolving this issue will require breaking binary compatibility. We won't be able to accept pull requests for this issue until the vNext branch is available. See #169 for more information.

decision needed enhancement vNext

Most helpful comment

The C++ standard derives the ECMAScript standard for the default regex specification, so it seems like adapting the regex code from a JavaScript engine (like v8 or Chakra) for the additions and additional grammars supported sounds like a viable option.

All 8 comments

We want to overhaul regex because this will be our only opportunity to fix its longstanding correctness and performance issues for the next N years, but first we need to decide how to do so. We could:

  • Write a new implementation from scratch, with community input.
  • Use an existing Standard-compatible implementation; my vague understanding is that libc++'s implementation is reasonably correct but not especially high performance, while Boost's implementation may have accumulated more performance improvements but also has non-Standard behavior and Boost dependencies that would need to be excised.
  • Adapt a high-performance engine to the Standard's semantics and conventions.

Glad to hear it. I can't really give an informed opinion on which strategy would be the best. I'm just a user.

The C++ standard derives the ECMAScript standard for the default regex specification, so it seems like adapting the regex code from a JavaScript engine (like v8 or Chakra) for the additions and additional grammars supported sounds like a viable option.

Note that licensing is important; we can consume both Boost and Apache License v2.0 with LLVM Exception. Before attempting to consume code from a JavaScript engine, we would need to investigate its license for viability.

v8 is BSD, Chakra is MIT (Chakra being a Microsoft project)

The following Microsoft-internal bugs are associated with this issue:

  • VSO-110128 "
  • VSO-177627 "
  • VSO-186671 "User Story: RegEx Refactoring (Bin Compat Breaking Changes)"

Did I hear correctly, that the committee wants to deprecate std::regex? In that case it probably doesn't make sense to invest too much time into regex beyond fixing bugs.

Even if <regex> ends up deprecated in the standard there's still a lot of code out there using it that don't deserve to be bitten by (1) our multiline nonconformance, (2) our 100X+ perf penalties, (3) other bugs.

The C++ standard derives the ECMAScript standard for the default regex specification, so it seems like adapting the regex code from a JavaScript engine (like v8 or Chakra) for the additions and additional grammars supported sounds like a viable option.

Sadly ECMAScript is but one of the 7 (yes, 7) grammars supported in the standard. :(

Was this page helpful?
0 / 5 - 0 ratings

Related issues

StephanTLavavej picture StephanTLavavej  路  34Comments

ohhmm picture ohhmm  路  16Comments

cbezault picture cbezault  路  15Comments

StephanTLavavej picture StephanTLavavej  路  10Comments

MahmoudGSaleh picture MahmoudGSaleh  路  20Comments