Powershell: switch -regex, unlike -match and -replace, doesn't respect the options of predefined [regex] instances

Created on 21 Feb 2019  路  9Comments  路  Source: PowerShell/PowerShell

-match and -replace accept a pre-constructed [regex] instance in lieu of a _string_, respecting whatever options (or absence thereof) the instance was constructed with.

By contrast, switch -regex does not, which has two implications:

  • the specific options (or absence thereof) baked into the [regex] instance are _lost_ during matching,

  • presumably because the regex is converted to a _string_ first and then _recreated_ using PowerShell's _default_ options (i.e., IgnoreCase, CultureInvariant)

Aside from exhibiting unexpected _behavior_, performance suffers as well, due to the unnecessary recreation.

Steps to reproduce

Run the following Pester tests:

Describe "Support for predefined [regex] instances" {
  BeforeAll {
    # Create a case-SENSITIVE regex
    $regex = [regex]::new('A') 
  }
  It "-match uses the [regex] instance as-is" {
    'a' -match $regex | Should -Be $False
    'A' -match $regex | Should -Be $True
  }
  It "-replace uses the [regex] instance as-is" {
    'a' -replace $regex, 'replaced' | Should -Be 'a'
    'A' -replace $regex, 'replaced' | Should -Be 'replaced'
  }
  It "-switch uses the [regex] instance as-is" {
    $(switch -Regex ('a') { $regex { 'matched' } }) | Should -BeNullOrEmpty
    $(switch -Regex ('A') { $regex { 'matched' } }) | Should -Be 'matched'
  }
}

Expected behavior

All tests should pass.

Actual behavior

The switch -regex test fails as follows:

[-] -switch uses the [regex] instance as-is 21ms
      Expected $null or empty, but got matched.

That is, switch -Regex ('a') { $regex { 'matched' } } unexpectedly returned 'matched', even though it shouldn't have matched had it respected the case-sensitivity of the $regex instance.

Environment data

PowerShell Core v6.2.0-preview.4 on macOS 10.14.2
PowerShell Core v6.2.0-preview.4 on Ubuntu 18.04.1 LTS
PowerShell Core v6.2.0-preview.4 on Microsoft Windows 10 Pro (64-bit; Version 1803, OS Build: 17134.471)
Windows PowerShell v5.1.17134.407 on Microsoft Windows 10 Pro (64-bit; Version 1803, OS Build: 17134.471)
Issue-Discussion Issue-Enhancement WG-Engine

All 9 comments

This only occurs when there's a mismatch between the case-sensitivity of the switch and the [regex] instance.

To illustrate what this means in practice, see this example:

# [regex] case-sensitive, ignores whitespace in pattern
$TwoLLNoSpace = [regex]::new('\p{Ll} \p{Ll}', 'IgnorePatternWhitespace') #case

# Case-sensitivity is the same, will work as expected
switch -Regex -CaseSensitive ('a b', 'ab') {
  $TwoLLNoSpace {
    $_
  }
}

# Case-sensitivity mismatch, regex recreated with IgnoreCase as only option, matches on wrong input
switch -Regex ('a b', 'ab') {
  $TwoLLNoSpace {
    $_
  }
}

At the very least we should copy the existing Options value and only add/subtract the IgnoreCase flag as dictated by -CaseSensitive, something like this perhaps... but I'm not sure I agree that individual cases should be allowed to override the behavior of the parameter.

Thanks for the sleuthing, @IISResetMe - you've found the crux of the inconsistency:

  • switch -regex [-casesensitive] makes its own (implied) case-sensitivity override the options of the predefined [regex] instance.

  • whereas -match == -imatch / -cmatch and -replace = -ireplace / -creplace do not - they respect whatever options are built into the [regex]

To wit (applies to -replace too):

PS> 'a' -cmatch ([regex]::new('A', 'IgnoreCase'))
True # The 'c' in -cmatch was ignored in favor of the options built into the [regex]

I created this issue based on this expectation, given that I'd always seen -match and -replace operate that way (also in Windows PowerShell).

While you could argue that the inconsistency should be resolved the way that switch -regex works, doing so would be a breaking change. [_Update_: of course, as would changing the behavior of switch -regex be].

While there is undoubtedly a contradiction, I think it is defensible - though requires documentation - to give precedence to the options baked into a predefined [regex] instance.

Predefining a [regex] is an advanced technique anyway, whose primary purpose to me is as a _performance optimization_ with _high iteration counts_, so as to use the given instance _as-is_, without incurring the cost of regex construction.

However, note that even with matching options the precompiled / predefined [regex] instance is _not_ used as-is the code doesn't run any faster, at least in PowerShell _Core_ - see #8976.

However, note that even with matching options the precompiled / predefined [regex] instance is _not_ used as-is, at least in PowerShell _Core_ - see #8976.

I think that's inaccurate - when case-sensitivity is aligned, the passed-in [regex] instance is indeed used https://github.com/PowerShell/PowerShell/blob/5d54f1aa3871a826409496437e25856dc263ccc4/src/System.Management.Automation/engine/runtime/Operations/MiscOps.cs#L2306-L2309

Thanks, @IISResetMe - I had misread the source code; I've updated my previous comment accordingly; that said, the problem of the optimization _not being effective_ remains - any idea why?

Please discuss the problem in #8976.

@iSazonov: No, this should remain a separate issue, because its focus is (now) on the _inconsistent behavior_ between -match / -replace on the one hand, and switch -regex on the other.

To resolve this, we need to decide on:

  • whether to change existing behavior - which would be a breaking change either way (changing switch -regex's behavior to that of -match / -replace or vice versa).

    • Sticking with switch -regex's current behavior would still require _fixing_ it, though, given that, as @IISResetMe demonstrated above, options _other_ than case-sensitivity aren't being honored (copied over when recreating the regex); @IISResetMe has prototyped that fix here.
  • whether to simply _document_ the inconsistency.

@mklement0 My comment was about "any idea why?".

Got it, @iSazonov, thanks; on the bright side, my misinterpretation of your comment got me to summarize _this_ issue...

my misinterpretation of your comment got me to summarize this issue...

:-) I am ready add more comments for you.

Was this page helpful?
0 / 5 - 0 ratings