-match and -replace accept a pre-constructed [regex] instance in lieu of a _string_, respecting whatever options (or absence thereof) the instance was constructed with.
By contrast, switch -regex does not, which has two implications:
the specific options (or absence thereof) baked into the [regex] instance are _lost_ during matching,
presumably because the regex is converted to a _string_ first and then _recreated_ using PowerShell's _default_ options (i.e., IgnoreCase, CultureInvariant)
Aside from exhibiting unexpected _behavior_, performance suffers as well, due to the unnecessary recreation.
Run the following Pester tests:
Describe "Support for predefined [regex] instances" {
BeforeAll {
# Create a case-SENSITIVE regex
$regex = [regex]::new('A')
}
It "-match uses the [regex] instance as-is" {
'a' -match $regex | Should -Be $False
'A' -match $regex | Should -Be $True
}
It "-replace uses the [regex] instance as-is" {
'a' -replace $regex, 'replaced' | Should -Be 'a'
'A' -replace $regex, 'replaced' | Should -Be 'replaced'
}
It "-switch uses the [regex] instance as-is" {
$(switch -Regex ('a') { $regex { 'matched' } }) | Should -BeNullOrEmpty
$(switch -Regex ('A') { $regex { 'matched' } }) | Should -Be 'matched'
}
}
All tests should pass.
The switch -regex test fails as follows:
[-] -switch uses the [regex] instance as-is 21ms
Expected $null or empty, but got matched.
That is, switch -Regex ('a') { $regex { 'matched' } } unexpectedly returned 'matched', even though it shouldn't have matched had it respected the case-sensitivity of the $regex instance.
PowerShell Core v6.2.0-preview.4 on macOS 10.14.2
PowerShell Core v6.2.0-preview.4 on Ubuntu 18.04.1 LTS
PowerShell Core v6.2.0-preview.4 on Microsoft Windows 10 Pro (64-bit; Version 1803, OS Build: 17134.471)
Windows PowerShell v5.1.17134.407 on Microsoft Windows 10 Pro (64-bit; Version 1803, OS Build: 17134.471)
This only occurs when there's a mismatch between the case-sensitivity of the switch and the [regex] instance.
To illustrate what this means in practice, see this example:
# [regex] case-sensitive, ignores whitespace in pattern
$TwoLLNoSpace = [regex]::new('\p{Ll} \p{Ll}', 'IgnorePatternWhitespace') #case
# Case-sensitivity is the same, will work as expected
switch -Regex -CaseSensitive ('a b', 'ab') {
$TwoLLNoSpace {
$_
}
}
# Case-sensitivity mismatch, regex recreated with IgnoreCase as only option, matches on wrong input
switch -Regex ('a b', 'ab') {
$TwoLLNoSpace {
$_
}
}
At the very least we should copy the existing Options value and only add/subtract the IgnoreCase flag as dictated by -CaseSensitive, something like this perhaps... but I'm not sure I agree that individual cases should be allowed to override the behavior of the parameter.
Thanks for the sleuthing, @IISResetMe - you've found the crux of the inconsistency:
switch -regex [-casesensitive] makes its own (implied) case-sensitivity override the options of the predefined [regex] instance.
whereas -match == -imatch / -cmatch and -replace = -ireplace / -creplace do not - they respect whatever options are built into the [regex]
To wit (applies to -replace too):
PS> 'a' -cmatch ([regex]::new('A', 'IgnoreCase'))
True # The 'c' in -cmatch was ignored in favor of the options built into the [regex]
I created this issue based on this expectation, given that I'd always seen -match and -replace operate that way (also in Windows PowerShell).
While you could argue that the inconsistency should be resolved the way that switch -regex works, doing so would be a breaking change. [_Update_: of course, as would changing the behavior of switch -regex be].
While there is undoubtedly a contradiction, I think it is defensible - though requires documentation - to give precedence to the options baked into a predefined [regex] instance.
Predefining a [regex] is an advanced technique anyway, whose primary purpose to me is as a _performance optimization_ with _high iteration counts_, so as to use the given instance _as-is_, without incurring the cost of regex construction.
However, note that even with matching options the precompiled / predefined the code doesn't run any faster, at least in PowerShell _Core_ - see #8976.[regex] instance is _not_ used as-is
However, note that even with matching options the precompiled / predefined
[regex]instance is _not_ used as-is, at least in PowerShell _Core_ - see #8976.
I think that's inaccurate - when case-sensitivity is aligned, the passed-in [regex] instance is indeed used https://github.com/PowerShell/PowerShell/blob/5d54f1aa3871a826409496437e25856dc263ccc4/src/System.Management.Automation/engine/runtime/Operations/MiscOps.cs#L2306-L2309
Thanks, @IISResetMe - I had misread the source code; I've updated my previous comment accordingly; that said, the problem of the optimization _not being effective_ remains - any idea why?
Please discuss the problem in #8976.
@iSazonov: No, this should remain a separate issue, because its focus is (now) on the _inconsistent behavior_ between -match / -replace on the one hand, and switch -regex on the other.
To resolve this, we need to decide on:
whether to change existing behavior - which would be a breaking change either way (changing switch -regex's behavior to that of -match / -replace or vice versa).
switch -regex's current behavior would still require _fixing_ it, though, given that, as @IISResetMe demonstrated above, options _other_ than case-sensitivity aren't being honored (copied over when recreating the regex); @IISResetMe has prototyped that fix here.whether to simply _document_ the inconsistency.
@mklement0 My comment was about "any idea why?".
Got it, @iSazonov, thanks; on the bright side, my misinterpretation of your comment got me to summarize _this_ issue...
my misinterpretation of your comment got me to summarize this issue...
:-) I am ready add more comments for you.