To be clear: the optimization in question is rarely needed in real life, but it may matter when processing large data sets.
This is a regression from Windows PowerShell, unlike related issue #8977.
Note that there's a related problem of a [regex] instance _intentionally_ getting recreated based on an options mismatch - see #8946 - but the test commands below have been crafted to avoid that problem.
[_Updated_] Even though you can tell here that the precompiled / predefined [regex] instance _is_ used as-is (currently only if the case-sensitivity option matches), performance doesn't improve.
# switch -regex with string literal
[GC]::Collect(); [GC]::WaitForPendingFinalizers()
(Measure-Command {
switch -regex (, 'foo' * 1e6) { 'f?(o)' { $true } }
}).TotalSeconds
# switch -regex with precompiled [regex] instance
[GC]::Collect(); [GC]::WaitForPendingFinalizers()
(Measure-Command {
$re = [Regex]::new('f?(o)', 'Compiled, IgnoreCase, CultureInvariant')
switch -regex (, 'foo' * 1e6) { $re { $true } } }
).TotalSeconds
The 2nd Measure-Command should be _faster_, due to use of a precompiled [regex] instance.
Sample timings from macOS 10.14.3:
4.0568081
4.4071708
The 2nd Measure-Command is _slower_ in PowerShell _Core_, on all platforms.
By contrast, it _is_ faster in _Windows PowerShell_ (about 25% in my tests).
PowerShell Core v6.2.0-preview.4 on macOS 10.14.2
PowerShell Core v6.2.0-preview.4 on Ubuntu 18.04.1 LTS
PowerShell Core v6.2.0-preview.4 on Microsoft Windows 10 Pro (64-bit; Version 1803, OS Build: 17134.471)
I see in debugger that second is processed in the line
Perhaps it should be:
[GC]::Collect(); [GC]::WaitForPendingFinalizers()
$re = [Regex]::new('f?(o)', 'Compiled, IgnoreCase, CultureInvariant')
(Measure-Command {
switch -regex (, 'foo' * 1e6) { $re { $true } } }
).TotalSeconds
If compilation of regex is slow it is .Net Core 2 problem and we need measure against .Net Core 3.0.
@iSazonov: The problem isn't the compilation time of the regex - the variant command you suggest roughly shows the same as the original test. (In fact, moving the $re = ... line out of the Measure-Command slows things down even further, perhaps because the variable must then be located in a different scope?)
The source-code line you link to (which is also in the OP) shows the real problem: instead of using the predefined [regex] instance directly, the static [Regex]::Match() method is called with the _stringified value_ of the [regex] instance.
@iSazonov, @IISResetMe pointed out that I had misread the source code: with the case-sensitivity option matching, a predefined [regex] instance _is_ used - yet it doesn't result in improved performance; I've already asked @IISResetMe in #8946 if he happens to have an explanation, but perhaps you have a sense too.
Best way is to measure .Net Core 3.0 implementation. You could create simple test in C#. I expect that result will be better.
@mklement0 I'm not sure it's actually slower, I think dot-source variable lookup (Measure-Command doesn't create a new scope) could be affecting those results.
# switch -regex with string literal
[GC]::Collect(); [GC]::WaitForPendingFinalizers()
(Measure-Command {
switch -regex (, 'foo' * 1e6) { 'f?(o)' { $true } }
}).TotalSeconds
# switch -regex with precompiled [regex] instance
[GC]::Collect(); [GC]::WaitForPendingFinalizers()
(Measure-Command {
$re = [Regex]::new('f?(o)', 'Compiled, IgnoreCase, CultureInvariant')
switch -regex (, 'foo' * 1e6) { $re { $true } } }
).TotalSeconds
# switch -regex with precompiled [regex] instance within a new scope
[GC]::Collect(); [GC]::WaitForPendingFinalizers()
(Measure-Command {
& {
$re = [Regex]::new('f?(o)', 'Compiled, IgnoreCase, CultureInvariant')
switch -regex (, 'foo' * 1e6) { $re { $true } } }
}
).TotalSeconds
4.6255024
5.1760229
3.4601631
It's possible that the caching done by Regex.Match could have received some performance updates, making the two methods comparable speeds.
Why & is used only in last test?
Good find, thanks, @SeeminglyScience - so it does improve performance (though not drastically; indeed, in general I've found that in .NET _Core_ (unlike with the .NET Framework) repeated Regex.Match() use with strings incurs relatively little overhead thanks to automatic caching of compiled regexes).
@iSazonov: By using & { ... } (a child scope), @SeeminglyScience is demonstrating that the slow-down comes from the variable lookups that are performed in dot-sourced code, as discussed in #8911.
I'll close this.
P.S., @SeeminglyScience and @iSazonov: I'd love to hear your thoughts on the (updated) #8977