Powershell: Performance regression: use of precompiled [regex] instances with switch -regex doesn't result in better performance

Created on 25 Feb 2019  路  10Comments  路  Source: PowerShell/PowerShell

To be clear: the optimization in question is rarely needed in real life, but it may matter when processing large data sets.

This is a regression from Windows PowerShell, unlike related issue #8977.

Note that there's a related problem of a [regex] instance _intentionally_ getting recreated based on an options mismatch - see #8946 - but the test commands below have been crafted to avoid that problem.

[_Updated_] Even though you can tell here that the precompiled / predefined [regex] instance _is_ used as-is (currently only if the case-sensitivity option matches), performance doesn't improve.

Steps to reproduce

# switch -regex with string literal
[GC]::Collect(); [GC]::WaitForPendingFinalizers()
(Measure-Command { 
  switch -regex (, 'foo' * 1e6) { 'f?(o)' { $true } } 
}).TotalSeconds

# switch -regex with precompiled [regex] instance
[GC]::Collect(); [GC]::WaitForPendingFinalizers()
(Measure-Command { 
  $re = [Regex]::new('f?(o)', 'Compiled, IgnoreCase, CultureInvariant')
  switch -regex (, 'foo' * 1e6) { $re { $true } } }
).TotalSeconds

Expected behavior

The 2nd Measure-Command should be _faster_, due to use of a precompiled [regex] instance.

Actual behavior

Sample timings from macOS 10.14.3:

4.0568081
4.4071708

The 2nd Measure-Command is _slower_ in PowerShell _Core_, on all platforms.

By contrast, it _is_ faster in _Windows PowerShell_ (about 25% in my tests).

Environment data

PowerShell Core v6.2.0-preview.4 on macOS 10.14.2
PowerShell Core v6.2.0-preview.4 on Ubuntu 18.04.1 LTS
PowerShell Core v6.2.0-preview.4 on Microsoft Windows 10 Pro (64-bit; Version 1803, OS Build: 17134.471)
Issue-Question Resolution-Answered WG-Engine-Performance

All 10 comments

I see in debugger that second is processed in the line

Perhaps it should be:

[GC]::Collect(); [GC]::WaitForPendingFinalizers()
$re = [Regex]::new('f?(o)', 'Compiled, IgnoreCase, CultureInvariant')
(Measure-Command { 
  switch -regex (, 'foo' * 1e6) { $re { $true } } }
).TotalSeconds

If compilation of regex is slow it is .Net Core 2 problem and we need measure against .Net Core 3.0.

@iSazonov: The problem isn't the compilation time of the regex - the variant command you suggest roughly shows the same as the original test. (In fact, moving the $re = ... line out of the Measure-Command slows things down even further, perhaps because the variable must then be located in a different scope?)

The source-code line you link to (which is also in the OP) shows the real problem: instead of using the predefined [regex] instance directly, the static [Regex]::Match() method is called with the _stringified value_ of the [regex] instance.

@iSazonov, @IISResetMe pointed out that I had misread the source code: with the case-sensitivity option matching, a predefined [regex] instance _is_ used - yet it doesn't result in improved performance; I've already asked @IISResetMe in #8946 if he happens to have an explanation, but perhaps you have a sense too.

Best way is to measure .Net Core 3.0 implementation. You could create simple test in C#. I expect that result will be better.

@mklement0 I'm not sure it's actually slower, I think dot-source variable lookup (Measure-Command doesn't create a new scope) could be affecting those results.

# switch -regex with string literal
[GC]::Collect(); [GC]::WaitForPendingFinalizers()
(Measure-Command { 
  switch -regex (, 'foo' * 1e6) { 'f?(o)' { $true } } 
}).TotalSeconds

# switch -regex with precompiled [regex] instance
[GC]::Collect(); [GC]::WaitForPendingFinalizers()
(Measure-Command { 
  $re = [Regex]::new('f?(o)', 'Compiled, IgnoreCase, CultureInvariant')
  switch -regex (, 'foo' * 1e6) { $re { $true } } }
).TotalSeconds

# switch -regex with precompiled [regex] instance within a new scope
[GC]::Collect(); [GC]::WaitForPendingFinalizers()
(Measure-Command { 
  & {
    $re = [Regex]::new('f?(o)', 'Compiled, IgnoreCase, CultureInvariant')
    switch -regex (, 'foo' * 1e6) { $re { $true } } }
  }
).TotalSeconds
4.6255024
5.1760229
3.4601631

It's possible that the caching done by Regex.Match could have received some performance updates, making the two methods comparable speeds.

Why & is used only in last test?

Good find, thanks, @SeeminglyScience - so it does improve performance (though not drastically; indeed, in general I've found that in .NET _Core_ (unlike with the .NET Framework) repeated Regex.Match() use with strings incurs relatively little overhead thanks to automatic caching of compiled regexes).

@iSazonov: By using & { ... } (a child scope), @SeeminglyScience is demonstrating that the slow-down comes from the variable lookups that are performed in dot-sourced code, as discussed in #8911.

I'll close this.

P.S., @SeeminglyScience and @iSazonov: I'd love to hear your thoughts on the (updated) #8977

Was this page helpful?
0 / 5 - 0 ratings