To be clear: the optimization in question is rarely needed in real life, but it may matter when processing large data sets.
Note:
-match, but the same applies to -replace - although the potential performance win seems to be much smaller with -replace.Note: The commands are enclosed in & { ... } to ensure that they're run in a child scope, so as to rule out the effects from #8911.
# -match with string-literal regex
[GC]::Collect(); [GC]::WaitForPendingFinalizers()
(Measure-Command {
& {
foreach ($s in , 'foo' * 1e6) { $s -match 'f?(o)' }
}
}).TotalSeconds
# -match with precompiled [regex]
[GC]::Collect(); [GC]::WaitForPendingFinalizers()
(Measure-Command {
& {
$re = [Regex]::new('f?(o)', 'Compiled, IgnoreCase, CultureInvariant')
foreach ($s in , 'foo' * 1e6) { $s -match $re }
}
}).TotalSeconds
# precompiled [regex] with .Match()
[GC]::Collect(); [GC]::WaitForPendingFinalizers()
(Measure-Command {
& {
$re = [Regex]::new('f?(o)', 'Compiled, IgnoreCase, CultureInvariant')
foreach ($s in , 'foo' * 1e6) { $re.Match($s).Success }
}
}).TotalSeconds
The 2nd and 3rd commands should exhibit comparable performance and should both be noticeably faster than the 1st command, due to use of a precompiled [regex] instance.
Sample timings on macOS 10.14.3
4.3153778 # string-literal regex with -match
3.6992489 # precompiled regex with -match
0.4902724 # precompiled regex with .Match()
That is, while using a precompiled [regex] with its .Match() method cut execution time almost to 1/10th, using -match only provided a modest speed increase.
PowerShell Core v6.2.0-preview.4 on macOS 10.14.2
PowerShell Core v6.2.0-preview.4 on Ubuntu 18.04.1 LTS
PowerShell Core v6.2.0-preview.4 on Microsoft Windows 10 Pro (64-bit; Version 1803, OS Build: 17134.471)
Windows PowerShell v5.1.17134.407 on Microsoft Windows 10 Pro (64-bit; Version 1803, OS Build: 17134.471)
I don't think the second and third examples will ever be comparable. The main difference iirc is that the -match operator will always build a hashtable and save it to $matches.
It'd be nice if $matches could be generated on demand, like a new PSVariable implementation that builds the hashtable in the Value getter (assuming it hasn't already been set externally) Not sure how feasible (or worth it) that is.
Good point about the extra effort required to construct the $Matches hashtable, @SeeminglyScience, but do you think that alone explains _how much_ slower -match is (a factor of almost 8)?
(And your suggestion of populating $Matches only on demand is also worth considering).
What is the // TODO: replace this with faster code source-code comment trying to tell us?
do you think that alone explains _how much_ slower
-matchis (a factor of almost 8)?
At that scale yeah it wouldn't surprise me. That's a whole lot of unused hashtables. You could always comment out that code and see if it's closer.
What is the
// TODO: replace this with faster codesource-code comment trying to tell us?
I would guess that you wouldn't see any performance benefits for this scenario in that snippet. The loop will already be fully compiled after a certain number of loops (I think somewhere around 32 loops?) Keep in mind that the same message is also above the expression creation for -like and -join (probably more as well)
What optimization do you ask? You do manual optimization by moving regex creation out of cycle. Such automatic optimization would be in c# compiler but not in PowerShell. Or do you want always compile regex? Perhaps we could use static regex and benefit from .Net Core cache for regex.
Your hunch was correct, @SeeminglyScience: it is the $Matches construction that is the major factor in the slow-down (it's even easy to verify that without source-code changes, by wrapping the LHS into an array: , $s -match $re, which bypasses $Matches); it also explains why -replace is much less affected.
With $Matches out of the picture, the slow-down goes from about 8-fold down to about 30% in my tests, which is much more reasonable.
I definitely like the idea of on-demand construction of $Matches - though I doubt that anyone cares enough to pursue this.
@iSazonov: Based on the initial benchmarks, I wrongly concluded that precompiled regexes weren't used as-is, but they are.
If you use a string-literal regex with switch, the static [regex]::Match() is called in every iteration, but in .NET Core the automatic caching of compiled-behind-the-scenes regexes is quite efficient, so that using a precompiled regex yields only a modest speed improvement (it's much more pronounced in Windows PowerShell).
Anyway, I'll close this, but I'll create a new, generic issue as a reminder that there may be performance gains to be had across the board by addressing the // TODO: replace this with faster code comments.
@mklement0 I remember Jason comment for me that we could remove the TODO comments because perhaps there does not exist direct and simple performance optimizations. So I think no need create new meta issue for this. Make sense to create new tracking issue for "on-demand construction of $Matches".
@iSazonov: Oops! Too late: #8989 - but I'll update the issue with your comments.
@iSazonov:
Make sense to create new tracking issue for "on-demand construction of $Matches".
After mulling this over some more, I'm no longer convinced it is a worthwhile effort, so I'll let it go (though, obviously, feel free to pick it up yourself).