Powershell: ForEach-Object operation statements are unexpectedly slower than script blocks; both are much slower than filter functions

Created on 4 Sep 2018  路  5Comments  路  Source: PowerShell/PowerShell

Steps to reproduce

_Update: Added a filter function._

Compare the performance of extracting the .Name property values from 1 million objects:

# PSv3+ operation statement (using parameters directly rather than a script block)
(Measure-Command { , (Get-Item /) * 1e6 | ForEach-Object Name }).TotalSeconds
# Script block
(Measure-Command { , (Get-Item /) * 1e6 | ForEach-Object { $_.Name } }).TotalSeconds
# Filter function
filter Name { $_.Name }
(Measure-Command { , (Get-Item /) * 1e6 | Name }).TotalSeconds

Note: (, (Get-Item /) * 1e6).Name - i.e., member enumeration - is the fastest by far, but this issue is about comparing _streaming_ approaches.

Expected behavior

At the very least comparable performance.

Presumably, not having to evaluate a script block in each iteration has the potential to be faster.

I would expect a filter function to perform similarly to the script-block solution.

Actual behavior

The operation statement is noticeably _slower_ than the script block.

Sample timings (averaged across 10 runs) on Windows, macOs, Ubuntu show varying ratios, ranging from the operation statement taking 1.24 as long to 1.58 to 1.85, respectively.

The factor for Windows PowerShell was 1.40.

The filter function is _by far the fastest_, by a factor of about _10_ on macOS.

Environment data

PowerShell Core v6.1.0-rc.1 on macOS 10.13.6
PowerShell Core v6.1.0-rc.1 on Ubuntu 16.04.5 LTS
PowerShell Core v6.1.0-rc.1 on Microsoft Windows 10 Pro (64-bit; Version 1803, OS Build: 17134.165)
Windows PowerShell v5.1.17134.165 on Microsoft Windows 10 Pro (64-bit; Version 1803, OS Build: 17134.165)
Area-Cmdlets-Core Issue-Enhancement WG-Engine-Performance

Most helpful comment

It would be a mistake to make this just about member resolution. I mean, yes, the member resolution needs to be faster, but even scriptblock execution is an order of magnitude _slower_ than just the call operator with a scriptblock.

Obviously the fastest way (by far) to get a single member is just to use member unrolling (i.e., wrap the expression in a parenthesis and write the property name), which is several times faster than even a filter...

| Duration| CommandLine
| --------| -----------
| 6.91423 | $Names = $lots \| % Name |
| 4.41294 | $Names = $lots \| % { $_.Name } |
| 0.83100 | $Names = foreach($f in $lots) { $f.Name } |
| 0.64787 | $Names = $lots \| nameFilter |
| 0.55108 | $Names = $lots \| &{process{$_.ToString()}} |
| 0.20108 | $Names = $lots.Name |

My point is that while we should be concerned about the speed of % Name versus % { $_.Name } this cmdlet is slower in everything it does than any other way of doing those things!

We should also be _very_ concerned about the speed of % { $_.Name } versus &{process{ $_. Name }} or even foreach($f in $lots) { $f.Name } ...

Most people think of (...) | ForEach-Object { ... } as basically the same as foreach(...){ ... } but it actually takes 5x as long!

All 5 comments

I'm seeing very different results.

First, my first few runs of the test had wildly varying results (on PS5.1), but for the life of me I don't know why. After some experimentation I ended up with this, which gave more consistent results:

$big = , (Get-Item /) * 1e6

# PSv3+ operation statement (using parameters directly rather than a script block)
(Measure-Command { $big | ForEach-Object Name }).TotalSeconds
# Script block
(Measure-Command { $big | ForEach-Object { $_.Name } }).TotalSeconds

On Windows PowerShell 5.1, direct property access takes me 12-13 seconds, whereas the scriptblock takes 6-7.

On PowerShell Core 6.1.0-rc.1 (official release; not one I built myself) (windows amd64), direct property access still takes 12-13 seconds... but scriptblock now takes 9 seconds.

So I do observe a slowdown from PS5.1 to PS6... but it's in scriptblock processing.

Oh... wait... you are saying that you expect direct prop access to be not slower than scriptblock...

Well, the two have very different code paths... and it's not hard to pick out a thing or two that seem like they would make things way slower, such as in the prop access path, we format a "should process" string, and then call ShouldProcess for every single input object.

    // should process
    string propertyAction = String.Format(CultureInfo.InvariantCulture,
        InternalCommandStrings.ForEachObjectPropertyAction, resolvedPropertyName);

    if (ShouldProcess(_targetString, propertyAction))
    {
        try
        {
            WriteToPipelineWithUnrolling(_propGetter.GetValue(InputObject, resolvedPropertyName));
        }

Whereas for the scriptblock path, we just blast through and execute the script for every input.

At the very least you could try lifting out the propertyAction string to avoid the String.Format for every input, but note that you can't just lift it straight out--the resolvedPropertyName could theoretically be different for every input object (but not very likely...)--so you would have to cache it, for strictly equivalent behavior.

But I don't know how much difference that would make... maybe ShouldProcess is more expensive. :/

Anyway, has anybody else noticed/filed the slowdown of the scriptblock path from 5.1 to 6?

This is the relevant code that handles member resolution and method/property access for ForEach-Object:

https://github.com/PowerShell/PowerShell/blob/759c4abde811aff1490dec92e438d61e341c3181/src/System.Management.Automation/engine/InternalCommands.cs#L317-L596

It's long, it's convoluted, and a significant portion of it is long blocks of comments. All of this leads me to believe that there should be a much simpler solution available. I'm not certain if I have the background know-how in terms of improving it significantly, but it certainly looks like it could use some improvement.

It would be a mistake to make this just about member resolution. I mean, yes, the member resolution needs to be faster, but even scriptblock execution is an order of magnitude _slower_ than just the call operator with a scriptblock.

Obviously the fastest way (by far) to get a single member is just to use member unrolling (i.e., wrap the expression in a parenthesis and write the property name), which is several times faster than even a filter...

| Duration| CommandLine
| --------| -----------
| 6.91423 | $Names = $lots \| % Name |
| 4.41294 | $Names = $lots \| % { $_.Name } |
| 0.83100 | $Names = foreach($f in $lots) { $f.Name } |
| 0.64787 | $Names = $lots \| nameFilter |
| 0.55108 | $Names = $lots \| &{process{$_.ToString()}} |
| 0.20108 | $Names = $lots.Name |

My point is that while we should be concerned about the speed of % Name versus % { $_.Name } this cmdlet is slower in everything it does than any other way of doing those things!

We should also be _very_ concerned about the speed of % { $_.Name } versus &{process{ $_. Name }} or even foreach($f in $lots) { $f.Name } ...

Most people think of (...) | ForEach-Object { ... } as basically the same as foreach(...){ ... } but it actually takes 5x as long!

@Jaykul Not that it fully explains the slowdown, but if you look at #9408 there's some definite _weirdness_ happening with the scriptblocks in this cmdlet.

Weirdness? There's a few ways you can pass them in, but that's all just to support calling it without explicitly naming the script blocks (and still allow infinitely many process blocks).

I doubt that's going to account for the slowdown. I'm guessing most of it is due to trying to cheat scope.

Was this page helpful?
0 / 5 - 0 ratings