Powershell: "dir -r" is slow (not just) on macOS

Created on 16 Apr 2017  路  14Comments  路  Source: PowerShell/PowerShell

Steps to reproduce

Type 'dir -r >/tmp/dir-output' at the top of a directory subtree containing around 5K files.

Expected behavior

Command completes quickly.

The comparable Unix command

find . -print0 | xargs -0 ls -al  >/tmp/dir-output

consistently takes less than 300ms to complete.

Actual behavior

Command takes between 3.7 and 4.1 seconds to complete, both using wall clock and

  Measure-Command { dir -r >/tmp/dir-output }

Environment data

$PSVersionTable

For me, this returns no result.

Issue-Discussion WG-Engine-Performance WG-Engine-Providers

Most helpful comment

I have a branch where this is so fast that the console output is what limits everything. Working to wrap that in a experimental feature. But if we only can get some of the PRs that I've made in this area through, I don't think we have such a big perf issue with the file system.

All 14 comments

@mpw Thanks for your report!

Native utilities on Unix and Windows use low level APIs and work with raw data. PowerShell work on top the APIs and add extra code levels (convert all in PSObjects). So it will always be slower. Of course we try to do things faster.

Hmmm...an order of magnitude is rather significant, especially considering that the Unix version was doing significantly more work, splitting the task between two processes, communicating paths between the processes, parsing strings and therefore also doing full path evaluation for each file.

My bad, I was actually going for something different when I hit upon this by accident. The more efficient Unix version would of course be ls -lR, and that executes in

real 0m0.062s
user 0m0.029s
sys 0m0.028s

So PowerShell is actually more than 50 times slower. I realize layers can add overhead, but should simple CPU processing be able to add almost 2 orders of magnitude to filesystem/kernel operations?

Sample attached. Surprisingly informative despite all the jitted code not showing up.

powershell_2017-04-21_002049_AH6L.sample.txt

What PowerShell versions you use for testing?
Currently we moved to .Net Cor 2.0 and since Alfa.19 you can test based on this version.

I just tried the same command with PowerShell on Windows (running emulated in Parallels) and it also took 3 seconds. While there is likely some virtualization overhead, it almost certainly isn't 50x.

This was a fresh install of Windows + VS downloaded just yesterday.

There are many layers of abstraction involved. PSDrives, PSProviders, Get-Childitem with generalized filtering, a generic, programmable, formatting system, an object pipeline with extremely powerful (and expensive) parameter binding and all this on top of the CLR. So we are comparing apples to oranges.

That said: it's too slow. We are in need of a performance push for the core scenarios, and ls -R is definitely one of those.

I was actually profiling this last night, as I also feel the pain.

CoreFX has performance issues with file system too.

Is is also interesting to see how much of it is formatting.
What times to you get if you pipe the output to Out-Null?

gci -r | out-null

That should give us the raw powershell provider filesystem times.

[io.directory]::EnumerateFileSystemEntries($pwd.ProviderPath, "*.*", [IO.SearchOption]::AllDirectories)

Is also something to compare against. Just getting the filenames, without stat-ing each file.

I think .NET is quite inefficient when doing this, stat-ing each file instead of getting that at the same time as enumerating the files.
Maybe this should be fixed in System.IO.Directory, with another method that enumerates items returning FileSystemInfo instead of string.

I wonder if we can do some optimization where we know it's being run interactively and not piped to something else, we can have a faster path that doesn't use objects and just outputs the text

@SteveL-MSFT given that one can play with default parameter values to automatically capture output, that may be considered detrimental in some cases...

$PSDefaultParameterValues["Out-Default:OutVariable"] = 'LastOut'

@vexx32 excellent point! If we wanted to pursue this, it would have to be an opt-in feature.

I believe such optimizations are tracked by PSMore #7857 - it is best strategic investments.

I have a branch where this is so fast that the console output is what limits everything. Working to wrap that in a experimental feature. But if we only can get some of the PRs that I've made in this area through, I don't think we have such a big perf issue with the file system.

Also happens on Windows vs. cmd's dir /s

image

If I target my operating system disk pwsh is so slow that I gave up letting it finish after ~10 minutes. I think it runs into lots of access denied errors there which trigger exceptions which are rather slow.

See also #12817

Was this page helpful?
0 / 5 - 0 ratings