Powershell: Possible Memory Leak in Foreach -Parallel, version 7.rc1

Created on 3 Jan 2020  Â·  20Comments  Â·  Source: PowerShell/PowerShell

Steps to reproduce

$messageids | ForEach-Object -Parallel {$mid = $_; $using:log | 
    ? messageid -eq $mid | 
    Sort-Object ExactTimeStamp | 
    Select-Object -first 1} | 
    Export-Csv mtl_midunique.csv -Encoding utf8 -NoTypeInformation

$log held ~70000 lines imported from a csv holding the export of get-messagetrackinglog, $messageids had ~60000 lines imported from a csv, holding unique messageids
"ExactTimeStamp" was build from the TimeStamp property, format 'yyyyMMddHHmmssfff'

Expected behavior

quicker iteration through the date than single threaded, steady memory usage

Actual behavior

memory usage of the process building up to several GBs, slowing down the iteration to much less than this single threaded code:

 &{foreach ($mid in $messageids) {&{foreach ($entry in $log) {if ($entry.messageid -eq $mid) {$entry}}} | 
    Sort-Object ExactTimeStamp | 
    Select-Object -first 1}} | 
    Export-Csv mtl_midunique.csv -Encoding utf8 -NoTypeInformation

Environment data

Name Value
---- -----
PSVersion 7.0.0-rc.1
PSEdition Core
GitCommitId 7.0.0-rc.1
OS Microsoft Windows 10.0.18363
Platform Win32NT
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
WSManStackVersion 3.0

Issue-Question Resolution-Answered

Most helpful comment

That is an awesome article. And now I am re-thinking how foreach -parallel is implemented. Currently a new runspace is created for each iteration, but it is clear that the .Net GC has trouble keeping up and manual collection is needed. One solution may be @iSazonov idea of having foreach -parallel perform regular GC for the user. My main concern with that is possible perf degradation that the user can't control. Also, I didn't originally go with the runspace pool because I was concerned with state leakage between iterations. But that may be the way to go. I'll reopen this Issue.

All 20 comments

What's in $log? If that's a logfile with a few hundred or a few thousand lines, keep in mind that in order to be parallelized in this manner, that data has to be _copied_ to each runspace before it can be used.

So yeah, I'd expect some extra memory usage for that. I daresay this would be the kind of task you should _not_ parallelize, since you have one data input source that you're now copying probably quite a few times.

I would guess that you could make this a lot quicker by simply skipping the internal foreach loop on your single-threaded code:

$messageids = [System.Collections.Generic.List[string]]$messageids
$uniqueEntries = foreach ($entry in $log) {
    if ($entry.messageid -in $messageids) {
        # remove the message ID we found from our list so we only get the first item of that ID
        $messageids.Remove($entry.messageid) > $null
        $entry
    }

    # If we find one of all our IDs, we can exit here
    if ($messageids.Count -eq 0) {
        break
    }
}

$uniqueEntries | Export-Csv -Path mtl_midunique.csv -Encoding UTF8

There's a lot of unnecessary piping and re-iterating the same large collection both in single and multithreaded versions of your code. In either case, it's not going to be particularly efficient. For the multithreaded case, though, copying a large log file to multiple runspaces is always going to be rather tricky to do with any kind of efficiency.

Perhaps you could create and use $Global:log. Also I'd create a (global) hash for messageid-s from the log before parallelize.

This issue has been marked as answered and has not had any activity for 1 day. It has been closed for housekeeping purposes.

I'm going to agree that there seems to be a memory leak. I ran the below code and pwsh.exe got up to 8 GB private working set before dropping down to 2 GB - around 7,500 items in. After the code completed it was back to 6 GB and did not return the memory - manual garbage collection had to be run at which point the memory quickly dropped to 120 MB.

1..10000 | Foreach-Object -Parallel {
    $guid = New-Guid
}

I tried this without going parallel and was able to assign 10,000 GUIDs to a hash table using only 12 MB of memory.

$hash = @{}
1..10000 | Foreach-Object { 
    $hash[$_] = New-Guid
}

PSVersionTable output

Name                           Value
----                           -----
PSVersion                      7.0.0-rc.2
PSEdition                      Core
GitCommitId                    7.0.0-rc.2
OS                             Microsoft Windows 10.0.17134
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

I certainly would expect that the memory is returned... interesting. It's never really clear exactly when the garbage collector does run, unfortunately. @paulhigin is it worth looking into whether we want to manually call garbage collection during EndProcessing() or Dispose() for ForEach-Object -Parallel invocations to avoid this memory-leak-like behaviour?

No, I don't think we should perform garbage collection for the user. I don't like to second guess the user because it seems to always end up doing the wrong thing. Garbage collection can really slow a system down. For general behavior, we need to rely on CLR garbage collection algorithms. A user knows more about what is going on, and can add manual garbage collection step if needed.

To confirm a memory leak you should run your _whole_ test _script_ (not only commands in the cycle body!) many times and measure diffs.

After the code completed it was back to 6 GB and did not return the memory - manual garbage collection had to be run at which point the memory quickly dropped to 120 MB.

In PowerShell 6 we saw the behavior in many scenarios. It is .Net Core 2.1 behavior. Now in PowerShell 7.0 we use .Net Core 3.1 and garbage collector works great in most scenarios. You can report in .Net Core repo if you see an issue with garbage collector.

As a script writer I would expect the memory usage to be more related to the number of parallel threads I'm running and not the total number of items I've piped through the Foreach-Object -Parallel. I'm not sure what the technology being used for the parallel implementation is but using a runspace pool to go through the same scriptblock I mentioned above introduced almost no memory overhead (analysis done with task manager). I'm not a developer and don't have any background on optimizing memory usage, but overall this just doesn't feel right. Garbage collection takes care of the memory usage if it's run after the scriptblock is complete but I can't figure out how to make garbage collection release the memory while the scriptblock is still being executed.

Maybe it's just a usage pattern thing? Maybe Foreach-Object -Parallel works better for a small number of long running tasks and not a large number of short running tasks?

Maybe Foreach-Object -Parallel works better for a small number of long running tasks and not a large number of short running tasks?

Yes, runspaces is not lightweight. See https://devblogs.microsoft.com/powershell/powershell-foreach-object-parallel-feature/

PowerShell
PowerShell ForEach-Object Parallel Feature PowerShell 7.0 Preview 3 is now available with a new ForEach-Object Parallel Experimental feature. This feature is a great new tool for parallelizing work, but like any tool, it has its uses and drawbacks. This article describes this new feature,

This is a great blog article illustrating the memory problems with Foreach-Object -Parallel. I had similar results - even if I was trying to do manual garbage collection it wouldn't clear things out like I would have expected.

https://xkln.net/blog/powershell-7-foreach-object-parallel-memory-consumption/

I’m undertaking a project where one of of the tasks involves processing some 350,000 XML files. If each file takes one second to process that’s just over 4 days…

That is an awesome article. And now I am re-thinking how foreach -parallel is implemented. Currently a new runspace is created for each iteration, but it is clear that the .Net GC has trouble keeping up and manual collection is needed. One solution may be @iSazonov idea of having foreach -parallel perform regular GC for the user. My main concern with that is possible perf degradation that the user can't control. Also, I didn't originally go with the runspace pool because I was concerned with state leakage between iterations. But that may be the way to go. I'll reopen this Issue.

Thank you! The Parallel option is probably the most talked about feature and I would hate to see adoption fall off because of something like this. If you need some testing I'm happy to help.

@awickham10 Great! I'll make some changes and we can test on your benchmark. I think I'll start with runspace pooling since that should give us the best perf/memory improvement.

That is an awesome article. And now I am re-thinking how foreach -parallel is implemented. Currently a new runspace is created for each iteration, but it is clear that the .Net GC has trouble keeping up and manual collection is needed.

I wrote the linked piece, that explanation brings some clarity to what I was seeing... I was a little confused because I incorrectly thought runspace pools were being used under the hood.

Also happy to help out with any testing and can update the post with new results if changes get implemented.

I agree that using a pooling may be great.
Also I found latest GC is very robust. If it is not true for the scenario I think we should investigate why, apply GC options and use this with the pooling.

@awickham10 , @mdjx I have a PR that implements runspace reuse (https://github.com/PowerShell/PowerShell/pull/12122).

This change greatly affects memory use and perf (especially for loops that do a small amount of work). My main concern is side effects due to state leakage between loop iterations, but hopefully there won't be any or they will be insignificant.

The PR is currently marked as WIP. Can you clone my fork and build the branch (foreach-parallel-rsreuse), and test?

@PaulHigin I'm not particularly familiar with this part of the codebase, so I'm curious -- I notice in the code you're not using RunspacePool at all. Is there a reason for that? Typically when I see parallelisation code, they're using that in one fashion or another.

@vexx32 RunspacePool is overkill for my needs here and I can go with a much simpler solution. Also I want to control the health and lifetime of the runspace objects myself and ensure the throttle limit is honored.

@PaulHigin I had a chance to build and test today, hopefully I did it right as the built $PSVer shows v6.x, but I had access to -Parallel, so I assume it's correct (first time doing something like this, sorry :)).

$PSVersionTable

Name                           Value
----                           -----
PSVersion                      6.0.0-alpha.9
PSEdition                      Core
GitCommitId                    6.0.0-alpha.9-4077-gbb145502f10c6dca37d1a714d8be58e487a78546
OS                             Microsoft Windows 10.0.17763
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

I tested using the same files and method as the blog post, here are the results.

2 Threads
TaskPool runspace pool count: 2

  • Peak memory: 0.23 GB
  • Exec time: ~45 seconds

6 Threads
TaskPool runspace pool count: 6

  • Peak memory: 0.33 GB
  • Exec time: ~ 30 seconds

6 Threads w/ Remove-Variable
TaskPool runspace pool count: 6

  • Peak memory: 0.26 GB
  • Exec time: ~30 seconds

Memory usage looks really good, execution time is still a little faster using the old way of creating runspace pools (~10 secs vs ~30).

Also did a test with 6 threads and 100,000 files:

6 Threads, 100,000 files
TaskPool runspace pool count: 6

  • Peak memory: 0.71 GB
  • Exec time: ~730 seconds

@mdjx Thanks for performing these tests! This generally agrees with my own simpler tests. But all of my tests were noticeably faster than before, since much less time is spent creating and initializing runspaces, so I wonder if there are other factors causing yours to run slightly slower.

BTW the 'TaskPool runspace pool count: x' console output is temporary and will be removed if these changes are accepted.

Was this page helpful?
0 / 5 - 0 ratings