Powershell: Provide option in ForEach-Object -parallel to transfer current runspace state

Created on 1 Apr 2020  路  15Comments  路  Source: PowerShell/PowerShell

Summary of the new feature/enhancement

Foreach-Object -parallel currently runs each loop iteration in a runspace initialized to its default state. But users might reasonably expect the runspace state to be the same as the runspace state running the foreach-object cmdlet.

This feature idea is to add a new -UseCurrentState (or something) switch that transfers the current runspace state to each parallel loop iteration runspace.

This would ensure that any modules imported or defined functions are available to each foreach -parallel script.

It is not clear how much of the current runspace state can or should be transferred to the loop runspaces, for example defined global variables are problematic since they are likely not thread safe.

Example:

New-PSDrive -Name ZZ -PSProvider FileSystem -Root c:\temp
1..1 | ForEach-Object -Parallel -UseCurrentState {
    # Should have access to ZZ drive
    dir ZZ:
}

Import-Module -Name c:\temp\Modules\MyModule.psd1
1..1 | ForEach-Object -Parallel -UseCurrentState {
    # Should have access to MyModule functions
    Get-MyInfo -Name Hello
}

This needs to be opt in since it will be a performance hit. Some users will likely be unhappy in the performance degradation for simple scenarios.

Proposed technical implementation details (optional)

Area-Cmdlets-Core Issue-Enhancement

Most helpful comment

There will also be issues with PowerShell classes defined in the current runspace. They will not exist in the new runspaces. Hard problem, for sure.

All 15 comments

This is a response to other issues created against foreach -parallel, where there is a reasonable expectation that the script block is run in the current state.

12239

Perhaps current state should be default and new switch "-UseDefaultRunspaceState".

May I suggest that if a -Parallel switch is specified, it is expected that only one thing has changed - that For-EachObject will become parallel, and not that the code block will work differently.

@somescout That would be ideal, but is not possible. There are too many issues. One is performance; replicating runspace state is time and resource intensive, and many times not even necessary. Another is multi-threaded issues. Much single thread code simply doesn't work in multi-threaded environments.

Like anything, there are trade offs to be made. I feel the best solution is to provide options along with guidance to help the user know and understand the optional benefits and trade offs.

One is performance; replicating runspace state is time and resource intensive, and many times not even necessary

Premature optimization. If there will be significant performance issues, it can be solved by "-Isolated" switch. But if isolation is a default - it will cause too much unpredictable behavior in parallelized code (you add -Parallel switch and your code silently produce wrong result - worst possible behavior).

Another is multi-threaded issues. Much single thread code simply doesn't work in multi-threaded environments.

Then there should be separated parallelization commandlet, like Start-Job.

I feel the best solution is to provide options along with guidance to help the user know and understand the optional benefits and trade offs.

Is it really worth adding a another feature, the problems with which can be found only by trial and error? (People Don't Read Instructions)

Linking in related issue: #12313

Per, #12378, we should also consider allowing passing in script block variables if the client runspace state is replicated. The script block would be re-created/re-parsed within the new runspace so that it will have affinity to that runspace and run correctly. We still have potential threading issues of any shared variable referenced in the script block, but variables and multi-threading are potential problems anyway, which a user needs to be aware of.

So we could have three options to cover all scenarios:

  • clean runspace
  • clean runspace with parameters
  • replicated runspace with parameters

There will also be issues with PowerShell classes defined in the current runspace. They will not exist in the new runspaces. Hard problem, for sure.

@iSazonov and @somescout - I've been thinking about the default sunspace - should by default Foreach -parallel use current (dirty) runspace or, by default, use default (clean) runspace. It seems to me that as an admin, I would expect my current (dirty) runspace to be available when I use the -parallel switch. While this makes intuitive sense for an admin - It just works - I wonder if the performance impact (which may be incurred for no real reason) have such a negative impact that admins would no longer see a benefit to -parallel. With this in mind, I tend to agree with @PaulHigin that the sunspace is clean. Thoughts?

I think I like sunspace better than runspace :). But this is a great question. The current runspace running ForEach-Object -Parallel could have accumulated a lot of state that is not needed for running the parallel script blocks, and transferring that state to each parallel runspace could have a major performance impact. So my inclination is to make state transfer opt-in. I haven't yet thought about exactly what state should be transferred.

I think opt-in in right direction. We could have options that transfer - from fastest clean runspace from a pool to slowest one with full features.
As side notice, we have an issue for runspace performance creation without any progress but we could investigate this too.

@theJasonHelmick, I, actually, don't know what to think now: it's not just runspace problem, there also issue with non-thread safe objects that can't be easily replicated to runspace:

$obj = New-Object -ComObject Some.Object
$array | % {
  $obj.DoSomething($_)
}

It -Parallel is used, it will break script regardless of dirty or clean runspace is used.

May be it is not a good idea in first place to try to implement it as a part of ForEach-Object - a separate commandlet, or, may be, Start-Job with -AsRunspace and something to throttle jobs. Because everyone (who used Start-Job) is familiar how it handle local variables so there will be no broken expectations.

@somescout there's already Start-ThreadJob with all of that, it's been available since 6.x iirc. 馃檪

@somescout Yeah, we have been aware of multi-threaded issues with ForEach-Object -Parallel since the beginning, but ultimately felt it was still worth doing. But script writers need to have some knowledge and use best practices, such as thread safe types if shared over parallel running script blocks.

Was this page helpful?
0 / 5 - 0 ratings