Powershell: Address Dangerous $using: in Foreach-Object -Parallel

Created on 8 Sep 2019  ยท  15Comments  ยท  Source: PowerShell/PowerShell

According to the documentation, Foreach -Parallel passes variables by _reference_ with $using: while the rest of PowerShell passes variables by _value_.
This is a dangerous feature, having the exact same syntax for such different mechanics can cause unintended effects, and could even slip by experienced developers.

If these values are to pass by reference, then I suggest the syntax be changed to $ref:
That will force people to do it with intention, and know exactly what they're looking at at all times. This would be similar to how it is done in C#.

Area-Cmdlets-Core Committee-Reviewed Issue-Enhancement

Most helpful comment

@PaulHigin I don't know if I can clarify this better than @pearsonsjp already has, but it seems like you might be misunderstanding the issue?

In Invoke-Command, the $using: variables are values copied to the remote machine. No data is exchanged between individual target machines.

In the new ForEach-Object -Parallel, $using: is (as you point out in the article) not thread-safe, as the actual reference to the data can and is shared between threads.

I think what @pearsonsjp is getting at is that existing expectations of how $using: behaves in Invoke-Command will almost certainly carry over to ForEach-Object -Parallel, creating confusion. Their proposed solution is to distinguish the difference in syntax.

I agree that if the behaviour remains as it currently stands, we should probably attempt to distance the similarity from Invoke-Command so as to conform to existing expectations of how $using: variables behave. I think the proposed $ref:variable syntax are a reasonable compromise, as it both communicates "this is kinda-similar to $using:variable" but _also_ "this is fundamentally different from how $using:variable operates" clearly and succinctly.

@Jaykul is often one to say we should attempt to engineer the language towards creating a "pit of success" (rather than "pitfalls" to be anxiously skirted around). I think this is one of those cases.

All 15 comments

This is a feature that was brought in the early days of PowerShell to be able to pass to Jobs and remote commands as listed and is documented in https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_remote_variables?view=powershell-6#using-local-variables and is required for the Foreach -Parallel to function at all.

I'm not sure I can agree with you that it's either dangerous or would slip past experienced developers plus PowerShell isn't C# so those coming from C# _may_ find they get tripped up by this but those that are experienced PowerShell Developers will know this and have used it excessively in the past

I was referring to Foreach-Object (Title updated) which nobody has used excessively in the past, because it is a new feature. It does not behave the same as previous iterations of $using because in the past it has always passed by _value_ not by _reference_
This is a fundamental change in how it works.

@pearsonsjp

while the rest of PowerShell passes variables by value.

This is not actually correct. In fact, PowerShell always passes by reference both with reference types and boxed value types.

With regards to using $using: to pass references - yes you can get into trouble but the performance
cost of the alternative is high. In PowerShell WorkFlow, which supported foreach -parallel, we marshalled objects across runspaces. It was abysmally slow. To get any kind of acceptable performance, marshalling had to be turned off.

/cc @PaulHigin for information.

Just to be clear, it is the ForEach-Object -Parallel parameter set that is new and the behavior of the existing ForEach-Object parameter sets has not changed. This new parameter set implements parallelism and really cannot be expected to work the same as the existing serial processing.

Much of these concerns have already been debated in the RFC for this feature (https://github.com/PowerShell/PowerShell-RFC/pull/194).

It was felt that adding this new -Parallel parameter set was the best way to provide parallelism at the cmdlet level. It can be dangerous if used incorrectly. But then there are a lot of ways to hurt yourself with PowerShell. Rather than diluting the feature I feel it is better make users aware so that they can use it safely and effectively.

Official documentation is coming, but in the mean time I have written a blog to describe and clarify this new feature:
https://devblogs.microsoft.com/powershell/powershell-foreach-object-parallel-feature/

PowerShell
PowerShell ForEach-Object Parallel Feature PowerShell 7.0 Preview 3 is now available with a new ForEach-Object Parallel Experimental feature. This feature is a great new tool for parallelizing work, but like any tool, it has its uses and drawbacks. This article describes this new feature,

This blog is actually where I got my information:

" But there is a big difference when using the $using: keyword in ForEach-Object -Parallel. And that is for remoting, the variable being passed is a copy sent over the remoting connection. But with ForEach-Object -Parallel, the actual object reference is being passed from one script to another, violating normal isolation restrictions. So it is possible to have a non thread-safe variable used in two scripts running on different threads, which can lead to unpredictable behavior." <=- That is passing by value, not reference.

This is why I made the suggestion to use $ref: in this feature rather than $using.

I agree that there are a lot of ways to hurt yourself with PowerShell and that education is good; this just seemed like asking for disaster. But it sounds like this has already been discussed at length.

@PaulHigin I don't know if I can clarify this better than @pearsonsjp already has, but it seems like you might be misunderstanding the issue?

In Invoke-Command, the $using: variables are values copied to the remote machine. No data is exchanged between individual target machines.

In the new ForEach-Object -Parallel, $using: is (as you point out in the article) not thread-safe, as the actual reference to the data can and is shared between threads.

I think what @pearsonsjp is getting at is that existing expectations of how $using: behaves in Invoke-Command will almost certainly carry over to ForEach-Object -Parallel, creating confusion. Their proposed solution is to distinguish the difference in syntax.

I agree that if the behaviour remains as it currently stands, we should probably attempt to distance the similarity from Invoke-Command so as to conform to existing expectations of how $using: variables behave. I think the proposed $ref:variable syntax are a reasonable compromise, as it both communicates "this is kinda-similar to $using:variable" but _also_ "this is fundamentally different from how $using:variable operates" clearly and succinctly.

@Jaykul is often one to say we should attempt to engineer the language towards creating a "pit of success" (rather than "pitfalls" to be anxiously skirted around). I think this is one of those cases.

Yes! Very much this.
Thank you vexx for putting my thoughts into words so eloquently.

Oh ok, I get it now. Yes I agree that having a $ref keyword instead of $using is a very reasonable request, and would make the issues of passing by reference more apparent.

@PowerShell/powershell-committee discussed this.

PSWorkFlow had already gone through this discussion where the semantics were different depending on whether it was in-proc or out-of-proc but the decision was to use $using: everywhere for consistency and not having to learn specific syntax based on situation even if the semantics were different. Introducing $ref: would be unique here making it more difficult to discover.

$using: means use this variable from caller space into new execution context. Expectation is that documentation for specific cmdlets supporting $using: clarify the semantics. So the PS-Committee decision is to stay with $using:.

Hmm. I can't say I think the comparison to workflow is particularly relevant. We're talking very distinct cmdlets and a specialised parameter set here (ForEach-Object doesn't have a $using: in its normal parameter sets, for example).

Personally I think it valuable to make a clear distinction in syntax, if only to be very clear about the potential pitfalls. If the PS team prefer to handle that burden in documentation, I suppose that's a viable option, though I feel it likely to increase misunderstandings.

If we're worried about discoverability... Well, given its constrained usage, would't it be fairly easy to just document it in the cmdlets' help documentation, and then perhaps an about topic that references $using for comparison's sake?

That said... Yeah, I guess in some ways it's "easier" to use the same naming system. I don't think it's a great way to go, though, since the features do behave significantly differently.

Just my 2 cents! :sparkling_heart:

The PSWorkFlow mention is that this whole discussion has already happened before within the team when it was developing PSWorkFlow. Although PowerShellCore/7 doesn't include PSWorkFlow, some existing PSWorkFlow users may recall $using: there and had to keep the semantics in mind depending on how they used PSWorkFlow. So that was a precedent already set. I think they key thing is that $using: was never meant to mean a copy, it was literally just "using this variable from caller context into a new context". Introducing a new prefix adds more confusion (in our opinion) and makes it less discoverable as $using: is a known concept.

Aye. And while I understand and accept that this is probably better for folks who already have used $using: variables extensively, I can't help but think it's yet another thing where users have to memorise that now we have two very clearly distinct types of $using:, rather than being able to tell the two kinds at a glance while skimming through code. ๐Ÿ™‚

I guess I feel like this is a good opportunity to make a clean break from the old behaviour to clarify the purpose of the variable usage, and would help new users / those not as familiar with $using: from the PSWorkflow days to more easily know the difference.

With $using: sort of pulling double duty as it were, it's much easier to draw the wrong conclusion from reading code. Much as I love documentation, I am painfully aware that there will always be a significant portion of the userbase that doesn't read the docs extensively. I'm constantly pointing people in the community to bits of the documentation who've been around for several years, and never saw or bothered to read more than one or two about_* help topics, if that. For those folks, the fact that $using: can behave quite unexpectedly in Start-ThreadJob or ForEach-Object -Parallel compared to, say Invoke-Command, which a great many users are familiar with.

The familiarity of $using: could easily be a double-edged sword for us here, I think. But I've said my piece; perhaps I'm worrying over nothing. ๐Ÿ˜

If you were worrying about nothing, then it wouldn't be an established
practice to separate the two in some languages. Most notably as I
referenced originally, another language developed by the same company
utilizes this separation in C#. (Yes, I am all tok familiar with the chasms
of separation between orgs inside MS)
They did that because passing variables by reference is a known danger,
they created the language in such a way that you have to explicitly state
that you want to pass by reference.

I don't think discoverability is an issue here. If a user wants to use
$using and it doesn't work, they're going to look for answers. Likewise if
a user doesn't understand the implications and uses it as-is, expecting a
particular behavior that they are used to...bad things can happen.

Mayne I'm beating a dead horse, but I feel like we are trading safety for
convenience. We should strive to be proactive in creating a strong,
reliable product, even when that reliability causes a very minor
inconvenience.

On Wed, Oct 23, 2019, 5:30 PM vexx32 notifications@github.com wrote:

Aye. And while I understand and accept that this is probably better for
folks who already have used $using: variables extensively, I can't help but
think it's yet another thing where users have to memorise that now we have
two very clearly distinct types of $using: ๐Ÿ™‚

I guess I feel like this is a good opportunity to make a clean break from
the old behaviour to clarify the purpose of the variable usage, and would
help new users / those not as familiar with $using: from the PSWorkflow
days to more easily know the difference.

With $using: sort of pulling double duty as it were, it's much easier to
draw the wrong conclusion from reading code. Much as I love documentation,
I am painfully aware that there will always be a significant portion of the
userbase that doesn't read the docs extensively. I'm constantly pointing
people in the community to bits of the documentation who've been around for
several years, and never saw or bothered to read more than one or two
about_* help topics, if that. For those folks, the fact that $using: can
behave quite unexpectedly in Start-ThreadJob or ForEach-Object -Parallel
compared to, say Invoke-Command, which a great many users are familiar
with.

The familiarity of $using: could easily be a double-edged sword for us
here, I think. But I've said my piece; perhaps I'm worrying over nothing.
๐Ÿ˜

โ€”
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/PowerShell/PowerShell/issues/10499?email_source=notifications&email_token=ADR57JUDJXC73E6H6DHCBFTQQDUC5A5CNFSM4IUT77Y2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECDJI7A#issuecomment-545690748,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ADR57JVRXQ7OPOE37BYOAOLQQDUC5ANCNFSM4IUT77YQ
.

The conclusion should be documented if not done yet.

Was this page helpful?
0 / 5 - 0 ratings