Powershell: @() (array-subexpression operator) should not be optimized away when a cast to an array type is involved

Created on 18 Jul 2017 · 31Comments · Source: PowerShell/PowerShell

^{This is a follow-up to the discussion involving @daxian-dbw and @PetSerAl in #4257}

Context:

PSv5.1 introduced a generally helpful optimization that optimizes @() away in cases where an array is _directly_ constructed inside of it.
- I presume that the motivation for this optimization was the widespread, but redundant and inefficient practice of enclosing ,-constructed arrays ("array literals") in @(), such as in @(1, 2), where 1, 2 would not only suffice, but, in PSv5-, @() would actually (effectively) _clone_ the array.
Unfortunately, this optimization is also applied when a _cast_ to an array type (e.g, [int[]]) is involved, which can have unpleasant side effects - see below.
_Not_ optimizing in this scenario makes for conceptually clearer, predictable behavior.
While a change would technically be a breaking one, it is hard to imagine anyone having relied on the optimized behavior _with casts_ - a typical Bucket 3: Unlikely Grey Area change.
- Also note that when the optimization was _introduced_ in PSv5.1, it was technically a breaking change too - though I suspect it wasn't announced as such or even at all.

Steps to reproduce

@([int[]] (1, 2)).GetType().Name

$arr = 1, 2; @([int[]] $arr).GetType().Name

Expected behavior

Object[]
Object[]

Actual behavior

Int32[]
Int32[]

Note that versions up to and including v5 - before the optimization was introduced - behave as expected.

Why is the current behavior problematic?

^{The examples in this section are courtesy of @PetSerAl.}

All examples are undoubtedly edge cases, but the larger point is that this behavior should never have been introduced, and, given its "stealth status", this is an opportunity to get rid of it.

It subverts the fundamental assumption that @() always creates regular PowerShell arrays (System.Object[]):

> @([int[]] (1, 2))[-1] = 'foo'   # BREAKS, because the array is [int[]], not [object[]]
Cannot convert value "foo" to type "System.Int32".  ...
...

@PetSerAl points out that this fundamental assumption is even still reflected in other places in the source code: ArrayExpressionAst.StaticType (a @(...) expression is of type [System.Management.Automation.Language.ArrayExpressionAst]).

It subverts the expectation that @() always _clones_ an array:

> $a = [int[]] (1, 2); $b = @([int[]] $a); [object]::ReferenceEquals($a, $b)
True    # $a and $b unexpectedly reference the same array.

Note that the accidental reference equality only happens ~if the source type and the cast type are identical~ if the cast type is either the same as the source type or a type that the source type is (directly or indirectly) derived from (covariance).

Thus, if the _cast_ type is [object[]] and the source type is a reference type too, the problem always surfaces.

Environment data

PowerShell Core v6.0.0-beta.4 on macOS 10.12.5
PowerShell Core v6.0.0-beta.4 on Ubuntu 16.04.2 LTS
PowerShell Core v6.0.0-beta.4 on Microsoft Windows 10 Pro (64-bit; v10.0.15063)
Windows PowerShell v5.1.15063.413 on Microsoft Windows 10 Pro (64-bit; v10.0.15063)

Issue-Discussion Resolution-Answered WG-Language

Source

mklement0

All 31 comments

Note that the accidental reference equality only happens if the source type and the cast type are identical.

It not necessary to be identical due to array covariance in .NET:

$a = [string[]]@()
$b = @([object[]]$a)
[object]::ReferenceEquals($a, $b)

PetSerAl on 18 Jul 2017

@PetSerAl: Thanks for the correction; I've fixed the original post.

mklement0 on 18 Jul 2017

Moving the discussions about this issue from #4257. Starting from this comment

From @mklement0:

Given the potentially problematic behavior of something like @([int[]] (1, 2)) (the only case where @() doesn't output an [object[]] array), is it worth creating a new issue for that?

It's hard to imagine that anyone would have relied on that behavior, and eliminating it makes for a more predictable environment.

daxian-dbw on 18 Jul 2017

From @daxian-dbw:

Here is the code that handles this special case. In my understanding, this happens only in one of the following two conditions:

the expression is a ConvertExpressionAst or a ParenExpression that wraps a ConvertExpressionAst, and the convert-to type is an array.
the expression is an ArrayLiteralAst or a ParenExpression that wraps an ArrayLiteralAst.

In the first case, since the conversion is explicitly specified, it's very unlikely the resulting array will again be used as object[]. In the second case, the resulting array from ArrayLiteralAst is already object[] so it doesn't matter. Plus, altering this behavior would be a breaking change. Given the above, I prefer to not make a behavior change. But feel free to notify the PowerShell-Committee if you want a further discussion.

daxian-dbw on 18 Jul 2017

From @mklement0:

I can't imagine much existing code breaking, but, conversely, I can't imagine many people running into a problem with the current behavior in real-world scenarios, so, personally, I'm happy to leave it at that; perhaps @PetSerAl feels differently.

daxian-dbw on 18 Jul 2017

From @PetSerAl:

@daxian-dbw, how about this:

$a = 1..3
$b = @([object[]]$a)
[object]::ReferenceEquals($a, $b)
$b[1] = 123
$a[1]

Now $b is not copy of $a but is $a. Also if current behavior will be kept, is not ArrayExpressionAst.StaticType should be adjusted to match current behavior?

daxian-dbw on 18 Jul 2017

From @daxian-dbw:

Now $b is not copy of $a but is $a

Yes, $b and $a point to the same array in this case. But I don't think the array expression in powershell language is defined to always return a new array object.

Is not ArrayExpressionAst.StaticType should be adjusted to match current behavior?

Good catch. It is inconsistent and should be fixed. But IMHO, the fix should be to change ArrayExpressionAst.StaticType to always return System.Array (also a breaking change :)). StaticType is by definition not accurate, because sometimes the actual type can be known only at runtime, for example, BinaryExpressionAst.StaticType returns System.Object when it's -bor, and the actual type may vary at runtime.

daxian-dbw on 18 Jul 2017

From @PetSerAl:

@daxian-dbw But where I can see that definition of array expression? Before now I always expect it to return new array object each time. I myself use $b = @($a) (without cast although) as array copy operator, and I really do not like if it stop copying array at some point in the future.

daxian-dbw on 18 Jul 2017

From @daxian-dbw:

You can find the Windows PowerShell Langauge Specification Version 3.0 here. It hasn't been updated for a while, so new language features like DSC or PowerShell Class are not in it, but I believe the content in the specification should still apply to PowerShell 6.0.

daxian-dbw on 18 Jul 2017

Thanks for moving the discussion, @daxian-dbw.

The migrated comment where I say that I'm happy to _not_ make a change is obsolete, however, as evidenced by my having created _this_ issue (the reference equality issue convinced me that a change is warranted).

I don't think the array expression in powershell language is defined to always return a new array object.

It is certainly not _documented_ as such (in help topics about_Arrays and about_Operators), and you can argue that the copying is an _implementation detail_.

^{As an side: about_Arrays states "For more information about the array sub-expression operator, see about_Operators", but, if anything, about_Operators contains _less_ information.}

I we look at the _spec_, the message is muddled (from "7.1.7 @(…) operator"; emphasis added):

any objects written to the pipeline as part of the evaluation are collected in an unconstrained 1-dimensional array, in order.

This certainly suggests creation of a _new_ array.

In the examples in the same chapter, the language is looser - though understandably so, given that in most cases PowerShell users needn't worry about reference equality:

$a = @(2,4,6) # result is array of 3

@($a) # result is the same array of 3

Either way, users may have come to rely on the copying behavior, either unwittingly or - as in @PetSerAl's case - intentionally, so the currently implemented optimization arguably constituted a Bucket 2: Reasonable Grey Area change back when it was introduced in PSv5.1 (which, from what I can tell, happened quietly).

mklement0 on 18 Jul 2017

The current behavior of @() is by design and has been so since V1 of PowerShell. The @() operator only guarantees that the result of the enclosed expression is an indexable collection. If the result is already a collection then it is returned unchanged. The primary purpose for the operator is to provide a way to ensure that the result of a pipeline execution is always an indexable collection. Somewhere along the way people got the idea that it was some kind of array literal but that has never been the case and was certainly not the intent.

BrucePay on 19 Jul 2017

@BrucePay: I thought that the initial post contained sufficient context, but let me try again:

The premise of this issue is that the behavior _did_ change in v5.1:

To verify, compare the output from@([int[]] (1, 2)).GetType().Name in v5.1+ vs. v5-

The - speculative - conclusion was that the change in behavior was an optimization intended to _indulge_ the very misconception that you reference: the mistaken belief that @(...) is needed to construct an array.

I'm sure that you or someone from your team can clarify.

This well-meaning optimization succeeded only _partially_:

It works well for ,-constructed arrays _not involving a cast_.
As an aside: whether the misconception should have been _indulged_ - as opposed to _educating users_ - is a separate issue.
- It introduced _treacherous behavior_ for arrays constructed _via a cast_, as demonstrated here.

This issue is a plea to _roll back the treacherous part of the optimization_.

mklement0 on 19 Jul 2017

Please clarify. If we change:

$a = 1..3
$b = @([object[]]$a)
[object]::ReferenceEquals($a, $b)
$b[1] = 123
$a[1]

(here we have the same array)
to:

$a = 1..3
$b = @($a)
[object]::ReferenceEquals($a, $b)
$b[1] = 123
$a[1]

-- here we have a new array. Why? Is it related:

Also if current behavior will be kept, is not ArrayExpressionAst.StaticType should be adjusted to match current behavior?

iSazonov on 19 Jul 2017

@BrucePay v3 specification say:

The result is the (possibly empty) unconstrained 1-dimensional array.

So, it is not just some indexable collection. It guarantied to be an array ([object[]]).

PetSerAl on 19 Jul 2017

👍1

@([object[]]$null).GetType()

PetSerAl on 19 Jul 2017

👍1

@([object[]]$null).GetType()
You cannot call a method on a null-valued expression.
At line:1 char:1
+ @([object[]]$null).GetType()
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

This is a deal breaker and definitely needs to be fixed. And this one

@([System.Collections.Generic.List[object]]$null)
Object reference not set to an instance of an object.
At line:1 char:1
+ @([System.Collections.Generic.List[object]]$null)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (:) [], NullReferenceException
    + FullyQualifiedErrorId : System.NullReferenceException

So, it is not just some indexable collection. It guarantied to be an array ([object[]]).

~~Maybe it should be System.Array, not necessarily System.Object[]. I think ArrayExpressionAst.StaticType should be changed to System.Array, but that would be a breaking change.~~
Never mind, the spec says it's unconstrained, so should be object[].

daxian-dbw on 19 Jul 2017

👍1

@daxian-dbw:

Glad to hear it.

Re @([System.Collections.Generic.List[object]]$null), out of curiosity:
Since the cast involved here is not an _array_ cast, what is the cause of the problem? Is this related to the same optimization, and, if so, how?

@iSazonov:

My understanding is limited, but I hope it's accurate enough to say that in your example - due to the presence of array cast [object[]] - the (problematic part of the) PSv5.1+ optimization effectively behaves as if @(...) hadn't been specified at all; in other words: @([object[]]$a) and [object[]]$a behave identically in your scenario, and $b = [object[]]$a simply copies the array reference (in other words $b and $a reference the same array).

By contrast, with a purely , constructed array (no cast), this behavior is not problematic, because , creates an [object[]] instance that by definition has no independent existence outside of @(...), and skipping the enumeration-and-repackaging step that @(...) normally entails actually speeds up legacy code that mistakenly uses @() to initialize array literals.

A @(...) expression is parsed as type ArrayExpressionAst and ArrayExpressionAst.StaticType - which was never changed - still indicates [object[]], even though the problematic optimization introduced @(...) expressions whose type could be different (e.g., @([int[]] (1, 2)).GetType().Name yields Int32[]).

mklement0 on 19 Jul 2017

Re @([System.Collections.Generic.List[object]]$null), out of curiosity

It's resulted in by the code here. A null check should be done.

daxian-dbw on 19 Jul 2017

👍1

It seems we should fix:

$a = 1..3
$b = $a
[object]::ReferenceEquals($a, $b)

$a = 1..3
$b = @($a)
[object]::ReferenceEquals($a, $b)

iSazonov on 20 Jul 2017

@iSazonov:

No, that (still) works as intended:

The first statement creates another variable pointing to the same array.
The second statement creates a _new_ array, by enumerating the result of expression $a and packaging that as a new [object[]] array.

The latter behavior is in line with @()'s documented intent, and while the aspect of _always_ creating a _new_ array (effectively, a copy (shallow clone), if the input array is also [object[]]) is not explicitly stated, to me it is implied by the spec, has always worked that way, and people may have come to _rely_ on this behavior, as evidenced by @PetSerAl's habit of using @() as an array-cloning shortcut.

mklement0 on 20 Jul 2017

Thanks for clarify.
Continue 😄

$a = 1..3
$b = @($a = 1..4;[object[]]$a)
[object]::ReferenceEquals($a, $b)

$a = 1..3
$b = @([object[]]$a)
[object]::ReferenceEquals($a, $b)

iSazonov on 20 Jul 2017

👍1

@daxian-dbw Please clarify about "Review - Committee" label in the Issue.

iSazonov on 3 Aug 2017

@mklement0 @SteveL-MSFT
Also should we add "Documentation Need" label?

iSazonov on 3 Aug 2017

@iSazonov: I think that a mention in the release notes should be sufficient, given that the original optimization was never documented and many people probably never noticed it.

mklement0 on 3 Aug 2017

👍1

This has been a good discussion, but I need a summary here. Is there anything that the committee still needs to review? Since @daxian-dbw fixed some aspects of this discussion, I think the only thing left is a regression introduced by an optimization in 5.x? If so, then I think Documentation Needed is sufficient.

SteveL-MSFT on 4 Aug 2017

The thing that committee needed to review was this:

With PSv5.1, @(...) returns an array of a specific type in some scenarios, for example @([int[]] (1, 2)).GetType().Name. However, according to PowerShell Language Specification Version 3.0, as quoted: "_The result is the (possibly empty) unconstrained 1-dimensional array_", @(...) should only return object[] array.

The PR #4296 addressed this, and Jason approved, so I assume the committee is fine with the change to make @(...) always return object[] array. If so, we can remove the 'Review Committee' label.

daxian-dbw on 4 Aug 2017

Just for future consideration, I'd like to address @lzybkr's comments on the linked PR:

I should add that the intent of @() is not to create a new array, but to ensure the result is an array, and not necessarily an object[].

While I agree re the overall intent of @(), I feel that it is better to not contravene two likely user expectations (which happen to be consistent with what the 3.0 spec says):

that the array returned is unconstrained, i.e., of type [object[]], i.e., a _regular_ PS array. (Most PowerShell users probably don't even think about array-element typing and probably expect arrays to be able to hold any mix of data types)
that @() always returns a _new_ array, without having to worry about reference equality (which is probably not even on the radar of most PowerShell users).

In other words, I think we could be open to changing what the language specification says to allow for better performance.

Passing strongly typed arrays through strikes me as a rare use case that I don't think warrants optimization, given that it would contravene the expectations above.

By contrast, optimizing the _typical_ use case is well worth considering: passing the [object[]] output of a _single command_ through (optimizing the @() away if the enclosed single command happens to return a [object[]] array already).

I can't speak to the technical feasibility, but here's an example where the @() could be optimized away (without contravening the assumptions above), which makes a noticeable difference with large output sets:

# Could be optimized to: Write-Output (1..10000000)
# Currently, the output is collected and a new array is constructed.
@(Write-Output (1..10000000))

mklement0 on 5 Aug 2017

@mklement0 Do you mean that it should return true?

$a = 1..3
$b = @($a)
[object]::ReferenceEquals($a, $b)

true

iSazonov on 6 Aug 2017

@iSazonov:

No, that's one of the user expectations I do _not_ want to contravene.

Conversely, if we _commit_ to _never_ optimizing @() away in this scenario, we can advertise that @() applied to a _variable_ has a secondary use as a shallow-clone-any-array-or-collection-to-a-regular PS array operator (which it has been restored to with this PR).

What I was suggesting is: make @() return a _new_ array _except_ in the following 2 scenarios - and no others - in which @() can be _optimized away_:

already implemented: an array literal _without a cast_; e.g., @(1, 2, 3)
suggested improvement: a single command that outputs an [object[]] array already; e.g., @(Get-ChildItem -Recurse); of course, if a _scalar_ is situationally returned, it still needs to be wrapped in an (invariably new) array - that is the primary purpose of @(), after all.

In both scenarios reference equality is a moot point, given that no variable referencing a preexisting array is involved. (I suppose in roundabout ways a command's output could hypothetically reference an array stored in a PS variable, but I don't think we need to worry about that).

The first optimization is really just a _courtesy_ to improve the performance in cases where users mistakenly apply @() to (unconstrained) array literals, even though it is pointless.

The 2nd one is by far the most common use case for @(), if I were to guess, and many scripts could benefit from it. Again, I can't speak to technical feasibility / effort.

mklement0 on 6 Aug 2017

@mklement0 Thanks for clarify.

Above I see Bruce and Jason said "the intent of @() is not to create a new array". I expect that behavior.
We need conclusion from @powershell-committee for this point too.

iSazonov on 7 Aug 2017

@iSazonov:

Indeed, array-cloning is not @()'s purpose, and maybe what I'm proposing is too convoluted (I like the conceptual simplicity of "make this an array if it isn't; otherwise use it as-is" - which guarantees neither cloning nor being unconstrained).

My thought was that since changing the always-create-a-new-array behavior would (a) break existing code and (b) applying @() to a _variable_ is an exotic use case to begin with, why not turn it into a documented virtue that gives us a convenient, PS-idiomatic way to clone arrays?

However, my suggestion to optimize the @(<single-command>) use case is independent of the above, and strikes me as a worthwhile, given that it's probably the most common use of @() by far.
I should say that I've inferred that this use case is currently _not_ optimized solely from timing commands - I haven't looked at the source code.

mklement0 on 7 Aug 2017

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Can't Uninstall/Remove AzureRM module from PowerShell on Mac OS

Michal-Ziemba · 3Comments

Feature Request: Add conditional colorization to Format-Table command

pcgeek86 · 3Comments

New-PSSession error "subsystem request failed on channel 0" when using SSH key authentication

pcgeek86 · 3Comments

Support for $MaximumFunctionCount and other limits should be removed

lzybkr · 3Comments

Referencing System.Management.Automation in .NET Core / VS Code

concentrateddon · 3Comments