This is a follow-up to the discussion involving @daxian-dbw and @PetSerAl in #4257
Context:
PSv5.1 introduced a generally helpful optimization that optimizes @()
away in cases where an array is _directly_ constructed inside of it.
,
-constructed arrays ("array literals") in @()
, such as in @(1, 2)
, where 1, 2
would not only suffice, but, in PSv5-, @()
would actually (effectively) _clone_ the array.Unfortunately, this optimization is also applied when a _cast_ to an array type (e.g, [int[]]
) is involved, which can have unpleasant side effects - see below.
_Not_ optimizing in this scenario makes for conceptually clearer, predictable behavior.
While a change would technically be a breaking one, it is hard to imagine anyone having relied on the optimized behavior _with casts_ - a typical Bucket 3: Unlikely Grey Area change.
@([int[]] (1, 2)).GetType().Name
$arr = 1, 2; @([int[]] $arr).GetType().Name
Object[]
Object[]
Int32[]
Int32[]
Note that versions up to and including v5 - before the optimization was introduced - behave as expected.
The examples in this section are courtesy of @PetSerAl.
All examples are undoubtedly edge cases, but the larger point is that this behavior should never have been introduced, and, given its "stealth status", this is an opportunity to get rid of it.
@()
always creates regular PowerShell arrays (System.Object[]
):> @([int[]] (1, 2))[-1] = 'foo' # BREAKS, because the array is [int[]], not [object[]]
Cannot convert value "foo" to type "System.Int32". ...
...
@PetSerAl points out that this fundamental assumption is even still reflected in other places in the source code: ArrayExpressionAst.StaticType
(a @(...)
expression is of type [System.Management.Automation.Language.ArrayExpressionAst]
).
@()
always _clones_ an array:> $a = [int[]] (1, 2); $b = @([int[]] $a); [object]::ReferenceEquals($a, $b)
True # $a and $b unexpectedly reference the same array.
Note that the accidental reference equality only happens ~if the source type and the cast type are identical~ if the cast type is either the same as the source type or a type that the source type is (directly or indirectly) derived from (covariance).
Thus, if the _cast_ type is [object[]]
and the source type is a reference type too, the problem always surfaces.
PowerShell Core v6.0.0-beta.4 on macOS 10.12.5
PowerShell Core v6.0.0-beta.4 on Ubuntu 16.04.2 LTS
PowerShell Core v6.0.0-beta.4 on Microsoft Windows 10 Pro (64-bit; v10.0.15063)
Windows PowerShell v5.1.15063.413 on Microsoft Windows 10 Pro (64-bit; v10.0.15063)
Note that the accidental reference equality only happens if the source type and the cast type are identical.
It not necessary to be identical due to array covariance in .NET:
$a = [string[]]@()
$b = @([object[]]$a)
[object]::ReferenceEquals($a, $b)
@PetSerAl: Thanks for the correction; I've fixed the original post.
Moving the discussions about this issue from #4257. Starting from this comment
From @mklement0:
Given the potentially problematic behavior of something like @([int[]] (1, 2)) (the only case where @() doesn't output an [object[]] array), is it worth creating a new issue for that?
It's hard to imagine that anyone would have relied on that behavior, and eliminating it makes for a more predictable environment.
From @daxian-dbw:
Here is the code that handles this special case. In my understanding, this happens only in one of the following two conditions:
ConvertExpressionAst
or a ParenExpression that wraps a ConvertExpressionAst
, and the convert-to type is an array.ArrayLiteralAst
or a ParenExpression that wraps an ArrayLiteralAst
.In the first case, since the conversion is explicitly specified, it's very unlikely the resulting array will again be used as object[]
. In the second case, the resulting array from ArrayLiteralAst
is already object[]
so it doesn't matter. Plus, altering this behavior would be a breaking change. Given the above, I prefer to not make a behavior change. But feel free to notify the PowerShell-Committee if you want a further discussion.
From @mklement0:
I can't imagine much existing code breaking, but, conversely, I can't imagine many people running into a problem with the current behavior in real-world scenarios, so, personally, I'm happy to leave it at that; perhaps @PetSerAl feels differently.
From @PetSerAl:
@daxian-dbw, how about this:
$a = 1..3
$b = @([object[]]$a)
[object]::ReferenceEquals($a, $b)
$b[1] = 123
$a[1]
Now $b
is not copy of $a
but is $a
. Also if current behavior will be kept, is not ArrayExpressionAst.StaticType
should be adjusted to match current behavior?
From @daxian-dbw:
Now
$b
is not copy of$a
but is$a
Yes, $b
and $a
point to the same array in this case. But I don't think the array expression in powershell language is defined to always return a new array object.
Is not ArrayExpressionAst.StaticType should be adjusted to match current behavior?
Good catch. It is inconsistent and should be fixed. But IMHO, the fix should be to change ArrayExpressionAst.StaticType
to always return System.Array
(also a breaking change :)). StaticType
is by definition not accurate, because sometimes the actual type can be known only at runtime, for example, BinaryExpressionAst.StaticType
returns System.Object
when it's -bor
, and the actual type may vary at runtime.
From @PetSerAl:
@daxian-dbw But where I can see that definition of array expression? Before now I always expect it to return new array object each time. I myself use $b = @($a)
(without cast although) as array copy operator, and I really do not like if it stop copying array at some point in the future.
From @daxian-dbw:
You can find the Windows PowerShell Langauge Specification Version 3.0
here. It hasn't been updated for a while, so new language features like DSC or PowerShell Class are not in it, but I believe the content in the specification should still apply to PowerShell 6.0.
Thanks for moving the discussion, @daxian-dbw.
The migrated comment where I say that I'm happy to _not_ make a change is obsolete, however, as evidenced by my having created _this_ issue (the reference equality issue convinced me that a change is warranted).
I don't think the array expression in powershell language is defined to always return a new array object.
It is certainly not _documented_ as such (in help topics about_Arrays
and about_Operators
), and you can argue that the copying is an _implementation detail_.
As an side: about_Arrays
states "For more information about the array sub-expression operator, see about_Operators
", but, if anything, about_Operators
contains _less_ information.
I we look at the _spec_, the message is muddled (from "7.1.7 @(…) operator"; emphasis added):
any objects written to the pipeline as part of the evaluation are collected in an unconstrained 1-dimensional array, in order.
This certainly suggests creation of a _new_ array.
In the examples in the same chapter, the language is looser - though understandably so, given that in most cases PowerShell users needn't worry about reference equality:
$a = @(2,4,6) # result is array of 3
@($a) # result is the same array of 3
Either way, users may have come to rely on the copying behavior, either unwittingly or - as in @PetSerAl's case - intentionally, so the currently implemented optimization arguably constituted a Bucket 2: Reasonable Grey Area change back when it was introduced in PSv5.1 (which, from what I can tell, happened quietly).
The current behavior of @() is by design and has been so since V1 of PowerShell. The @() operator only guarantees that the result of the enclosed expression is an indexable collection. If the result is already a collection then it is returned unchanged. The primary purpose for the operator is to provide a way to ensure that the result of a pipeline execution is always an indexable collection. Somewhere along the way people got the idea that it was some kind of array literal but that has never been the case and was certainly not the intent.
@BrucePay: I thought that the initial post contained sufficient context, but let me try again:
The premise of this issue is that the behavior _did_ change in v5.1:
@([int[]] (1, 2)).GetType().Name
in v5.1+ vs. v5-The - speculative - conclusion was that the change in behavior was an optimization intended to _indulge_ the very misconception that you reference: the mistaken belief that @(...)
is needed to construct an array.
This well-meaning optimization succeeded only _partially_:
It works well for ,
-constructed arrays _not involving a cast_.
As an aside: whether the misconception should have been _indulged_ - as opposed to _educating users_ - is a separate issue.
This issue is a plea to _roll back the treacherous part of the optimization_.
Please clarify. If we change:
$a = 1..3
$b = @([object[]]$a)
[object]::ReferenceEquals($a, $b)
$b[1] = 123
$a[1]
(here we have the same array)
to:
$a = 1..3
$b = @($a)
[object]::ReferenceEquals($a, $b)
$b[1] = 123
$a[1]
-- here we have a new array. Why? Is it related:
Also if current behavior will be kept, is not ArrayExpressionAst.StaticType should be adjusted to match current behavior?
@BrucePay v3 specification say:
The result is the (possibly empty) unconstrained 1-dimensional array.
So, it is not just some indexable collection. It guarantied to be an array ([object[]]
).
@([object[]]$null).GetType()
@([object[]]$null).GetType()
You cannot call a method on a null-valued expression.
At line:1 char:1
+ @([object[]]$null).GetType()
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidOperation: (:) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
This is a deal breaker and definitely needs to be fixed. And this one
@([System.Collections.Generic.List[object]]$null)
Object reference not set to an instance of an object.
At line:1 char:1
+ @([System.Collections.Generic.List[object]]$null)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OperationStopped: (:) [], NullReferenceException
+ FullyQualifiedErrorId : System.NullReferenceException
So, it is not just some indexable collection. It guarantied to be an array ([object[]]).
Maybe it should be System.Array
, not necessarily System.Object[]
. I think ArrayExpressionAst.StaticType
should be changed to System.Array
, but that would be a breaking change.
Never mind, the spec says it's unconstrained, so should be object[]
.
@daxian-dbw:
Glad to hear it.
Re @([System.Collections.Generic.List[object]]$null)
, out of curiosity:
Since the cast involved here is not an _array_ cast, what is the cause of the problem? Is this related to the same optimization, and, if so, how?
@iSazonov:
My understanding is limited, but I hope it's accurate enough to say that in your example - due to the presence of array cast [object[]]
- the (problematic part of the) PSv5.1+ optimization effectively behaves as if @(...)
hadn't been specified at all; in other words: @([object[]]$a)
and [object[]]$a
behave identically in your scenario, and $b = [object[]]$a
simply copies the array reference (in other words $b
and $a
reference the same array).
,
constructed array (no cast), this behavior is not problematic, because ,
creates an [object[]]
instance that by definition has no independent existence outside of @(...)
, and skipping the enumeration-and-repackaging step that @(...)
normally entails actually speeds up legacy code that mistakenly uses @()
to initialize array literals.A @(...)
expression is parsed as type ArrayExpressionAst
and ArrayExpressionAst.StaticType
- which was never changed - still indicates [object[]]
, even though the problematic optimization introduced @(...)
expressions whose type could be different (e.g., @([int[]] (1, 2)).GetType().Name
yields Int32[]
).
Re @([System.Collections.Generic.List[object]]$null), out of curiosity
It's resulted in by the code here. A null check should be done.
It seems we should fix:
$a = 1..3
$b = $a
[object]::ReferenceEquals($a, $b)
$a = 1..3
$b = @($a)
[object]::ReferenceEquals($a, $b)
@iSazonov:
No, that (still) works as intended:
The first statement creates another variable pointing to the same array.
The second statement creates a _new_ array, by enumerating the result of expression $a
and packaging that as a new [object[]]
array.
The latter behavior is in line with @()
's documented intent, and while the aspect of _always_ creating a _new_ array (effectively, a copy (shallow clone), if the input array is also [object[]]
) is not explicitly stated, to me it is implied by the spec, has always worked that way, and people may have come to _rely_ on this behavior, as evidenced by @PetSerAl's habit of using @()
as an array-cloning shortcut.
Thanks for clarify.
Continue 😄
$a = 1..3
$b = @($a = 1..4;[object[]]$a)
[object]::ReferenceEquals($a, $b)
$a = 1..3
$b = @([object[]]$a)
[object]::ReferenceEquals($a, $b)
@daxian-dbw Please clarify about "Review - Committee" label in the Issue.
@mklement0 @SteveL-MSFT
Also should we add "Documentation Need" label?
@iSazonov: I think that a mention in the release notes should be sufficient, given that the original optimization was never documented and many people probably never noticed it.
This has been a good discussion, but I need a summary here. Is there anything that the committee still needs to review? Since @daxian-dbw fixed some aspects of this discussion, I think the only thing left is a regression introduced by an optimization in 5.x? If so, then I think Documentation Needed
is sufficient.
The thing that committee needed to review was this:
With PSv5.1,
@(...)
returns an array of a specific type in some scenarios, for example@([int[]] (1, 2)).GetType().Name
. However, according to PowerShell Language Specification Version 3.0, as quoted: "_The result is the (possibly empty) unconstrained 1-dimensional array_",@(...)
should only returnobject[]
array.
The PR #4296 addressed this, and Jason approved, so I assume the committee is fine with the change to make @(...)
always return object[]
array. If so, we can remove the 'Review Committee'
label.
Just for future consideration, I'd like to address @lzybkr's comments on the linked PR:
I should add that the intent of @() is not to create a new array, but to ensure the result is an array, and not necessarily an object[].
While I agree re the overall intent of @()
, I feel that it is better to not contravene two likely user expectations (which happen to be consistent with what the 3.0 spec says):
that the array returned is unconstrained, i.e., of type [object[]]
, i.e., a _regular_ PS array. (Most PowerShell users probably don't even think about array-element typing and probably expect arrays to be able to hold any mix of data types)
that @()
always returns a _new_ array, without having to worry about reference equality (which is probably not even on the radar of most PowerShell users).
In other words, I think we could be open to changing what the language specification says to allow for better performance.
Passing strongly typed arrays through strikes me as a rare use case that I don't think warrants optimization, given that it would contravene the expectations above.
By contrast, optimizing the _typical_ use case is well worth considering: passing the [object[]]
output of a _single command_ through (optimizing the @()
away if the enclosed single command happens to return a [object[]]
array already).
I can't speak to the technical feasibility, but here's an example where the @()
could be optimized away (without contravening the assumptions above), which makes a noticeable difference with large output sets:
# Could be optimized to: Write-Output (1..10000000)
# Currently, the output is collected and a new array is constructed.
@(Write-Output (1..10000000))
@mklement0 Do you mean that it should return true
?
$a = 1..3
$b = @($a)
[object]::ReferenceEquals($a, $b)
true
@iSazonov:
No, that's one of the user expectations I do _not_ want to contravene.
Conversely, if we _commit_ to _never_ optimizing @()
away in this scenario, we can advertise that @()
applied to a _variable_ has a secondary use as a shallow-clone-any-array-or-collection-to-a-regular PS array operator (which it has been restored to with this PR).
What I was suggesting is: make @()
return a _new_ array _except_ in the following 2 scenarios - and no others - in which @()
can be _optimized away_:
already implemented: an array literal _without a cast_; e.g., @(1, 2, 3)
suggested improvement: a single command that outputs an [object[]]
array already; e.g., @(Get-ChildItem -Recurse)
; of course, if a _scalar_ is situationally returned, it still needs to be wrapped in an (invariably new) array - that is the primary purpose of @()
, after all.
In both scenarios reference equality is a moot point, given that no variable referencing a preexisting array is involved. (I suppose in roundabout ways a command's output could hypothetically reference an array stored in a PS variable, but I don't think we need to worry about that).
The first optimization is really just a _courtesy_ to improve the performance in cases where users mistakenly apply @()
to (unconstrained) array literals, even though it is pointless.
The 2nd one is by far the most common use case for @()
, if I were to guess, and many scripts could benefit from it. Again, I can't speak to technical feasibility / effort.
@mklement0 Thanks for clarify.
Above I see Bruce and Jason said "the intent of @() is not to create a new array". I expect that behavior.
We need conclusion from @powershell-committee for this point too.
@iSazonov:
Indeed, array-cloning is not @()
's purpose, and maybe what I'm proposing is too convoluted (I like the conceptual simplicity of "make this an array if it isn't; otherwise use it as-is" - which guarantees neither cloning nor being unconstrained).
My thought was that since changing the always-create-a-new-array behavior would (a) break existing code and (b) applying @()
to a _variable_ is an exotic use case to begin with, why not turn it into a documented virtue that gives us a convenient, PS-idiomatic way to clone arrays?
However, my suggestion to optimize the @(<single-command>)
use case is independent of the above, and strikes me as a worthwhile, given that it's probably the most common use of @()
by far.
I should say that I've inferred that this use case is currently _not_ optimized solely from timing commands - I haven't looked at the source code.