Powershell: Surprising behavior of @() (array subexpression operator) with arrays/collections created with New-Object

Created on 14 Jul 2017  路  16Comments  路  Source: PowerShell/PowerShell

tl; dr:

This issue is written based on the following, currently unfulfilled expectation:

  • When you wrap a New-Object call that outputs an array / collection in @(), it should not create an array wrapper around it.

@PetSerAl disagrees with my expectation (quotes from the comments on this SO post that inspired this issue, part of which is reprinted below):

What is unexpected about this behavior? New-Object writes a single element to the pipeline and @() wraps it in an array.

On the tangentially related issue that @() preserves the specific array type:

Also, IMHO, @([int[]] (1, 2)).GetType().Name [returning Int32[]] is a bug (over-optimization; it returns Object[] in v2)


As of Windows PowerShell v5.1 / PowerShell Core v6.0.0-beta.4, @() unexpectedly wraps arrays / collections instantiated directly as .NET types with the New-Object cmdlet in an outer, single-element array; in other words: it doesn't recognize that the results already _are_ array-valued:

> @(New-Object 'Object[]' 2).Count; @(New-Object 'Object[]' 2)[0].Count
1  # !! The array was unexpectedly wrapped in an outer single-item array.
2  # !! Element [0] contains the original array.

> @(New-Object 'System.Collections.ArrayList').Count; @(New-Object 'System.Collections.ArrayList')[0].Count
1  # !! The array list was unexpectedly wrapped in an outer single-item array.
0  # !! Element [0] contains the original (empty) array list.

To contrast the surprising New-Object behavior above with commands that _should_ be equivalent, but work as expected:

> @((New-Object 'Object[]' 2)).Count
2 # OK - !! Simply enclosing the New-Object call in (...) made the difference.

> @([int[]] (1, 2)).Count
2 # OK - using a cast in lieu of New-Object

> @([System.Collections.ArrayList]::new()).Count
0 # OK - using the static ::new() method in lieu of New-Object

Environment data

PowerShell Core v6.0.0-beta.4 on macOS 10.12.5
PowerShell Core v6.0.0-beta.4 on Ubuntu 16.04.2 LTS
PowerShell Core v6.0.0-beta.4 on Microsoft Windows 10 Pro (64-bit; v10.0.15063)
Windows PowerShell v5.1.15063.413 on Microsoft Windows 10 Pro (64-bit; v10.0.15063)
Issue-Discussion Resolution-Answered WG-Language

All 16 comments

Let me describe how I understand how @() work.

Suppose you have this:

$Result = @(
    some statements here
)

It is more or less equivalent to:

. {
    some statements here
} | ForEach-Object -Begin { $Temp = [Collections.ArrayList]::new() } -Process { [void]$Temp.Add($_) } -End { $Result = $Temp.ToArray() }

So @() invoke provided statements, collect all the pipeline output and create array from them. Notice: it absolutely does not care what kind of objects (scalar, array or some other collection) you write into pipeline, and that is where great power of @() is.

_Note special case: @(<array literal>). I think that in this case @() is eliminated as optimization, but that leads to incorrect, IMHO, result, when <array literal> is typed. @([int[]] (1, 2)).GetType().Name return Int32[] instead of Object[]._

How is that useful? It is useful when you work with arrays of arrays (jagged arrays in .NET terms). Suppose you have following command:

$Result = Get-Content File -ReadCount 3

If you have 9 lines in File, then result will be [[1, 2, 3], [4, 5, 6], [7, 8, 9]] (array of arrays). But if have only 3 lines, then result will be [1, 2, 3], but not [[1, 2, 3]]. What if you always want to have array of arrays, regardless of number of lines in File? Simple solution:

$Result = @(Get-Content File -ReadCount 3)

Other example: suppose you want to filter array of rows (arrays) by some condition, and then find how many are actually satisfy condition.

$Array = (1, 2, 3), (4, 5, 6), (7, 8, 9)
$Result = $Array | Where-Object { $_[1]%2 }
$Result.Count # 3

That is how to do it right:

$Array = (1, 2, 3), (4, 5, 6), (7, 8, 9)
$Result = @($Array | Where-Object { $_[1]%2 })
$Result.Count # 1

The same apply on any other transformation of array of rows, for example sorting. @() make this cases work especially because it wrap single array wrote into pipeline into another single element array, which other methods, like casting to [Array], did not do.

So, how does @(New-Object 'Object[]' 2) different from @($Array | Where-Object { $_[1]%2 }) from above? Right answer: them does not different, both write array into pipeline as single object, and it got wrapped by @() operator, because it is what that operator do.

One thing everyone should understand about @() is that it always treat its content as statement, not as expression. That means that standard collection unrolling behavior apply, when expression supplied as first (or only) element of pipeline.

$a = [int[]](1, 2)
@(
    $a; # write two objects into pipeline
).Count # 2
$AutomationNull = [System.Management.Automation.Internal.AutomationNull]::Value
@(
    $AutomationNull; # write zero objects into pipeline, do not work same way in v2
).Count # 0
@(
    'Scalar'; # write single object into pipeline
).Count # 1
function WriteArrayAsSingleObject { ,(1..5) } # nothing special in New-Object, after all
@(
    WriteArrayAsSingleObject; # write single object into pipeline
).Count # 1
@(
    (WriteArrayAsSingleObject); # write five objects into pipeline, do not work same way in v2
).Count # 5

@PetSerAl: Thanks for the background info and the helpful examples.

First, let me say that my example is obviously contrived and this issue may rarely, if ever, arise in the real world - of course, if you're explicitly constructing an array / collection, wrapping it in @() is pointless.

In the following examples I'll use something _other_ than an array for illustration, namely an _empty_ [System.Collections.ArrayList] instance.

If I understand you correctly, you're saying that:

The equivalent of this _cmdlet_-based statement:

# Yields an [object[]] array whose 1 item is the empty array list.
@(New-Object System.Collections.ArrayList)   

is the following _expression_-based statement:

# To get the same result with an expression, the [System.Collections.ArrayList]::new() 
# call must be wrapped in an array(!).
@(, [System.Collections.ArrayList]::new())

This equivalence in itself is puzzling - why wouldn't the equivalent be just @([System.Collections.ArrayList]::new()), without the wrapper array?

It seems to me that there's no good _conceptual_ reason for New-Object to wrap the instance created in an outer, single-element array when writing to the pipeline - but, if I understand you correctly, doing so is a _technical necessity_, because the collection instance would otherwise invariably be converted to System.Object[] on output, correct?

Here's another example of where the distinction is puzzling, at least to those not intimately familiar with the inner workings of PowerShell, and taking that perspective (as someone who has _some_ understanding of them) was the reason for creating this issue:

# Yields *1*, because the *wrapped* empty array list is sent through the pipeline.
New-Object System.Collections.ArrayList | Measure-Object | % Count 

# Yields *0*, because the empty array list itself is enumerated:
[System.Collections.ArrayList]::new() | Measure-Object | % Count

The fact that (...) causes unwrapping of _pipeline_ output is also far from obvious:

# Now yields *0*, because the (...) forces unwrapping.
(New-Object System.Collections.ArrayList) | Measure-Object | % Count 

Incidentally, that makes inspecting the true output generated by New-Object System.Collections.ArrayList all but impossible:

# -InputObject must be used to inspect collections (using the pipeline would unwrap),
# but the unavoidable use of (...) unwraps too.
Get-Member -InputObject (New-Object System.Collections.ArrayList)

# Storing in a variable doesn't help, because the assignment unwraps too.
$al = New-Object System.Collections.ArrayList
Get-Member -InputObject $al

Given that this seemingly comes down to the fundamentals of pipeline behavior, there's probably nothing that can be done - except perhaps consider including the findings in a collection of yet-to-be-created advanced help topics.

Unless someone thinks that an actionable change is warranted here, I'll close this issue soon.

P.S.:

Re:

Note special case: @(<array literal>). I think that in this case @() is eliminated as optimization, but that leads to incorrect, IMHO, result, when <array literal> is typed. @([int[]] (1, 2)).GetType().Name return Int32[] instead of Object[].

I think what you mean is an _expression_ that creates an array (PowerShell has no array literals, only the , operator for array construction, and here a cast (class instantiation) is involved as well).

I have no real opinion, but why do you think it is incorrect?

Because @() should _consistently_ create System.Object[] arrays?
What's the harm in keeping the specific type?

Something like the following still works, for instance (and _then_ yields [System.Object[]]):

@([int[]] (1, 2)) + 'three' 

As a minor consideration, arrays typed as value types perform better (though (shallow) _cloning_ of the array still happens).

@PetSerAl:

I think I finally wrapped my head around what you were saying, including why the type-preserving optimization may be problematic.

Thanks for your help.

I've updated my SO answer, which hopefully now contains a correct and comprehensive description of how @() works.

The equivalent of this cmdlet-based statement:

# Yields an [object[]] array whose 1 item is the empty array list.
@(New-Object System.Collections.ArrayList)

is the following expression-based statement:

# To get the same result with an expression, the [System.Collections.ArrayList]::new()
# call must be wrapped in an array(!).
@(, [System.Collections.ArrayList]::new())

This equivalence in itself is puzzling - why wouldn't the equivalent be just @([System.Collections.ArrayList]::new()), without the wrapper array?

Because PowerShell only unwrap result of expressions, but not result of commands:

{
    New-Object System.Collections.ArrayList
}.Ast.EndBlock.Statements[0].PipelineElements[0].GetType().Name # CommandAst
{
    [System.Collections.ArrayList]::new()
}.Ast.EndBlock.Statements[0].PipelineElements[0].GetType().Name # CommandExpressionAst

So, when you use an expression, then you need something to prevent unwrapping.

It seems to me that there's no good conceptual reason for New-Object to wrap the instance created in an outer, single-element array when writing to the pipeline - but, if I understand you correctly, doing so is a technical necessity, because the collection instance would otherwise invariably be converted to System.Object[] on output, correct?

PowerShell do not automatically unwrap collections for compiled cmdlets, unless so was explicitly asked. You can see that, if you use $PSCmdlet.WriteObject in advanced function, because it is what cmdlets use to write objects into pipeline:

function Test-WriteObject {
    [CmdletBinding()]param() 
    $Result = [System.Collections.ArrayList]::new()
    $PSCmdlet.WriteObject($Result)
}
Test-WriteObject | Measure-Object | % Count
(Test-WriteObject).GetType().Name

So, New-Object simply does not ask for unwrapping behavior in the first palace. It does not wrap ArrayList into something, because it just not necessary for cmdlets.

Here's another example of where the distinction is puzzling, at least to those not intimately familiar with the inner workings of PowerShell, and taking that perspective (as someone who has some understanding of them) was the reason for creating this issue:

# Yields *1*, because the *wrapped* empty array list is sent through the pipeline.
New-Object System.Collections.ArrayList | Measure-Object | % Count

# Yields *0*, because the empty array list itself is enumerated:
[System.Collections.ArrayList]::new() | Measure-Object | % Count

Same distinction: command vs. expression:

{
    New-Object System.Collections.ArrayList | Measure-Object | % Count
}.Ast.EndBlock.Statements[0].PipelineElements[0].GetType().Name
{
    [System.Collections.ArrayList]::new() | Measure-Object | % Count
}.Ast.EndBlock.Statements[0].PipelineElements[0].GetType().Name

The fact that (...) causes unwrapping of pipeline output is also far from obvious:

# Now yields *0*, because the (...) forces unwrapping.
(New-Object System.Collections.ArrayList) | Measure-Object | % Count

Parenthesis (...) do not cause unwrapping. Them transform command into expression:

{
    (New-Object System.Collections.ArrayList) | Measure-Object | % Count
}.Ast.EndBlock.Statements[0].PipelineElements[0].GetType().Name

And because that expression used as first element in pipeline, that cause unwrapping.

Incidentally, that makes inspecting the true output generated by New-Object System.Collections.ArrayList all but impossible:

In my opinion true output of New-Object System.Collections.ArrayList is an ArrayList instance, and both your commands show members of it.

PowerShell has no array literals

{ 1, 2, 3 }.Ast.EndBlock.Statements[0].PipelineElements[0].Expression.GetType().Name

I have no real opinion, but why do you think it is incorrect?

Because @() should consistently create System.Object[] arrays?

Yes, I think it should be always System.Object[]. Also:

{ @([int[]] (1, 2, 3)) }.Ast.EndBlock.Statements[0].PipelineElements[0].Expression.StaticType

What's the harm in keeping the specific type?

@([int[]] (1, 2, 3))[1] = 'Not int'

@PetSerAl:

That's all very illuminating and I appreciate that you took the time to explain it.

I have updated the SO answer yet again, and I'd appreciate it if you could take another look so I don't make any incorrect claims there.


Re array literals:

> { 1, 2, 3 }.Ast.EndBlock.Statements[0].PipelineElements[0].Expression.GetType().Name
ArrayLiteralAst

I guess it's just a matter of terminology - there is no disagreement in substance here.

I based my statement on this quote from PowerShell in Action, 2nd Edition (emphasis added):

Here鈥檚 how array literals are defined in PowerShell: They鈥檙e not. There鈥檚 no array literal notation in PowerShell. [鈥 instead of having array literals, there鈥檚 a set of operations that create collections as needed.

P.S.: The situation changes with a cast:

> { [int[]] (1, 2, 3) }.Ast.EndBlock.Statements[0].PipelineElements[0].Expression.GetType().Name
ConvertExpressionAst

Great discussions!!!

@PetSerAl is right, PowerShell unravels collection results from expressions, but not commands.
Commands themselves decide whether the result writen to pipe should be unravelled or not (Cmdlet.WriteObject(object) and Cmdlet.WriteObject(object, enumerateCollection) for C# and Write-Output and Write-Output -NoEnumerate for script).

As for the quote, here is more context on it:

Most programming languages have some kind of array literal notation similar to the PowerShell hash literal notation (@{...}), where there's a beginning character sequence followed by a list of values, followed by a closing character sequence. Here鈥檚 how array literals are defined in PowerShell: They鈥檙e not. There鈥檚 no array literal notation in PowerShell. [鈥 instead of having array literals, there鈥檚 a set of operations that create collections as needed.

I think the point is that you don't need a notation (beginning char sequence + closing char sequence) to create an array in PowerShell, for example, 1,2,3,4 is defining an array. But PowerShell does have ArrayLiteralRule in parser and ArrayLiteralAst.

The situation changes with a cast

The ArrayLiteralAst is embedded in the ConvertExpressionAst:

{ [int[]] (1, 2, 3) }.Ast.EndBlock.Statements[0].PipelineElements[0].Expression.Child.Pipeline.PipelineElements[0
].Expression.GetType().Name
ArrayLiteralAst

Thanks, @daxian-dbw, that's great to know.

Given the potentially problematic behavior of something like @([int[]] (1, 2)) (the only case where @() doesn't output an [object[]] array), is it worth creating a new issue for that?

It's hard to imagine that anyone would have relied on that behavior, and eliminating it makes for a more predictable environment.

Here is the code that handles this special case. In my understanding, this happens only in one of the following two conditions:

  • the expression is a ConvertExpressionAst or a ParenExpression that wraps a ConvertExpressionAst, and the convert-to type is an array.
  • the expression is an ArrayLiteralAst or a ParenExpression that wraps an ArrayLiteralAst.

In the first case, since the conversion is explicitly specified, it's very unlikely the resulting array will again be used as object[]. In the second case, the resulting array from ArrayLiteralAst is already object[] so it doesn't matter. Plus, altering this behavior would be a breaking change. Given the above, I prefer to not make a behavior change. But feel free to notify the PowerShell-Committee if you want a further discussion.

Thanks, @daxian-dbw.

I can't imagine much existing code breaking, but, conversely, I can't imagine many people running into a problem with the current behavior in real-world scenarios, so, personally, I'm happy to leave it at that; perhaps @PetSerAl feels differently.

@daxian-dbw, how about this:

$a = 1..3
$b = @([object[]]$a)
[object]::ReferenceEquals($a, $b)
$b[1] = 123
$a[1]

Now $b is not copy of $a but is $a. Also if current behavior will be kept, is not ArrayExpressionAst.StaticType should be adjusted to match current behavior?

@PetSerAl:

Excellent points - I now agree that this should be fixed, so I've created #4280.

Now $b is not copy of $a but is $a

Yes, $b and $a point to the same array in this case. But I don't think the array expression in powershell language is defined to always return a new array object.

Is not ArrayExpressionAst.StaticType should be adjusted to match current behavior?

Good catch. It is inconsistent and should be fixed. But IMHO, the fix should be to change ArrayExpressionAst.StaticType to always return System.Array (also a breaking change :)). StaticType is by definition not accurate, because sometimes the actual type can be known only at runtime, for example, BinaryExpressionAst.StaticType returns System.Object when it's -bor, and the actual type may vary at runtime.

@daxian-dbw But where I can see that definition of array expression? Before now I always expect it to return new array object each time. I myself use $b = @($a) (without cast although) as array copy operator, and I really do not like if it stop copying array at some point in the future.

You can find the Windows PowerShell Langauge Specification Version 3.0 here. It hasn't been updated for a while, so new language features like DSC or PowerShell Class are not in it, but I believe the content in the specification should still apply to PowerShell 6.0.

Was this page helpful?
0 / 5 - 0 ratings