Powershell: Introduce some kind of sanity to coercions

Created on 16 Oct 2018  路  9Comments  路  Source: PowerShell/PowerShell

In order to manipulate some config files, I need to deserialize some JSON that represents an array of strings, apply a transformation to each string, and serialize the resulting array back to JSON.

Most of my time on this task has been spent discovering the many wonderful surprises in Powershell's system of coercions between arrays of various lengths, non-arrays, and nulls.

#!/usr/bin/env pwsh
# Let's try an array with two elements
$x = '["foo", "bar"]'
$result = ConvertTo-Json ((ConvertFrom-Json $x) | % { "Hello, $_" })
echo $result
# [
#  "Hello, foo",
#  "Hello, bar"
# ]

# So far so good

# But what about a 1-element array?
$x = '["foo"]'
$result = ConvertTo-Json ((ConvertFrom-Json $x) | % { "Hello, $_" })
echo $result
# "Hello, foo"

# So apparently a 1-element array of strings just turns into a string

# 0-element arrays?
$x = '[]'
$result = ConvertTo-Json ((ConvertFrom-Json $x) | % { "Hello, $_" })
echo $result
# <no output>

# A 0-element array turns into $null, which in turn can't be serialized at all

I appreciate that a scripting language has slightly different concerns than a full blown programming language, but this is basically bizarro-world. I don't know what the solution is exactly, but a sense of consistency and type-safety in the primitive data structures is sorely missing when trying to use PowerShell for non-trivial tasks.

Area-Cmdlets-Utility Issue-Question

All 9 comments

It's well known that pipelines unwrap arrays and enumerate the contents. If you want to operate on the array itself, you will need... not use the pipeline -- that's what it's meant to do.

In your second example, if you force the data back into an array, it will convert to json as you expect:

PS> $x = '["foo"]'
PS> ConvertTo-Json @((ConvertFrom-Json $x) | % { "Hello, $_" })
[
  "Hello, foo"
]

And yes, an empty array when enumerated doesn't do anything. The same is true if you enumerate such an item in a more conventional manner.

If you wish to retain the original array structure you will need to manually enumerate the contents, just like you would for any other programming language. You wouldn't expect the enumerator variable in a foreach loop to contain an array value unless each individual entry in the array is itself an array -- the same is true of PowerShell's pipeline.

To add to @vexx32's comments:

In a nutshell, the price of using PowerShell's pipeline - both on input and output processing - is that you lose the distinction between:

  • a single item and single-element array
  • nothing / $null and an empty array

It is a slight irony that ConvertFrom-Json is _not_ a good pipeline citizen in that it actually sends arrays as a _single object_ through the pipeline as opposed to _enumerating its elements_, the way cmdlets usually do:

PS>  '[]' | ConvertFrom-Json | ConvertTo-Json
[]  # !! empty array was preserved

Because % (ForEach-Object) was also involved in your case, however, you saw the usual enumeration behavior; also note that even enclosing a command in (...) by itself forces enumeration too:

PS>  ('[]' | ConvertFrom-Json) | ConvertTo-Json
 # !! No output - (...) forced enumeration, so no element was sent to ConvertTo-Json

While you could argue that ConvertFrom-Json's current behavior is valuable for its ability to round-trip without loss of information, that ability is easy to disrupt, as demonstrated here, and overall it is more important for ConvertFrom-Json to exhibit _standard_ behavior - though a change may not be made for the sake of backward compatibility - see https://github.com/PowerShell/PowerShell/issues/3424

@vexx32 has already mentioned @(....), the array-subexpression operator that forces interpretation of a result as an array, unless it already is one.

ConvertTo-Json has recently gained a similar ability with its -AsArray switch:

PS> 1 | ConvertTo-Json -AsArray -Compress
[1]

Applied to your command:

PS> ConvertFrom-Json $x | % { "Hello, $_" } | ConvertTo-Json -AsArray
[
  "Hello, foo"
]

While this is not automatic round-trip behavior, at least you can predictably output JSON arrays if you know that they're expected.

@vexx32

You wouldn't expect the enumerator variable in a foreach loop to contain an array value unless each individual entry in the array is itself an array -- the same is true of PowerShell's pipeline.

In the example: $result = @("foo") | % { "Hello, $_" }, the enumerator variable is $_, not $result. I'm not asking $_ to be an iterable container, but for $result to be an iterable container, which is the behavior you see when you transform the elements of some container in basically every other programming language.

Here are some examples for you to try out for yourself:

C#:

```c#
var input = new [] { "foo" }
var result = input.Select(x => $"Hello, {x}")

JS:

```js
const input = ["foo"]
const result = input.map(x => `Hello, ${x}`)

Python:

input = ["foo"]
result = ["Hello, " + x for x in input]

Haskell:

input = ["foo"]
result = (\x -> "Hello, " <> x) <$> input

PHP, Java, Ruby, Clojure, etc. are left as an exercise to the interested reader.

And yes, an empty array when enumerated doesn't do anything. The same is true if you enumerate such an item in a more conventional manner.

Simply enumerating an array never "does" anything, regardless of how many items it contains. We're talking about two totally orthogonal things here. Iterating over an array and performing some side effects is just that: a sequence of side effects. It has no result value. Mapping an array using some transformation produces a transformed array and no side effects.

We can of course encode mapping of the array using incremental mutation of an output array. We'd implement this in let's say JS, like so:

const input = ["foo"]

const result = []
for (const x of input)
{
  result.push(`Hello, ${x}`)
}

// The result is:
// ["Hello, foo"]

The output is still, quite straightforwardly, an array. To precisely replicate the PowerShell behavior using a loop in another language takes a considerable amount of effort on our part:

const input = ["foo"]

let result = null
for (const x of input)
{
  // If this is the first element, we want to produce it as the result
  // directly
  if (result === null)
  {
    result = `Hello, ${x}`;
    continue;
  }

  // If this is the second element, we want to insert the first element and
  // the current element into an array
  if (!Array.isArray(result))
  {
    result = [result, `Hello, ${x}`];
    continue;
  }

  // If this is any other element, we want to append it to the array
  result.push(`Hello, ${x}`);
}

console.log(result)

// The result is:
// "Hello, foo"

In all your examples from other languages, this is only doable on values that are already collections at every part of the sequence.

PowerShell fundamentally isn't like that with its pipeline. Collections are broken apart, and there is fundamentally no difference between one item and a 1-length array.

Consider:

$Value = 1
$Result = $Value | ForEach-Object {$_}

$Value = @(1)
$Result = $Value | ForEach-Object {$_}

What you're suggesting is for both of these to result in arrays, even when the original item is not an array.

C#, Haskell, none of the other languages do this.

PowerShell, frankly, isn't particularly comparable to them in this regard. It is not C#, not F#, not Haskell. All languages have their nuances, and this simply is one peculiar to PowerShell.

If you require an array to be returned for whatever reason, then PowerShell allows you to specify this in the ways discussed above.

$Value = 1
$Result = $Value | ForEach-Object {$_}
...

What you're suggesting is for both of these to result in arrays, even when the original item is not an array.

Wrong. I am not suggesting that 1 | ForEach-Object {$_} should be an array. I am suggesting that @(1) | ForEach-Object {$_} should be an array. In other words, whenever it is the case that the output of $x.GetType() is:

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     Object[]                                 System.Array

then the output of $x | % {$_} should similarly have type Array. You can apply this check yourself to @(1) and 1 to see the difference.

In all your examples from other languages, this is only doable on values that can be confirmed to be collections at every part of the sequence.

This sounds like nonsense. There's no repeated confirmation of the input value being an array in any of the examples I posted above. [].map(x => x * 2).map(x => x * 3) works identically to [1].map(x => x * 2).map(x => x * 3) works identically to [1, 2, 3].map(x => x * 2).map(x => x * 3): all three examples produce an array.

Correct. All of those methods, however, are only valid on collections themselves. You cannot take a scalar value as you can in PowerShell and apply such a method.

PowerShell intentionally doesn't keep track of the fact that the original item is an array or not. I don't think there is a solution to your quandary, save keeping track of it yourself. 馃槃

To make this more precise, currently neither $Result in:

$Value = 1
$Result = $Value | ForEach-Object {$_}

$Value = @(1)
$Result = $Value | ForEach-Object {$_}

is an array. The second should be, the first should not. Since this isn't the case, it should at least be possible to construct your own map function to provide the desired behavior as @(1) ?? map(...) ?? map(...). If we try to use | we're foiled again, since PowerShell won't leave arrays alone.

So we should perhaps have some other operator that acts as simple function application to allow working around this behavior in userland libraries.

There is such a thing, sort of, and it's quite close to hand.

You're looking for .ForEach{} and .Where{} -- PowerShell magic methods. They can be applied to literally any object in PowerShell, just like their pipeline cousins. However, they too are not perfect.

Counter to the pipeline, using these methods will always result in a collection. Seems to be a generic collection, from what I can see.

Example:

PS> @(1).ForEach{$_}.GetType().FullName
System.Collections.ObjectModel.Collection`1[[System.Management.Automation.PSObject, System.Management.Automation, Version=6.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]

PS> (1).ForEach{$_}.GetType().FullName
System.Collections.ObjectModel.Collection`1[[System.Management.Automation.PSObject, System.Management.Automation, Version=6.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35]]

Although both cases require parentheses due to how PS handles parsing on decimal numerals (yes, that also is weird and strangely not exhibited by hexadecimal literals, but that's a story for another time), only the first is actually an array.

And yes, both result in a generic Collection[PSObject]. Effectively the inverse of the pipeline methods.

In this case, perhaps you could argue that PS could remember if the value was scalar and return a scalar value, but it's also not possible to know whether or not in the .ForEach{} method you opt to return multiple values for each input until the script is being executed... so, I suppose, it's the most elegant solution to simply return a collection.

You're looking for .ForEach{} and .Where{} -- PowerShell magic methods. They can be applied to literally any object in PowerShell

Small note here, they can also be applied to $null (e.g. $null.ForEach{}.GetType().FullName returns the same)

Also they can't be applied to objects that already have a Where or ForEach method (e.g. using ForEach on List<> will call List<T>.ForEach(Action<T>) instead of the magic method)

Was this page helpful?
0 / 5 - 0 ratings