Powershell: Restrict use of [ref] to variables

Created on 3 May 2018  路  22Comments  路  Source: PowerShell/PowerShell

For the ultimate resolution, see https://github.com/PowerShell/PowerShell-Docs/issues/2402

From what I understand, use of [ref] only makes sense when applied to a _variable_ or _parameter_ [variable].

Assuming this assumption holds, perhaps nonsensical uses such as [ref] 'foo' or [ref] $hashtable.key1 could be flagged as syntax errors.

The confusion that not preventing such pointless uses can create is exemplified by this SO question, in which the OP thought they could create a persistent reference to a specific hashtable entry as follows (simplified):

$Tree = @{ TextValue = "main"; Children = @() }
 # Mistaken attempt to create a "pointer" to a specific hashtable entry
$Pointer = [ref] $Tree.Children  # This should be prevented.
# Mistaken attempt to indirectly append  to $Tree.Children
$Pointer.Value += $Item

Environment data

Written as of:

PowerShell Core v6.0.2
Issue-Discussion Resolution-External

Most helpful comment

@yecril71pl ref just isn't sortable. It's probably doing ToString which will result in the same string for all ref's of the same type.

All 22 comments

[ref] works just fine with data structures:

PS[1] (584) > $x = [pscustomobject] @{a=@{b=2; c=3}}
PS[1] (585) > $r = [ref] $x
PS[1] (586) > $r.Value.a.abc = 123
PS[1] (587) > $x
a
-
{c, abc, b}

It does exactly what you would expect from other languages: it creates a _durable_ reference to a _specific_ instance. The SO item in question was using [ref] when they didn't need to not understanding that they already had a reference to the parent object and also not understanding how array concatenation is done in PowerShell. If they had assigned to an _element_ of the array it would have worked fine. But by appending an element, they created a new object which, of course, did not update the reference.

Thanks, @BrucePay, but this issue is not about how [ref] functions when it is used _as intended_, it is about the syntax not preventing _nonsensical uses_.

[ref] works just fine with data structures:

What you're demonstrating is not per se about _data structures_ (that aspect is incidental), you're demonstrating use with a _variable_, i.e., effectively creating a variable _alias_ (this is also covered in my answer to the SO question).

If they had assigned to an element of the array

Yes, it would have worked - but it also would have been _pointless_. Pointing to a piece of _data_ (a) only makes sense with instances of _reference types_ and (b) you can use a _regular_ variable to do that - using [ref] in this scenario adds nothing and only complicates matters.

Thus, my point was that using it with a [parameter] _variable_ is the _only_ use that makes sense
and that users can be spared confusion if the language itself prevents other uses.

Or am I missing other legitimate uses of [ref]?

Since I'm the OP of the item in question, I thought I'd provide a little more insight into what I was trying to do and how I got to the bad [ref] use. First, here's a simplified version of the code I was trying to use:

 $List = @()
 while (!($Result.EOF)) {
     $Pointer = [ref] $List
     foreach ($Field in $Result.Fields) {
         $Pointer.Value += @{ DataValue = $Field.Value; Children = @() }
         $Pointer = [ref] ($Pointer.Value[$Pointer.Value.Count - 1].Children)
     }
     $Result.MoveNext()
 }

I hope this code explains why I was trying to point to an array element instead of just using the parent variable containing the array.

Second, I did understand that there might be an issue with array concatenation, that is why I first attempted to test with a simple [ref] to an array variable. Seeing that it worked (but not knowing how the aliases worked), I made an incorrect assumption that I could create a persistent pointer to an array member. When that didn't work, I looked for an explanation at SO, which I received.

So, now I know that creating references to a piece of data has no practical uses, and I should be fine using [ref] going forward as long as I know what is just data and what is a reference to data (and also keeping in mind aliases and how += works on arrays).

@the-CPU1: Thanks for the explanation - what you did is an understandable thing to try, especially given that the language doesn't prevent you from doing so.

So, if there's consensus that applying [ref] only ever makes sense when applied to a _variable_, perhaps trying anything else can be flagged as a syntax error.
That would spare future users the confusion over why their code doesn't work as expected.

@mklement0

you're demonstrating use with a variable,

Bad example - how about this :-)

PS[2] (611) > $r = [ref] ([system.collections.generic.list[object]]::new())
PS[2] (612) > $r.Value.Add(1)
PS[2] (613) > $r.Value.Add(2)
PS[2] (614) > $r.Value
1
2

or this

PS[2] (626) > $rerr =  [ref] ([System.Collections.ObjectModel.Collection[System.Management.Automation.PSParseError]]::new())
PS[2] (627) > $null = [system.management.automation.psparser]::Tokenize("2 2 2", $rerr)
PS[2] (628) > $rerr.value

Token                                Message
-----                                -------
System.Management.Automation.PSToken Unexpected token '2' in expression or statement.
System.Management.Automation.PSToken Unexpected token '2' in expression or statement.

@the-CPU1

I could create a persistent pointer to an array member

You get a persistent pointer to the _data_ stored in in the array member. Getting a pointer to a specific location in memory is not supported in PowerShell.

I should be fine using [ref] going forward as long as I know what is just data and what is a reference to data

Everything in PowerShell is already a pointer (object reference) so the set of circumstances where [ref] is _needed_ is very small - basically with APIs that have In/Out/Ref parameters. COM APIs in particular tend to have out parameters, but, as the example above shows, it can be necessary even with PowerShell APIs.

@BrucePay:

$r = [ref] ([system.collections.generic.list[object]]::new())

That's an example of _pointless_ use of [ref], because just using the reference _directly_ gives you the same functionality, and does so more simply:

$r = [system.collections.generic.list[object]]::new() 
$r.Add(1)
$r.Add(2)
$r  # prints the list

Unless I'm missing something, there is _no good reason_ to use [ref] in this scenario.


$rerr = [ref] ([System.Collections.ObjectModel.Collection[System.Management.Automation.PSParseError]]::new())

The [ref]'s purpose is to type the _variable_ - $rerr - and therefore better written as follows:

[ref] $rerr = ([System.Collections.ObjectModel.Collection[System.Management.Automation.PSParseError]]::new())

Or, to localize the by-reference passing:

```powershell

Declare as regular variable.

$err = ([System.Collections.ObjectModel.Collection[System.Management.Automation.PSParseError]]::new())

Pass with ad-hoc [ref] cast

$null = [system.management.automation.psparser]::Tokenize("2 2 2", [ref] $err)

$err - still a regular variable - was assigned a value in the method call.

$err
```

Note that this idiom is also the form found in the v3.0 language spec.

@BrucePay:

Everything in PowerShell is already a pointer (object reference) so the set of circumstances where [ref] is needed is very small - basically with APIs that have In/Out/Ref parameters.

This is what got me confused initially. Suppose we have a function:

function foo([ref] $a) { $a.value += 1 }

I was trying to see if something like this would work:

$b = @(0); foo ([ref] $b); $b

And it did. But then this call didn't work:

$c = @(@(0)); foo ([ref] $c[0]); $c[0]

And this doesn't work either:

$c = @(@(0)); $d = $c[0]; foo ([ref] $d); $c[0]

I thought that if first would work, so would the second one, and vice versa. I do understand why it didn't work in the second call - that's the way arrays and += work.

If I were to make a guess, I'd think that the first call was specifically coded for by the developers (aka aliases), since there might be a need for a user to pass an array to a function and then modify the size of that array. I'd also guess that the second call behaves "normally", as one would expect if there were no aliases.

I think you misunderstand the way the @() operator works. It does not always wrap the content in a new array. What it does is create an array if the content is a scalar value (or $null). If the content is already an array, @() is a no-op.

@the-CPU1:

@rkeithhill correctly points out that you have a misconception about @(), which, in short, is not an array _constructor_, but an array _guarantor_.

However, even if we construct the nested array the way you intended - i.e., using , , 0 instead of @(@(0)), your commands cannot work:

$c = , , 0; foo ([ref] $c[0]); $c[0]

The problem here is again that [ref] is being used with a _value_, not a _variable_.
You're passing a reference to the inner _array_, and _not_ a reference to the _location of that array within the value of $c_ - the latter cannot be done, because it would require an _additional_ level of indirection (and, as @BrucePay states, getting a pointer to a specific location in memory is not supported).

$c = , , 0; $d = $c[0]; foo ([ref] $d); $c[0]

This is basically the same scenario above, except that the by-reference passing works for the _intermediate_ variable $d, because there [ref] _is_ used on a _variable_ - but, again, it cannot refer to the location of the nested array inside $c.

I'd think that the first call was specifically coded for by the developers (aka aliases), since there might be a need for a user to pass an array to a function and then modify the size of that array.

Note that [ref] is not about _arrays_ specifically.
It's about passing _any_ variable _by reference_ to a method or function, typically so that the callee can modify it.

As @BrucePay states, you need [ref] to call .NET methods that have ref or out or in parameters - see the docs - or to call PowerShell functions that declare [ref] parameters, but that is rare.

And while you _can_ use [ref] to create an effective alias of another variable outside the context of parameter passing (e.g., $v = 1; $vAlias = [ref] $v; $vAlias.Value++; $v), that is even rarer.


@rkeithhill:

Just to be clear: @() with an array operand is _conceptually_ a no-op, but not _technically_: It actually _clones_ something that already is an array (the only exception being array _literals_ (explicitly enumerated elements such as 1, 2, 3), an optimization introduced in v5.1 - see #4280)

As an aside: While maintaining reference equality is rarely a concern in PowerShell, this cloning is problematic from a performance perspective.

An attempt to summarize and clear (at least my) conceptual fog (arrived at without source-code analysis; do let me know if and where I'm wrong):

  • The purpose of [ref] ([System.Management.Automation.PSReference]) is to enable passing PowerShell _variables_ _by reference_ to .NET method _parameters_ marked as ref/ out / in or, rarely, to PowerShell function parameters typed as [ref]

    • When used as such, a regular PowerShell variable is _directly cast_ to [ref], the variable is wrapped so that modifying the [ref] instance's .Value property is equivalent to assigning to the variable directly (the docs suggest that [ref] essentially wraps a [psvariable] instance in this case).

    • This indirect access to a variable _only_ works with a _direct cast_:

      • [ref] $var # OK
      • [ref] ($var) # !! Does NOT work
      • [ref] $ref = $var # !! Does NOT work
    • The conceptually cleanest idiom is:

      • Define a _regular_ PowerShell variable.
      • Cast it to [ref] _as part of the invocation only_; e.g.:
        [System.Management.Automation.PSParser]::Tokenize('foo', [ref] $err)
      • That way, the by-reference passing is a localized aspect of the given invocation; this mirrors C# usage, where you _must_ use the ref / out / in keywords on invocation.
    • Outside of this use, there's no good reason to use [ref]:

      • There is no point in using [ref] with a _value_ rather than a _variable_ (something other than ultimately a [psvariable]) - see below.

      • If you really want an alias variable [wrapper], use Get-Variable:
        $v = 666; $vObj = (Get-Variable v); $vObj.Value++; $v # -> 667


Why [ref] should not be used with _values_ (non-variables)

Note: There is one edge case: [ref] $null is useful for cases where you don't care about what the target method/function returns via the by-reference parameter; that said, you can conceive of $null as a _variable_ too (it certainly is that _syntactically_).

When you use [ref] with a _value_:

  • It obviously doesn't work with the cast-to-[ref]-on-invocation idiom, whose purpose is to pass a _variable_.
  • If you save a [ref] <non-variable> expression in a variable (e.g., $ref = [ref] (1, 2, 3)), you're effectively creating a more cumbersome analog to a regular PowerShell variable in that you must then use .Value to access the enclosed value.

    • While you _can_ then pass $ref to a ref / out / in parameter directly - in which case you _mustn't_ use [ref] on invocation - it leaves you with having to access the value returned via $ref.Value.

      • Again: the cast-to-[ref]-on-invocation idiom is superior in every respect: $var = 1, 2, 3 on _initialization_, then [ref] $var _on invocation_.

      • Outside the context of by-reference parameter passing, use of [ref] is pointless:

        • $ref = [ref] (1, 2, 3) # pointless; just use the expression result directly
        • It can lead you to mistakenly think that it's possible to create a reference to _properties inside other objects_ (the confusion that prompted creation of this issue).

Therefore, my preference is to disallow [ref] with a non-variable operand, but given that it _technically_ works, it would be a breaking change.


Get-Help about_Ref is currently a mixed bag:

  • It commendably shows only the cast-to-[ref]-on-invocation idiom.

  • The type's primary purpose - use with .NET ref / out / in parameters is not mentioned.

  • The description is confusing.


Re improving the documentation: please see https://github.com/PowerShell/PowerShell-Docs/issues/2402.

@mklement0

Note that [ref] is not about arrays specifically.
It's about passing any variable by reference to a method or function, typically so that the callee can modify it.

That is what I remember from my C days. In my first example above, inside the callee foo any manipulation of variable a is effecting caller's variable b, including moving it from one memory location to another. I think I have a better picture now.

@mklement0

It can lead you to mistakenly think that it's possible to create a reference to properties inside other objects

One more question on this:

$a = @{ Children = New-Object System.Collections.ArrayList }
$b = [ref] $a.Children
$b.Value.Add(1)

I understand that I can simply reference $a directly here, but for my purposes I was trying to create a function to create (and another one to traverse) a series of nested array lists (a tree-like structure). Even though this sort of scenario would be rare, and can probably be done without nesting, it was a quick and easy solution for me.

@the-CPU1:

In my first example above, inside the callee foo any manipulation of variable a is affecting caller's variable b

Indeed: [ref] $var is special in that it truly creates a reference to the variable _object_ behind the scenes, not its _present value_.


I understand the intent behind

$a = @{ Children = New-Object System.Collections.ArrayList }
$b = [ref] $a.Children
$b.Value.Add(1)

but the point is that the use of [ref] here creates a _pointless wrapper_. You can simply assign $a.Children directly to a regular variable and get the same effect:

$a = @{ Children = New-Object System.Collections.ArrayList }
$b = $a.Children  # No need for [ref] - obtain a reference to the array list
$b.Add(1)         # Operate on the array list directly.

Again, note that this only works because the value of the Children entry is an instance of a .NET _reference type_.
If it were an instance of a _value type_ (e.g., 666), this approach _fundamentally_ wouldn't work - whether or not you use the [ref] wrapper.

That last example explains a lot.

There really isn't a good reason to use [ref] outside of referenced parameters.

I sometimes use ref to force a value type to be a reference type. For example capturing a value from a child scope.

$innerVar = [ref] 0
& { $innerVar.Value = 10 }
$innerVar.Value
# 10

You could use a bunch of other things here like Nullable<> or even just wrapping it in a PSObject. But ref is nice and short, can be cast from anything, can hold anything, and the name fits.

@SeeminglyScience:

That's a good example in principle, but note that it isn't about _value_ types - it's about (conveniently) _modifying a variable in a parent scope_.

Without the scoping issue involved, again a simple variable will do - note the use of . rather than &, which creates _no_ child scope:

$var = 0
. { $var = 10 }
$var # 10

Because & creates a child scope, you do need an indirect reference, as demonstrated in your example.

Implementing the same thing without [ref] would indeed be clunky (though it has the advantage of not needing .Value afterwards):

$var = 0
& { (Get-Variable var -Scope 1).Value = 10 }
$var # 10

So, as long as about_Ref is updated to properly frame the two - disparate - cases in which [ref] makes sense - use with APIs, use a convenient Get-Variable alternative - perhaps that's all we need.

That's a good example in principle, but note that it isn't about value types - it's about (conveniently) modifying a variable in a parent scope.

Well, yes and no. This is just semantics but you aren't modifying the variable. The variable in the child scope is a different variable but it holds a reference to the same object or the value of a value type. I mentioned value types because if the variable instead held a reference type you could adjust it as you would in the parent scope (with the exception of replacing it entirely)

But with a value type you need to either change the value of the variable from the previous scope (like your example) or place it into a reference type.

More specifically I'd say it's useful for creating an explicit reference to an object.

@SeeminglyScience:

I see what you're saying and "it's useful for creating an explicit reference to an object" is a good summary.

As the whole discussion here shows, users need guidance with respect to the primary purpose of [ref] and the secondary one that you describe.

This guidance is missing from the documentation, so let's try to summarize in preparation for updating it:

  • The primary purpose of [ref] ([System.Management.Automation.PSReference]) is to enable passing PowerShell _variables_ _by reference_ to .NET method _parameters_ marked as ref/ out / in or, rarely, to PowerShell function parameters typed as [ref].

    • In this usage, [ref] is applied to a _variable_, and the resulting [ref] instance can be used to indirectly modify that variable's value.
  • Secondarily, you may also use [ref] as a general-purpose object holder.

    • In this usage, [ref] is applied to a _value_ (data) - typically an instance of a _value type_.
    • In many scenarios you can use a regular variable or parameter instead, but this approach is useful as a concise way to modify a (value-type) value in a descendent scope - without having to _explicitly_ pass a value holder (such as via a [ref] _parameter_).
      This technique is useful in scenarios where passing an explicit value holder is undesired (for brevity) or not possible (e.g., in script-block parameter values - see below).

Does that sound correct and comprehensive to you?


Your example inspired me to rethink a scenario in which I did use Get-Variable (clunkily) in the past:

If you use script-block parameter values, such as for calculating the value of the Rename-Item's -NewName parameter from each pipeline input object, such script blocks run in a _child_ scope, so modifying a variable in the _caller_'s scope directly is not an option (and neither is passing _arguments_ to the script block in this context).

I solved that problem with Get-Variable as follows (in this case, an index (sequence number) needed to be maintained in the caller's context):

$i = 0; $iVar = Get-Variable -Name i
Get-ChildItem -File $setPath | Rename-Item -NewName { ... $iVar.Value++ ...  }

But your technique enables a more elegant solution:

$iRef = [ref] 0
Get-ChildItem -File $setPath | Rename-Item -NewName { ... $iRef.Value++ ...  }

@mklement0

Does that sound correct and comprehensive to you?

Yes that is an excellent summary 馃憤

Thanks, @SeeminglyScience.

I've transferred the relevant information to https://github.com/PowerShell/PowerShell-Docs/issues/2402, so we can close this.

SORT returns bogus results with [REF]:

, (
([REF] 0, [REF] 1), ([REF] 1, [REF] 0) |
 % { , ($_, { $_ | SORT -T:1 | % VALUE }) } |
 % { , ($_[0]) | % $_[1] }
) |
 % { $_[0] | SHOULD -BE $_[1] }

Explanation for hoomans (in case any come around):

  1. Create two equal but differently ordered sequences of integers, wrapping each element into a reference!
  2. Pair each sentence with an instruction to extract the value of each of the the smallest elements of the result!
  3. Execute said instruction on each pair!
  4. Verify that both values are equal!

Is this a problem with SORT or a problem with [REF]?
Workaround:

, (
([REF] 0, [REF] 1), ([REF] 1, [REF] 0) | % { , ($_, { $_ | SORT VALUE -T:1 | % VALUE }) } | % { , ($_[0]) | % $_[1] }
) |
 % { $_[0] | SHOULD -BE $_[1] }

The workaround means to explicitly sort by value.
OTOH, if I replace SORT -T:1 with MEASURE -MIN, MEASURE correctly fails.

@yecril71pl ref just isn't sortable. It's probably doing ToString which will result in the same string for all ref's of the same type.

Was this page helpful?
0 / 5 - 0 ratings