Powershell: Set-Variable / New-Variable shouldn't wrap numeric literal arguments in [psobject] instances that preserve the original string representation

Created on 18 Mar 2019  路  23Comments  路  Source: PowerShell/PowerShell

Primarily for the benefit of calling _external programs_, PowerShell automatically wraps unquoted literal arguments that initially parse as _numbers_ in a [psobject] instance that preserves that argument as-is via its .psobject.ToString() method, so that they can be passed as such to external programs.

[_Update_: REVISION PENDING - THIS ISSUE DISCUSSES PROBLEMATIC BEHAVIOR, BUT WHAT IT ASKS FOR IS MISGUIDED.]

By contrast, this behavior is unexpected and confusing in the context of creating variables with Set-Variable / New-Variable.

For background, see this Stack Overflow answer.

Distantly related: #5579

Specifically, one would expect the following two statements to be equivalent, but they're not:

$num = 0xa
Set-Variable num 0xa

Steps to reproduce

$num1 = 0xa
Set-Variable num2 0xa
$num1; $num2
'{0:N1}' -f $num1; '{0:N1}' -f $num2 

Expected behavior

10
10
10.0
10.0

Actual behavior

10
0xa
10.0
0xa

That is, the Set-Variable call created a _mostly_ invisible [psobject] wrapper around the $num2 value that preserved the exact representation of the (implied) unquoted, literal -Value argument, and that wrapper unexpectedly affects later stringification of the value.

Environment data

PowerShell Core 6.2.0-rc.1
Windows PowerShell v5.1 
Issue-Question

Most helpful comment

I don't think there's anything wrong with making that change. After all, it would align Write-Output's behaviour more closely to the raw $PSCmdlet.WriteObject() which I think is a good thing as it means less nuances to worry about between cmdlets and functions.

All 23 comments

Should one expect that this two are equivalent?

$num = (write 0xa)
Set-Variable num (write 0xa)

Yes, @PetSerAl - and they currently are, from what I can tell.

Unfortunately, they are equivalent in the wrong way: $num = (Write-Output 0xa) too preserves the half-invisible [psobject] wrapper, so it looks like the problem goes deeper.

The problem is Write-Output itself, which itself outputs a [psobject]-wrapped instance.

I would expect the following to work the same:

0xa                 # -> 10
Write-Output 0xa    # -> 0xa !!

More generally, the problem affects all functions / cmdlets that pass arguments through that were bound to untyped or [object] / [psobject] parameters:

& { param($foo) $foo } 0xa  # -> 0xa !!
& { param([int] $foo) $foo } 0xa  # -> 10 - due to explicit (re)-typing

Seems like the issue comes from the parameter binding code. You should be able to verify that with Trace-Command, I would think?

@vexx32 Numeric literals within command ASTs are wrapped in a PSObject when they are compiled.

https://github.com/PowerShell/PowerShell/blob/e91d6dcf56e4bbd406a144771979f77d13046856/src/System.Management.Automation/engine/parser/Compiler.cs#L3585-L3590

My understand is that this is done because the compiler can't tell if a parameter is actually typed as a numeric. So if the parameter is typed as a string for instance, it would be unexpected to receive the string 15 instead of 0xf. The problem we're seeing here is that when the parameter is typed PSObject, it's just passing the wrapper made by the compiler.

Maybe parameter binding should clear PSObject.TokenText when passing to a parameter typed as PSObject

Thanks, @vexx32 and @SeeminglyScience.

While this preserve-the-original-string-representation behavior is arguably useful when calling _external programs_ - where arguments of technical necessity - must be passed as _strings_ (e.g., bash -c 'printf %s $1' - 0xa) - I think in the realm of calling _PowerShell_ commands this behavior ultimately creates more confusion than it is being helpful.

The behavior is most likely to surface with Write-Output - instantly, if you output to the console - and Set-Variable / New-Variable, where such a wrapped argument is output / stored _as-is_.

It's especially tricky with Set-Variable / New-Variable, where the surprising stringification behavior may not surface until much later, but it can also affect custom functions with untyped / [object] / [psobject] parameters, as shown.

A clean separation would be to only ever apply the behavior to external-program invocations.

Do you think that's feasible, @SeeminglyScience?

The caveat is that while I don't think existing code would rely on the behavior in the context of Set-Variable / New-Variable, changing Write-Output behavior is a much more public change: Write-Output 0xa, when printed to the console, would then yield 10, and no longer 0xa.

@mklement0

A clean separation would be to only ever apply the behavior to external-program invocations.
Do you think that's feasible, @SeeminglyScience?

Pretty significant risk imo. I think any of these are likely to come up as an issue somewhere:

function Write-String {
    param([string] $Arg)
    $Arg
}

Write-String 0xfeed
Write-String 10l
Write-String 10ul
Write-String 10u
Write-String 10us
Write-String 10s
Write-String 0b011111111 # Granted this one depends on a pending PR from @vexx32

I think it would be better to clear PSObject.TokenText at the time of a successful parameter binding - if the parameter is typed as anything other than string.

Edit Also:

I think in the realm of calling PowerShell commands this behavior ultimately creates more confusion than it is being helpful.

I think it's significantly more likely that the user is not aware of hex/explicit type postfix/etc syntax and instead are using it accidentally when trying to pass a string.

Thanks, @SeeminglyScience - tying and limiting the wrapping behavior to [string]-typed parameters makes sense and makes for a simple rule to remember.

It would change the behavior of Write-Output, though, correct?
I'm personally fine with that, but not everyone may be.

I don't think there's anything wrong with making that change. After all, it would align Write-Output's behaviour more closely to the raw $PSCmdlet.WriteObject() which I think is a good thing as it means less nuances to worry about between cmdlets and functions.

Good point, @vexx32.

Now that we've worked out the crux of the issue, it's worth opening a new issue focused on it and close this one.

I'm happy to do it - unless you prefer to, @SeeminglyScience, given that you can provide more technical detail.

@mklement0 - external programs have little to do with the implemented behavior. In fact, it was Exchange that asked for unquoted tokens to be passed through as-is, so e.g. if you wanted to add a user, Add-User 0xC00L ..., you aren't adding user 3072.

@SeeminglyScience - I agree any change here is risky. For example, you've ignored [object] parameters that might eventually convert the value to a string.

As I see it, there are limited options:

  • Require all strings to use quotes - a horrible shell experience.
  • Maintain the token text for any token that might not round-trip - potentially very expensive for limited gain.
  • Compromise and accept some limited confusion.

Thanks, @lzybkr.

  • To me, the current behavior makes the most sense for external programs, where PowerShell has no way of knowing what the target type is.

  • Exchange cmdlets, at least in their current incarnation, don't seem to be using [object] (or [psobject]) parameters; e.g., New-ADUser's -Name parameter is [string]-typed, and with @SeeminglyScience's proposed fix (keep the current behavior only for [string]-typed parameters), these would continue to work.

    • I haven't looked at all cmdlets, however, so I presume there are - or at least used to be - [object] parameters somewhere, otherwise they wouldn't have asked for it.
  • Generally, given that an [object] / [psobject]-typed parameter retains its _inferred_ type, I think it's reasonable to make it exhibit _only_ that behavior, so if we didn't have to worry about backward compatibility, limiting the retain-the-string-representation behavior to [string]-typed parameters in PowerShell and for all external programs would make the most sense to me.

With backward compatibility a concern, I agree that the options you mention (apart from doing nothing) are impractical.

However, how about the following ones, going back to the surprising behavior that prompted creation of this issue?

  • Special-case Set-Variable to discard the cached string representation, so as to make its behavior consistent with direct assignment.

    • I think it's reasonable to assume that in the context of Set-Variable users are in more of a programming-language than in a shell mindset.
  • and/or ignore the cached string representation in expression contexts such as with -f, which would be consistent with its already being ignored in an explicit .ToString() call and during string interpolation:

Special-case Set-Variable to discard the cached string representation, so as to make its behavior consistent with direct assignment.

Returning to

$num = (write 0xa)
Set-Variable num (write 0xa)

case. How should Set-Variable know, that is should no unwrap here to be consistent with direct assignment? Or more general case:

$num = $a
Set-Variable num $a

Set-Variable already consistent with direct assignment: if you start with the same value, then you will get same value assigned to variable. Can you describe the change to Set-Variable that will not break this?

If you start with different values, due to different parsing modes, then it is reasonable, IMHO, that you will end up with different results. And difference between expression and arguments parsing modes is not going anywhere, so you always have to keep it in mind. Adding additional exceptions on top of it just make whole system more complex: harder for newcomers to grasp, more corner cases to handle.

Also for general unwrapping [object] and [psobject] parameters, should all this change their behavior?:

ForEach-Object { [string]$_ } -InputObject 0xa
Invoke-Command { param([string]$s) $s } -ArgumentList 0xa
Start-Job { param([string]$s) $s } -ArgumentList 0xa | Receive-Job -Wait -AutoRemoveJob
Set-Item Env:Var -Value 0xa
Set-ItemProperty HKCU:\Software Var -Value 0xa -Type String
function ToString { param([string]$s) $s } function ToStringWrapper { ToString @args } ToStringWrapper 0xa

A quick aside, @PetSerAl: Please don't use alias write for Write-Output in examples, because it isn't defined on Unix-like platforms.

Leaving backward compatibility aside for a moment, @PetSerAl:

How should Set-Variable know, that is should no unwrap here to be consistent with direct assignment?

It shouldn't have to know, nor should anyone else _after parameter binding_.

Adding additional exceptions on top of it just make whole system more complex: harder for newcomers to grasp, more corner cases to handle.

True: The awkward crutch of parsing something as a number first and then having it _half-pretend_ that it's a string, situationally, obscurely, should be ditched as early as possible.

The solution is to once-and-for-all decide at the time of parameter binding what type a given argument is and _discard the cached string representation_ once that decision is made.

Also for general unwrapping [object] and [psobject] parameters, should all these change their behavior?:

Based on the proposal to keep the current behavior (only) with [string]-typed parameters, none of these would change.

Similarly, with that proposal the asymmetry between $var = ... and Set-Variable var would automatically go away:

Write-Output and Set-Variable would ditch the cached string representation on binding to their [psobject] / [object]-typed parameters (shout-out to #5551), and $var would receive the "pure" number in all 3 scenarios:

# Would all behave the same if only [string]-typed parameters cached the 
# string representation.
$var = 0xa
$var = (Write-Output 0xa)
Set-Variable var 0xa

Based on the proposal to keep the current behavior (only) with [string]-typed parameters, none of these would change.

ForEach-Object -InputObject is [psobject]
Invoke-Command -ArgumentList is [object[]]
Start-Job -ArgumentList is [object[]]
Set-Item -Value is [object]
Set-ItemProperty -Value is [object]
$args is [object]

None of them are [string].

@PetSerAl: All the specific uses of these in your previous examples expressly used [string], and it was these uses I was referring to.

But to the time you reach [string] point value would be already unwrapped, thus it would not be possible to see original string 0xa instead of integer 10.

Good point, @PetSerAl, I hadn't considered that.

I have already been learning nearly every day that when a value needs to be numeric and its not part of the 'expression mode' logic, its best to wrap it in parenthesis. Its a simple enough rule.

So these are equivalent: (Guaranteed, internal or external, won't matter)

$num = 0xa
Set-Variable num (0xa)

Consider if you changed the input to:

$num = -10
Set-Variable num -10

Is -10 a parameter name, a string value, or a numeric value? For cmdlets and functions that do not bind to a numeric value type, it will be a string (as it cannot be a parameter name for a cmdlet or function, best I can determine). But put parenthesis around it and its guaranteed to be a numeric, not a parameter name, or a string, unless it binds to a string, in which case it will then be reformatted by .ToString().

Sample of external access: (Windows 'more')

more 10.0000
# tries to find file '10.000'
more (10.000)
# tries to find file '10'

Also consider if you would want the same behavior for numeric formats such as +10. Its a numeric in expression mode, but a string in argument mode. Also note that numeric formats such as 10kb are also affected in argument mode. Again, as long as its bound to a numeric type, it will be resolved as numeric, but otherwise it stays as a string, but (10kb) will guarantee its converted to numeric.

Imagine the following:

function bob ([string]$a) {$a}
function hello ($a) {bob $a}
hello 10kb

PowerShell will not know at the call to 'hello' that the value will be treated as a string, but later when 'hello' calls 'bob', its finally able to unwrap it to a string type. Only its funny that if you put quotes around it, {bob "$a"} it converts it to a numeric value removing the original string content. This is more alarming to me!

True, @msftrncs, wrapping in parentheses can always be used to disambiguate.

Again, as long as its bound to a numeric type, it will be resolved as numeric, but otherwise it stays as a string,

No: a literal token that would become a number in expression mode also becomes a number in argument mode, _even if it is bound to an untyped / [object] / [psobject] parameter_, and that's where the confusion starts.

The only difference is that such a number parsed in argument mode _stringifies_ to its original form.

Note the docs also incorrectly suggest that unquoted arguments in argument mode are always _strings_; from about_Parsing, emphasis added:

In argument mode, each value is treated _as an expandable string_

To demonstrate that this is not true:

# Script block with untyped parameter
PS> & { param($p) $p + 1 } 1kb
1025  # 1kb became [int] 1024

My (limited) understanding is that the to-number conversion happens _before_ the target command's parameters are even being consulted - and that's where the need for the "dual nature" of such arguments comes from: it is now a number that must _situationally_ act as if it were a string in its original form.

There may be a good reason for this up-front typing, but it's not obvious to me.


if you put quotes around it, {bob "$a"} it converts it to a numeric value removing the original string content. This is more alarming to me!

Indeed: the stringification currently doesn't work the way it should _in PowerShell code_, as my previous examples also showed:

  • "$var" stringifies to the _default_ number representation, as you point out, not the cached string representation.
  • so does $var.ToString()

Except via the formatting system (by default to-console output or via Out-* / Format-* calls), the only way to get the cached representation in an expression is to call .psobject.ToString() rather than .ToString():

PS> & { param($p) "$p"; $p.ToString(); $p.psobject.ToString() } 1kb
1024  # !!
1024  # !!
1kb

Conversely, the cached string representation does kick in with -f, as shown in the OP:

& { param($p) "{0:N1}`n{1:N1}" -f $p, $p.psobject.baseobject } 1kb
1kb  # !! number formatting ignore; cached string representation used.
1,024.0  

If you call [string]::Format() directly instead of -f, the variable again acts as number:

PS> & { param($p) [string]::Format('{0:N1}', $p) } 1kb
1,024.0 

In short:

  • In _PowerShell code_, the cached string representation is currently more of a hindrance than a feature.

  • By contrast, in _cmdlets_, both object and PSObject-typed parameters (which both effectively receive a PSObject instance) do consistently see the cached string representation with .ToString().

@lzybkr

@SeeminglyScience - I agree any change here is risky. For example, you've ignored [object] parameters that might eventually convert the value to a string.

Yup I incorrectly assumed the psobject would be unwrapped similar to when used for CLR method parameters typed as object. It does make sense that that logic wouldn't extend to command parameter binding though.

That definitely complicates things, I no longer recommend my original suggestion. Instead, maybe it should just be documented.

Thanks for the discussion, it cleared some conceptual fog for me.

Thanks to @PetSerAl's illuminating examples, it's clear that we can't change the current behavior without breaking things.

However, arguably the following _fixes_ to the behavior of PowerShell code are called for, for consistency - though I suspect no one feels any urgency to make them:

  • "$v" and $v.ToString() should use the cached string representation.
  • Set-Content should use the cached string representation (the way Out-File / > and the formatting system in general already do).

The behavior with -f is debatable - either behavior is defensible.


In the spirit of #6745 and @PetSerAl's comment:

Adding additional exceptions on top of it just make whole system more complex: harder for newcomers to grasp, more corner cases to handle.

In that vein I wish what the documentation states _were_ true:

In argument mode, each value is treated as an expandable string

That is, a simple rule would be that everything unquoted (that isn't expression-based) is a _string_, _until instructed otherwise_, either by eventually explicitly binding to a specifically typed parameter or implicitly through the usual type conversions in expressions.

As an _optimization_ technique only, so as not to duplicate parsing effort, it may make sense to _reverse_ the logic and package the string with a cached number instance, if it was found to be a potential number literal during parsing, to be used if and when to-number conversion is called for. But this would have to be a mere _implementation detail_ that must never peek from behind the curtain.

The current behavior amounts to an odd blurring of the lines between argument mode and expression mode.

As an aside: Somewhat ironically, if things worked this way, it would amount to the opposite of what I (misguidedly) originally asked for; that is, to then truly create an [int]-typed value with Set-Variable, the argument would have to be enclosed in (...): Set-Variable num (0xa); otherwise $num would receive a _string_ value.

Was this page helpful?
0 / 5 - 0 ratings