Powershell: Is there a good reason not to allow direct access to properties whose name starts with a digit?

Created on 28 May 2018  路  11Comments  路  Source: PowerShell/PowerShell

Currently, you cannot access an object's property if that property's name happens to start with a (decimal) _digit_:

$obj = [pscustomobject] @{ '1a' = '1a''s value' }

 # !! BREAKS, because the property name starts with a digit.
$obj.1a

Note: It does work with an _all-digit name_, such as 111, as explained below.

The error message you get is:

...
Missing property name after reference operator.
...

The current _workarounds_ are to either use _quoting_ or to _indirectly_ reference the property via a _variable_ containing the name:

# OK: Quote the property name
$obj.'1a'
$obj."1a"

# OK: Access the property via  a *variable*
$propName = '1a'
$obj.$propName

Note: $obj.{1a}, as mentioned by @george-chakhidze below, happens to work too, but it is an accidental by-product of script-block stringification.

Is there a good reason _not_ to allow direct use of the literal property name?
In other words: why shouldn't $obj.1a work as-is?

(By contrast, there are property names that do invariably require quoting, such as
([pscustomobject] @{ 'a.b' = 'a.b''s value' }).'a.b')

Environment data

Written as of:

PowerShell Core v6.1.0-preview.2
Issue-Question Resolution-Answered WG-Engine WG-Language

All 11 comments

$obj.{1a}

is OK too.

@george-chakhidze $obj.{1a} works because the string value for the scriptblock [1a} is '1a'. So it's essentially equivalent to$obj.'1a'`.

@mklement0 It's not allowed because it's sufficiently anomalous that it's probably a mistake. Cat's walking on keyboards should not produce valid programs. If you _really_ want to do it, then you have to make it explicit with quoting as in the case of this StackOverflow question (I assume this is what prompted the question?)

Now as it happens, $foo.123 works fine. This is because we allow expressions of the LHS of . and, of course, 123 is a legal expression.

@BrucePay:

Good to know that $obj.{1a} only works _accidentally_, as a by-product of script-block stringification (I've updated the original post).

... as in the case of this StackOverflow question. I assume this is what prompted the question?

Yes.
And it is tangible evidence that not only cats walking on keyboards produce such property names, especially in the world of JSON.

It's not allowed because it's sufficiently anomalous that it's probably a mistake.

If that is truly the intent - and I don't think it should be - please provide a more meaningful error message.

Now as it happens, $foo.123 works fine. This is because we allow expressions of the LHS of . and, of course, 123 is a legal expression.

Yes, as it _happens_ - and I suggest not relying on things that _happen_.

Let me ask the opposite question: with respect to _property access_ on _objects_, what good reason is there to interpret what's to the right of . as anything other than a property _name_, i.e., a _string_ (leaving property _paths_ aside)?
(Hashtable keys are a different story).

@mklement0 The issue in the stackoverflow question was not difficulty in figuring out that quoting the anomalous property name was required. The problem was that there was a _typo_ in that name. Quoted or not made no difference since the property didn't exist.

especially in the world of JSON.

Interestingly, JavaScript doesn't allow property names to start with a number either. You have you have to use indexing instead. That would seem to discourage the use of such names in JSON.

If that is truly the intent - and I don't think it should be - please provide a more meaningful error message

As one of the people making that decision and the person who wrote the original code, I can say - yes, that is the intent. The error message is currently that the property is missing. I suppose we could change it to "missing or invalid".

Yes, as it happens - and I suggest not relying on things that happen

Poor phrasing on my part. It's not an accident, it's the expected consequence of the explicit decision to allow expressions on the LHS of '.', which in turn was done, in part, to deal with scenarios where objects might have non-conforming property names. As is the case in the stackoverflow question.

@BrucePay:

The problem was that there was a _typo_ in that name.

Yes, the issue turned out to be a typo, but the fact remains that (a) digit-prefixed property names do occur, and (b) you do need explicit quoting to access them, and that may be surprising.

Interestingly, JavaScript doesn't allow property names

True, but JSON has long since outgrown use in the context of JavaScript _only_.

And, given that var o = JSON.parse('{ "1a": "value" }') works just fine even in JavaScript, you could similarly ask why JavaScript doesn't allow o.1a and instead requires o['1a'] - but that's obviously a separate debate.

The error message is currently that the property is missing. I suppose we could change it to "missing or invalid".

That would certainly help, though to be truly helpful it would have to say something along the lines of "when in doubt, _quote_ the name".

which in turn was done, in part, to deal with scenarios where objects might have non-conforming property names. As is the case in the stackoverflow question.

Yet, typo aside, it is that very decision - treating what's to the right of . as an _expression_ ONLY - that _prevents_ the use of the non-conforming property name at hand.

Note that _falling back to [string] interpretation_ - if interpretation as an _expression_ doesn't work - makes perfect sense, as _that is already happening for non-digit-prefixed barewords_ a.k.a regular property access:

$co.foo # .foo is *implicitly* treated like .'foo', because foo by itself is not an expression
$co.1a # Why shouldn't this be treated like .'1a', if it can't be interpreted as an expression?

With property access, the advantage of being able to use an expression to the right of . is _solely_ to calculate a _property name_ - so that (ultimately) treating what's to the right as anything other than a string makes no sense.

that (a) digit-prefixed property names do occur

Off the top of my head, I can't think of a single language that allows property names to start with a digit.

Note that falling back to [string] interpretation - if interpretation as an expression doesn't work - makes perfect sense, as that is already happening for non-digit-prefixed barewords a.k.a regular property access

That's not what happens. The parser looks for either a propertyname token or a unary expression. If a propertyname token is found, it is used to look up the property. If an expression is found, that expression is evaluated, the result is converted to a string by calling .ToString() and that string is used to look up the property.

that (ultimately) treating what's to the right as anything other than a string makes no sense.

So you're saying that in $x.abc+5 the property name should be abc+5?. That would make things rather awkward.

I can't think of a single language that allows property names to start with a digit.

We need to be careful to distinguish between _allowing the definition of_ a property with a name that starts with a digit _in principle_ on the one hand and allowing that name to be _accessed with a bareword_ on the other hand.

As the JavaScript example showed, JavaScript supports the former, but not the latter, and that is currently also true of PowerShell.

The overall question, however, is whether there is a good reason for this asymmetry.

PowerShell is already unusual in allowing _expressions_ in lieu of name literals as accessors, the way you describe.

So you're saying that in $x.abc+5 the property name should be abc+5

No, it should be parsed as $x.abc + 5, which is indeed what happens already.

$x.(abc+5) - explicit use of (...) - is a different story, and there the current behavior makes sense too: abc+5 is interpreted as a _command_ and causes an error (unless there is a _command_ literally named abc+5).

The parser looks for either a propertyname token or a unary expression

Well, 1a is, by the current rules, _neither_, so the question is what behavior makes sense in that case:

Since the result of the expression is converted to a _string_, as you state, so as to be interpreted as a _property name_, it seems to me that _giving_ up on something that looks like a unary numeric expression at first glance but then cannot be evaluated as such is unhelpful (especially with the current error message).

Use of a unary _numeric_ expression as a _property_ accessor seems unlikely to be helpful in most cases and _it's far from obvious that this kind of interpretation does take place_ - in contrast with more readily recognizable expressions such as $obj.$propName or $obj.('foo' + 2) -

I see the following ways to address this (there may be subtleties I'm missing):

  • (a) expand the definition of a property-name token to allow a leading digit.

  • (b) fall back to interpretation as a string literal if expression evaluation fails.

  • (c) report a _meaningful_ error (it seems that we've already found common ground there).

Even with (b) in place the behavior could still be surprising, however:

$o = [pscustomobject] @{ '1e2' = 'hi' }
$o.1e2  # !! $null, because `1e2` is interpreted as [double] 100

What muddies the waters is that for _hashtable keys_, which can be of _any type_, it _does_ make sense to distinguish between @{ '1e2' = 'hi' } ([string] key) and @{ 1e2 = 'hi' } ([double] key, interpretation as unary expression that is a _number literal_), although you could argue that @{ 1a = 'hi' } - currently a different, but similarly obscure error (Missing closing '}' in statement block or type definition) - too should fall back to interpreting 1a as a _string_.

With hashtables too, if I were to guess, many users are unaware that keys are parsed as _expressions_ if they don't start with a letter.

The ship has sailed, but I'd say that defaulting to _string-literal_ interpretation with _explicit opt-in_ to expression evaluation - similar to how $ and ( at the start of tokens in argument mode signal a non-literal argument - would have made for a model that's easier to grasp and remember.

I'm inclined to agree with that, really. It should default to string interpretation unless explicitly cast otherwise. However, it becomes tricky to distinguish there between what is intended to be a variable cast and what is simply part of a string.

If you allow expressions to be recognised as strings in hashtable literals, then how is it supposed to treat type casts correctly?

$TypeHash = @{
     [int]32 = "hello!"
}
# should this work?
$TypeHash.'[int]32'
# because it's not so simple to ensure that that's not the case if we just 
# say 'parse everything as strings, unless... unless what, exactly?'

Thanks, @vexx32.

Yes, bringing hashtables into the picture complicates the situation, especially since Powershell - generally commendably - makes it so easy to treat them the same syntactically and even to convert one type to the other.

As for syntax: The ambiguity would go away if you:

  • default to _string_ interpretation to the right of .

  • _except_ if the _1st char_ is either $ or (, in which case the variable reference / subexpression / expression should be evaluated and the _result as-is_ be used as a hashtable _key_ / the _stringified result_ be used as a _property name_.

Applied to your example:

$hash = @{ [int]32 = "hello!" }; $hash.([int] 32) # OK

$obj = (Get-Item /); $obj.('Na' + 'me')  # OK

In fact, all of the above _already works_, except that in the absence of either $ or (, a _digit too_ as the 1st char. to the . right too causes evaluation as en _expression_, and that's where the trouble starts.

Why? Because the parser only considers a token that starts with a _letter_ a _member-name_ (property-name) token, as the following example shows:

PS> (([system.management.automation.psparser]::Tokenize('$obj.a1', [ref] $null)).Type)[-1]
Member # OK, recognized as a member name:  'a1' starts with with a *letter*

PS> (([system.management.automation.psparser]::Tokenize('$obj.1a', [ref] $null)).Type)[-1]
CommandArgument # NOT recognized as member name: '1a' starts with a *digit*.

In the latter case - and for _any_ token that doesn't parse as Member - PowerShell then evaluates the token as a _unary expression_ (if I understood @BrucePay correctly).

In the case of _property_ access, that currently gives us counter-intuitive behaviors in two scenarios:

$obj = [pscustomobject] @{ '1a' = 'one A'; '1e2' = 'one E two'; '100' = 'hundred' }

# As detailed in the original post:
# An unexpected and confusing error, because 1a is interpreted as an *expression* and *fails*.
PS> $obj.1a
... Missing property name after reference operator. ...

PS> $obj.1e2
hundred # !! because $obj.1e2 turned into $obj.'100', 
        # !! given that 1e2 as an expression becomes [double] 100, which stringifies to '100'

In the case of _hashtable-key_ access, interpretation as a a _numeric_ expression makes comparatively more sense, but you could argue that even there, for any _non-string_ key, you should use:

  • either: $hash.([int] 32]), as shown (explicit expression use with (...))
  • or: [...]: $hash[[int] 32], [...] is an always-expression-only context

Limiting the syntax to that makes sense, especially considering that .-based hashtable-entry access is _syntactic sugar borrowed from PROPERTY access_ for the unambiguous original[...] syntax - the latter explicitly and unambiguously allows _only_ expressions.


In conclusion:

Changing the parsing to the right of . to require _explicit opt-in_ to _expression_ evaluation via $ or (:

  • is _probably fine_ for _property_ access, because it's hard to imagine anyone having relied on $obj.1e2 accessing property '100'.

  • is _problematic_ for _hashtable-key_ access, because existing code _does_ rely on being able to specify _numeric_ keys - see below.

Given the latter, the question is whether introducing divergent behavior (changing only _property_ access) is ultimately more confusing than helpful.

Here's an example where existing code relies on numeric keys with .:

$null = 'abc' -match '(b)'; $Matches.1
b  # OK  - currently the same as: $Matches[1]

$Matches is a hashtable with _integer_ keys containing the overall regex match (key 0), the 1st capture group (1), ...

If we changed the interpretation of .1 do refer to a _string_ key '1', the above code would break (you'd have to use $Matches[1] or $Matches.(1) instead).

I always wondered how $Matches managed to work almost like an array, but still function as a hashtable. Very interesting... I'll have to keep that in mind.

All said, I personally have no qualms with how it currently functions. It could be improved a little, perhaps, but it's pretty consistent as it is, and it does seem like changing the behaviour for, say, a specific use case, seems to be... Unwise, and liable to cause other problems.

I don't really see an elegant solution here.

@vexx32

I don't really see an elegant solution here.

Yes, unfortunately; arguably, the . syntactic sugar for hashtables keys should have applied _string-only_ interpretation from the start (and required differently typed keys to be accessed with [...] or even .(...)), but that ship has sailed.

say, a specific use case

It's more than a specific use case, but it is not common.

The best we can do under the circumstances is to provide a meaningful error message - see #6959

Was this page helpful?
0 / 5 - 0 ratings