Powershell: Negative hex literal int16 throws parse error

Created on 10 Feb 2020 · 12Comments · Source: PowerShell/PowerShell

Steps to reproduce

0xACEDs

Expected behavior

-21267

Actual behavior

ParserError:
Line |
   1 |  0xACEDs
     |         ~
     | The numeric constant 0xACEDs is not valid.

Environment data

Name                           Value
----                           -----
PSVersion                      7.0.0-rc.2
PSEdition                      Core
GitCommitId                    7.0.0-rc.2
OS                             Microsoft Windows 10.0.18362
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

Issue-Bug Resolution-Fixed WG-Engine

Source

SeeminglyScience

👍2

Most helpful comment

Yeah, sorry about pasting the wrong result in my previous comment- fixed now.

I wouldn't call it surprising though,

It's surprising if you come from a C# background (or JavaScript, or Perl, or Bash, or ...), where hex literals are always _positive_ numbers, with _unsigned_ types chosen as needed (0x80000000 -> 2147483648 ([uint32]).

But that was really just an aside, given that that ship has sailed a long time ago.

As is hopefully clear from my previous comment, I fully agree that parsing something like 0xACEDs should work and should return -21267.

mklement0 on 11 Feb 2020

👍3

All 12 comments

The same in 7.0.0-preview.1.
I wonder how we lost the test.

iSazonov on 11 Feb 2020

I could be misremembering, but to my memory, this behaviour was asked for by the PS team when I was implementing this feature.

I didn't agree with it then, I don't agree with it now, and I'd very much like to have this work. IIRC the reasoning back then was effectively "we shouldn't have hex parsing differ based on the type suffix" or something along those lines. It's been a while, but that's what I can remember at the moment. The current parsing behaviour for hex literals mimics the pre-existing behaviour for parsing hex, where originally the int.Parse() and long.Parse() methods were used, which came with an implicit width restriction/assumption.

Given that yes, these will cause parse errors otherwise, I'm more than happy to go back and fix it if that's something folx want. It's certainly a more useful feature that way. 🙂

vexx32 on 11 Feb 2020

@vexx32 Can you point the PR or comment?

iSazonov on 11 Feb 2020

Would have been in #7993 somewhere I'd imagine. 🙂

vexx32 on 11 Feb 2020

Historically (and somewhat counter-intuitively), suffix-less hex. literals are always parsed into _signed_ types, automatically chosen as either [int] or [long], depending on whether the _bit pattern_ implied by the hex digits fits, which leads to surprising behavior such as the following:

PS> 0x7fffffff
 2147483647  # [int]

PS> 0x80000000
-2147483648  # bit pattern still fits into [int], but the result is now  a *negative* number

Now, if I'm _explicitly asking_ for a _signed_ type with suffix S, I see no reason why the same bit-pattern logic should _not_ apply:

PS> 0x7fffS
32767

PS> 0x8000S  # !! BOOM - even though the bit pattern clearly fits. 
The numeric constant 0x8000S is not valid.

0x8000S should be the equivalent of [int16]::Parse('8000', 'AllowHexSpecifier'), which correctly yields -32768 (and [int16]::Parse('ACED', 'AllowHexSpecifier') yields -21267, as requested in the OP).

mklement0 on 11 Feb 2020

Historically (and somewhat counter-intuitively), suffix-less hex. literals are always parsed into _signed_ types, automatically chosen as either [int] or [long], depending on whether the _bit pattern_ implied by the hex digits fits, which leads to surprising behavior such as the following:
PS> 0x7fffffff
 2147483647  # [int]

PS> 0x80000000
-1  # bit pattern still fits into [int], but the result is now  a *negative* number

The latter returns -2147483648 FYI. 0xFFFFFFFF is -1. I wouldn't call it surprising though, all literals are signed by default.

Edit: Marking off-topic since it isn't really related to the issue.

SeeminglyScience on 11 Feb 2020

Yeah, sorry about pasting the wrong result in my previous comment- fixed now.

I wouldn't call it surprising though,

But that was really just an aside, given that that ship has sailed a long time ago.

As is hopefully clear from my previous comment, I fully agree that parsing something like 0xACEDs should work and should return -21267.

mklement0 on 11 Feb 2020

👍3

I have an implementation-based question... Currently whether a literal is considered signed or unsigned is literally a single boolean toggle switch when parsing the number.

Do you think it makes sense to just assume a given hex literal is unsigned if the target type (specified by the type suffix) is an unsigned type (byte, ushort, ulong)?

vexx32 on 12 Feb 2020

That makes sense to me, @vexx32; not having looked into the current implementation, I'm asking innocently: wouldn't <targetType>.Parse(<input>, System.Globalization.NumberStyles.AllowHexSpecifier) give us that desired behavior? (In the case of U you'd have to try UInt32 first, and then retry with UInt64).

mklement0 on 13 Feb 2020

Yeap, but on the whole when working in that area of the code I found it was much easier to handle everything via BigInteger (where you have full control of whether a given literal is considered signed or not) than shuffling into different variables constantly, because as you alluded to, there are some cases where you have to automatically figure out the right type to use -- and when there's no suffix specified, it can be a bit tricky to try to juggle everything in separate types. 🙂

vexx32 on 13 Feb 2020

I have submitted #11844 which will adjust the hex parsing to allow literals as described. 🙂

vexx32 on 13 Feb 2020

👍2

:tada:This issue was addressed in #11844, which has now been successfully released as v7.1.0-preview.2.:tada:

Handy links: