0xACEDs
-21267
ParserError:
Line |
1 | 0xACEDs
| ~
| The numeric constant 0xACEDs is not valid.
Name Value
---- -----
PSVersion 7.0.0-rc.2
PSEdition Core
GitCommitId 7.0.0-rc.2
OS Microsoft Windows 10.0.18362
Platform Win32NT
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0鈥
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
WSManStackVersion 3.0
The same in 7.0.0-preview.1.
I wonder how we lost the test.
I could be misremembering, but to my memory, this behaviour was asked for by the PS team when I was implementing this feature.
I didn't agree with it then, I don't agree with it now, and I'd very much like to have this work. IIRC the reasoning back then was effectively "we shouldn't have hex parsing differ based on the type suffix" or something along those lines. It's been a while, but that's what I can remember at the moment. The current parsing behaviour for hex literals mimics the pre-existing behaviour for parsing hex, where originally the int.Parse()
and long.Parse()
methods were used, which came with an implicit width restriction/assumption.
Given that yes, these will cause parse errors otherwise, I'm more than happy to go back and fix it if that's something folx want. It's certainly a more useful feature that way. 馃檪
@vexx32 Can you point the PR or comment?
Would have been in #7993 somewhere I'd imagine. 馃檪
Historically (and somewhat counter-intuitively), suffix-less hex. literals are always parsed into _signed_ types, automatically chosen as either [int]
or [long]
, depending on whether the _bit pattern_ implied by the hex digits fits, which leads to surprising behavior such as the following:
PS> 0x7fffffff
2147483647 # [int]
PS> 0x80000000
-2147483648 # bit pattern still fits into [int], but the result is now a *negative* number
Now, if I'm _explicitly asking_ for a _signed_ type with suffix S
, I see no reason why the same bit-pattern logic should _not_ apply:
PS> 0x7fffS
32767
PS> 0x8000S # !! BOOM - even though the bit pattern clearly fits.
The numeric constant 0x8000S is not valid.
0x8000S
should be the equivalent of [int16]::Parse('8000', 'AllowHexSpecifier')
, which correctly yields -32768
(and [int16]::Parse('ACED', 'AllowHexSpecifier')
yields -21267
, as requested in the OP).
Historically (and somewhat counter-intuitively), suffix-less hex. literals are always parsed into _signed_ types, automatically chosen as either
[int]
or[long]
, depending on whether the _bit pattern_ implied by the hex digits fits, which leads to surprising behavior such as the following:PS> 0x7fffffff 2147483647 # [int] PS> 0x80000000 -1 # bit pattern still fits into [int], but the result is now a *negative* number
The latter returns -2147483648
FYI. 0xFFFFFFFF
is -1
. I wouldn't call it surprising though, all literals are signed by default.
Edit: Marking off-topic since it isn't really related to the issue.
Yeah, sorry about pasting the wrong result in my previous comment- fixed now.
I wouldn't call it surprising though,
It's surprising if you come from a C# background (or JavaScript, or Perl, or Bash, or ...), where hex literals are always _positive_ numbers, with _unsigned_ types chosen as needed (0x80000000
-> 2147483648
([uint32]
).
But that was really just an aside, given that that ship has sailed a long time ago.
As is hopefully clear from my previous comment, I fully agree that parsing something like 0xACEDs
should work and should return -21267
.
I have an implementation-based question... Currently whether a literal is considered signed or unsigned is literally a single boolean toggle switch when parsing the number.
Do you think it makes sense to just assume a given hex literal is unsigned if the target type (specified by the type suffix) is an unsigned type (byte, ushort, ulong)?
That makes sense to me, @vexx32; not having looked into the current implementation, I'm asking innocently: wouldn't <targetType>.Parse(<input>, System.Globalization.NumberStyles.AllowHexSpecifier)
give us that desired behavior? (In the case of U
you'd have to try UInt32
first, and then retry with UInt64
).
Yeap, but on the whole when working in that area of the code I found it was much easier to handle everything via BigInteger (where you have full control of whether a given literal is considered signed or not) than shuffling into different variables constantly, because as you alluded to, there are some cases where you have to automatically figure out the right type to use -- and when there's no suffix specified, it can be a bit tricky to try to juggle everything in separate types. 馃檪
I have submitted #11844 which will adjust the hex parsing to allow literals as described. 馃檪
:tada:This issue was addressed in #11844, which has now been successfully released as v7.1.0-preview.2
.:tada:
Handy links:
Most helpful comment
Yeah, sorry about pasting the wrong result in my previous comment- fixed now.
It's surprising if you come from a C# background (or JavaScript, or Perl, or Bash, or ...), where hex literals are always _positive_ numbers, with _unsigned_ types chosen as needed (
0x80000000
->2147483648
([uint32]
).But that was really just an aside, given that that ship has sailed a long time ago.
As is hopefully clear from my previous comment, I fully agree that parsing something like
0xACEDs
should work and should return-21267
.