This was previously requested in a comment on #216 (and I independently viewed that thread precisely to see if it was already valid).
With VS15 Preview 3, we have:
Valid:
var x = 0b1010_0000;
var y = 0x1234_abcd;
Not valid:
var x = 0b_1010_0000;
var y = 0x_1234_abcd;
I find the latter more readable than the former. While I can see the reason why digit separators before just digits isn't valid (e.g. _1, which is a valid identifier), the leading 0x or 0b already prevents the token from being an identifier.
// cc @zippec
This is a language request, not a compiler request. The compiler is behaving per its specification.
/cc @khyperia FYI
I implemented this a while back, so I figured I'd chime in with what I know. While I'm not sure where the exact spec is hiding (it likely is equivalent to the compiler), this is what the compiler does:
Any "string of digits" (e.g. 0-9 for decimals, plus fullwidth for VB) in any literal (decimal, hex, binary, float, double) can contain any number of underscores at any place between the first and last digit (i.e. cannot start nor end with an underscore).
There are additional cases that might be interesting to consider when discussing the choice of if 0x_2 should be allowed, mostly relating to floats ("reasonable" means "easy to design without breaking changes"). I've also listed cases that cannot or are difficult to be parsed without a breaking change. (All of these are impossible with today's rules)
0x_2 -- the original proposal0b_10 -- same0x2_ -- reasonable_1.2e3 -- might be technically possible, but involves lookahead to see a digit or the e (and breaks in unintuitive ways)1_.2e3 -- reasonable1._2e3 -- same as earlier, but even more unintuitive (e.g. 1._2 is impossible to be resolved in the parser, it needs the exponent syntax to be possible)1.2_e3 -- reasonable1.2e_3 -- reasonable (this is an odd one - prefixing an underscore to the digit sequence isn't simple to do in the other two cases)1.2e3_ -- reasonableAdditionally, 0_x2 might be considered, but I don't see how that makes sense at all.
Note that if any rules is changed, we would also want to update VB, as well as possibly F# - I helped out a PR implementing digit separators in F#, and they ended up following the same rules.
Edit from half a year later (2017-02-11): Don't know what I was thinking with _1.2e3 or 1._2e3 being technically possible to be resolved in the parser, they're definitely not. My personal opinion is that 0x_2 is the only truly useful change, but I figured I'd correct the above for potential future discussion.
1._2e3-- same as earlier, but even more unintuitive (e.g.1._2is impossible to be resolved in the parser, it needs the exponent syntax to be possible)
So the compiler will have to wait until it knows whether _2e3 is a valid extension method on int or not to choose between 1 and 1.2e3? I don't know if that's worth it.
I think the proposed grammar was
Literal ::= Prefix ( Sep? Digit )* Digit
I'm closing this and letting the LDM decide before doing any work. See https://github.com/dotnet/csharplang/issues/65
Note:
I'd be very wary about allowing a prefix of _ as _1 is already a legal identifier.
@CyrusNajmabadi The meaning of prefix here are is, are those define by the language specification.
Prefix ::= HexPrefix | BinPrefix | ...
HexPrefix ::= "0h" | "0H"
BinPrefix ::= "0b" | "0B"
If the prefix is missing then it possible for the character sequence to match a identifier (possibly legal) if the underscore separator is first. Though this is unlikely as the prefix is required in this context of digit separators.
Most helpful comment
Note:
I'd be very wary about allowing a prefix of
_as_1is already a legal identifier.