Being able to group digits in large numeric literals would have great readability impact and no significant downside.
Adding binary literals (#215) would increase the likelihood of numeric literals being long, so the two features enhance each other.
We would follow Java and others, and use an underscore _
as a digit separator. It would be able to occur everywhere in a numeric literal (except as the first and last character), since different groupings may make sense in different scenarios and especially for different numeric bases:
c#
int bin = 0b1001_1010_0001_0100;
int hex = 0x1b_a0_44_fe;
int dec = 33_554_432;
int weird = 1_2__3___4____5_____6______7_______8________9;
double real = 1_000.111_1e-1_000;
Any sequence of digits may be separated by underscores, possibly more than one underscore between two consecutive digits. They are allowed in decimals as well as exponents, but following the previous rule, they may not appear next to the decimal (10_.0
), next to the exponent character (1.1e_1
), or next to the type specifier (10_f
). When used in binary and hexadecimal literals, they may not appear immediately following the 0x
or 0b
.
The syntax is straightforward, and the separators have no semantic impact - they are simply ignored.
This has broad value and is easy to implement.
Does this apply to real literals as well? For example, would 1_0_._5_e_-_1_6_m_
be valid?
I have no idea if this would be useful, just curious.
:+1:
Don't shoot me, but would it be too hard to parse "space" as a seperator? Or does that make the grammer ambiguous?
c#
int two = 0b 10;
short max = 0x ffff;
long oneMillion = 1 000 000;
Just thinking out loud.
Digit separators where included in the VB.net (vNext CTP) would it be beneficial to a also describe what was allow in VB? It think
'1_000' was allowed but 1__000
wasn't.
Comma usage could be an issue as it would clash with array literals, as you couldn't tell what was number and what was and array element.
I agree I like space more then underscore. It's generally easier to type and makes it easier when working with something like hex numbers e.g.
0x8080 8080 8080 8080UL
is so much easier to read and make sure I've filled all the slots vs something like 0x8080808080808080UL
where I have to sit and count to see if I got 16 characters or I only typed 14 or something. How's about ' as well.
I don't see how you could use a space as the separator because numeric literals would then potentially consist of not one but several tokens. This would make them very difficult to parse.
The underscore seems the best choice of separator to me, particularly as it's already used by several other languages.
I'm not so keen on allowing multiple consecutive underscores but I suppose it does no harm.
This grammar wouldn't allow consecutive separators.
digit ::= '0' - '9'
sep ::= '_'
prefix ::=
literal ::= prefix (sep? digit)+
I think spaces could also be possible
digit ::= '0'-'9'
seperator ::= ' '
literal ::= digit (separator? digit)*
I think it would be very hard to use spaces.
I haven't looked at the parser, but it's probably doing something like breaking the text at white spaces, parenthesis, braces, whatever and analyzing the tokens from there. Assuming that after a numeric literal it might come the rest of it is doable, but I don't think it is worth the cost.
And what next? This?
var a = 1111
1111
1111
1111;
Or this?
var a = 1111 // comment
1111 // comment
1111 // comment
1111; // comment
Although it might be an itsy bitsy harder to write in most keyboard configuration, the semantic break of the numeric literal is the same with the _
and I would argue that it's even better because gives separation and cohesion.
Wonder if the parser supports significant whitespace?
The VB implementation of digit group separators prototyped last year actually supported three different separators originally: underscore, back tick, and space. So you could write &B1111 0010 or 1_000_000 or 3`600. We quickly decided that back tick didn't make enough sense to anyone and cut it. The VB preview still supported both underscores and spaces. The biggest motivation for spaces was binary literals, another feature prototyped at the same time, because binary numbers are conventionally separated with spaces.
As to implementation, it's not hard at all really - at least in VB, particularly when you don't allow multiple consecutive separators. Normally the scanner encounters a digit and starts scanning a integral literal one character at a time until it encounters a character that's not a digit for the base being used (decimal, hex, octal) then it stops. We changed it so that if the non-digit character were a underscore or space it would peek one more character ahead and if that character were a digit it would keep scanning it as a single token. There are some corner cases you have to put extra recovery around but it's not very complicated, particularly because in VB it's not valid to have two integer literals follow one another so it's non-breaking to interpret 1 1 as 11. I think C# is the same here though in C# we were pretty settled that underscore would be the sole separator.
I think the biggest concern about that is that tools would be confused thinking the space was a word boundary (not VS, the editor is smart enough in VS to handle space) and we just couldn't foresee what havoc spaces would be unleashing on the world (if any).
Another more minor concern was complexity - would users benefit more from having a single consistent separator used everywhere? If we decided to pick one it would likely be the underscore so space was only a possibility if we were ok with having two separators which was an open question.
-ADG
Using space as a separator would probably be a bad idea, because it would cause hard-to-spot mistakes. For instance, int[] numbers = { 1 2 }
_looks_ like an array with the numbers 1 and 2, but it would actually be an array with only the number 12. Forgetting a comma would silently change the meaning of the code, instead of causing an error.
@thomaslevesque that is a very good point, before I suggested it I quickly tried to think of places where two numbers would follow each other, but I had totally missed this obvious one. I think that is probably a deal breaker.
Seems generally people are not for using space, and I think I have come to agree with this point. Still don't like how "1_000" looks, but it might be the best and easiest option.
Isn't this proposal about digit separators for the literals have a prefix?
@AdamSpeight2008, no, it's for all numeric literals.
@AdamSpeight2008, we did consider restricting space in particular to its most obvious use case - binary literals. It would be unusual, but I think it's worth considering if it gives us more confidence in the feature.
@thomaslevesque, @chrisaut, I find that developers tend to bias negatively on what would confuse other developers and how often. Just about every feature ever proposed or introduced has someone saying "this will cause hard to spot mistakes for everyone ever". There are also features which at first seem harmless - then later turn out to be pits of failure. Fortunately, with "Roslyn" and a managed code base it's much easier to quickly prototype language features - even the scary ones and experiment and make decisions after making observations. I think that will give us the most room to explore the full potential of the language without being committed to doing or not doing a feature a particular way too early. It's still very _very_ early in the design of VB15 (this idea has 0% chance of making it into C#) and given how often space has been proposed or preferred by different VB users we've spoken to I'd hate to cut the idea down prematurely if it could actually produce a better experience for those users.
Regards,
-ADG
I'd say ' or ` are better choices than _:
@mikedn
They're easier to type (single keystroke instead of combination)
Sadly this holds true only for the US keyboard layout. At least In the German layout all three require two key strokes. Only space is one keystroke here, too.
Agreed ` or ' are undesirable for the reasons already mentioned. I actually don't mind using _ as a separator at all, and, frankly, anything here is better than nothing :)
Using space seems like a recipe for conflicts all over the place, and I don't see it adding that much value. I dislike the idea of allowing multiple, alternative separators, while anyone reusing Roslyn wouldn't care, other tools doing their own lexing of C# code would have to do much more work.
'
is used for a comment in VB.net
In VB.net _
is also used as a line continuation.
Would that cause a misread of the user's intent?
@mikedn, @tomasr or ' is good only for decimal digits. Let
s see other cases:
cs
int bin = 0b`1001`1010`0001`0100;
int hex = 0x1b`a0`44`fe;
int dec = 33`554`432;
int weird = 1`2``3```4````5`````6``````7```````8````````9`````````;
I think _
is better because it more universal.
@ViIvanov ` and ' make it look like numbers are indicating degrees. or feet and inches.
@AdamSpeight2008, in VB the explicit line continuation is actually
I agree that ` and ' look more like units of measurement. _ has a precedent in identifiers as a chunk separator.
I haven't seen a good scenario for multiple consecutive separators yet and am likely to advocate disallowing them.
Just to reinforce what @AnthonyDGreen and @d-kr said, on the Portuguese keyboard layout requires me to type **[SHIFT]**+**[
]** followed by [SPACE] if the following character is a vowel.
You couldn't possibly imagine how hard was to me to type code in markdown.
I like this proposal. But I don't know why this restriction is necessary: "When used in binary and hexadecimal literals, they may not appear immediately following the 0x
or 0b
."
I feel like
int bin = 0b_1001_1010_0001_0100;
is much better than
int bin = 0b1001_1010_0001_0100;
and I can't imagine any problem with allowing this.
@jveselka Me too especially the general grammar would be literal ::= prefix (sep? digit)+
@gafter So the final decision is disallowing separators immediately after prefixes?
@CnSimonChan I think it is implement in the Future
branch.but it needs the feature flag to be set (or the language version to be VB15. Not sure if these features are available by default in that version (15) of the language.
@zippec: Completely agree. @jaredpar should we break out the feature request for 0x_1001_1000
to be valid into a separate issue?
@jskeet yes let's use a separate issue since this feature is implemented as spec'd here. We can use the new issue to track changing to allow that syntax.
Would be nice. Although space feels less C#ish, I still vote for spaces, I mean can it go wrong as long as we're expecting a ;
?
Anyway, I think it should only be allowed in binary/hex/o谭c谭t谭 etc.?
@weitzhandler, I think that changing C# 7 and Visual Studio for tomorrow is, most probably, out of the question. 馃槃
so
var a = 1 0;
is actually just ten?
@alrz, that's no worst than
var a = 1______________________________________________________________________________________________________________________________________________________________________________________________________________________________________0;
The greater issue here is that, in this particular case and only in this particular case, space is a special case for white spaces. And that's bad. Very bad.
@paulomorgado
No, the space is worst because it's invisible. In your example it's impossible to overlook the zero because the literal goes on and on. and on.
Limit to single space (surely no line breaks :rage:).
_
is definitely more C#ish anyway.
And separation only make sense in binary/hex.
@weitzhandler No it doesn't. C# doesn't mind how many spaces you are using between tokens at all.
We should keep the discussion here.
@weitzhandler I think you mean discussion has moved here.
@alrz, Visual Studio can make white space visible. But I still think that would be the least of the problems.
Discussion for this feature has been moved here.
Most helpful comment
Using space as a separator would probably be a bad idea, because it would cause hard-to-spot mistakes. For instance,
int[] numbers = { 1 2 }
_looks_ like an array with the numbers 1 and 2, but it would actually be an array with only the number 12. Forgetting a comma would silently change the meaning of the code, instead of causing an error.