Original PR: https://github.com/dotnet/roslyn/pull/28927
The VirtualChar system can properly handle almost all C# escapes (including \n, \u0001, \xff, etc.). All these escapes map back down to a single 'char'.
one thing it can't handle is \UXXXXXXXX where that would map to an escape character larger than 16 bits. It's unclear if we should bother making it support that case. But this issue tracks the limitation anyways.
Is there a BCL datatype which Roslyn references which should be used, or is int/uint the path to take?
I couldn't find a BCL type (pity). I'm using uint myself.
@CyrusNajmabadi BCL uses int internally - the valid range is U+0000..U+10FFFF.
https://github.com/dotnet/runtime/blob/16e325d2f8f0aa0f7ab390275525b569edac7d1f/src/libraries/Common/tests/CoreFx.Private.TestUtilities.Unicode/System/Text/Unicode/CodePoint.cs#L15

Lol... sigh.
THey use a signed value... to represent a character... Thus also making it a super pita to go between chars and CodePoints... sigh...
I'm not sure what you mean. char is a different concept then code point. You can't just cast char to int to get a code point.
I'm not sure what you mean. char is a different concept then code point. You can't just cast char to int to get a code point.
I mean that there is no sense where these are signed. There's no negative codepoint, just like there's no negative char. My sign is about using an innapropriate type (likely for cls reasons) to represent this instead of matching what char does and having a sane 0-N representation.
The .NET commonly uses int to store non-negative numbers that fit to 31 bits.
You could also argue that uint is not a good representation because it can represent numbers that are larger than 0x10ffff, which is the max value of a code point.
You could also argue that uint is not a good representation because it can represent numbers that are larger than 0x10ffff, which is the max value of a code point.
Yes. But it's the best type we have, and also makes the math a ton simpler.
Most helpful comment
Yes. But it's the best type we have, and also makes the math a ton simpler.