Roslyn: VirtualChar system cannot handle 32bit wide characters

Created on 9 Aug 2018  路  10Comments  路  Source: dotnet/roslyn

Original PR: https://github.com/dotnet/roslyn/pull/28927

The VirtualChar system can properly handle almost all C# escapes (including \n, \u0001, \xff, etc.). All these escapes map back down to a single 'char'.

one thing it can't handle is \UXXXXXXXX where that would map to an escape character larger than 16 bits. It's unclear if we should bother making it support that case. But this issue tracks the limitation anyways.

Area-IDE Bug Tenet-Localization help wanted

Most helpful comment

You could also argue that uint is not a good representation because it can represent numbers that are larger than 0x10ffff, which is the max value of a code point.

Yes. But it's the best type we have, and also makes the math a ton simpler.

All 10 comments

Is there a BCL datatype which Roslyn references which should be used, or is int/uint the path to take?

I couldn't find a BCL type (pity). I'm using uint myself.

image

Lol... sigh.

THey use a signed value... to represent a character... Thus also making it a super pita to go between chars and CodePoints... sigh...

I'm not sure what you mean. char is a different concept then code point. You can't just cast char to int to get a code point.

I'm not sure what you mean. char is a different concept then code point. You can't just cast char to int to get a code point.

I mean that there is no sense where these are signed. There's no negative codepoint, just like there's no negative char. My sign is about using an innapropriate type (likely for cls reasons) to represent this instead of matching what char does and having a sane 0-N representation.

The .NET commonly uses int to store non-negative numbers that fit to 31 bits.
You could also argue that uint is not a good representation because it can represent numbers that are larger than 0x10ffff, which is the max value of a code point.

You could also argue that uint is not a good representation because it can represent numbers that are larger than 0x10ffff, which is the max value of a code point.

Yes. But it's the best type we have, and also makes the math a ton simpler.

Was this page helpful?
0 / 5 - 0 ratings