Crystal: CharLiteral#ord / binary char syntax

Created on 15 Oct 2020  Â·  4Comments  Â·  Source: crystal-lang/crystal

Currently (0.35.1) there are some inconsistencies and areas of the language where it is not possible to access Char codepoints ~at compile time~ without external / manual conversion.

# Works
A = 'A'.ord

# Does not
enum Example
  A = 'A'.ord
end

# Also does not
B = {{'B`.ord}}

Some further info on the use case for this is explained on the forums.

While there may be some value in addressing the enum inconsistency, introducing CharLiteral#ord is only one approach.

From some issue digging, it looks like there has been some brief, previous discussion on a b'a' style syntax to allowing expressing UInt8's as their char equivalent. This seems like the elegant approach for general use.

Is this something of interest, and if so, is there a preferred approach between these two options? Happy to look at implementation, but would be good to discuss design (including if this shouldn't be done at all) before doing so.

Most helpful comment

Looking at the current form of the compiler, the cause for the Enum inconsistency is that expressions there are evaluated by the Crystal::MathInterpreter. Adding support for eval of an Char#ord call upstream of that will likely be extremely hacky, prone to error or likely both of these things.

Some good points were also raised on the forum re the expansion of macros increasingly leading to two seperate languages that need to be maintained, so it seem like adding a CharLiteral#ord is worth avoiding.

With the above in mind, expanding the lexer to support a 'char as codepoint' syntax looks to be a good option _if_ this is something of interest.

Q's...

  1. _Is_ this something of interest?
  2. If so, is there a preferred syntax?

On syntax, some options:

b'a' as previously suggested. This mirrors some other languages, namely Rust's byte literals. Worth noting that there is the also the concept of a bytes string literal, which would also map neatly to Crystal's Bytes.

0ca which extends the existing syntax for expressing integers as binary, octal and hexadecimal. This would also be a quick implementation thanks to the extisting scan_zero_number.

All 4 comments

Just a note: A = 'A'.ord is not doing that at compile-time.

Constants are runtime values.

Looking at the current form of the compiler, the cause for the Enum inconsistency is that expressions there are evaluated by the Crystal::MathInterpreter. Adding support for eval of an Char#ord call upstream of that will likely be extremely hacky, prone to error or likely both of these things.

Some good points were also raised on the forum re the expansion of macros increasingly leading to two seperate languages that need to be maintained, so it seem like adding a CharLiteral#ord is worth avoiding.

With the above in mind, expanding the lexer to support a 'char as codepoint' syntax looks to be a good option _if_ this is something of interest.

Q's...

  1. _Is_ this something of interest?
  2. If so, is there a preferred syntax?

On syntax, some options:

b'a' as previously suggested. This mirrors some other languages, namely Rust's byte literals. Worth noting that there is the also the concept of a bytes string literal, which would also map neatly to Crystal's Bytes.

0ca which extends the existing syntax for expressing integers as binary, octal and hexadecimal. This would also be a quick implementation thanks to the extisting scan_zero_number.

@KimBurgess I love the syntax idea for a b'a' and 0ca both I think could be of great value. I can also see b'a' working well with unicode since we can put it into a Bytes instead of just a single integer. I'm not sure if 0ca could be supported with unicode though, could be an issue with that specific form. Maybe instead 0c'a' to keep the single quote syntax for chars?

I'm not sure if supporting Unicode chars -> Bytes would be the best with the b'a' syntax as this would introduce ambiguity for the output kind. It could however be used to provide the codepoint as an appropriate unsigned integer type, mirroring the behaviour of the the existing binary, hex and octal number literals.

b'a' == 0x61

b'â—†' == 0x25c6

I do however _really_ like the 0c'a' syntax as this is a much neater match the existing ways of expressing number literals.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

will picture will  Â·  3Comments

ArthurZ picture ArthurZ  Â·  3Comments

relonger picture relonger  Â·  3Comments

TechMagister picture TechMagister  Â·  3Comments

Papierkorb picture Papierkorb  Â·  3Comments