String#codepoint_at
simply delegates to char_at(index).ord
. Instead of being a short cut, codepoint_at(index)
is actually even longer (by one char) than calling char_at(index).ord
directly.
Even if it technically was functioning as a short cut, such a thing only makes sense when it is at least somewhat useful. Currently, I could only find two real usages of this method in markd
, while looking all of GitHub's Crystal repositories. All other occurrences seem to be mirrors or re-implementations of the stdlib and a few specs.
I'm suggesting to deprecate and remove this method. It's not good to maintain a short cut that defeats its purpose and isn't really used.
Just my opinion, but I think codepoint
is more descriptive than string[index].ord
. Doesn't mean we couldn't remove it.
I'd rather rename Char#ord
to #codepoint
then.
Yes! We copied that from Ruby. It's not bad because it's short, but it's also cryptic. Then we have 1.chr
which is also a bit cryptic, but I don't know.
The #codepoint
looks much more meaningful. A small cons is with #ord
gone one more method would be missing for those of us coming from Ruby.
But it would probably be for good... 馃し鈥嶁檪 馃槃
Then we have
1.chr
which is also a bit cryptic, but I don't know.
So ord
/ chr
for symmetry can be something like codepoint
/ character
(or char
)?
1.to_char
and 'a'.codepoint
would be what i'd design in a cleanroom. But changing chr
-> to_char
is not overly worth it.
@RX14 If it's 1.to_char
, why it's not 'a'.to_codepoint
then? :)
Because in 'a'.codepoint
you are asking the codepoint of a char, but in 1.char
you are not asking the char of 1 (that doesn't make sense), you are asking to convert it to a char. But that also doesn't make sense. chr
is a bit cryptic but it's the char for a given codepoint, so it might be good. Maybe 1.considering_it_as_a_codepoint_to_char
, but it's super verbose :-P
Yeah, good points. Reminds me about to_i
vs to_int
difference in Ruby.
It's just to_x
cause you're converting Int to Char. Char#to_i
is defined the same as Char#codepoint
.
Except it's not. Char#to_i
converts '1'
to 1
. So why shouldn't 1.to_char
return '1'
not 'u0001'`? They both operate on only a (tiny) range of their input values!
Doesn't have to be the end of the world, but if someone can make some neat API symmetry here it would be preferable.
Int#to_codepoint
might make sense: Convert the number to a codepoint. And the codepoint is represented as Char
.
I would suggest maybe:
1.to_s # => "1"
'1'.to_i32 # => 1
1.char # => '\u0001'
'1'.codepoint # => 49
It's because we have Char#bytes
. to_
is more like a converting method, which means it contains parsing or human friendly converting instead of low level casting.
@straight-shoota i actually feel Int#as_codepoint
works better than #to_codepoint
...
@RX14 Yeah, I can follow that.
Regarding the original topic: String
also has #each_codepoint
and #codepoints
methods which essentially just wrap #each_char
with #ord
. When we remove #codepint_at
, we should also consider removing the related collection methods. But they might be more useful and act as actual short cuts. So maybe they should stay? And in turn, this could also extend to #codepoint_at
, because they essentially form a family related methods for treating a String
as a collection of codepoints.
Can be closed since the decision was made not to remove String#codepoint_a
. See #8902
Most helpful comment
I'd rather rename
Char#ord
to#codepoint
then.