Crystal: Remove String#codepoint_at

Created on 7 Nov 2019 · 16Comments · Source: crystal-lang/crystal

String#codepoint_at simply delegates to char_at(index).ord. Instead of being a short cut, codepoint_at(index) is actually even longer (by one char) than calling char_at(index).ord directly.
Even if it technically was functioning as a short cut, such a thing only makes sense when it is at least somewhat useful. Currently, I could only find two real usages of this method in markd, while looking all of GitHub's Crystal repositories. All other occurrences seem to be mirrors or re-implementations of the stdlib and a few specs.

I'm suggesting to deprecate and remove this method. It's not good to maintain a short cut that defeats its purpose and isn't really used.

newcomer feature topictext

Source

straight-shoota

👍8

Most helpful comment

I'd rather rename Char#ord to #codepoint then.

straight-shoota on 15 Nov 2019

👍4

All 16 comments

Just my opinion, but I think codepoint is more descriptive than string[index].ord. Doesn't mean we couldn't remove it.

asterite on 15 Nov 2019

I'd rather rename Char#ord to #codepoint then.

straight-shoota on 15 Nov 2019

👍4

Yes! We copied that from Ruby. It's not bad because it's short, but it's also cryptic. Then we have 1.chr which is also a bit cryptic, but I don't know.

asterite on 15 Nov 2019

The #codepoint looks much more meaningful. A small cons is with #ord gone one more method would be missing for those of us coming from Ruby.

vlazar on 15 Nov 2019

But it would probably be for good... 🤷‍♂ 😄

straight-shoota on 15 Nov 2019

😄1

Then we have 1.chr which is also a bit cryptic, but I don't know.

So ord / chr for symmetry can be something like codepoint / character (or char)?

vlazar on 15 Nov 2019

👍1

1.to_char and 'a'.codepoint would be what i'd design in a cleanroom. But changing chr -> to_char is not overly worth it.

RX14 on 21 Nov 2019

👍1

@RX14 If it's 1.to_char, why it's not 'a'.to_codepoint then? :)

vlazar on 21 Nov 2019

Because in 'a'.codepoint you are asking the codepoint of a char, but in 1.char you are not asking the char of 1 (that doesn't make sense), you are asking to convert it to a char. But that also doesn't make sense. chr is a bit cryptic but it's the char for a given codepoint, so it might be good. Maybe 1.considering_it_as_a_codepoint_to_char, but it's super verbose :-P

asterite on 21 Nov 2019

👍2

Yeah, good points. Reminds me about to_i vs to_int difference in Ruby.

vlazar on 21 Nov 2019

It's just to_x cause you're converting Int to Char. Char#to_i is defined the same as Char#codepoint.

Except it's not. Char#to_i converts '1' to 1. So why shouldn't 1.to_char return '1' not 'u0001'`? They both operate on only a (tiny) range of their input values!

Doesn't have to be the end of the world, but if someone can make some neat API symmetry here it would be preferable.

RX14 on 21 Nov 2019

👍1

Int#to_codepoint might make sense: Convert the number to a codepoint. And the codepoint is represented as Char.

straight-shoota on 21 Nov 2019

I would suggest maybe:

1.to_s # => "1"
'1'.to_i32 # => 1

1.char # => '\u0001'
'1'.codepoint # => 49

It's because we have Char#bytes. to_ is more like a converting method, which means it contains parsing or human friendly converting instead of low level casting.

wdhwg001 on 24 Nov 2019

@straight-shoota i actually feel Int#as_codepoint works better than #to_codepoint...

RX14 on 29 Nov 2019

@RX14 Yeah, I can follow that.

Regarding the original topic: String also has #each_codepoint and #codepoints methods which essentially just wrap #each_char with #ord. When we remove #codepint_at, we should also consider removing the related collection methods. But they might be more useful and act as actual short cuts. So maybe they should stay? And in turn, this could also extend to #codepoint_at, because they essentially form a family related methods for treating a String as a collection of codepoints.