Crystal: Remove String#codepoint_at

Created on 7 Nov 2019  路  16Comments  路  Source: crystal-lang/crystal

String#codepoint_at simply delegates to char_at(index).ord. Instead of being a short cut, codepoint_at(index) is actually even longer (by one char) than calling char_at(index).ord directly.
Even if it technically was functioning as a short cut, such a thing only makes sense when it is at least somewhat useful. Currently, I could only find two real usages of this method in markd, while looking all of GitHub's Crystal repositories. All other occurrences seem to be mirrors or re-implementations of the stdlib and a few specs.

I'm suggesting to deprecate and remove this method. It's not good to maintain a short cut that defeats its purpose and isn't really used.

newcomer feature topictext

Most helpful comment

I'd rather rename Char#ord to #codepoint then.

All 16 comments

Just my opinion, but I think codepoint is more descriptive than string[index].ord. Doesn't mean we couldn't remove it.

I'd rather rename Char#ord to #codepoint then.

Yes! We copied that from Ruby. It's not bad because it's short, but it's also cryptic. Then we have 1.chr which is also a bit cryptic, but I don't know.

The #codepoint looks much more meaningful. A small cons is with #ord gone one more method would be missing for those of us coming from Ruby.

But it would probably be for good... 馃し鈥嶁檪 馃槃

Then we have 1.chr which is also a bit cryptic, but I don't know.

So ord / chr for symmetry can be something like codepoint / character (or char)?

1.to_char and 'a'.codepoint would be what i'd design in a cleanroom. But changing chr -> to_char is not overly worth it.

@RX14 If it's 1.to_char, why it's not 'a'.to_codepoint then? :)

Because in 'a'.codepoint you are asking the codepoint of a char, but in 1.char you are not asking the char of 1 (that doesn't make sense), you are asking to convert it to a char. But that also doesn't make sense. chr is a bit cryptic but it's the char for a given codepoint, so it might be good. Maybe 1.considering_it_as_a_codepoint_to_char, but it's super verbose :-P

Yeah, good points. Reminds me about to_i vs to_int difference in Ruby.

It's just to_x cause you're converting Int to Char. Char#to_i is defined the same as Char#codepoint.

Except it's not. Char#to_i converts '1' to 1. So why shouldn't 1.to_char return '1' not 'u0001'`? They both operate on only a (tiny) range of their input values!

Doesn't have to be the end of the world, but if someone can make some neat API symmetry here it would be preferable.

Int#to_codepoint might make sense: Convert the number to a codepoint. And the codepoint is represented as Char.

I would suggest maybe:

1.to_s # => "1"
'1'.to_i32 # => 1

1.char # => '\u0001'
'1'.codepoint # => 49

It's because we have Char#bytes. to_ is more like a converting method, which means it contains parsing or human friendly converting instead of low level casting.

@straight-shoota i actually feel Int#as_codepoint works better than #to_codepoint...

@RX14 Yeah, I can follow that.

Regarding the original topic: String also has #each_codepoint and #codepoints methods which essentially just wrap #each_char with #ord. When we remove #codepint_at, we should also consider removing the related collection methods. But they might be more useful and act as actual short cuts. So maybe they should stay? And in turn, this could also extend to #codepoint_at, because they essentially form a family related methods for treating a String as a collection of codepoints.

Can be closed since the decision was made not to remove String#codepoint_a. See #8902

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sergey-kucher picture sergey-kucher  路  66Comments

asterite picture asterite  路  71Comments

HCLarsen picture HCLarsen  路  162Comments

benoist picture benoist  路  59Comments

RX14 picture RX14  路  62Comments