After some thought about this issue to move Array index methods to Indexable. What about String?
There are lots of methods that are mirrored from Indexable like each_char, each_char_with_index, #[], #[]?... Same for method related to bytes each_bytes and lines each_line.
Another problem is having "str".each_char and "str".chars.each. Several Indexable methods like map can only be used by creating an intermediate array.
Including Indexable(Char) into String isn't a proper solution , because a string can be, by nature, represented as bytes, chars, codepoints.
On the high level API, the changes will be:
"str".each_char => "str".chars.each"str".chars => "str".chars.to_a"str".each_char_with_index => "str".chars.each_with_index"str".count => "str".chars.count"str".blank? => "str".chars.all? &.whitespace?"str".codepoints => "str".chars.map &.ord"str".each_codepoint=> "str".chars.each { |c| c.ord }"str".lines => "str".lines.to_a"str".each_line => "str".lines.each"str".bytes => "str".bytes.to_a"str".each_byte => "str".bytes.eachThat's a preview, with possibly even more method changes.
The codepoints changes are less user-friendly, but being quite low-level and rarely used, on my opinion it's fine.
Better to concentrate first on the proposal's API, but if you wonder what type String#chars can be and how this can be implemented by using structs: String::Chars, String::Bytes and String::Lines. CharIterator and LineIterator can be moved to String::Chars::Iterator and String::Lines::Iterator.
This proposal could superset https://github.com/crystal-lang/crystal/pull/7275
The idea doesn't sound too bad, but what actual problem does it solve?
@straight-shoota remove duplicates with Indexable, improve the API (no more each_char vs chars.each), avoid creating intermediate arrays to use Enumerable methods like map, reject or select.
Just note that in my comment I proposed having views as another way to do things a bit more efficiently. I wouldn't remove any of the existing methods. They are convenient.
And note that you can do string.each_char.map...to_a
@asterite ok, this uses an intermediate Iterator instead of an intermediate Array :smile:
With your proposal and not removing any method we will got String#char_view.map, String#char_view.each.map, String#chars.map and String#each_char.map. It will brings even more confusion for beginners when deciding which one to choose.
So the benefit over the already existing iterators would be that it implements Indexable? Maybe we can just include Indexable in CharIterator etc. and combine the APIs?
@straight-shoota that's what I said:
CharIterator and LineIterator can be moved to String::Chars::Iterator and String::Lines::Iterator.
This will yield to "str".chars.each : String::Chars::Iterator
When including Indexable, a each returning an Iterator must be implemented for Iterable.
~Therefore including Indexable into an Iterator isn't a good idea.~
What's the point of Iterable over using each directly?
One goal I forgot is to simplify this documentation section.
After some research, Rust has this iterators: Struct std::str::Chars Struct std::str::Bytes and Struct std::str::Lines accessed by str methods chars, bytes and lines.
I think it's OK to include iterator into String::Chars, but not indexable into the existing CharsIterator.
IMHO the String class is a bit too full featured with various helpers related to chars, bytes, lines and codepoints. This raises questions (that I had too) because it resembles of an Indexable of Chars without being one.
The main goal is to better organize the String API.
Looking at the proposed API changes, I think I disagree with all of them. I fail to see any improvement or benefit to the developer.
I think we should first try to answer the question of what's hard or inconvenient to do with the existing String API. From my side the answer is: nothing. I was just thinking of other ways to access the string contents, but it might not be worth it.
The main problem I see at the end is there are multiple non optimal ways to do the same thing: https://github.com/search?l=Crystal&q=chars.{each,map}&type=Code
@asterite like you, I have a tendency to use String#chars instead of String#each_char - and we aren't alone looking at the GitHub search :sweat_smile:
require "benchmark"
STR = "string"
Benchmark.ips do |x|
x.report("String#chars.map") do
STR.chars.map { |c| }
end
x.report("String#each_char.map : CharIterator") do
STR.each_char.map { |c| }
end
end
String#chars.map 12.78M ( 78.24ns) (卤 7.01%) 112 B/op 1.84脳 slower
String#each_char.map : CharIterator 23.56M ( 42.45ns) (卤 7.38%) 80 B/op fastest
After noticing that most of the time, users often use String#chars for iterator operations, I suggest:
String#chars : Array(String) to String#chars : CharIterator (and suggest using String#chars.to_a to have an Array)String#each_char : CharIteratorWDYT?
I disagree: it's not clear what chars returns, whereas each_{item} is a common idiom in crystal where the block version yields each item, and the non-block version returns an {Item}iterator.
I'd rather remove chars than change it to each_char and remove each_char..
I think detecting things like .chars.each etc should be easy to do for a semantic linter and I would prefer a tool like that to educate people about this rather than designing the stdlib into fool-proof but ugly.
The benchmark is flawed, you must call to_a after map. It will be slower than chars, but not by too much (I think)
@asterite The benchmark also uses a very short string which is clearly in favour of #chars.
Ok, else does anyone has a idea how to achieve this level of performance with the actual stdlib then?
https://gist.github.com/j8r/c4b7d24257914b55cfad3acaf04cc7d8
String#chars.map 9.16M (109.16ns) (卤 9.44%) 112 B/op 2.31脳 slower
String#each_char.to_a.map 3.97M ( 251.8ns) (卤15.06%) 209 B/op 5.32脳 slower
String#chars2.map 21.14M ( 47.31ns) (卤 6.35%) 48 B/op fastest
Big String#chars.map 19.02k ( 52.58碌s) (卤30.32%) 24064 B/op 10.34脳 slower
Big String#each_char.to_a.map 16.69k ( 59.92碌s) (卤21.55%) 50032 B/op 11.78脳 slower
Big String#chars2.map 196.65k ( 5.09碌s) (卤 7.66%) 128 B/op fastest
IMHO the API is nice.
EDIT: update according @bew suggestion of using a big String.
Why are you benchmarking each_char.to_a.map vs chars2.map, it doesn't make sense to me.. The equivalent to chars2.map is each_char.map.
And using a bigger string like "string" * 1000, I get very different results:
https://gist.github.com/bew/3e44bf01b27db15da0f1ff14be38b450
Edit: actually forget me, each_char.map doesn't do anything.... my bad
Again: each_char.map is a no-op without later invoking to_a
chars is faster than char2.map because chars knows the resulting size of the Array. I think Rust has an optional size_hint getter for this, but that's a separate discussion.
Most helpful comment
I think we should first try to answer the question of what's hard or inconvenient to do with the existing String API. From my side the answer is: nothing. I was just thinking of other ways to access the string contents, but it might not be worth it.