I think a method should be added to String
that returns the first character and should be a faster alternative to [0]
. The strategy is to start an iteration through the string but stop at the first character. That's faster than [0]
. Reason for this is for example the each_byte
on ASCII-only optimization in String#each_char
.
require "benchmark"
Benchmark.ips do |bm|
bm.report "each_char" do
"hello".each_char do |char|
break char
end
end
bm.report "[0]" do
"hello"[0]
end
end
each_char 394.43M ( 2.54ns) (卤 5.82%) 0.0B/op fastest
[0] 196.71M ( 5.08ns) (卤 5.03%) 0.0B/op 2.01脳 slower
The first is also faster for non-ASCII characters.
This optimization is for example already used here: string.cr:4069
.
The method could perhaps be called first
or first_char
.
can we just special case String#[]
with the optimization if index == 0
?
I don't know, it looks like the difference boils down to the added bounds checking and support for negative indexes in byte_at
, otherwise the codepaths look pretty identical (char_at
is already optimized for the ascii only case).
require "benchmark"
class String
def first_char
c = each_char do |char|
break char
end
raise IndexError.new unless c
c
end
def at2(index)
if index == 0
first_char
else
at(index)
end
end
end
Benchmark.ips do |bm|
bm.report "first_char" do
"hello".first_char
end
bm.report "[0]" do
"hello"[0]
end
bm.report "at2" do
"hello".at2(0)
end
end
first_char 552.12M ( 1.81ns) (卤 2.14%) 0.0B/op fastest
[0] 386.78M ( 2.59ns) (卤 1.67%) 0.0B/op 1.43脳 slower
at2 547.62M ( 1.83ns) (卤 3.85%) 0.0B/op 1.01脳 slower
Try with a string that's not hardcoded (for example reading from ARGV). The difference is miniscule:
require "benchmark"
str = ARGV[0]
Benchmark.ips do |bm|
bm.report "each_char" do
str.each_char do |char|
break char
end
end
bm.report "[0]" do
str[0]
end
end
each_char 292.54M ( 3.42ns) (卤 3.93%) 0.0B/op fastest
[0] 273.98M ( 3.65ns) (卤 6.93%) 0.0B/op 1.07脳 slower
And if you change the order you get that the second one is fastest.
Can we stop worrying about sub-nanosecond performance differences?
Thank you!
All these examples are not really worth anything as @asterite shows. That's because #char_at
is super efficient for ascii-only strings because it only reads the byte at index n and converts it to Char
(=> N(1)
).
But with strings containing non-ascii characters, there could be a significant difference and optimizing for 0
might actually be worth it. Maybe even for other small indices, where each_char
+ break
would be faster than the current char_at
implementation.
Feel free to continue this.
Most helpful comment
Try with a string that's not hardcoded (for example reading from ARGV). The difference is miniscule:
And if you change the order you get that the second one is fastest.
Can we stop worrying about sub-nanosecond performance differences?
Thank you!