Crystal: Add a method to String that returns the first character

Created on 29 May 2019 · 5Comments · Source: crystal-lang/crystal

I think a method should be added to String that returns the first character and should be a faster alternative to [0]. The strategy is to start an iteration through the string but stop at the first character. That's faster than [0]. Reason for this is for example the each_byte on ASCII-only optimization in String#each_char.

require "benchmark"

Benchmark.ips do |bm|
  bm.report "each_char" do
    "hello".each_char do |char|
      break char
    end
  end

  bm.report "[0]" do
    "hello"[0]
  end
end

each_char 394.43M (  2.54ns) (± 5.82%)  0.0B/op        fastest
      [0] 196.71M (  5.08ns) (± 5.03%)  0.0B/op   2.01× slower

The first is also faster for non-ASCII characters.
This optimization is for example already used here: string.cr:4069.
The method could perhaps be called first or first_char.

Source

r00ster91

👎2

Most helpful comment

Try with a string that's not hardcoded (for example reading from ARGV). The difference is miniscule:

require "benchmark"

str = ARGV[0]

Benchmark.ips do |bm|
  bm.report "each_char" do
    str.each_char do |char|
      break char
    end
  end
  bm.report "[0]" do
    str[0]
  end
end

each_char 292.54M (  3.42ns) (± 3.93%)  0.0B/op        fastest
      [0] 273.98M (  3.65ns) (± 6.93%)  0.0B/op   1.07× slower

And if you change the order you get that the second one is fastest.

Can we stop worrying about sub-nanosecond performance differences?

Thank you!

asterite on 29 May 2019

👍6 ❤1

All 5 comments

can we just special case String#[] with the optimization if index == 0?

I don't know, it looks like the difference boils down to the added bounds checking and support for negative indexes in byte_at, otherwise the codepaths look pretty identical (char_at is already optimized for the ascii only case).

jhass on 29 May 2019

👍3

require "benchmark"

class String
  def first_char
    c = each_char do |char|
      break char
    end
    raise IndexError.new unless c
    c
  end

  def at2(index)
    if index == 0
      first_char
    else
      at(index)
    end
  end
end

Benchmark.ips do |bm|
  bm.report "first_char" do
    "hello".first_char
  end

  bm.report "[0]" do
    "hello"[0]
  end

  bm.report "at2" do
    "hello".at2(0)
  end
end

first_char 552.12M (  1.81ns) (± 2.14%)  0.0B/op        fastest
       [0] 386.78M (  2.59ns) (± 1.67%)  0.0B/op   1.43× slower
       at2 547.62M (  1.83ns) (± 3.85%)  0.0B/op   1.01× slower