Crystal: Performance: while loop significantly faster than ranges with steps

Created on 21 Dec 2019  路  5Comments  路  Source: crystal-lang/crystal

I noticed that the optimizer seems to have difficulties with ranges:

(start..end).step(size).each { ... }

Maybe it is known, then feel free to close. For me, the slowdown compared to a while loop was unexpected, so I wanted to share the test program that I wrote.

First, the original (slow) version:

require "bit_array"

def sieve(n : Int32)
  return if n < 2

  arr = BitArray.new(n)
  arr[0] = true
  pos = 1
  while pos < n
    if !arr[pos]
      prime = pos + 1
      yield prime
      (prime + prime..n).step(prime).each { |x| arr[x-1] = true }
    end
    pos += 1
  end
end

n = ARGV.empty? ? 100 : ARGV[0].to_i
sieve(n) { |x| puts(x) }
$ crystal build --release version1.cr
$ time ./version1 1000000
...
999961
999979
999983

real    0m20.583s
user    0m20.264s
sys 0m0.280s

Replacing the line

(prime + prime..n).step(prime).each { |x| arr[x-1] = true }

by an explicit loop resulted in a significant speedup (about 50x):

require "bit_array"

def sieve(n : Int32)
  return if n < 2

  arr = BitArray.new(n)
  arr[0] = true
  pos = 1
  while pos < n
    if !arr[pos]
      prime = pos + 1
      yield prime
      i = prime + prime
      while i < n
        arr[i-1] = true
        i += prime
      end
    end
    pos += 1
  end
end

n = ARGV.empty? ? 100 : ARGV[0].to_i
sieve(n) { |x| puts(x) }
$ crystal build --release version2.cr
$ time ./version2 1000000
...
999961
999979
999983

real    0m0.392s
user    0m0.065s
sys 0m0.327s

With bigger input sizes, the effect is more dramatic. With 1000000 (10 times more), the fast version takes about 10x longer (3.2 seconds), while the slow version took 80x longer (27m 46s). One aspect that I noticed was that it became a lot faster once the range became empty, so I assume the performance bug is when it steps through a non-empty range.

I'm using Arch Linux:

$ crystal -v
Crystal 0.32.1 (2019-12-18)

LLVM: 9.0.0
Default target: x86_64-pc-linux-gnu

All 5 comments

(start..end).step(size).each creates an iterator, and iterates using that, which is likely where the performance slowdown is coming from. start.step(to: end, by: size) do |i| is the idiomatic way to do this for numbers.

The optimizer is LLVM, we don't have much if any control over that.

Also that Iterator is a class so it allocates memory, which is significantly slower than not allocating memory.

I really don't think there's anything to do here or to continue discussing. Just don't use iterators in tight loops.

Also that Iterator is a class so it allocates memory

It also doesn't use any Int-specific optimizations.

Perhaps we could optimize this cause, but Int#step is the right way to do this.

Just tried this one:

(prime + prime).step(to: n, by: prime) { |x| arr[x-1] = true }

The performance is identical to the handwritten while loop.

Can we close this? I'm not sure we're gonna improve this further

Was this page helpful?
0 / 5 - 0 ratings

Related issues

oprypin picture oprypin  路  3Comments

nabeelomer picture nabeelomer  路  3Comments

lgphp picture lgphp  路  3Comments

asterite picture asterite  路  3Comments

relonger picture relonger  路  3Comments