Crystal: Performance: while loop significantly faster than ranges with steps

Created on 21 Dec 2019 · 5Comments · Source: crystal-lang/crystal

I noticed that the optimizer seems to have difficulties with ranges:

(start..end).step(size).each { ... }

Maybe it is known, then feel free to close. For me, the slowdown compared to a while loop was unexpected, so I wanted to share the test program that I wrote.

First, the original (slow) version:

require "bit_array"

def sieve(n : Int32)
  return if n < 2

  arr = BitArray.new(n)
  arr[0] = true
  pos = 1
  while pos < n
    if !arr[pos]
      prime = pos + 1
      yield prime
      (prime + prime..n).step(prime).each { |x| arr[x-1] = true }
    end
    pos += 1
  end
end

n = ARGV.empty? ? 100 : ARGV[0].to_i
sieve(n) { |x| puts(x) }

$ crystal build --release version1.cr
$ time ./version1 1000000
...
999961
999979
999983

real    0m20.583s
user    0m20.264s
sys 0m0.280s

Replacing the line

(prime + prime..n).step(prime).each { |x| arr[x-1] = true }

by an explicit loop resulted in a significant speedup (about 50x):

require "bit_array"

def sieve(n : Int32)
  return if n < 2

  arr = BitArray.new(n)
  arr[0] = true
  pos = 1
  while pos < n
    if !arr[pos]
      prime = pos + 1
      yield prime
      i = prime + prime
      while i < n
        arr[i-1] = true
        i += prime
      end
    end
    pos += 1
  end
end

n = ARGV.empty? ? 100 : ARGV[0].to_i
sieve(n) { |x| puts(x) }

$ crystal build --release version2.cr
$ time ./version2 1000000
...
999961
999979
999983

real    0m0.392s
user    0m0.065s
sys 0m0.327s

With bigger input sizes, the effect is more dramatic. With 1000000 (10 times more), the fast version takes about 10x longer (3.2 seconds), while the slow version took 80x longer (27m 46s). One aspect that I noticed was that it became a lot faster once the range became empty, so I assume the performance bug is when it steps through a non-empty range.

I'm using Arch Linux:

$ crystal -v
Crystal 0.32.1 (2019-12-18)

LLVM: 9.0.0
Default target: x86_64-pc-linux-gnu

Source

philipp-classen

👀1

All 5 comments

(start..end).step(size).each creates an iterator, and iterates using that, which is likely where the performance slowdown is coming from. start.step(to: end, by: size) do |i| is the idiomatic way to do this for numbers.

The optimizer is LLVM, we don't have much if any control over that.

RX14 on 21 Dec 2019

Also that Iterator is a class so it allocates memory, which is significantly slower than not allocating memory.

I really don't think there's anything to do here or to continue discussing. Just don't use iterators in tight loops.

asterite on 21 Dec 2019

Also that Iterator is a class so it allocates memory

It also doesn't use any Int-specific optimizations.

Perhaps we could optimize this cause, but Int#step is the right way to do this.

RX14 on 21 Dec 2019

👍1

Just tried this one:

(prime + prime).step(to: n, by: prime) { |x| arr[x-1] = true }

The performance is identical to the handwritten while loop.

philipp-classen on 21 Dec 2019

❤1

Can we close this? I'm not sure we're gonna improve this further

RX14 on 21 Dec 2019

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Allow type annotation + initialization `@var : Type = value`

oprypin · 3Comments

Inline Assembly in Crystal

nabeelomer · 3Comments

how do i pass a function type to a function

lgphp · 3Comments

Rewrite sigfault.c in Crystal

asterite · 3Comments

Can't use Time in NamedTuple with .to_h conversion

relonger · 3Comments