Crystal: `Array#join` is slower in Crystal than in Ruby

Created on 1 Sep 2018  路  6Comments  路  Source: crystal-lang/crystal

Ubuntu 16.04.1
1GB RAM (Free memory 400MB+)
On VPS machine

Each test is ran 3 times:

ruby -v
Ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu]

crystal -v
Crystal 0.26.1 [391785249] (2018-08-27)
LLVM: 4.0.0
Default target: x86_64-unknown-linux-gnu

$ crystal run --release 1.rb
00:00:00.033265433

$ ruby 1.rb
0.015399671000011494

1.rb

require "benchmark"

cols = 10
rows = 1000
data = Array.new(rows) { Array.new(cols) { "x"*1000 } }

time = Benchmark.realtime do
  csv = data.map { |row| row.join(",") }.join("\n")
end

puts time

I'm surprise if an unopitimize code does affect Crystal to run slower than Ruby?
Is it due to LLVM 4.0 on Ubuntu being too old where LLVM 6.0 on macOS Homebrew is newer?
Tested latest Crystal 0.26.1 on macOS Mojave is definitely slower than Ruby.2.5 as other have reported.

performance stdlib

Most helpful comment

@DestyNova Super strange, that shouldn't produce a difference.

@proyb6 You are totally right, Crystal is performing worse than Ruby here. At first I thought it was because maybe Ruby's memory allocator was faster than Crystal. Then I got curious and checked how join is implemented. I was sure join was defined in Enumerable, but just in case I checked whether its defined in Array too. And it is! I wondered why...

I turns out there's a possible optimization to do if the array consists entirely of strings. In the general case, join will use String.build and successively append the elements and the separators, reallocating memory if needed. But if all the elements are strings (and we convert the separator to a string if it's not a string already), we can compute the total memory needed for the final string: separator.bytesize * (array.size - 1) + array.sum(&.bytesize).

I will send a PR with this optimization soon.

@proyb6 Thank you so much for reporting this! These are the kind of problems that I enjoy most, and it brings performance improvements to all programs out there :-)

All 6 comments

Have you tried compiling it and running it separately? I'm not really sure why this would make a difference, but it produces completely different results for me.
With cols = 100 and rows = 10000:

$ ruby wat.cr
22.24634099297691
$ crystal run --release wat.cr
00:00:23.799129445

$ crystal build --release wat.cr

$ ./wat
00:00:07.604624759

@DestyNova Super strange, that shouldn't produce a difference.

@proyb6 You are totally right, Crystal is performing worse than Ruby here. At first I thought it was because maybe Ruby's memory allocator was faster than Crystal. Then I got curious and checked how join is implemented. I was sure join was defined in Enumerable, but just in case I checked whether its defined in Array too. And it is! I wondered why...

I turns out there's a possible optimization to do if the array consists entirely of strings. In the general case, join will use String.build and successively append the elements and the separators, reallocating memory if needed. But if all the elements are strings (and we convert the separator to a string if it's not a string already), we can compute the total memory needed for the final string: separator.bytesize * (array.size - 1) + array.sum(&.bytesize).

I will send a PR with this optimization soon.

@proyb6 Thank you so much for reporting this! These are the kind of problems that I enjoy most, and it brings performance improvements to all programs out there :-)

@DestyNova I have compiled to build and have the time timing.

@asterite No problem! It's helpful if we can compare the benchmark based from the source code in Ruby Performance Optimization:
https://pragprog.com/book/adrpo/ruby-performance-optimization

Assuming Rubyists and newbies will evaluate the results from those source, I hope we get better and reliable!

I always keep adding 鉂わ笍 to Ruby, it's incredible all the tweaks, optimizations and great thoughts that there are in the entire codebase. All of this usually goes un-noticed...

@asterite

Super strange, that shouldn't produce a difference.

Yeah it does seem counterintuitive. It might be because my laptop only has 8 GB of RAM and that's nearly all used by a couple of Elixir programs and Firefox and Chromium. Perhaps the compiler uses enough RAM that the system starts using swap memory, and a few seconds are needed to go back to normal... if I add a sleep 10 to the program before the benchmark starts, it performs better (but still not as fast as running the executable created by crystal build --release).

Interestingly, if I add GC.disable to the beginning of the program, then both crystal build --release wat.cr && ./wat and crystal run --release wat.cr show similar performance.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

grosser picture grosser  路  3Comments

nabeelomer picture nabeelomer  路  3Comments

RX14 picture RX14  路  3Comments

ArthurZ picture ArthurZ  路  3Comments

costajob picture costajob  路  3Comments