Crystal: Is Crystal slower than Ruby in generating randoms?

Created on 6 Jul 2018 · 40Comments · Source: crystal-lang/crystal

Hey there. I've just got a question...

Is Crystal slower than Ruby in generating randoms?

While playing around with Crystal I noticed that Ruby is quite faster than Crystal in generating randoms.

I wrote a Crystal script that performs 10 million randoms (hex, base64, rand and random_bytes) and measures the times.

I did the same in Ruby and found that the Ruby script is significantly faster than Crystal.

Is this expected? Are Crystal randoms more secure and thus more expensive?

| Generator | Amount | Time in Crystal | Time in Ruby |
|:--------------- |-------:|----------------:|-------------:|
| hex | 10M | ~ 47.4s | ~ 8.8s |
| base64 | 10M | ~ 52.1s | ~ 10.6s |
| rand | 10M | ~ 43.8s | ~ 6.1s |
| random_bytes | 10M | ~ 47.8s | ~ 5.8s |

(In Crystal I used Random, in Ruby I used SecureRandom)

$ crystal -v
Crystal 0.25.1 (2018-06-29)

LLVM: 5.0.2
Default target: x86_64-apple-macosx

$ ruby -v
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]

Environment

macOS High Sierra 10.13.5 (17F77)
1.4 GHz Intel Core i7
8 GB 1867 MHz LPDDR3

(This description and all scripts are available at https://github.com/lxxxvi/ruby-vs-crystal-random-performance )

Source

lxxxvi

Most helpful comment

Don't bother tweaking Random::Secure. It doesn't have to be fast, it has to be a CSPRNG, meant for cryptography and secure usages such as generating secret keys, or seeding a PRNG. Free performance is great, but it musn't impact the cryptography quality. It's _already_ the best we can achieve today, secure & performance wise —except for FreeBSD 12 where we could leverage getrandom(2).

If you need _high quality_ random numbers, for example to generate UUIDs, and never expose the numbers (or don't care that the seed can be reversed), you can use the default rand (PCG32) or preferably another PRNG with more state (e.g. xoshiro256*) if you generate lots of them. They're *much faster than Random::Secure, but they're unsuitable for cryptography and secure usages.

ysbaddaden on 17 Nov 2019

👍5

All 40 comments

You must use --release to benchmark anything, this is especially true for CPU-heavy computations.
You must compare what's comparable. For example the global rand function or SecureRandom (Ruby) vs Random::Secure (Crystal).

ysbaddaden on 6 Jul 2018

did you compile it with --release? for me 0.25.1 is 10-20% faster than ruby.

kostya on 6 Jul 2018

Comparing what's comparable, namely the default random and the secure random source (/dev/urandom):

# ruby
require "benchmark"
require "securerandom"

N = 1_000_000
RANGE = 0...999_999

Benchmark.bm do |x|
  x.report("rand()") { N.times { rand(RANGE) } }
  x.report("SecureRandom.rand") { N.times { SecureRandom.rand(RANGE) } }
  x.report("SecureRandom.random_bytes") { N.times { SecureRandom.random_bytes(16) } }
end

# crystal
require "benchmark"
require "random/secure"

N = 1_000_000
RANGE = 0...999_999

Benchmark.bm do |x|
  x.report("rand()") { N.times { rand(RANGE) } }
  x.report("Random::Secure.rand") { N.times { Random::Secure.rand(RANGE) } }
  x.report("Random::Secure.random_bytes") { N.times { Random::Secure.random_bytes(16) } }
end

No comment:

```console
$ ruby bench.rb
user system total real
rand() 0.134299 0.000000 0.134299 ( 0.134408)
SecureRandom.rand 1.159312 2.414552 3.573864 ( 3.580610)
SecureRandom.random_bytes 0.782822 3.070608 3.853430 ( 3.860538)

$ crystal run --release bench.cr
user system total real
rand() 0.010000 0.000000 0.010000 ( 0.012934)
Random::Secure.rand 0.130000 1.200000 1.330000 ( 1.337170)
Random::Secure.random_bytes 0.190000 1.990000 2.180000 ( 2.128027)

ysbaddaden on 6 Jul 2018

@ysbaddaden Thanks for helping me comparing it properly, I appreciate that.

I ran your benchs, but used N = 10_000_000 and got different results.

$ ruby bench.rb

                                  user     system      total        real
rand()                        1.485554   0.004912   1.490466 (  1.497553)
SecureRandom.rand             6.764457   0.008698   6.773155 (  6.783643)
SecureRandom.random_bytes     4.735023   0.004706   4.739729 (  4.745405)


$ crystal run --release bench.cr

                                  user      system       total         real
rand()                        0.050000    0.000000    0.050000 (   0.053843)
Random::Secure.rand           5.330000    8.670000   14.000000 (  14.022774)
Random::Secure.random_bytes   5.900000   18.360000   24.260000 (  24.136340)

What could I be doing wrong now?

lxxxvi on 6 Jul 2018

@lxxxvi What's your OS? I'm on OSX and my results are:

# ruby
                               user     system      total        real
rand()                     1.860000   0.000000   1.860000 (  1.875924)
SecureRandom.rand         15.350000   0.030000  15.380000 ( 15.447070)
SecureRandom.random_bytes 14.920000   0.040000  14.960000 ( 15.049611)

# crystal
                                  user     system      total        real
rand()                        0.090000   0.000000   0.090000 (  0.089923)
Random::Secure.rand           5.400000   9.760000   15.160000 (  15.251411)
Random::Secure.random_bytes   6.340000   21.350000   27.690000 (  27.635482)

So on OSX Crystal is definitely slower than Ruby, at least for random_bytes.

asterite on 6 Jul 2018

Hmm, is there some sort of precision cutoff going on for the crystal values? The values seem suspiciously round.

yxhuvud on 6 Jul 2018

@yxhuvud Indeed!

It seems Ruby 2.5.1 uses libc getrusage instead of times. Before that, Ruby used times and it gave the same bad precision. Maybe we should use getrusage, but it seems it's not always defined.

asterite on 6 Jul 2018

@lxxxvi Also, what version of Ruby are you using? It seems 2.5.1 now uses urandom if available, previoulsy it used openssl... I'm installing 2.5.1 right now to run the benchmark again.

asterite on 6 Jul 2018

I used the very slow random_bytes(Int) which allocates a 16 bytes slice on the HEAP on each call, which will trigger the GC at some point, because there is no Ruby equivalent to Crystal's recommendation to fill a Slice (HEAP or stack allocated):

N = 10_000_000
bytes = Bytes.new(16)
N.times { Random::Secure.random_bytes(bytes) }

$ crystal run --release bench.cr
                                  user     system      total        real
Random::Secure.random_bytes   1.060000   19.240000   20.300000 (  20.316772)

(can't compare with Ruby anymore)

ysbaddaden on 6 Jul 2018

@asterite

I use(d) these versions (see the issue's description)

$ crystal -v
Crystal 0.25.1 (2018-06-29)

LLVM: 5.0.2
Default target: x86_64-apple-macosx

$ ruby -v
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-darwin17]

Environment

macOS High Sierra 10.13.5 (17F77)
1.4 GHz Intel Core i7
8 GB 1867 MHz LPDDR3

lxxxvi on 6 Jul 2018

It seems Ruby is using arc4random_buf, which might be faster than reading from urandom:

https://github.com/ruby/ruby/blob/trunk/random.c#L472-L479

So maybe that's the reason Ruby is faster?

/cc @ysbaddaden

asterite on 6 Jul 2018

👍1

(I wonder how can we detect if a C function is present, when compiling... otherwise I don't know how can we conditionally use arc4random_buf if it's available)

asterite on 6 Jul 2018

Doing this in Crystal:

lib LibC
  fun arc4random_buf(buf : UInt8*, bytes : SizeT)
end

time = Time.now
10_000_000.times do
  bytes = Bytes.new(16)
  LibC.arc4random_buf(bytes, 16)
end
puts Time.now - time

Takes 0.24 seconds. Assuming Ruby uses that (I think so), doing it in Ruby:

require "securerandom"

time = Time.now
10_000_000.times do
  SecureRandom.random_bytes(16)
end
puts Time.now - time

Takes 13.8 seconds.

So here's our chance to optimize :-)

By the way, arc4random_buf seems to be cryptographically secure, according to the man:

These functions use a cryptographic pseudo-random number generator to generate high quality random bytes very quickly. One data pool is used for all consumers in a process, so that consumption under program flow can act as additional stirring. The subsystem is re-seeded from the kernel random number subsystem on a regular basis, and also upon fork(2).

asterite on 6 Jul 2018

👍2

@asterite That looks promising 👍

What could be the reason that Random::Secure.rand isn't faster?

Looking at your benchmark results, the times for SecureRandom.rand vs Random::Secure.rand is quite interesting too... Crystal is faster in user but slower in system, overall it's about as fast (or slow) as Ruby.

lxxxvi on 6 Jul 2018

@lxxxvi Random::Secure.rand also uses urandom in Crystal

asterite on 6 Jul 2018

@asterite - arc4random and friends were originally built by the folks at OpenBSD. Its designed to be a easier to use, better, and faster source of cryptographically secure random. MacOS has had it for ages though they moved away from rc4 and use yarrow internally (OpenBSD also moved from rc4 to ChaCHa internally). The code for this should already be in the stdlib as the OpenBSD Random::Secure (implementation) already uses it.

Adding:

{% elsif flag?(:darwin) %}
  require "./unix/arc4random"

to this and changing a few other things in lib c should cover it.

I would also like to see arc4random_uniform like methods be added to the stdlib as its free of modulo bias an easy mistake often made. Perhaps when I have a bit more time I will open a PR.

arc4random for anyone thats interested:

DaRaadt HackFest 2014 Slides
OpenBSD Man Pages
How not to generate random numbers (Ted Unangst intentionally signs his own certs rolls eyes)

chris-huxtable on 6 Jul 2018

👍5

Well, arc4random is only deemed secure on OpenBSD (because ChaCha, and slow). See libsodium sysrandom, which I trust the most.

Different systems have different secure sources. Linux 4.x added the getrandom syscall, which may be faster than going through /dev/urandom for example (already implemented in Crystal).

Anyway:

A secure random source doesn't have to be fast, but of cryptography quality (hence slow);
In most use cases, non crytographic, a good PRNG (PCG, xoshiro**, siphash, ...) seeded from Random::Secure is enough, and will boost performances.

Of course if /dev/urandom and arc4random are identical on macOS and the later is faster, I could revise, but I need proof of it, for all supported macOS releases :smile:

@chris-huxtable we already have uniformisation, for any PRNG. See https://github.com/crystal-lang/crystal/blob/master/src/random.cr

ysbaddaden on 7 Jul 2018

👍5

@ysbaddaden - Syscalls (arc4random,getrandom) also have the added benefit of working when in a chroot and when you run out of file descriptors (which can be done intentionally by an attacker).

chris-huxtable on 7 Jul 2018

Sure, but the main issue remains: is arc4random the same as urandom on all supported macOS versions ?

That being said, we only ever open a single fd to urandom.

ysbaddaden on 7 Jul 2018

@ysbaddaden - macOS has had arc4random for quite some time. Unfortunately the online man pages have been scrubbed with the Swift overhaul of the ADC. As such I have found nothing from Apple. From third parties one could deduce that arc4random has been on they system since macOS 10.6 (2009) and arc4random_buf since macOS 10.7 (2011). Additionally they now (since 10.12) use an AES cipher (according to man arc4random).

With regards to arc4random equivalence with /dev/urandom/. The man pages say:

The same random data is also available from getentropy(2). Using the getentropy(2) system call interface will provide resiliency to file descriptor exhaustion, chroot, or sandboxing which can make /dev/random unavailable. Additionally, the arc4random(3) API provides a fast userspace random number generator built on the random data source and is preferred over directly accessing the system's random device.
[...]
The random device implements the Yarrow pseudo random number generator algorithm and maintains its entropy pool. The kernel automatically seeds the algorithm with additional entropy during normal execution.

I would note that man getentropy says:

However, it should be noted that getentropy() is primarily intended for use in the construction and seeding of userspace PRNGs like arc4random(3) [...].

and man random says:

Applications requiring cryptographic quality randomness should use arc4random(3).

While this doesn't explicitly say it is based on how OpenBSD does it and how I remember reading something to effect a number of years ago (I could be mistaking the source/context), I think its a safe bet.

chris-huxtable on 7 Jul 2018

I spent too much time already on this issue. I don't place bets on a CSPRNG.

Maybe macOS 10.12+ is fine, but older releases are very bad.
Maybe we can drop support for macOS < 10.12, after all Apple only supports its latest shiny release.
Maybe a userspace CSPRNG is secure enough (or maybe not).

What's sure is that the "kernel automatically seeds the algorithm with additional entropy during normal execution" for the random device (i.e. /dev/urandom). What about arc4random? Is it seeded once or is its entropy regularly updated? Who knows? :cry:

ysbaddaden on 9 Jul 2018

According to source it is reseeded regularly, after each 6.4Mb of random data. It uses genentropy, but this basically is the same as /dev/urandom.

konovod on 9 Jul 2018

👍1

6.4MiB of random is a LOT of random

RX14 on 9 Jul 2018

@konovod sadly, the most recent version of the file is dated of 2008 and specifically refers to RC4 :sob:

ysbaddaden on 9 Jul 2018

👍1

https://opensource.apple.com
shows the list of OSX versions. Following the link to version you can find files that match corresponding version.
As for 2008 and RC4 - well, comments are outdated. The interesting part is under #if defined(__APPLE__) && !defined(VARIANT_STATIC) and it includes AES.

konovod on 9 Jul 2018

Is there any John Apple we can ask?

asterite on 9 Jul 2018

Apple running hideously old versions of open source code is sadly not uncommon :(

yxhuvud on 9 Jul 2018

👍1

I may be wrong here but, it also appears that it is reseeded after a fork.
source

in_arc4_fork_child:

if (rng_state != NULL) {
    arc4_count = 0;

in arc4random and arc4random_buf:

if (arc4_count <= 0) {
    arc4_stir();
}

chris-huxtable on 9 Jul 2018

@ysbaddaden - Unfortunately when Apple updated the ADC for Swift a number of years ago they killed off a ton of c and systems documentation. :(

chris-huxtable on 9 Jul 2018

For Crystal: N = 1_000_000
saleen@TESTAROSSA:/mnt/c/Users/MaraTom/Desktop$ crystal build --release bench.cr
saleen@TESTAROSSA:/mnt/c/Users/MaraTom/Desktop$ ./bench
user system total real
rand() 0.000000 0.000000 0.000000 ( 0.007522)
Random::Secure.rand 0.280000 1.280000 1.560000 ( 1.560843)
Random::Secure.random_bytes 0.360000 1.150000 1.510000 ( 1.515625)

For Ruby
saleen@TESTAROSSA:/mnt/c/Users/MaraTom/Desktop$ ruby bench.rb
user system total real
rand() 0.156250 0.000000 0.156250 ( 0.146042)
SecureRandom.rand 0.781250 1.265625 2.046875 ( 2.049006)
SecureRandom.random_bytes 0.484375 1.234375 1.718750 ( 1.733307)

N = 10_000_000
Ruby:
saleen@TESTAROSSA:/mnt/c/Users/MaraTom/Desktop$ ruby bench.rb
user system total real
rand() 1.375000 0.000000 1.375000 ( 1.374348)
SecureRandom.rand 7.031250 12.984375 20.015625 ( 20.036401)
SecureRandom.random_bytes 4.453125 12.796875 17.250000 ( 17.265022)

Crystal:
saleen@TESTAROSSA:/mnt/c/Users/MaraTom/Desktop$ crystal build --release bench.cr
saleen@TESTAROSSA:/mnt/c/Users/MaraTom/Desktop$ ./bench
user system total real
rand() 0.070000 0.000000 0.070000 ( 0.070488)
Random::Secure.rand 2.770000 12.530000 15.300000 ( 15.299145)
Random::Secure.random_bytes 2.890000 12.150000 15.040000 ( 15.056012)

Crystal seems to be the winner here.... it cut through the rand() like butter.

System: Intel I5, Windows 10, the tests were conducted using the WSL Linux Substem for both Ruby and Crystal.

98-f355-f1 on 17 Jul 2018

@98-f355-f1 yeah thats expected on linux systems, osx benchmark results will be slower

RX14 on 17 Jul 2018

All I get is SecureRandom.randtest.rb:10:inblock (3 levels) in

': private method rand' called for SecureRandom:Module (NoMethodError) :)

rdp on 15 Nov 2019

Closing. Security trumps performance on this topic.

Also, Ruby made changes, and:

restricted arc4random_buf(3) to a few BSD (OpenBSD, NetBSD, FreeBSD 12) — see https://github.com/ruby/ruby/commit/b120f5e38d9c9817a18723d8002665e6dd29f7a6#diff-a962031992b7b5934db41bf925b3487e;
leverages SecRandomCopyBytes on macOS 10.7+ which wraps /dev/random under the hood — see https://github.com/ruby/ruby/commit/68f9d7b444cf7f870f2163c00a491e224a2a77a2#diff-a962031992b7b5934db41bf925b3487e;
uses getrandom(2) when available —which may include FreeBSD 12?

Note that libsodium now uses getrandom(2) on FreeBSD 12.

ysbaddaden on 15 Nov 2019

👍4

for those following this issue if you are interested here are the 2 most active libsodium libraries that include multiple ChaCha algorithms and other cryspographic algorithms.

https://github.com/didactic-drunk/sodium.cr
https://github.com/watzon/nacl

wontruefree on 15 Nov 2019

👍1

Just poking around, this change seems to effect about a 2x speedup?

diff --git a/src/crystal/system/unix/urandom.cr b/src/crystal/system/unix/urandom.cr
index f2998b772..82e9688d4 100644
--- a/src/crystal/system/unix/urandom.cr
+++ b/src/crystal/system/unix/urandom.cr
@@ -11,7 +11,7 @@ module Crystal::System::Random
     return unless urandom.info.type.character_device?

     urandom.close_on_exec = true
-    urandom.read_buffering = false
+    urandom.read_buffering = true
     @@urandom = urandom
   end

benchs (OS X, git master)

$ ./bench.old 
Random::Secure
                                  user     system      total        real
rand()                        0.011046   0.000009   0.011055 (  0.011061)
Random::Secure.rand           0.272780   0.349765   0.622545 (  0.622883)
Random::Secure.random_bytes   0.266101   0.633775   0.899876 (  0.881925)
$ ./bench.new
Random::Secure
                                  user     system      total        real
rand()                        0.011446   0.000007   0.011453 (  0.011462)
Random::Secure.rand           0.020545   0.075565   0.096110 (  0.096123)
Random::Secure.random_bytes   0.021922   0.364581   0.386503 (  0.368568)

Though I don't understand fully what's going on...doesn't seem to affect startup time, can be synchronized with a mutex and doesn't really drop in speed. FWIW...

rdp on 15 Nov 2019

Please see other pull requests about why we don't buffer urandom.

ysbaddaden on 16 Nov 2019

👍1

A comment in the source code might have been nice...

asterite on 16 Nov 2019

👍4

For followers, here's the PR referred to: https://github.com/crystal-lang/crystal/pull/5848

Here's another attempt to speedup, by using OpenSSL's rand function (which I noticed in the ruby source). No speedup in linux, but OS X:

$ ./bench.urandom
                                  user     system      total        real
rand()                       0.019039   0.000025   0.019064 (  0.019074)
Random::Secure.rand          0.743042   1.436121   2.179163 (  2.203013)
Random::Secure.random_bytes  0.795708   3.055914   3.851622 (  3.904940)

$ ./bench.openssl
                                  user     system      total        real
rand()                        0.019009   0.000080   0.019089 (  0.019695)
Random::Secure.rand           0.202165   0.000737   0.202902 (  0.208178)
Random::Secure.random_bytes   0.517045   0.003730   0.520775 (  0.526210)

Implementation:

diff --git a/src/crystal/system/random.cr b/src/crystal/system/random.cr
index e03cc6ca6..08c9d00ac 100644
--- a/src/crystal/system/random.cr
+++ b/src/crystal/system/random.cr
@@ -15,7 +15,11 @@ end
 {% elsif flag?(:openbsd) %}
   require "./unix/arc4random"
 {% elsif flag?(:unix) %}
-  require "./unix/urandom"
+  {% if flag?(:without_openssl) %}
+    require "./unix/urandom"
+  {% else %}
+    require "./unix/openssl"
+  {% end %}
 {% elsif flag?(:win32) %}
   require "./win32/random"
 {% else %}
diff --git a/src/crystal/system/unix/openssl.cr b/src/crystal/system/unix/openssl.cr
new file mode 100644
index 000000000..547f8884b
--- /dev/null
+++ b/src/crystal/system/unix/openssl.cr
@@ -0,0 +1,18 @@
+require "openssl"
+
+module Crystal::System::Random
+
+  def self.random_bytes(buf : Bytes) : Nil
+      out = LibCrypto.rand_bytes(buf.to_unsafe.as(UInt8*), buf.size)
+      if out != 1
+        raise OpenSSL::Error.new("generating cryptographic rand_bytes failed")
+      end
+  end
+
+  def self.next_u : UInt8
+      buf = uninitialized UInt8[1]
+      random_bytes(buf.to_slice)
+      buf.unsafe_as(UInt8)
+ end
+
+end

Hard to exactly detect the startup penalty but it seems to be around 0.003s on my local box. FWIW...

And here's the long discussion of why ruby switched away from this kind of OpenSSL random default (blog links at top are interesting) https://bugs.ruby-lang.org/issues/9569 for followers...

rdp on 16 Nov 2019

No, please. There are _years_ of arguments against OpenSSL's rand function. Ruby finally changed to use proper secure sources, and only kept a fallback on OpenSSL that shall never happen in practice —that we didn't even bother to implement, on purpose.

ysbaddaden on 17 Nov 2019