Julia: parallel RNG

Created on 3 Jul 2011 · 23Comments · Source: JuliaLang/julia

How should parallel RNG be addressed? Codes that do parallel RNG:

This is an MPI code that does not seem to be maintained:
http://sprng.cs.fsu.edu/

A derivative of Mersenne Twister:
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html

At the very least,when randomize() is called, it should include the processor's number along with time to select a different seed on every processor.

RNG parallel

Source

ViralBShah

Most helpful comment

How current threading and RNG work may not be ideal in terms of reproducibility because Random.seed!(seed) is not enough to control all thread-local RNG states:

using Random

Random.seed!(1)

Threads.@threads for _ in 1:Threads.nthreads()
    @show Threads.threadid(), rand()
end

Users need to do something like the following to initialize all RNGs.

using Future

Random.seed!(1)
for i in 2:Threads.nthreads()
    Random.THREAD_RNGs[i] = Future.randjump(Random.THREAD_RNGs[i - 1], big(10)^20)
end

I wonder if it makes sense to invoke above code with seed! on GLOBAL_RNG by default. It should make code using static scheduling reproducible. On the other hand, maybe it does not make sense as Julia uses dynamic scheduling, so that most of the code would loose reproducibility when relying on thread-local state anyway?

tkf on 29 Jul 2019

❤4 👍1

All 23 comments

I don't fully understand what needs to be done here. By default, processors start with different seeds. To seed each process you can do @bcast srand(seeds[myid()]).

JeffBezanson on 13 Jul 2011

I think that will work for an initial cut. The real issue is making sure that entire distribute simulations/computations are deterministically reproducible from a single initial seed value. That means that the computation can't depend on race conditions regarding which processor gets to some point first, or even how many processors there are. For now we can definitely punt on this.

StefanKarpinski on 13 Jul 2011

I guess we need a hook for when processors are added so you can do custom initialization on the machine. And if we do checkpointing we need to save the random state.

JeffBezanson on 13 Jul 2011

Isn't it good enough that they read from /dev/urandom for now?

-viral

On Jul 13, 2011, at 11:37 AM, JeffBezanson wrote:

I don't fully understand what needs to be done here. By default, processors start with different seeds. To seed each process you can do @bcast srand(seeds[myid()]).

Reply to this email directly or view it on GitHub:
https://github.com/JuliaLang/julia/issues/94#issuecomment-1560380

ViralBShah on 13 Jul 2011

The hard part isn't getting distributed simulations to be random, it's getting them to be repeatable. Using /dev/urandom helps making things really random, but helps repeatability not at all. Let's worry about this when we have an actual use case. I have done simulations like this in the past, so the problem makes sense to me, but I don't necessarily know what a good solution is.

StefanKarpinski on 13 Jul 2011

Well you can make them repeatable by just saving the seed. Doesn't seem to be difficult.

A related issue is that if you have a bunch of virtual machines being booted up for your computation, is it likely that their entropy pools are similar?

ViralBShah on 16 Jul 2011

We already seed the RNG from /dev/urandom or system time. Hence, this is not really a major issue for now.

ViralBShah on 8 Jan 2012

This is not at all a resolved issue. The problem with parallel RNG is _not_ making all the machines behave differently, but to make pseudorandom behavior across many machines exactly repeatable. This won't get addressed until we have an actual example to see why it's a problem, but we shouldn't close it because it will be an issue at some point.

StefanKarpinski on 8 Jan 2012

You can set a different seed on each machine for repeatable behavior, which is what you have to do in the 1p case as well. How is this specific to parallel rng?

-viral

On 09-Jan-2012, at 2:11 AM, Stefan [email protected] wrote:

This is not at all a resolved issue. The problem with parallel RNG is _not_ making all the machines behave differently, but to make pseudorandom behavior across many machines exactly repeatable. This won't get addressed until we have an actual example to see why it's a problem, but we shouldn't close it because it will be an issue at some point.

Reply to this email directly or view it on GitHub:
https://github.com/JuliaLang/julia/issues/94#issuecomment-3404425

ViralBShah on 8 Jan 2012

Because in a complex distributed computation the same exact work won't always happen on the same machines. Trust me, this is a more complicated issue. If you must close it, fine, but we will encounter this problem at some point. I'll pose a simple example: let's say we want to do a distributed parallel Buffon's needle that can be repeatably run on any number of machines. How do you do that?

StefanKarpinski on 8 Jan 2012

I agree, that when you change the number of nodes, it is a problem. But, for the same number of nodes, there is no issue. My only point was that for the immediate purposes of this issue, it was resolved. I am quite sure that even SPRNG and others do not offer this feature.

I suggest that we create a new issue when we run into this problem, or reopen this one then.

-viral

On Jan 9, 2012, at 2:52 AM, Stefan Karpinski wrote:

Because in a complex distributed computation the same exact work won't always happen on the same machines. Trust me, this is a more complicated issue. If you must close it, fine, but we will encounter this problem at some point. I'll pose a simple example: let's say we want to do a distributed parallel Buffon's needle that can be repeatably run on any number of machines. How do you do that?

Reply to this email directly or view it on GitHub:
https://github.com/JuliaLang/julia/issues/94#issuecomment-3404749

ViralBShah on 9 Jan 2012

Ok, here's a proposition. The user calls a routine called sdrand (for set distributed random seeds). The argument to it is the maximum number of processors that the user ever expects to use - say 100 or 1000 or whatever. The routine then seeds the RNG uniformly within the RNG's interval for that many processors. Now, if the user uses up to the specified number of processors, reproducible results are possible.

-viral

On Jan 9, 2012, at 2:52 AM, Stefan Karpinski wrote:

Because in a complex distributed computation the same exact work won't always happen on the same machines. Trust me, this is a more complicated issue. If you must close it, fine, but we will encounter this problem at some point. I'll pose a simple example: let's say we want to do a distributed parallel Buffon's needle that can be repeatably run on any number of machines. How do you do that?

Reply to this email directly or view it on GitHub:
https://github.com/JuliaLang/julia/issues/94#issuecomment-3404749

ViralBShah on 9 Jan 2012

This doesn't work because if the data distribution is different, the same
calls to the rng will be on different nodes and therefore change the
results. Why can't we just leave this issue open until we find ourselves in
a position to tackle the issue — hopefully with a real-world use-case to
drive it.

On Sun, Jan 8, 2012 at 10:07 PM, Viral B. Shah <
[email protected]

wrote:

Ok, here's a proposition. The user calls a routine called sdrand (for set
distributed random seeds). The argument to it is the maximum number of
processors that the user ever expects to use - say 100 or 1000 or whatever.
The routine then seeds the RNG uniformly within the RNG's interval for that
many processors. Now, if the user uses up to the specified number of
processors, reproducible results are possible.

-viral

On Jan 9, 2012, at 2:52 AM, Stefan Karpinski wrote:

Because in a complex distributed computation the same exact work won't
always happen on the same machines. Trust me, this is a more complicated
issue. If you must close it, fine, but we will encounter this problem at
some point. I'll pose a simple example: let's say we want to do a
distributed parallel Buffon's needle that can be repeatably run on any
number of machines. How do you do that?

Reply to this email directly or view it on GitHub:
https://github.com/JuliaLang/julia/issues/94#issuecomment-3404749

Reply to this email directly or view it on GitHub:
https://github.com/JuliaLang/julia/issues/94#issuecomment-3406809

StefanKarpinski on 9 Jan 2012

I think that just using different seeds for each node (even if reproducible) might not be enough, as the streams in different workers might get into step. Parallel RNG has been addressed in the "parallel" package in R. This is a link to a pdf (see p.5)

http://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf

rdiaz02 on 28 Apr 2012

This could be useful:

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/JUMP/dsfmt-jump.html

ViralBShah on 23 Sep 2012

Yeah, we should definitely include that. Very cool.

StefanKarpinski on 23 Sep 2012

I wrote dSFMTjump package which works with MersenneTwister type.

wildart on 28 Jul 2015

I have been wanting to write that translation for a while. I would love to have this in Base, since it is a tiny amount of code and greatly useful in a number of situations. Could you submit a PR?

ViralBShah on 28 Jul 2015

Sure, I'd like to see this in Base as well.

wildart on 28 Jul 2015

Seems to be addressed by #12498.

jakebolewski on 11 Aug 2015

I think it is too early to close this one. The jump function is one step, but we should have a complete setup for parallel rng in place before we close here.

andreasnoack on 12 Aug 2015

How current threading and RNG work may not be ideal in terms of reproducibility because Random.seed!(seed) is not enough to control all thread-local RNG states:

using Random

Random.seed!(1)

Threads.@threads for _ in 1:Threads.nthreads()
    @show Threads.threadid(), rand()
end

Users need to do something like the following to initialize all RNGs.

using Future

Random.seed!(1)
for i in 2:Threads.nthreads()
    Random.THREAD_RNGs[i] = Future.randjump(Random.THREAD_RNGs[i - 1], big(10)^20)
end

tkf on 29 Jul 2019

❤4 👍1

We explicitly stated that we did not want seed! to do that by default as it conflicts with numerous other design goals. We'd perhaps consider accepting PRs that added it as a helper function though, but no promises. Note that the original issue was about Distributed (e.g. processes not threads), though any helper function now should probably take both into account.

Closing as "won't fix," though, as there is, at least, no changes planned to the default behavior.