We currently don't have an example that runs multiple chains. However, I think most examples should actually do this. Maybe we could change the tutorials in this respect?
Running multiple chains is currently possible using:
chains = mapreduce(c -> sample(model_fun, sampler), vcat, 1:num_chains)
which will create a Turing.Chains object that contains samples for all chains.
I feel like we should instead just add a keyword argument (or something) to sample where sample(model, sampler, chains=3) returns that number of chains. Is that a poor idea? I suspect we'd have to touch a lot of samplers due to the places we have sample defined.
Ideally, we can share the same sample function implementation for all samplers. We can probably do this a bit later, for example after we finish the refactoring in https://github.com/TuringLang/Turing.jl/issues/689
I'll document this behavior for the moment until we get something more permanent.
I noticed that #582 was closed. Does this plan include running the chains in parallel?
Currently this is not possible.
You cannot set a random seed for Hamiltonian based samplers for now. However, this will very soon be possible and you will be able to sample chains in parallel if only Hamiltonian based samplers are used.
The Distributions package does not implement all rand functions in Julia but used R as a backend. Therefore, it is not possible to seed the random number generator for all distributions making parallel particle based sampling difficult at the moment.
This will show up in the guide shortly. I'll be revisiting some of the tutorials soon after #695 goes through, and I'll add in this functionality to better show how to run multiple chains.
Related issue: https://github.com/TuringLang/Turing.jl/issues/65
Why do mapreduce and reduce...pmap respond differently to @everywhere?
Although both approaches generate multiple chains only the first one uses parallel processes:
chns = reduce(chainscat, pmap(x->sample(model,sampler), 1:num_chains))
(from https://github.com/TuringLang/Turing.jl/pull/710/commits/bdf5d8cf52970896fb236d9c29e3a5e22e01d8c0)
chains = mapreduce(c -> sample(model, sampler), chainscat, 1:num_chains)
(from docs)
I'm not sure what you're asking, exactly. pmap (docs) is a mapping function that uses whatever worker processes are available, while mapreduce (docs) is explicitly serial.
The second case is slightly more numerically accurate, since I believe there's some cases where random numbers are not independent when sampling from Distributions. Not sure if that's been fixed yet.
@cpfiffer Thank you for your fast reply and explanation. I'm new to Julia and thought I'm making an error when working with the @everywhere macro. So, pmapstems from the Distrubuted-package and thus works with all workers where mapreduce is base-Julia and hence serial.
Yes, that's correct --- we're hoping soon to make it unnecessary to worry about mapreduce and pmap, but at the moment it's a bit clumsy.
Both functions are from Julia Base. pmap is from the Distributed module of Base. You can read more on this in the Julia documentation.
Closed in favour of https://github.com/TuringLang/Turing.jl/issues/746