Turing.jl: Add example illustrating how to run multiple chains.

Created on 25 Feb 2019  路  13Comments  路  Source: TuringLang/Turing.jl

We currently don't have an example that runs multiple chains. However, I think most examples should actually do this. Maybe we could change the tutorials in this respect?

Running multiple chains is currently possible using:

chains = mapreduce(c -> sample(model_fun, sampler), vcat, 1:num_chains)

which will create a Turing.Chains object that contains samples for all chains.

doc

All 13 comments

I feel like we should instead just add a keyword argument (or something) to sample where sample(model, sampler, chains=3) returns that number of chains. Is that a poor idea? I suspect we'd have to touch a lot of samplers due to the places we have sample defined.

Ideally, we can share the same sample function implementation for all samplers. We can probably do this a bit later, for example after we finish the refactoring in https://github.com/TuringLang/Turing.jl/issues/689

I'll document this behavior for the moment until we get something more permanent.

I noticed that #582 was closed. Does this plan include running the chains in parallel?

Currently this is not possible.

  1. You cannot set a random seed for Hamiltonian based samplers for now. However, this will very soon be possible and you will be able to sample chains in parallel if only Hamiltonian based samplers are used.

  2. The Distributions package does not implement all rand functions in Julia but used R as a backend. Therefore, it is not possible to seed the random number generator for all distributions making parallel particle based sampling difficult at the moment.

This will show up in the guide shortly. I'll be revisiting some of the tutorials soon after #695 goes through, and I'll add in this functionality to better show how to run multiple chains.

Why do mapreduce and reduce...pmap respond differently to @everywhere?
Although both approaches generate multiple chains only the first one uses parallel processes:

chns = reduce(chainscat, pmap(x->sample(model,sampler), 1:num_chains))
(from https://github.com/TuringLang/Turing.jl/pull/710/commits/bdf5d8cf52970896fb236d9c29e3a5e22e01d8c0)

chains = mapreduce(c -> sample(model, sampler), chainscat, 1:num_chains)
(from docs)

I'm not sure what you're asking, exactly. pmap (docs) is a mapping function that uses whatever worker processes are available, while mapreduce (docs) is explicitly serial.

The second case is slightly more numerically accurate, since I believe there's some cases where random numbers are not independent when sampling from Distributions. Not sure if that's been fixed yet.

@cpfiffer Thank you for your fast reply and explanation. I'm new to Julia and thought I'm making an error when working with the @everywhere macro. So, pmapstems from the Distrubuted-package and thus works with all workers where mapreduce is base-Julia and hence serial.

Yes, that's correct --- we're hoping soon to make it unnecessary to worry about mapreduce and pmap, but at the moment it's a bit clumsy.

Both functions are from Julia Base. pmap is from the Distributed module of Base. You can read more on this in the Julia documentation.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

skanskan picture skanskan  路  5Comments

fredcallaway picture fredcallaway  路  5Comments

yebai picture yebai  路  6Comments

mohamed82008 picture mohamed82008  路  4Comments

trappmartin picture trappmartin  路  3Comments