I have scanned through the consume-produce pattern use in Turing.jl and I noticed the following:
Trace aka Particle is nothing but a model wrapper that calls it with inputs vi and spl when "consumed".ParticleContainer is a collection of Particles and their results when consumed, i.e. the model function is run once for each particle. When a ParticleContainer is consumed, it iterates over its Particles and consumes them one by one.consume-produce pattern is used.Given the above, can't we just make ParticleContainer an iterator and instead of calling
https://github.com/TuringLang/Turing.jl/blob/6e6197a6afe7e03ea91847f50804cea362736bee/src/samplers/pgibbs.jl#L70,
we can do:
for p in particles
vi = p.vi
....
end
Notice that the above for loop will call iterate(::ParticleContainer, state) which we can define to do what consume https://github.com/TuringLang/Turing.jl/blob/master/src/core/container.jl#L76 currently does returning p which holds the result of the model call in p.vi. state can hold all the state variables used inside consume(::ParticleContainer) now. More conveniently, we can also define consume as a ResumableFunction and still call it with the for loop syntax above. Did I miss anything?
I like it. The control flow is a little more palatable via an iterator. What would a common use for the resumeable functionality look like? Just consume(particles), which consumes the next particle?
A ResumableFunction is just a more fancy iterator that lets you have a more complicated logic in the iterator, e.g. multiple yielding points and break conditions, which makes it more natural to think in terms of sometimes. A common use case for me is if you are iterating over an unknown number of elements, e.g. traversing a recursive structure.
In our case, we actually know the length of the iterator, there is no recursion, and there is only one point of "yielding" in the function, so a ResumableFunction may be overkill, a simple iterator would be just as easy. However, I need to think of how this will play with Libtask and test it. According to @yebai, Libtask is necessary to use, and I think it may be usable alongside the iterator idea we have here. I will play with this a bit in a couple of weeks.
The ParticleContainer is the main entry point of any Particle based sampler Turing. So whatever changes we do on it, should be done carefully.
That said, are we expecting significant performance improvements when changing the ParticleContainer as you suggested? If so could you elaborate why?
More comments follow.
That said, are we expecting significant performance improvements when changing the ParticleContainer as you suggested? If so could you elaborate why?
Oh well, it is mostly a hunch. Tasks involve expensive context switching, whereas an iterator is a very basic component of Julia optimized by Keno in the new compiler so I expect it to be fast. Benchmarks will prove me right or wrong.
The ParticleContainer is the main entry point of any Particle based sampler Turing. So whatever changes we do on it, should be done carefully.
I will try to be careful , and we can do many tests :)
Also if I remember correctly, this was the reason why the Channel-based PyGen was abandoned by its author in favor of the iterator-based ResumableFunctions.
Currently, we require LibTask for Particle based samplers. Which means that each Particle is a Task that has to be resumable and should be possible to be duplicated at its current state. Meaning that the ParticleContainer needs to use the producer-consumer pattern for now.
After reading my post I think we should add more tests as you suggested. ;)
I鈥檒l try to explain the concept in more details on a github page which we can use to discuss how it works and what would be alternatives.
Let鈥檚 not focus on this issue for now.
Relevant but yet to find the time to read it: http://proceedings.mlr.press/v32/paige14.pdf
As far as I remember, the current implementation of Turing is based on this paper + some further extensions to make it more flexible.
Ok, after reading the paper, I see what you mean. I still think an iterator can be used alongside LibTask to simplify the code a little. Let's discuss this further in a PR.
I should manage to write the promised GitHub page during next week. Let鈥檚 discuss alternative approaches there once I have it done.
Given the above, can't we just make ParticleContainer an iterator and instead of calling
https://github.com/TuringLang/Turing.jl/blob/6e6197a6afe7e03ea91847f50804cea362736bee/src/samplers/pgibbs.jl#L70,
This is a good idea. We should try to do it.
More conveniently, we can also define
consumeas aResumableFunctionand still call it with the for loop syntax above. Did I miss anything?
Libtask basically implements delimited continuations (see wiki and Continuations.jl) in Julia. ResumableFunction is mostly created for finite state machines (FSM), since certain use-cases of task-copying can not be efficiently represented as FSMs, I'm not sure whether ResumableFunction is flexible/general enough to replace Libtask.