Turing.jl: Completely eliminating `Real` from the model and pre-allocating

Created on 29 Jan 2019 · 24Comments · Source: TuringLang/Turing.jl

In this issue, I discuss a performance improvement that can be made after #660 goes in. Consider the following model:

@model vdemo() = begin
    x = Vector{Real}(undef, N)
    for i = 1:N
        x[i] ~ Normal(0, sqrt(4))
    end
end

There are 2 things in this model that can be improved:

Every time this model is called, the vector x is allocated. This is unnecessary in the case where the length of the vector does not change, and the memory is not modified elsewhere.
The eltype of x is not a concrete type.

As I explained in #660, in that PR, the need for Real inside TypedVarInfo was eliminated by creating a new TypedVarInfo of the correct autodiff eltypes before passing it to the model. However, #660 doesn't do anything to the Real in the model definition above which is passed by the user. This means that a call to the model is still not fully type stable which limits the speedup achievable.

My proposal for solving the 2 problems above is to have a new macro @init, such that the model definition can optionally be:

@model vdemo() = begin
    @init x = Vector{Real}(undef, N)
    for i = 1:N
        x[i] ~ Normal(0, sqrt(4))
    end
end

If the user does the above, what this signals to us is that:

The LHS is a parameter.
The length of x is constant for all the runs.
This line only needs to run in the initial model call with UntypedVarInfo input.
In the later runs, we can simply define x as x = vi.vis.x.vals, which will be a vector of the correct and concrete eltype.

The above lets us:

Allocate the vector only once when UntypedVarInfo is passed, and
Piggyback on the TypedVarInfo workaround to eliminate the use of Real inside the model entirely.

This can all be done at no run-time overhead since we can branch on the type of the input vi, which will be compiled away. Notice that this will be an optional feature, so it will be a non-breaking change, just a pure optimization feature. Feedback is appreciated.

Source

mohamed82008

Most helpful comment

The same technique above can be used to specialize Array types to TArray when a particle sampler is used. So we can have a very generic definition of the model that works correctly for HMC, particle and every other sampler!

mohamed82008 on 20 Jun 2019

🎉3

All 24 comments

Seems like a nice-to-have for the performance sensitive Turing user. Assuming this is built in, are there ways we can determine that x is fixed _without_ the @info macro, to get an occasional performance improvement?

cpfiffer on 29 Jan 2019

I don't understand the question @cpfiffer. What do you mean by fixed? This PR will mostly apply to Hamiltonian variables which IIUC need to be of fixed length. Variables sampled with non-Hamiltonian algorithms can be easily concretely typed since they only admit floats or integers. It is also currently possible to pre-allocate the parameter vectors by simply closing over the vector which can be defined before the model. Something like:

x = Vector{Float64}(undef, 10)
@model ....
    x[1] ~ Normal()
end

So really the main value from the improvement proposed in this issue is that Real will be replaced with the concrete type in all but the first sampling iteration.

Note: it might be possible to relax the fixed length constraint, but I need to play around with it to see.

mohamed82008 on 29 Jan 2019

👍1

Nice idea. As @init is mainly for dealing with the unstable Real for AD types, how about just rename it as @diff or something, and just let users write @diff x = Vector(undef, N) (i.e. further removing the Real in the syntax). I feel by doing so this syntax is more likely to be used by people and also make the modelling syntax easy to read (since differentiable parameters are explictedly declared)

xukai92 on 29 Jan 2019

I am open to syntax suggestions, but I also like the one word change that magically give us more performance. Also, even though pre-allocating is possible now, I think @init or the likes will make it more usable, since it requires no changes to the model except a single word. The @xyz approach also doesn't litter the including scope with names of pre-allocated parameters, so it will be more self-contained.

mohamed82008 on 29 Jan 2019

I don't understand the question @cpfiffer. What do you mean by fixed? This PR will mostly apply to Hamiltonian variables which IIUC need to be of fixed length.

I meant, in the case where we have a model like

@model vdemo() = begin
    x = Vector{Real}(undef, N)
    for i = 1:N
        x[i] ~ Normal(0, sqrt(4))
    end
end

Is there a way, given that @info (or @diff or something) is implemented, we can say that x is always of type Float64 and simply impose the same functionality as what is provided by @info.

It seems that it might be a nice thing if the model could identify places where the @info macro would help, and either just throw the functionality in or inform the user that it may be worthwhile to add it themselves.

cpfiffer on 29 Jan 2019

I am open to syntax suggestions, but I also like the one word change that magically give us more performance. Also, even though pre-allocating is possible now, I think @init or the likes will make it more usable, since it requires no changes to the model except a single word. The @xyz approach also doesn't litter the including scope with names of pre-allocated parameters, so it will be more self-contained.

It would be nice if we can do everything with just @model : ) Of course, this would need a deeper dive into Julia's AST.

cc @cscherrer

yebai on 29 Jan 2019

@yebai This may be possible if we can detect all expressions x = ... where x is a parameter, and replace it with:

x = vi isa TypedVarInfo && spl isa Sampler{<:Hamiltonian} && in(:x, getspace(spl)) ? vi.vis.x.vals : ...

But I am in favor of making this an opt-in behavior with a macro, with a clear documentation of the assumptions and consequences of using the macro.

Doing the above by default for the Hamiltonian is safe. Other algorithms which support variable length parameters may be less safe. In the latter case, the user may start with an empty vector and push to it during the sampling. If such a thing is supported (I remember seeing it somewhere) then the automatic transformation will be limited to Hamiltonian samplers. With the @init approach however, the user can use this trick for variables of any sampler if they satisfy the macro's contract.

@cpfiffer Any model where the whole vector is allocated once and never extended is a candidate for pre-allocation. Real is only essential now for the case of Hamiltonian variables to admit AD number types. Even if the idea proposed here is implemented, Real will still be used in the first iteration, then we can reduce it to Float64, Float32, etc in the second iteration onwards. I don't think the fixed length property will be easy to detect automatically. So I think documentation is the best way to educate people about such a feature if introduced.

mohamed82008 on 29 Jan 2019

@cpfiffer What we can do is have a debug mode that does extra checks like eltype(copy.(x)) === eltype(x) for all variables x which are not Hamiltonian variables, printing a warning message saying that x is abstractly typed and can be tightened further or something along those lines. For Hamiltonian variables, we can print the warning if @init is not used which we can detect by checking if vi isa TypedVarInfo && x !== vi.vis.x.vals. Note that the first check comes at extra runtime cost, so it should only be enabled by the user's request to "debug" the model.

mohamed82008 on 29 Jan 2019

How about the following?

 x = Parameters(Vector{Float64}(undef, N)) # detect `= Parameters()` in model macro

yebai on 30 Jan 2019

@mohamed82008

Every time this model is called, the vector x is allocated. This is unnecessary in the case where the length of the vector does not change, and the memory is not modified elsewhere.

These optimisations are not Turing specific and can be applied to all Julia programs. In general, I think we should refrain from optimising the code within Turing's compiler. This should be done at a language level (i.e. Julia), instead of a DSL level. So, I would suggest that we

encourage the user to take full advantage of Julia's power to write efficient functions.
talk to the JuliaLang team if some specific optimisation is beneficial

The eltype of x is not a concrete type.

I think this can be solved by allowing parametric types in model definitions, a similar way to parametric types in normal Julia functions? For example:

@model vdemo(;valType::T=1.0) where {T<:Real} = begin
    x = Vector{T}(undef, N)
    for i = 1:N
        x[i] ~ Normal(0, sqrt(4))
    end
end

# when Dual type is required
vdemo(valType=DualType(1.0))

# when Float type is required
vdemo(valType=1.0)

This can be automated by the @model macro to ensure easy use.

yebai on 15 Jun 2019

I like this direction of thinking. The allocation problem can be eliminated if the user uses StaticArrays.MArray instead of Base.Array, so I am not too worried about that. The type parameter solution is neat. Alternatively, T can be simply just passed in as a hyperparameter. I suggest we have another field in Model for hyperparameters much like data and params.

So the Turing user can define

@model vdemo(;T = Float64) = begin
    x = Vector{T}(undef, N)
    for i = 1:N
        x[i] ~ Normal(0, sqrt(4))
    end
end

where model = vdemo() will have a field model.hyper that is a named tuple of all the kwargs. The hyperparameters that are not types will be just unpacked inside the inner function. While the ones that are types and <: AbstractFloat will be passed to another function get_number_type(sampler, T). get_number_type will get the AD version of T for samplers requiring AD, otherwise it will return T. The AD number type can be known at compile time, if and only if the unpacking if statements and get_number_type are resolvable at compile time. Since we encode the AD backend in the HMC sampler type, this should be possible.

So the inner function of the model can look like:

function inner_function(sampler, vi, model)
    if model.hyper.T isa Type{<:AbstractFloat}
        T = get_number_type(sampler, model.hyper.T)
    else
        T = model.hyper.T
    end
    # The if statement above is repeated for all kwargs
    x = Vector{T}(undef, N)
    ....
end

Comments?

mohamed82008 on 16 Jun 2019

Both approaches look good to me. I think the type parameter approach is easier to implement and maintain, while passing T as a hyperparameter might be more flexible.

yebai on 16 Jun 2019

I have a preference for the type parameter method. It's a bit more Julian. It would also make the link between the types in the model more obvious for modelers who might have tried to allocate Vector{Float64}, which is always a bit of a confusing error for people to get.

cpfiffer on 16 Jun 2019

I think there are 2 distinct decisions to be made:

Do we want to pass numbers or types?
Do we want to pass them as positional or keyword arguments?

I don't think we should pass them as positional arguments not be confused with data that are typically associated with a line of ~ and become parameters when missing. Now as a kwarg, I don't see the merit of passing a value which doesn't matter since we are only interested in its type. So the type passing is more direct in my opinion. And if we are having the user pass the type directly, there is no point in having a type parameter as it just adds unnecessary complexity to the @model definition.

But how will any of these approaches affect the compiler? I think either way we have to operate on 2 levels:

Model construction through the outer function defined with @model, and
Model calling through the inner function wrapped in Turing.Model.

In the construction, we need to take T as input from the user through a value or directly. This can be Float64 or Float32 for example. But then this type needs to be specialized in the inner function when calling the Turing.Model with different samplers as input argument. Some samplers can use the T<:AbstractFloat directly if they require no AD and others require a transformed T. In fact, we can go one step further, and have the user pass TV <: AbstractArray{<:AbstractFloat} which is the type of vector parameter. This can then be specialized to use TrackedArray for example as opposed to Array{TrackedReal}. If we consider the array case, it makes even less sense to me to have the user pass a dummy array that we are only interested in its type.

If we decide to go the positional arguments way, we can have a type parameter approach that looks like this:

@model vdemo(::Type{T} = Float64) where {T} = begin
    x = Vector{T}(undef, N)
    for i = 1:N
        x[i] ~ Normal(0, sqrt(4))
    end
end

That's also possible but it adds some extra parsing complexity not to be confused with data variables that become parameters when missing. So as the most likely maintainer of this part of the code, I am not in favor of this option! We would also still need to operate on the 2 levels mentioned above, so that didn't go anywhere.

So that was my thought process when I recommended the new syntax. Comments?

mohamed82008 on 17 Jun 2019

On second thought, treating T as data or a positional argument could also work smoothly.

mohamed82008 on 17 Jun 2019

mohamed82008 on 20 Jun 2019

🎉3

I agree that passing types is sufficient here, but if we want to support parameter initialization (see https://github.com/TuringLang/Turing.jl/issues/571) and address missing values ( https://github.com/TuringLang/Turing.jl/issues/571), then passing value together with types provide a unified solution.

@model vdemo(
    w :: TV # model or hyper parameters
    x = Vector{TP}(undef, N) # uninitialised parameter if `TP<:Turing.ParameterType`
    z = Vector{TP}(Vector([1. 0. 1. 1.])) # initialised parameter when `TP<:Turing.ParameterType`
    m = Vector{TM}(undef, N) # missing values if `TM<:Turing.MissingValueType`
    y = Vector{TD}(Vector(zeros(4))) # trate y as data if `TD<:Turing.DataType`
) where {TP, TD, TM, TV} = begin
    # model block
    d = Vector{TP}(undef, N) # additional uninitialised parameter can be defined inside model def
end

This is slightly different from our current syntax, where parameters and data are specified implicitly. However, I suspect the above explicit syntax is beneficial for the longer term. In summary, this design uses Julia type system and multi-dispatching to handle the following issues:

sampling state initialization https://github.com/TuringLang/Turing.jl/issues/571
specify which variable contains missing data https://github.com/TuringLang/Turing.jl/issues/786 https://github.com/TuringLang/Turing.jl/issues/544 https://github.com/TuringLang/Turing.jl/issues/792
specify which variable contains data
eliminate the use of Real https://github.com/TuringLang/Turing.jl/issues/665
pre-allocate parameter arrays (implicitly via default values) https://github.com/TuringLang/Turing.jl/issues/665
specialize Array types to TArray when a particle sampler is used https://github.com/TuringLang/Turing.jl/issues/718#issuecomment-471795616
allow more sanity checking in the compiler (e.g. ~ macro can make type assertion of LHS)
improve consistency between Turing.Model and Distributions.Distribution.
a better resemblance with Stan's parameter, data blocks.

This is a relatively large chunk of changes and can be implemented gradually in a concerted effort.

Comments?

yebai on 20 Jun 2019

cc'ing @trappmartin @scheidan @cpfiffer @xukai92 @marcoct @willtebbutt

yebai on 20 Jun 2019

🎉1

This is a massively breaking change that will totally change the way people use Turing. It will allow more features, but it will require a significant amount of work to shake off all the edge cases. So before JuliaCon, I suggest we go with the previous proposal for the performant release, then we can work on this proposal, or its modified version after criticism, for the new features it brings. For initializing HMC, we can allow the user to define kwargs with the same name as the parameter variable. When this is done, we can treat this as the initial value of this parameter.

Now, the first thing I would change in your proposal is to have ParameterType as a wrapper of the vector of values as opposed to having it as the number type. This will make it easier to embed Turing models in larger programs since there will be no need for conversion by copying or reinterpreting, only doing Turing.Parameter(rand(2)) which is a light-wrapper over the values.

Secondly, making users explicitly and compulsorily specify the parameter variables in the model will make our models more clunky than they are now, and perhaps for the majority of use-cases this is a step down the usability ladder. We can think about which of these features can be made optional to bridge the gap between the current Turing modeling API and the one this proposal brings.

I prefer the separation of data and parameters for readability, so perhaps one as args and one as kwargs? I find it easier to read the model when I can see a ~ line, forget if this is a parameter or data variable, look up at the args of the model figure it out in an instant, then continue reading. This may be just me though.

I will ponder some more over this and let you know if I have more comments. This is probably befitting of a different issue to avoid mixing the discussions of this proposal with the discussion of the quick performance fix proposed earlier, that also happens to solve the TArray problem.

mohamed82008 on 21 Jun 2019

So before JuliaCon, I suggest we go with the previous proposal for the performant release, then we can work on this proposal, or its modified version after criticism, for the new features it brings.

Sorry for throwing so many ideas into this issue. Yes, we can implement these ideas separately and can focus on performance issues before JuliaCon. However, I thought it's useful to have some bigger picture in mind when addressing performance issues.

Secondly, making users explicitly and compulsorily specify the parameter variables in the model will make our models more clunky than they are now, and perhaps for the majority of use-cases this is a step down the usability ladder. We can think about which of these features can be made optional to bridge the gap between the current Turing modeling API and the one this proposal brings.

I didn't mean to introduce breaking changes to current Turing syntax. We can allow the user to define parameters and data in an implicit way, like before. But, say if the user wants to write more efficient programs, or want to initialise parameters in a certain way, or want to use missing values, then they have to use the syntax described above.

I prefer the separation of data and parameters for readability, so perhaps one as args and one as kwargs? I find it easier to read the model when I can see a ~ line, forget if this is a parameter or data variable, look up at the args of the model figure it out in an instant, then continue reading. This may be just me though.

We can encourage the user to place data as kwargs but probably shouldn't make this a rule. Since by doing that, the user would need to re-write the model when the user wants to treat certain parameters as data or vice versa.

yebai on 21 Jun 2019

@mohamed82008
With a second thought, the following design you proposed in https://github.com/TuringLang/Turing.jl/issues/665#issuecomment-502538150 might be a nice solution to the performance issue.

@model vdemo(::Type{T} = Float64) where {T} = begin
    x = Vector{T}(undef, N)
    for i = 1:N
        x[i] ~ Normal(0, sqrt(4))
    end
end

This keeps the convention that data should be in the parentheses but not model parameters. This design also requires minimal changes to the compiler. The ideas that I described in https://github.com/TuringLang/Turing.jl/issues/665#issuecomment-504124166 might be helpful for the long run. But it does require serious re-work of the compiler, which I now think needs more discussions before putting them into action. Let's do that through a group hackathon after JuliaCon.

yebai on 24 Jun 2019

👍2

@yebai all the proposals above require similar amounts of work on the compiler. The choice of syntax is irrelevant to the need to operate on the 2 levels mentioned above in https://github.com/TuringLang/Turing.jl/issues/665#issuecomment-502538150.

mohamed82008 on 24 Jun 2019

It is not too hard though, I think I can finish it in time. The PR should hopefully be up in 1-2 weeks.

mohamed82008 on 24 Jun 2019

I’m in favour of the last syntax proposal.

trappmartin on 24 Jun 2019

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Improve Turing website - turing.ml

yebai · 6Comments

Incorrect log evidence estimate

marcoct · 6Comments

Parameter values initialization does not speed up NUTS

ClaudMor · 4Comments

Program bug is silent for coroutines

xukai92 · 3Comments

Compiler performance

mohamed82008 · 4Comments