Turing.jl: Compiler performance

Created on 13 Sep 2018  路  4Comments  路  Source: TuringLang/Turing.jl

I just finished reading the amusingly complicated and very dynamic compiler code. I am sure a lot of thought has gone into the current compiler design but I have a few concerns and questions. So I am opening this issue to start a discussion on ways to improve the performance of Turing starting from its core, the compiler.

So firstly I am going to explain how I think the compiler is doing things. The compiler works as follows:

  • The DSL is used to define a model where all the positional arguments of the model are assumed to be data.

  • The macro expands into an "outer" function definition, with all the model's positional arguments as the outer function's positional arguments and with 2 additional keyword arguments: data::Dict and compiler.

  • The data can be passed to the outer function using the positional arguments or using the data Dict keyword argument. If passed through both, the data one will be used. All passed positional arguments will be added to the data Dict. If a model's positional argument is not passed as a positional argument to the outer function or through the data keyword argument, then the function will error.

  • The outer function also defines an "inner" function which accepts vi::VarInfo and smpl::Union{Nothing, Sampler} as inputs.

  • This "inner" function is really defined globally with some fallback methods also defined globally.

  • Additionally, a global _compiler_ variable is defined to store information about the outer and inner functions. This information includes:
    1) The outer function name, also the model name,
    2) The inner function body,
    3) The fallback method bodies,
    4) The dvars, which are all the positional arguments of the model that are used on the LHS of a ~, and
    5) The pvars, which are all the variables used on the LHS of ~ that are not positional arguments of the model,

  • The inner function body defines all keys of data as variables at the beginning.

  • Since it is possible to redefine a model with the same name and overwrite the global "inner" function and fallback methods, when this "inner" function is called in the sample function, the type unstable invokelatest has to be used. So one cannot simply do inner_function_name(vi, smpl).

  • The ~ macro is unwrapped inside the inner function body. The dvars and pvars are also determined inside the body of the inner function. This is one place where the global _compiler_ variable is used to share the vector of dvars and pvars.

  • The global _compiler_ variable is also used inside the sample functions to make sure that the space of the sampler is a subset of the pvars of the model.

So assuming my understanding above is correct, here are a few things that don't make sense to me and I think can be improved.

  • I don't see a need to define inner function globally. One can simply use the function handle returned by the outer function and pass that to the sampler. I don't think this inner function is used elsewhere in the code base. This will avoid the use of invokelatest inside the sample functions so they can become fully type stable.

  • I don't see a reason why the @~ macro exists and is then unwrapped inside the inner function as opposed to just unwrapping the whole thing as part of the @model macro. This will remove the use of the global _compiler_ in this place.

  • I think the use of _compiler_ in the sample functions can also be eliminated if pvars are made available as part of the inner function. For instance, one can define a callable struct as such:

struct CallableModel{F<:Function}
    f::F
    pvars::Vector{Symbol}
end
(c::CallableModel)(args...; kwargs...) = c.f(args...; kwargs...)

And instead of returning the inner function f from the outer function, we can return CallableModel(f, compiler[:pvars]). Then inside the sample function, we can replace Turing._compiler_[:pvars] with model.pvars. So with this and the previous solution, the use of _compiler can be eliminated entirely.

  • I think instead of defining the keys of the data Dict as variables in the inner body and evaling it in global scope, we can add the data field to the CallableModel struct and use that directly inside the inner function.

I hope these questions and ideas are useful! Did I miss something?

discussion refactoring speed-up

Most helpful comment

Yes, I took a very brief look at that PR, but I think it doesn't tackle the main performance issues in relation to defining global variables, defining the inner function using eval and the other things mentioned above. So I think most of the points here are relevant even after #513 lands. I can do an experimental Compiler 2.0 after you finish your PR and see how much speed we can gain by just applying the changes proposed above, I reckon a lot. Currently, both the inner function and the sample functions are not fully type stable with global variables, so we are very far from top Julia speeds.

All 4 comments

I'm currently working on a major refactoring of the compiler, see PR #513.

I had some similar ideas on how to improve the current design but they would go too far for the current refactoring. Therefore, I think we should have further discussions after #513 is finished and merged into the master. The refactored compiler allows us to do changes and unit tests more easily.

Yes, I took a very brief look at that PR, but I think it doesn't tackle the main performance issues in relation to defining global variables, defining the inner function using eval and the other things mentioned above. So I think most of the points here are relevant even after #513 lands. I can do an experimental Compiler 2.0 after you finish your PR and see how much speed we can gain by just applying the changes proposed above, I reckon a lot. Currently, both the inner function and the sample functions are not fully type stable with global variables, so we are very far from top Julia speeds.

I completely agree!

@mohamed82008 I completely agree regarding the CallableModel struct and using something like it to remove the need to have the global _compiler variable. Will make more comments later :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

trappmartin picture trappmartin  路  3Comments

mohamed82008 picture mohamed82008  路  3Comments

marcoct picture marcoct  路  6Comments

trappmartin picture trappmartin  路  6Comments

krishvishal picture krishvishal  路  6Comments