Turing.jl: Improving loading and first execution times

Created on 26 Dec 2020  路  6Comments  路  Source: TuringLang/Turing.jl

I am opening this issue in response to a discussion we had on Slack regarding loading and execution times in Turing. Although there have been improvements in absolute loading time in newer versions of Julia, Turing remains about 2-3 times slower than Plots.

Loading Times

Julia 1.5.3

@time using Plots
  6.639685 seconds (12.61 M allocations: 797.037 MiB, 4.42% gc time)
@time using Turing
 16.071072 seconds (40.04 M allocations: 2.060 GiB, 4.11% gc time)

The absolute load times are better on nightly, but Turing is still relatively slow.

Nightly

@time using Plots
  2.841848 seconds (5.80 M allocations: 429.191 MiB, 5.07% gc time, 25.36% compilation time)
  @time using Turing
  9.306305 seconds (16.28 M allocations: 988.606 MiB, 3.80% gc time, 54.65% compilation time)

First Execution Time

Here are typical first execution times for the coinflip model described in the documentation.

Julia 1.5.3

@time chain = sample(coinflip(data), HMC(系, 蟿), iterations; progress=false)
 10.453679 seconds (42.26 M allocations: 1.876 GiB, 5.53% gc time)
Chains MCMC chain (1000脳10脳1 Array{Float64,3}):

Nightly

@time chain = sample(coinflip(data), HMC(系, 蟿), iterations; progress=false)
 18.583285 seconds (74.80 M allocations: 4.053 GiB, 6.13% gc time, 93.57% compilation time)

Nearly all of the first execution time is attributable to compilation. It should be noted that the improvement in loading time on nightly is offset by the increased first execution time. There is also a lag of about 6-7 seconds for printing the first chain, but this seems to have improved quite a bit in recent versions.

Reducing these times would be greatly appreciated, even if it is only by 2-3 seconds.

Versions

(@v1.5) pkg> st Turing Plots
Status `~/.julia/environments/v1.5/Project.toml`
  [91a5bcdd] Plots v1.9.1
  [fce5fe82] Turing v0.15.7

All 6 comments

I agree, this should definitely be improved. However, it's a bit difficult to comment without any more detailed benchmarks and without knowing what parts cause these excessive compilation times. I assume a large part is caused by DistributionsAD and its hacks and type piracy, it would be interesting to see if it causes invalidations (maybe also some other parts cause them?). Can you check the compilation and loading times of DistributionsAD?

IIRC in the Slack discussion it was mentioned that dropping DynamicHMC and a more lightweight MLE/MAP implementation would allow to drop Requires. That is actually not true since it is used as well to optionally load AD support for Zygote and ReverseDiff. The optimization algorithm is supposed to be changed from Optim to GalacticOptim to make it more generally applicable but I am not sure how much this will help - we will have to depend at least on DiffEqBase which also depends on quite many packages (there is an issue in DiffEqBase to fix this and/or extract some parts to a more lightweight package: https://github.com/SciML/DiffEqBase.jl/issues/618). And even if Requires could be removed from Turing, other packages such as DistributionsAD still make heavy use of it...

Regardless of the loading times, maybe it would be good to move some additional parts (such as DynamicHMC integration) to separate packages to make the setup more modular.

Hi devmotion-

Here are the requested load times for DistributionsAD:

Julia 1.5.3

```
@time using DistributionsAD
2.845870 seconds (5.87 M allocations: 326.353 MiB, 1.62% gc time)

  Nightly:

@time using DistributionsAD
2.632951 seconds (5.31 M allocations: 325.930 MiB, 2.58% gc time, 66.49% compilation time)
```

The improvement for nightly is small. I am not sure if that is expected. Let me know if there is any more info I can provide. Thanks for looking into this.

Thanks!

I think it would be even more realistic to check how much time it takes to load DistributionsAD + Tracker (+ ForwardDiff) since Turing loads both AD backends but DistributionsAD only depends on ForwardDiff and puts Tracker support in a @requires block. IIRC Requires mostly affects loading times if the optional dependency is loaded.

Is this what you are looking for?

Julia 1.5.3

@time using DistributionsAD, Tracker, ForwardDiff
  9.621758 seconds (22.21 M allocations: 1.217 GiB, 3.77% gc time)

Nightly

@time using DistributionsAD, ForwardDiff, Tracker
  6.568642 seconds (14.62 M allocations: 916.992 MiB, 5.09% gc time, 59.66% compilation time)

Yes, so it seems most time is spent with loading and compiling DistributionsAD and the AD backends which can't be fixed in Turing itself.

I had a quick look at how to analyze method invalidations but I am not sure how to interpret and fix the results yet. On Julia nightly I ran

julia> ] add SnoopCompileCore SnoopCompile Tracker ForwardDiff DistributionsAD#master

julia> using SnoopCompileCore

julia> invalidations = @snoopr begin
           using DistributionsAD, ForwardDiff, Tracker
       end
1701-element Vector{Any}:
...

julia> using SnoopCompile

julia> length(uinvalidated(invalidations))
510

julia> trees = invalidation_trees(invalidations)
17-element Vector{SnoopCompile.MethodInvalidations}:
...

I.e., as far as I can see loading DistributionsAD, ForwardDiff, and Tracker causes 510 method invalidations. I'll try to debug this more closely. BTW unfortunately it was not possible to perform the same analysis with Turing since Libtask_jll does not support Julia >= 1.6.

Tim Holy's tutorial on precompilation might be helpful. In case it is, you can find it here.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

mohamed82008 picture mohamed82008  路  3Comments

xukai92 picture xukai92  路  3Comments

hessammehr picture hessammehr  路  4Comments

willtebbutt picture willtebbutt  路  4Comments

mohamed82008 picture mohamed82008  路  4Comments