Excerpt from slack discussions
@goedman
Not sure how often this channel is used but I wonder if Turing has a way to retrieve the model portion passed into the
@model ...macro. Before actually running the sampler I would like to compute the maximum a posterior (as is done in McElreath鈥檚 book StatisticalRethinking, I would use e.g. Optim.jl).
@mohamed82008
using Turing
@model gdemo(x, y) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
x ~ Normal(m, sqrt(s))
y ~ Normal(m, sqrt(s))
return s, m
end
x = 1.5
y = 2.0
model = gdemo(x, y)
vi = Turing.VarInfo()
model(vi, Turing.SampleFromPrior())function nlogp(sm)
vi.vals .= sm
model(vi, Turing.SampleFromPrior())
-vi.logp
end
using Optim
sm_0 = Float64.(vi.vals)
lb = [0.0, -Inf]
ub = [Inf, Inf]
result = optimize(nlogp, lb, ub, sm_0, Fminbox())
Related: #421 #72
@cpfiffer could you add some doc (perhaps under internals section) on this topic?
Hi @yebai I hesitated when I asked the question to switch to Slack. If you prefer that folks ask these questions using Github issues, please let me know.
@goedman No worries, either Slack / GitHub works! I moved this discussion here because it's something we want to solve in the near future.
I'll add a little section about this behavior.
I have typed up something very quick, which mostly just liberally copies @mohamed82008's great little hack. Once I send up a pull request I'll probably stick it at the bottom of the Advanced section. Any comments to add in here?
Maximum a Posteriori Estimation
Turing does not currently have built-in methods for calculating the maximum a posteriori (MAP) for a model. This is a goal for Turing's implementation, but for the moment, we present here a method for estimating the MAP using Optim.jl.
using Turing # Define the simple gdemo model. @model gdemo(x, y) = begin s ~ InverseGamma(2,3) m ~ Normal(0,sqrt(s)) x ~ Normal(m, sqrt(s)) y ~ Normal(m, sqrt(s)) return s, m end # Define our data points. x = 1.5 y = 2.0 # Set up the model call, sample from the prior. model = gdemo(x, y) vi = Turing.VarInfo() model(vi, Turing.SampleFromPrior()) # Define a function to optimize. function nlogp(sm) vi.vals .= sm model(vi, Turing.SampleFromPrior()) -vi.logp end # Import Optim.jl. using Optim # Create a starting point, call the optimizer. sm_0 = Float64.(vi.vals) lb = [0.0, -Inf] ub = [Inf, Inf] result = optimize(nlogp, lb, ub, sm_0, Fminbox())
Thank you Cameron ( @cpfiffer ).
I have been experimenting a bit with creating a function for the lower part. Would you mind taking a quick peek as a sanity check?
using Turing, Optim
function maximum_a_posteriori(model, lower_bound, upper_bound)
vi = Turing.VarInfo()
model(vi, Turing.SampleFromPrior())
# Define a function to optimize.
function nlogp(sm)
vi.vals .= sm
model(vi, Turing.SampleFromPrior())
-vi.logp
end
start_value = Float64.(vi.vals)
optimize((v)->nlogp(v), lower_bound, upper_bound, start_value, Fminbox())
end
# Define the simple gdemo model.
@model gdemo(x, y) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
x ~ Normal(m, sqrt(s))
y ~ Normal(m, sqrt(s))
return s, m
end
# Define our data points.
x = 1.5
y = 2.0
# Set search bounds
lb = [0.0, -Inf]
ub = [Inf, Inf]
model = gdemo(x, y)
result = maximum_a_posteriori(model, lb, ub)
println()
result.minimizer
which returns:
julia> include("/Users/rob/.julia/dev/StatisticalRethinking/chapters/02/map/map05.jl")
loaded
[ Info: Assume - `s` is a parameter
[ Info: Assume - `m` is a parameter
[ Info: Observe - `x` is an observation
[ Info: Observe - `y` is an observation
2-element Array{Float64,1}:
0.9074074076939405
1.1666666666666472
I assume the "loaded" comes from the Turing compiler?
Looks sane to me! The loaded line I believe comes from Julia when it loads some package.
Looks like this was removed from the docs. Has MAP been added to Turing yet? Cool to see how easy it is to implement with Optim though.
Did you use it? I can easily put it back into the TuringModels.jl repo. Having CmdStan, Mamba, DynamicHMC and Turing all in a single project became simply too slow to work on. Let me know.
I typically use a log likelihood function + Optim. But getting it as rock-solid as in quap() in rethinking is tricky w.r.t. the initial values, in e.g. 04/clip-24-29s.jl and 10/m10.02d1.
In fact, I should put it back!
Strange. I seem to recall having put the MAP guide into the docs, but the history shows it never actually made it in. I'll add it (back?) to the docs section, at least until we have a more formal implementation for MAP estimation.
Hey @goedman, I hadn't used it before. I've been going through McElreath's book with R and thought it would be fun to reproduce it in Julia. Just stumbled across this issue after some searching. I think the DynamicHMC examples are really interesting to see.
Hi @joshualeond, thank you.
I too have been very impressed by the performance and how to formulate the models in DynamicHMC. In fact I've started to use the logpdf formulations more often in CmdStan. At least one model in DynamicHMC do still run into (AD?) problems (m10.4d.jl is not correct).
I can't wait until (maybe early next year?) there are 2 additional (DynamicHMC and Turing) pure Julia options available for MCMC!
I did add the maximum_a_posterior example back in m2.1t.jl and in the master docs for TuringModels. I hadn't realized Cameron had documented it as well.
Most helpful comment
I'll add a little section about this behavior.