Turing.jl: Error when running example on missing values

Created on 19 Nov 2020  路  30Comments  路  Source: TuringLang/Turing.jl

Question on: https://github.com/TuringLang/Turing.jl/edit/master/docs/_docs/using-turing/guide.md

Using the exact code from the docs:

@model function gdemo(x)
    s ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s))
    for i in eachindex(x)
        x[i] ~ Normal(m, sqrt(s))
    end
end

# x[1] is a parameter, but x[2] is an observation
model = gdemo([missing, 2.4])
c = sample(model, HMC(0.01, 5), 500)

I get this error:

BoundsError: attempt to access 0-element Array{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},1} at index [1]

Stacktrace:
 [1] getindex at ./array.jl:809 [inlined]
 [2] macro expansion at /home/brian/.julia/packages/DynamicPPL/YDJiG/src/varinfo.jl:705 [inlined]
 [3] _link!(::NamedTuple{(:s, :m, :x),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:s,Tuple{}},Int64},Array{InverseGamma{Float64},1},Array{DynamicPPL.VarName{:s,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:m,Tuple{}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:m,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}}, ::DynamicPPL.VarInfo{NamedTuple{(:s, :m, :x),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:s,Tuple{}},Int64},Array{InverseGamma{Float64},1},Array{DynamicPPL.VarName{:s,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:m,Tuple{}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:m,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}, ::NamedTuple{(:s, :m, :x),Tuple{Array{DynamicPPL.VarName{:s,Tuple{}},1},Array{DynamicPPL.VarName{:m,Tuple{}},1},Array{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},1}}}, ::Val{()}) at /home/brian/.julia/packages/DynamicPPL/YDJiG/src/varinfo.jl:699
 [4] link!(::DynamicPPL.VarInfo{NamedTuple{(:s, :m, :x),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:s,Tuple{}},Int64},Array{InverseGamma{Float64},1},Array{DynamicPPL.VarName{:s,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:m,Tuple{}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:m,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64}, ::DynamicPPL.Sampler{HMC{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.UnitEuclideanMetric},Turing.Inference.HMCState{DynamicPPL.VarInfo{NamedTuple{(:s, :m, :x),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:s,Tuple{}},Int64},Array{InverseGamma{Float64},1},Array{DynamicPPL.VarName{:s,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:m,Tuple{}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:m,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},AdvancedHMC.StaticTrajectory{AdvancedHMC.EndPointTS,AdvancedHMC.Leapfrog{Float64}},AdvancedHMC.Adaptation.NoAdaptation,AdvancedHMC.PhasePoint{Array{Float64,1},AdvancedHMC.DualValue{Float64,Array{Float64,1}}}}}) at /home/brian/.julia/packages/DynamicPPL/YDJiG/src/varinfo.jl:697
 [5] sample_init!(::Random._GLOBAL_RNG, ::DynamicPPL.Model{var"#64#65",(:x,),(),(),Tuple{Array{Union{Missing, Float64},1}},Tuple{}}, ::DynamicPPL.Sampler{HMC{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.UnitEuclideanMetric},Turing.Inference.HMCState{DynamicPPL.VarInfo{NamedTuple{(:s, :m, :x),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:s,Tuple{}},Int64},Array{InverseGamma{Float64},1},Array{DynamicPPL.VarName{:s,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:m,Tuple{}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:m,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},AdvancedHMC.StaticTrajectory{AdvancedHMC.EndPointTS,AdvancedHMC.Leapfrog{Float64}},AdvancedHMC.Adaptation.NoAdaptation,AdvancedHMC.PhasePoint{Array{Float64,1},AdvancedHMC.DualValue{Float64,Array{Float64,1}}}}}, ::Int64; verbose::Bool, resume_from::Nothing, init_theta::Nothing, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/brian/.julia/packages/Turing/yvxUH/src/inference/hmc.jl:137
 [6] sample_init! at /home/brian/.julia/packages/Turing/yvxUH/src/inference/hmc.jl:122 [inlined]
 [7] mcmcsample(::Random._GLOBAL_RNG, ::DynamicPPL.Model{var"#64#65",(:x,),(),(),Tuple{Array{Union{Missing, Float64},1}},Tuple{}}, ::DynamicPPL.Sampler{HMC{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.UnitEuclideanMetric},Turing.Inference.HMCState{DynamicPPL.VarInfo{NamedTuple{(:s, :m, :x),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:s,Tuple{}},Int64},Array{InverseGamma{Float64},1},Array{DynamicPPL.VarName{:s,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:m,Tuple{}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:m,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},AdvancedHMC.StaticTrajectory{AdvancedHMC.EndPointTS,AdvancedHMC.Leapfrog{Float64}},AdvancedHMC.Adaptation.NoAdaptation,AdvancedHMC.PhasePoint{Array{Float64,1},AdvancedHMC.DualValue{Float64,Array{Float64,1}}}}}, ::Int64; progress::Bool, progressname::String, callback::AbstractMCMC.var"#20#23", chain_type::Type{T} where T, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/brian/.julia/packages/AbstractMCMC/iOkTf/src/sample.jl:74
 [8] sample(::Random._GLOBAL_RNG, ::DynamicPPL.Model{var"#64#65",(:x,),(),(),Tuple{Array{Union{Missing, Float64},1}},Tuple{}}, ::DynamicPPL.Sampler{HMC{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.UnitEuclideanMetric},Turing.Inference.HMCState{DynamicPPL.VarInfo{NamedTuple{(:s, :m, :x),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:s,Tuple{}},Int64},Array{InverseGamma{Float64},1},Array{DynamicPPL.VarName{:s,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:m,Tuple{}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:m,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},AdvancedHMC.StaticTrajectory{AdvancedHMC.EndPointTS,AdvancedHMC.Leapfrog{Float64}},AdvancedHMC.Adaptation.NoAdaptation,AdvancedHMC.PhasePoint{Array{Float64,1},AdvancedHMC.DualValue{Float64,Array{Float64,1}}}}}, ::Int64; chain_type::Type{T} where T, resume_from::Nothing, progress::Bool, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/brian/.julia/packages/Turing/yvxUH/src/inference/Inference.jl:179
 [9] sample(::Random._GLOBAL_RNG, ::DynamicPPL.Model{var"#64#65",(:x,),(),(),Tuple{Array{Union{Missing, Float64},1}},Tuple{}}, ::DynamicPPL.Sampler{HMC{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.UnitEuclideanMetric},Turing.Inference.HMCState{DynamicPPL.VarInfo{NamedTuple{(:s, :m, :x),Tuple{DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:s,Tuple{}},Int64},Array{InverseGamma{Float64},1},Array{DynamicPPL.VarName{:s,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:m,Tuple{}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:m,Tuple{}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}},DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}}},Float64},AdvancedHMC.StaticTrajectory{AdvancedHMC.EndPointTS,AdvancedHMC.Leapfrog{Float64}},AdvancedHMC.Adaptation.NoAdaptation,AdvancedHMC.PhasePoint{Array{Float64,1},AdvancedHMC.DualValue{Float64,Array{Float64,1}}}}}, ::Int64) at /home/brian/.julia/packages/Turing/yvxUH/src/inference/Inference.jl:178
 [10] sample(::Random._GLOBAL_RNG, ::DynamicPPL.Model{var"#64#65",(:x,),(),(),Tuple{Array{Union{Missing, Float64},1}},Tuple{}}, ::HMC{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.UnitEuclideanMetric}, ::Int64; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/brian/.julia/packages/Turing/yvxUH/src/inference/Inference.jl:165
 [11] sample at /home/brian/.julia/packages/Turing/yvxUH/src/inference/Inference.jl:165 [inlined]
 [12] #sample#1 at /home/brian/.julia/packages/Turing/yvxUH/src/inference/Inference.jl:155 [inlined]
 [13] sample(::DynamicPPL.Model{var"#64#65",(:x,),(),(),Tuple{Array{Union{Missing, Float64},1}},Tuple{}}, ::HMC{Turing.Core.ForwardDiffAD{40},(),AdvancedHMC.UnitEuclideanMetric}, ::Int64) at /home/brian/.julia/packages/Turing/yvxUH/src/inference/Inference.jl:155
 [14] top-level scope at In[24]:11
 [15] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1091

So I suppose either missing value handling is broken or the docs are out of date?

Or maybe my Julia version is too new?

Turing v.0.14.11
Julia 1.5.3

All 30 comments

This is really strange. I'm on Julia 1.5.1 and the above _works_ but doesn't actually produce samples for x[1]. Have we dropped support for mixing observations and missing recently? @devmotion @cpfiffer

Let me rephrase that: DynamicPPL.jl is doing the right thing, e.g.

julia> # x[1] is a parameter, but x[2] is an observation
       model = gdemo([missing, 2.4]);

julia> var_info = Turing.VarInfo(model);

julia> var_info.metadata.x
DynamicPPL.Metadata{Dict{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},Int64},Array{Normal{Float64},1},Array{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},1},Array{Float64,1},Array{Set{DynamicPPL.Selector},1}}(Dict{DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}},Int64}(x[1] => 1), DynamicPPL.VarName{:x,Tuple{Tuple{Int64}}}[x[1]], UnitRange{Int64}[1:1], [-1.3576106773798338], Normal{Float64}[Normal{Float64}(渭=0.8828819868130773, 蟽=1.7668934766490565)], Set{DynamicPPL.Selector}[Set()], [0], Dict{String,BitArray{1}}("del" => [0],"trans" => [0]))

But Turing.jl messes up when bundling the chains, it seems:

julia> names(sample(model, HMC(0.01, 5), 500), :parameters)
2-element Array{Symbol,1}:
 :m
 :s

@torfjelde What Turing version are you on? I'll try initializing a new Julia environment to match yours to see if the issue is 1.5.3 or not.

(TestTuring) pkg> st
Project TestTuring v0.1.0
Status `/tmp/TestTuring/Project.toml`
  [fce5fe82] Turing v0.14.11

So should be the same as you.

Excuse me, what?

julia> using Turing

julia> @model function gdemo(x)
           s ~ InverseGamma(2, 3)
           m ~ Normal(0, sqrt(s))
           for i in eachindex(x)
               x[i] ~ Normal(m, sqrt(s))
           end
       end
gdemo (generic function with 1 method)

julia> # x[1] is a parameter, but x[2] is an observation
       model = gdemo([missing, 2.4]);

julia> keys(Turing.VarInfo(model).metadata) # Yes
(:s, :m, :x)

julia> keys(Turing.VarInfo(model).metadata) # NO!
(:s, :m)

julia> keys(Turing.VarInfo(model).metadata) # NO!
(:s, :m)

julia> keys(Turing.VarInfo(model).metadata) # NO!
(:s, :m)

There's something very strange going on here.

Ah, think I'm on to something here:

julia> model
DynamicPPL.Model{var"#1#2",(:x,),(),(),Tuple{Array{Union{Missing, Float64},1}},Tuple{}}(:gdemo, var"#1#2"(), (x = Union{Missing, Float64}[-3.0877422804870696, 2.4],), NamedTuple())

It seems like we end up mutating the argument rather than a copy of it. So in subsequent calls, no elements of x will be missing. That is not correct.

I am pretty sure this was correct before. I was calling deepcopy.

Yeah it's definitively a recent introduction.

Try a deepcopy here in DPPL.

function matchingvalue(sampler, vi, value)
    T = typeof(value)
    if hasmissing(T)
        return get_matching_type(sampler, vi, T)(value)
    else
        return value
    end
end

You mean like

function matchingvalue(sampler, vi, value)
    T = typeof(value)
    if hasmissing(T)
        return get_matching_type(sampler, vi, T)(value)
    else
        return deepcopy(value)
    end
end

?

get_matching_type(sampler, vi, T)(deepcopy(value))

You beautiful man @mohamed82008 :heart:

julia> using Turing
[ Info: Precompiling Turing [fce5fe82-541a-59a6-adf8-730c64b5f9a0]

julia> @model function gdemo(x)
           s ~ InverseGamma(2, 3)
           m ~ Normal(0, sqrt(s))
           for i in eachindex(x)
               x[i] ~ Normal(m, sqrt(s))
           end
       end
gdemo (generic function with 1 method)

julia> # x[1] is a parameter, but x[2] is an observation
       model = gdemo([missing, 2.4]);

julia> keys(Turing.VarInfo(model).metadata) # Yes
(:s, :m, :x)

julia> keys(Turing.VarInfo(model).metadata) # NO!
(:s, :m)

julia> keys(Turing.VarInfo(model).metadata) # NO!
(:s, :m)

julia> function DynamicPPL.matchingvalue(sampler, vi, value)
           T = typeof(value)
           if DynamicPPL.hasmissing(T)
               return convert(DynamicPPL.get_matching_type(sampler, vi, T), deepcopy(value))
           else
               return value
           end
       end

julia> # x[1] is a parameter, but x[2] is an observation
       model = gdemo([missing, 2.4]);

julia> keys(Turing.VarInfo(model).metadata) # Yes
(:s, :m, :x)

julia> keys(Turing.VarInfo(model).metadata) # Yes!
(:s, :m, :x)

julia> keys(Turing.VarInfo(model).metadata) # YES!
(:s, :m, :x)

Now the strange thing is that the only recent change we made is this one: https://github.com/TuringLang/DynamicPPL.jl/commit/8c9004eca82633751af1c8b0bd9fd6fea78cd266#diff-365fc3b77837a3d3be094efbadc976064a26a40534c0adfb9ca65dc98819134e.

It seems like the error occured because:

julia> a = [1.0, 2.0]
2-element Array{Float64,1}:
 1.0
 2.0

julia> T = typeof(a)
Array{Float64,1}

julia> b = T(a)
2-element Array{Float64,1}:
 1.0
 2.0

julia> a[1] = 3.
3.0

julia> a
2-element Array{Float64,1}:
 3.0
 2.0

julia> b
2-element Array{Float64,1}:
 1.0
 2.0

versus

julia> a = [1.0, 2.0]
2-element Array{Float64,1}:
 1.0
 2.0

julia> b = convert(typeof(a), a)
2-element Array{Float64,1}:
 1.0
 2.0

julia> a[1] = 3.
3.0

julia> b
2-element Array{Float64,1}:
 3.0
 2.0

Very subtle stuff. So it's actually a bit unclear whether the change in the above commit is a good idea? @devmotion @mohamed82008 Thoughts?

So @bgroenks96 if you copy-paste this:

function DynamicPPL.matchingvalue(sampler, vi, value)
    T = typeof(value)
    if DynamicPPL.hasmissing(T)
        return convert(DynamicPPL.get_matching_type(sampler, vi, T), deepcopy(value))
    else
        return value
    end
end

after using Turing, then you should be good for now. We'll get a new version out of DynamicPPL.jl ASAP.

We can just revert that change and write a comment to explain this.

Hmm I see. We can define a _convert that copies for non-TArrays then.

DynamicPPL should not depend on Libtask.

Then let's convert and check if the output is === the input. If it's the same, copy/deepcopy the output.

Hmm it feels a bit suboptimal, doesn't it? Often it should be fine to not copy the input.

No we always need to copy in the missing branch.

Ah wait, this branch is just used if some missing is involved.

BTW looking at this implementation and thinking about our desire for a simpler API: is it actually useful to have both matching_type and matching_value? Wouldn't it be sufficient to implement only matching_value in downstream packages such as Turing and define the defaults matching_value(spl, vi, value) = value and matching_value(spl, vi, value::AbstractArray{>:Missing}) = deepcopy(value) in DynamicPPL?

@torfjelde Thank you! :clap: This seems to do the trick.

I'm not sure why I got the error and you didn't... but the issue is resolved either way!

Wouldn't it be sufficient to implement only matching_value in downstream packages such as Turing and define the defaults matching_value(spl, vi, value) = value and matching_value(spl, vi, value::AbstractArray{>:Missing}) = deepcopy(value) in DynamicPPL?

We need to convert the type for AD. matchignvalue is internal so Turing should not define it. The current API here seems fine to me.

The suggestion was to define a default implementation in DynamicPPL and to specialize it in Turing for samplers owned by Turing if needed. Basically like currently done for matching_type.

This won't simplify the API, it will just change it.

It's only one method where currently we have two?

But only 1 of them is overloaded. The other is internal.

So @bgroenks96 if you copy-paste this:

function DynamicPPL.matchingvalue(sampler, vi, value)
    T = typeof(value)
    if DynamicPPL.hasmissing(T)
        return convert(DynamicPPL.get_matching_type(sampler, vi, T), deepcopy(value))
    else
        return value
    end
end

after using Turing, then you should be good for now. We'll get a new version out of DynamicPPL.jl ASAP.

This fix indeed works, I have to use it now too otherwise I get the same error in my model with missing values it seems.

@mohamed82008 @devmotion Any updates on getting this fix into the official package?

Was this page helpful?
0 / 5 - 0 ratings