Turing.jl: Running model without data returns error

Created on 15 Sep 2018 · 31Comments · Source: TuringLang/Turing.jl

using Turing

@model gdemo(x) = begin
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  x[1] ~ Normal(m, sqrt(s))
  x[2] ~ Normal(m, sqrt(s))
  return s, m
end

g = gdemo([1.5, 2]) # This works
g()           # returns: (0.8660317849113424, 0.820409281237518)

g = gdemo() # Returns an error.
g()

Here is the error message:

julia> g = gdemo()
┌ Warning: Data `x` not provided, treating as parameter instead.
└ @ Main ~/.julia/dev/Turing/src/core/compiler.jl:326
[ Info:  Assume - `s` is a parameter
[ Info:  Assume - `m` is a parameter
[ Info:  Assume - `x` is a parameter
gdemo_model (generic function with 4 methods)

julia> g()
ERROR: UndefVarError: x not defined
Stacktrace:
 [1] macro expansion at /Users/hg344/.julia/dev/Turing/src/core/compiler.jl:78 [inlined]
 [2] gdemo_model(::Turing.VarReplay.VarInfo, ::Turing.SampleFromPrior) at /Users/hg344/.julia/packages/MacroTools/4AjBS/src/utils.jl:297 (repeats 2 times)
 [3] top-level scope at none:0

@trappmartin

compiler enhancement

Source

yebai

👍1

Most helpful comment

It makes a lot of sense to further push the compiler design towards

compositional modelling and
improved consistency with standard Julia syntax

For this particular case, I think we can leverage the missing value type. For example,

using Turing

@model gdemo(x) = begin
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  x[1] ~ Normal(m, sqrt(s))
  x[2] ~ Normal(m, sqrt(s))
  return s, m
end

g = gdemo([1.5, 2]) # standard version

g = gdemo([missing, missing]) # treat x as parameter and draw from prior

This would allow us the both

simplify the compiler by removing function alias (e.g. no more default value for x)
still, allow a user to re-use the same model definition (e.g. drawing from the prior)

I think the burden of ensuring dimentionality match between arguments and actual use should be left on the user, for example, if the user writes

using Turing

@model gdemo(x) = begin
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  x ~ Normal(m, sqrt(s)) # no broadcasting syntax
  return s, m
end

g = gdemo(1.0)
g = gdemo(missing)

Then, the user shouldn't be doing g = gdemo([1.0 2.0]) or g = gdemo([missing, missing]). That said, I do agree we can have a nice broadcasting syntax for defining IID variables.

yebai on 17 Sep 2018

👍3

All 31 comments

I’ll have a look. Does it work if x is a scalar?

trappmartin on 15 Sep 2018

I’ll have a look. Does it work if x is a scalar?

Probably not, since we are indexing x (i.e. x[1] ~ Normal).

yebai on 15 Sep 2018

This problem is related to the fact that we do not know the dimensions and type of x if we draw x from the prior.

Running the code as:

julia> using Turing, Distributions
julia> @model gdemo(x, y) = begin
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  x ~ Normal(m, sqrt(s))
  y ~ Normal(m, sqrt(s))
  return x, y
end
julia> g = gdemo()
gdemo_model (generic function with 4 methods)
julia> g()
(0.685690547873451, -1.1972706455914328)

works fine with the current version of Turing.
@cpfiffer Could you please add the scalar version into the documentation?

However, idealy the user should be able to do something in the lines of

julia> @model gdemo1(N, x) = begin
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  x ~ [Normal(m, sqrt(s)) for n in 1:N]
  return x
end
julia> g = gdemo(2)
julia> g() # currently doesn't work as we need x in assume, see below

or alternativelly something like:

julia> @model gdemo2(x) = begin
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  N = length(x)
  x[1] ~ Normal(m, sqrt(s))
  x[2] ~ Normal(m, sqrt(s))
  return x
end
julia> g = gdemo(Array{Float64}(undef, 2))
julia> g() # currently doesn't work as we don't check if x is defined or not

in which the user has to ensure that x can actually be indexed at index 1 and 2.

@yebai I think gdemo1 would be the cleaner approach but is more involved as assume for a vector of distributions has the variable x as an argument. The variable is used to check the size and indexing scheme, which is decided based on the variable type the type of distribution.
https://github.com/TuringLang/Turing.jl/blob/ce780b87fa7ac64a4a516c8bfa1e32fe8152f55d/src/samplers/sampler.jl#L76-L97
This is related to the broadcasting issue described in #476 and will only be possible once #476 can be resolved.

The second solution should be possible but is a slightly hackish solution to the actual problem.

trappmartin on 17 Sep 2018

Potentially controversial opinion: I think that @yebai 's example _should_ break. As @trappmartin says, nothing about x is known when the function is invoked, other than that we're assigning to it. More fundamentally, x is an input to that model, and I really don't think that we should be able to run a model that requires arguments without providing them (it's just inconsistent with what one generally expects from a function-like thing), consequently I also think that @trappmartin 's scalar example _should_ break. I do, however, think that we should be able to provide observational data as arguments when appropriate, so here's a proposal that would work nicely, should be consistent with function-like things, and wouldn't really diverge too much from our current approach:

make compositional modelling work, so that I can say some variables in one model bar are distributed according to some other Turing model foo that I've already defined.
If a model requires arguments, then they must be provided (as with any other function)
To get code re-use, a good design pattern is to split the modelling process into two parts - specify a prior model as a Turing model that _doesn't_ take arguments that are used as observational data (foo in our example above), and specify a separate observation model (bar in our example). Then if I want to sample from the prior I just run foo, but if I want to condition on data I use bar. If you try to run bar without providing data, it produces the error that @yebai found no matter what is going on internally.

This would resolve the inconsistent / complicated behaviour in the current design whereby sometimes you can run a model that has arguments without providing any arguments, but not always, and whether you can or not depends on exactly what types you have used in your model -- if a model has arguments, you must provide them (there's no reason not to allow default values in the same way that regular functions do though). Moreover, we would still be able to write a stdlib of "prior" models that make no / few assumption about what RVs might be observed, and compose these together to make larger models. We would also be able to provide "observation" models that are a really thin wrapper around prior models that could provide standard scenarios eg. for mixture models one could specify a prior model that returns all of the random variables, and a separate observation model that requires observations are provided but not mixture components.

willtebbutt on 17 Sep 2018

🎉3

I think I generally agree. This would also simply the current compiler code and would be potentially much cleaner.

I’m just not quite sure what would be a good syntax.

julia> @model gmodel() = begin
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  x ~ Normal(m, sqrt(s))
  return x
end
julia> @model gdemo(x) = begin
  N = length(x)
  x ~ [gmodel() for n in 1:N]
end

trappmartin on 17 Sep 2018

There are two separate things going on here though: the way we handle sampling from the prior vs providing data, and how to specify vectors of random variables. A scalar version of your example could be

@model prior() = begin
    s ~ InverseGamma(2, 3)
    m ~ Normal(0, sqrt(s))
    x ~ Normal(m, sqrt(s))
    return x
end
@model observation(x) = begin
    x ~ prior()
end

which I think is completely fine. As regards the specification of vectors of RVs, I think that there are a few ways to go about this. I suspect that we should try to avoid upgrading our @~ macro (or whatever succeeds it) to handle stuff other than x ~ distribution-like statements. It seems to me that this will probably lead to an unbounded number of weird edge cases to support. The options as I see them are

stuff like

@model observation(x) = begin
    for n in eachindex(x)
        x[n] ~ prior()
    end
end

is total fine to my mind, if a little bit verbose and should already work, if I'm not mistaken.

as discussed in #476 it would be really nice if broadcasting worked properly, and models behave like distributions, such that so that we could write stuff like

x .~ [prior() for n in eachindex(x)]

The goal here would be to completely avoid having to customise any broadcasting behaviour, so that this would just naturally work. This is a whole separate kettle of fish though.

in certain situations, such as the one described here, it might be really helpful to introduce a product distribution-like construct which represents a collection of independent RVs (see https://github.com/JuliaStats/Distributions.jl/pull/722 for my thoughts on this). This isn't a general solution though. This would let you write things like

@model observation(x) = begin
    x ~ Product([prior() for _ in eachindex(x)])
end

(the precise name of the distribution is quite reasonable to have some discussion over).

So my thoughts on what we should actually do are refactor the compiler a bit, and worry about vectors of RVs separately.

willtebbutt on 17 Sep 2018

I've put the current sampling methodology in the documentation for now, but I'll update if/when it changes.

cpfiffer on 17 Sep 2018

👍3

It makes a lot of sense to further push the compiler design towards

compositional modelling and
improved consistency with standard Julia syntax

For this particular case, I think we can leverage the missing value type. For example,

using Turing

@model gdemo(x) = begin
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  x[1] ~ Normal(m, sqrt(s))
  x[2] ~ Normal(m, sqrt(s))
  return s, m
end

g = gdemo([1.5, 2]) # standard version

g = gdemo([missing, missing]) # treat x as parameter and draw from prior

This would allow us the both

simplify the compiler by removing function alias (e.g. no more default value for x)
still, allow a user to re-use the same model definition (e.g. drawing from the prior)

I think the burden of ensuring dimentionality match between arguments and actual use should be left on the user, for example, if the user writes

using Turing

@model gdemo(x) = begin
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  x ~ Normal(m, sqrt(s)) # no broadcasting syntax
  return s, m
end

g = gdemo(1.0)
g = gdemo(missing)

Then, the user shouldn't be doing g = gdemo([1.0 2.0]) or g = gdemo([missing, missing]). That said, I do agree we can have a nice broadcasting syntax for defining IID variables.

yebai on 17 Sep 2018

👍3

I do like your missing value proposal @yebai -- I think the ideal would be to support both proposals as they're not in conflict.

willtebbutt on 17 Sep 2018

I agree that the broadcasting issue and the issue regarding draws from a prior are two independent issues.

At the moment the solution of @yebai is the most tangible from my point of view. For the compositional model proposal, we would need to construct something that behaves like a distribution. I see the appeal for that and I think it's the way to go on the long run. Maybe let us create an additional issue to discuss a possible redesign of the compiler.

trappmartin on 17 Sep 2018

👍2

Yeah, we've definitely discussed this in other issues (I'm not sure which ones). I do have a habit of bringing the "models should just behave like distributions" thing up on a weekly basis, so apologies for that everyone.

willtebbutt on 17 Sep 2018

😄1

After discussions with @yebai, this issue will be resolved in #613 by allowing the user to specify a default value for the inputs when defining the model for when they are treated as parameters. For example, the following will be equivalent:

@model gdemo(x = Vector{Real}(undef, 2)) = begin
    s ~ InverseGamma(2,3)
    m ~ Normal(0,sqrt(s))
    x[1] ~ Normal(m, sqrt(s))
    x[2] ~ Normal(m, sqrt(s))
    return s, m
end
g = gdemo()
g()

and

@model gdemo() = begin
    x = Vector{Real}(undef, 2)
    s ~ InverseGamma(2,3)
    m ~ Normal(0,sqrt(s))
    x[1] ~ Normal(m, sqrt(s))
    x[2] ~ Normal(m, sqrt(s))
    return s, m
end
g = gdemo()
g()

In other words, when a data variable is not input, it is treated as a parameter. If a default value is assigned, it is used, otherwise it is assigned to nothing which is the current behavior. nothing works for scalars but not vectors. This is analogous to the default value syntax in Julia. Let me know if you have any objections.

mohamed82008 on 15 Dec 2018

👍1

So it seems that a few people were unhappy about the above solution for the following reason if I understood correctly. If you want to set a default "data" value for x above while keeping it as dvar as opposed to just treating it as a pvar, then it is not possible with the above syntax. So how can we allow the following:

Treat missing vector data variables as parameters.
Allow for default data value.

One proposal that was discussed today was to use Vector{Missing} to signal to the compiler that we want to treat x as a parameter, while letting us know the length of x so we can define our own Vector{Real}(undef, n) inside the model body. So any value that is not Vector{Missing} will keep x as data. The Vector{Missing} can be set in the model definition as the default value of x or it can be passed as input when constructing the model using the outer function.

At first glance, this seems like a reasonable solution but it will most likely bite us in the neck later when we want to be able to infer the type of x to use a type stable VarInfo. Just knowing the length is not enough to know the type of vi.vals for this variable, our best guess is Real. I thought of a few alternatives but each seems to have at least one problem. Perhaps one of the best alternatives from a compiler POV may be:

@model gdemo(x::Vector{Float64}=[missing, missing])
    ....
end

So using the above syntax, the user can specify if x is made of Float64s, or Ints allowing for VarInfo specialization when treating it as a parameter. And at the same time, the default value can be assigned to be [missing, missing] in which case x will be treated as a parameter. Or the default value can be any other meaningful default value in which case x will be treated as data. To make the above more Julian, we should probably make the user write ::Union{Vector{Missing}, Vector{Float64}}. What I don't like about this solution is that it is pretty lengthy given that we only want 2 pieces of information: 1) the length of x, and 2) the eltype of x if treated as parameter. Alternatively, we can use the following syntax:

@model gdemo(x=Param{Vector{Float64}, 2}())
    ....
end

Param{Vector{Float64}, 2}() can be made to be short for Vector{Float64}(undef, 2). So using the above, we know the eltype and length of x and it is clear from a user POV, that x will be a Parameter by default of this type and length. If we go with this, we won't need the Vector{Missing} anymore; nothing will do as we would already know the length and eltype from the model definition.

mohamed82008 on 20 Dec 2018

If this change turns out to be controversial, I will remove it from the compiler PR and we can discuss it later.

mohamed82008 on 20 Dec 2018

I'm partial to the x=Param{Vector{Float64}, 2}() solution, mostly due to the lengthiness of x::Vector{Float64}=[missing, missing].

Maybe this has already been discussed, but is there a reason we can't just do something basic like this:

@model gdemo(x = Param(300, Float64))
   . . . .
end

Just a minor thought. I am still a fan of the x=Param{Vector{Float64}, 2}() syntax, though.

cpfiffer on 20 Dec 2018

I'm generally in favour of using missing because it automatically allows us to handle missing values, which I believe is important in many domains. And it's semantically clear what happens.

I understand the appeal for something like x = Param{Vector{Float64}, 2}(). But I would consider this to be an additional syntax that we might support. The syntax using missing values seems more relevant to me.

trappmartin on 20 Dec 2018

👍1

So, if we were to use missing syntax, could I pass in a vector like [2,missing,5] and have the middle value in that vector be an assumption? Or would that not work?

Perhaps that's a poor example. I don't know why someone would have that kind of data.

cpfiffer on 20 Dec 2018

I think that’s where we should be heading. Reasons for missing data can be diverse, in the simplest case think about sensory data where a sensor doesn’t provide data at time t. Using probabilistic modelling in such cases is attractive as it allows to handle missing data.

trappmartin on 20 Dec 2018

@cpfiffer Param(300, Float64) is also possible and we default to Vector.

I'm generally in favour of using missing

@trappmartin if we want to allow this fine granularity of missing data, then we will have to check if each variable is observed or assumed at run-time. I am ok with this, if we want to go that way. So basically, the partially data variable can be of type Vector{Union{T, Missing}} and we can assume any missing value. Union{T, Missing} is efficient in Julia, so that may not hurt performance much. But note that this doesn't solve the main problem addressed above, which is what happens when the whole vector is missing. We still need to know the eltype and length of vector.

mohamed82008 on 21 Dec 2018

👍1

so that may not hurt performance much

Actually, it should not hurt the performance at all for full parameters and full data variables because if use some if statement like this:

if x[i] isa Missing
    ...
else
   ...
end

and x is of type Vector{Float64} then the Julia compiler should be able to compile this if statement away. So this may be a pure gain if done right.

mohamed82008 on 21 Dec 2018

🎉1

As said, I’m fine with the syntax you propose and I’m also fine with the more clumsy syntax using missing values. And inserting those if statements would be a great feature in my opinion.

But let’s keep @yebai and @willtebbutt in the loop to find a sensible solution everyone agrees on.

trappmartin on 21 Dec 2018

I'm actually fairly opposed to Turing automatically doing something with arrays comprising a mix of Missing and observed data. Whatever implementation we eventually use will necessarily add complexity to the compiler, and it's not clear to me whether or not this kind of feature is actually that useful. So I guess that I have a few thoughts:

Have we received any indication that this kind of feature is in high demand?
If so, is there a solution that we can suggest that doesn't involve us baking it into the language? i.e. some design pattern we can suggest when writing programmes to make it easy for users to work around missing data?

willtebbutt on 21 Dec 2018

I’m not aware of a feature request on missing values.

Regarding solutions, in a Bayesian framework the obvious solution is to treat missing values as unknown parameters. Alternativ solutions are for example to exclude observations with missing values.

I personally think it would be good to handle missing values such as other PPLs do. Especially as the overhead in the compiler would be minimal. But I’ll not try to push it if there are reasons not to support missing values.

trappmartin on 21 Dec 2018

I personally think it would be good to handle missing values such as other PPLs do.

@trappmartin Could you point me towards an example of this please?

in a Bayesian framework the obvious solution is to treat missing values as unknown parameters

I don't disagree. My point is that there is more than one way to realise this in practice. For example, if as a user I have vector of data and I know that a subset of the data is missing, then it's usually quite straightforward to chop it up into two vectors by hand, one containing missing stuff and the other observed stuff. You can then treat these as two separate vectors in the model, and continue as usual. I would argue that this is a straightforward solution to this problem, which doesn't require any work on our side. It also means that there's one language feature for the user to concern themselves with, and won't usually add a huge amount of work on their part.

There are also some non-trivial technical challenges. For example, if

x ~ MvNormal(m, C)

and we want to provide x as data with some Missing elements then it's clear how to proceed conceptually, but I don't really know what we would need to do from an implementation perspective. It somehow feel different to the

for n in ...
    x[n] ~ Foo
end

situation, where my impression is that we have a better idea of how to implement stuff.

I'm open to being convinced about this, but the crux of my argument against doing it is basically that

It's not a crucial feature (straightforward workarounds exist),
It's not trivial to implement / maintain, and
we have other priorities

@yebai where do you stand on this?

willtebbutt on 21 Dec 2018

Hm, I’m not sure I understand your approach. 😅

Most cases of missing values I have encountered where those in which a sensor would sometimes not provide a measurement. Meaning that you would usually observe measurements from multiple sensors at a time and that some sensor sometimes lacks information due to whatever reason, e.g. out of range values. How would I proceed with your approach?

I’m open to everything btw.

Regarding other PPLs, Stan and pymc can handle missing values to some extent.

trappmartin on 21 Dec 2018

https://docs.pymc.io/notebooks/getting_started.html

See: Case study 2: Coal mining disasters

trappmartin on 21 Dec 2018

Hmm, I raised this issue of replacing undef with missing yesterday. My primary motivation is to make semantics of Mohamed’s syntax more explicit, ie, they are missing values, not any data. I can see @trappmartin point for support missing values, I think it is important, but should be done carefully in a separate effort. I lean towards @willtebbutt idea for keeping this out of the compiler: mixing missing values and data at a compiler level might lead to unintended confusions.

@mohamed82008 perhaps let’s focus on the dimension issue in the compiler PR, ie, to replace undef with missing and always treat them as parameters.

yebai on 21 Dec 2018

👍1

@yebai may you write down some test cases and expected behavior? There are a few ideas thrown around in this thread and some are incompatible with others, so I want to nail down what is actually required.

mohamed82008 on 21 Dec 2018

I think the 2 main use cases I have in mind are:

@model gdemo(x) = begin
    s ~ InverseGamma(2,3)
    m ~ Normal(0,sqrt(s))
    x[1] ~ Normal(m, sqrt(s))
    x[2] ~ Normal(m, sqrt(s))
    return s, m
end
g = gdemo([missing, missing]) # treats x as a parameter of length 2 and eltype Real

and

g = gdemo(Union{Missing, Float64}[missing, 1.0]) # what should this be?

If we decide to treat x in the second case as a full parameter of length 2 and eltype Float64, then @trappmartin 's use case of partially missing data will be blocked AFAICT. If we introduce a new syntax for defining parameters of specific length and eltype like Param(2, Float64), then we can still accommodate Martin's use case. All of this is orthogonal to the default value of x which can be assigned using the syntax:

@model gdemo(x = default_value) = begin
    ....
end

This default value could be meaningful data or the missing vectors from above signalling that x is a parameter.

mohamed82008 on 21 Dec 2018

Since the specifications of this change that everyone agrees on are still not clear to me, I will remove it from the compiler PR and do it in a separate PR.

mohamed82008 on 22 Dec 2018

Hi @mohamed82008, sorry for the confusion. Let's only support the following syntax in the compiler PR (for now):

@model gdemo(x = default_value) = begin
    ....
end

to provide a way to initialize data variables, i.e.

variable x is always treated as a parameter in sample(gdemo()), regardless whether default_value contains a missing value or a float64.
variable x is always treated as observed data in sample(gdemo([0.0 1.0])

The following syntax should be forbidden before we explicitly support missing values during inference.

g = gdemo([missing, missing])               # returns error at runtime
g = gdemo(Union{Missing, Float64}[missing, 1.0]) # return error at runtime

yebai on 22 Dec 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Variational Bayes inference?

skanskan · 5Comments

Log probability of an observation

trappmartin · 6Comments

Distributions not compatible with AD (ForwardDiff/Flux.Tracker)

yebai · 5Comments

Doc: more informative visualisation about MCMC

yebai · 4Comments

`MethodError: no method matching (::BallTreeDensity)(::Array{ForwardDiff.Dual....` when trying to use a KernelDensityEstimate BallTreeDensity as prior

ClaudMor · 3Comments