using Turing
@model gdemo(x) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
x[1] ~ Normal(m, sqrt(s))
x[2] ~ Normal(m, sqrt(s))
return s, m
end
g = gdemo([1.5, 2]) # This works
g() # returns: (0.8660317849113424, 0.820409281237518)
g = gdemo() # Returns an error.
g()
Here is the error message:
julia> g = gdemo()
┌ Warning: Data `x` not provided, treating as parameter instead.
â”” @ Main ~/.julia/dev/Turing/src/core/compiler.jl:326
[ Info: Assume - `s` is a parameter
[ Info: Assume - `m` is a parameter
[ Info: Assume - `x` is a parameter
gdemo_model (generic function with 4 methods)
julia> g()
ERROR: UndefVarError: x not defined
Stacktrace:
[1] macro expansion at /Users/hg344/.julia/dev/Turing/src/core/compiler.jl:78 [inlined]
[2] gdemo_model(::Turing.VarReplay.VarInfo, ::Turing.SampleFromPrior) at /Users/hg344/.julia/packages/MacroTools/4AjBS/src/utils.jl:297 (repeats 2 times)
[3] top-level scope at none:0
@trappmartin
I’ll have a look. Does it work if x is a scalar?
I’ll have a look. Does it work if x is a scalar?
Probably not, since we are indexing x (i.e. x[1] ~ Normal).
This problem is related to the fact that we do not know the dimensions and type of x if we draw x from the prior.
Running the code as:
julia> using Turing, Distributions
julia> @model gdemo(x, y) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
x ~ Normal(m, sqrt(s))
y ~ Normal(m, sqrt(s))
return x, y
end
julia> g = gdemo()
gdemo_model (generic function with 4 methods)
julia> g()
(0.685690547873451, -1.1972706455914328)
works fine with the current version of Turing.
@cpfiffer Could you please add the scalar version into the documentation?
However, idealy the user should be able to do something in the lines of
julia> @model gdemo1(N, x) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
x ~ [Normal(m, sqrt(s)) for n in 1:N]
return x
end
julia> g = gdemo(2)
julia> g() # currently doesn't work as we need x in assume, see below
or alternativelly something like:
julia> @model gdemo2(x) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
N = length(x)
x[1] ~ Normal(m, sqrt(s))
x[2] ~ Normal(m, sqrt(s))
return x
end
julia> g = gdemo(Array{Float64}(undef, 2))
julia> g() # currently doesn't work as we don't check if x is defined or not
in which the user has to ensure that x can actually be indexed at index 1 and 2.
@yebai I think gdemo1 would be the cleaner approach but is more involved as assume for a vector of distributions has the variable x as an argument. The variable is used to check the size and indexing scheme, which is decided based on the variable type the type of distribution.
https://github.com/TuringLang/Turing.jl/blob/ce780b87fa7ac64a4a516c8bfa1e32fe8152f55d/src/samplers/sampler.jl#L76-L97
This is related to the broadcasting issue described in #476 and will only be possible once #476 can be resolved.
The second solution should be possible but is a slightly hackish solution to the actual problem.
Potentially controversial opinion: I think that @yebai 's example _should_ break. As @trappmartin says, nothing about x is known when the function is invoked, other than that we're assigning to it. More fundamentally, x is an input to that model, and I really don't think that we should be able to run a model that requires arguments without providing them (it's just inconsistent with what one generally expects from a function-like thing), consequently I also think that @trappmartin 's scalar example _should_ break. I do, however, think that we should be able to provide observational data as arguments when appropriate, so here's a proposal that would work nicely, should be consistent with function-like things, and wouldn't really diverge too much from our current approach:
bar are distributed according to some other Turing model foo that I've already defined.foo in our example above), and specify a separate observation model (bar in our example). Then if I want to sample from the prior I just run foo, but if I want to condition on data I use bar. If you try to run bar without providing data, it produces the error that @yebai found no matter what is going on internally.This would resolve the inconsistent / complicated behaviour in the current design whereby sometimes you can run a model that has arguments without providing any arguments, but not always, and whether you can or not depends on exactly what types you have used in your model -- if a model has arguments, you must provide them (there's no reason not to allow default values in the same way that regular functions do though). Moreover, we would still be able to write a stdlib of "prior" models that make no / few assumption about what RVs might be observed, and compose these together to make larger models. We would also be able to provide "observation" models that are a really thin wrapper around prior models that could provide standard scenarios eg. for mixture models one could specify a prior model that returns all of the random variables, and a separate observation model that requires observations are provided but not mixture components.
I think I generally agree. This would also simply the current compiler code and would be potentially much cleaner.
I’m just not quite sure what would be a good syntax.
julia> @model gmodel() = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
x ~ Normal(m, sqrt(s))
return x
end
julia> @model gdemo(x) = begin
N = length(x)
x ~ [gmodel() for n in 1:N]
end
There are two separate things going on here though: the way we handle sampling from the prior vs providing data, and how to specify vectors of random variables. A scalar version of your example could be
@model prior() = begin
s ~ InverseGamma(2, 3)
m ~ Normal(0, sqrt(s))
x ~ Normal(m, sqrt(s))
return x
end
@model observation(x) = begin
x ~ prior()
end
which I think is completely fine. As regards the specification of vectors of RVs, I think that there are a few ways to go about this. I suspect that we should try to avoid upgrading our @~ macro (or whatever succeeds it) to handle stuff other than x ~ distribution-like statements. It seems to me that this will probably lead to an unbounded number of weird edge cases to support. The options as I see them are
@model observation(x) = begin
for n in eachindex(x)
x[n] ~ prior()
end
end
is total fine to my mind, if a little bit verbose and should already work, if I'm not mistaken.
x .~ [prior() for n in eachindex(x)]
The goal here would be to completely avoid having to customise any broadcasting behaviour, so that this would just naturally work. This is a whole separate kettle of fish though.
@model observation(x) = begin
x ~ Product([prior() for _ in eachindex(x)])
end
(the precise name of the distribution is quite reasonable to have some discussion over).
So my thoughts on what we should actually do are refactor the compiler a bit, and worry about vectors of RVs separately.
I've put the current sampling methodology in the documentation for now, but I'll update if/when it changes.
It makes a lot of sense to further push the compiler design towards
For this particular case, I think we can leverage the missing value type. For example,
using Turing
@model gdemo(x) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
x[1] ~ Normal(m, sqrt(s))
x[2] ~ Normal(m, sqrt(s))
return s, m
end
g = gdemo([1.5, 2]) # standard version
g = gdemo([missing, missing]) # treat x as parameter and draw from prior
This would allow us the both
x)I think the burden of ensuring dimentionality match between arguments and actual use should be left on the user, for example, if the user writes
using Turing
@model gdemo(x) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
x ~ Normal(m, sqrt(s)) # no broadcasting syntax
return s, m
end
g = gdemo(1.0)
g = gdemo(missing)
Then, the user shouldn't be doing g = gdemo([1.0 2.0]) or g = gdemo([missing, missing]). That said, I do agree we can have a nice broadcasting syntax for defining IID variables.
I do like your missing value proposal @yebai -- I think the ideal would be to support both proposals as they're not in conflict.
I agree that the broadcasting issue and the issue regarding draws from a prior are two independent issues.
At the moment the solution of @yebai is the most tangible from my point of view. For the compositional model proposal, we would need to construct something that behaves like a distribution. I see the appeal for that and I think it's the way to go on the long run. Maybe let us create an additional issue to discuss a possible redesign of the compiler.
Yeah, we've definitely discussed this in other issues (I'm not sure which ones). I do have a habit of bringing the "models should just behave like distributions" thing up on a weekly basis, so apologies for that everyone.
After discussions with @yebai, this issue will be resolved in #613 by allowing the user to specify a default value for the inputs when defining the model for when they are treated as parameters. For example, the following will be equivalent:
@model gdemo(x = Vector{Real}(undef, 2)) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
x[1] ~ Normal(m, sqrt(s))
x[2] ~ Normal(m, sqrt(s))
return s, m
end
g = gdemo()
g()
and
@model gdemo() = begin
x = Vector{Real}(undef, 2)
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
x[1] ~ Normal(m, sqrt(s))
x[2] ~ Normal(m, sqrt(s))
return s, m
end
g = gdemo()
g()
In other words, when a data variable is not input, it is treated as a parameter. If a default value is assigned, it is used, otherwise it is assigned to nothing which is the current behavior. nothing works for scalars but not vectors. This is analogous to the default value syntax in Julia. Let me know if you have any objections.
So it seems that a few people were unhappy about the above solution for the following reason if I understood correctly. If you want to set a default "data" value for x above while keeping it as dvar as opposed to just treating it as a pvar, then it is not possible with the above syntax. So how can we allow the following:
One proposal that was discussed today was to use Vector{Missing} to signal to the compiler that we want to treat x as a parameter, while letting us know the length of x so we can define our own Vector{Real}(undef, n) inside the model body. So any value that is not Vector{Missing} will keep x as data. The Vector{Missing} can be set in the model definition as the default value of x or it can be passed as input when constructing the model using the outer function.
At first glance, this seems like a reasonable solution but it will most likely bite us in the neck later when we want to be able to infer the type of x to use a type stable VarInfo. Just knowing the length is not enough to know the type of vi.vals for this variable, our best guess is Real. I thought of a few alternatives but each seems to have at least one problem. Perhaps one of the best alternatives from a compiler POV may be:
@model gdemo(x::Vector{Float64}=[missing, missing])
....
end
So using the above syntax, the user can specify if x is made of Float64s, or Ints allowing for VarInfo specialization when treating it as a parameter. And at the same time, the default value can be assigned to be [missing, missing] in which case x will be treated as a parameter. Or the default value can be any other meaningful default value in which case x will be treated as data. To make the above more Julian, we should probably make the user write ::Union{Vector{Missing}, Vector{Float64}}. What I don't like about this solution is that it is pretty lengthy given that we only want 2 pieces of information: 1) the length of x, and 2) the eltype of x if treated as parameter. Alternatively, we can use the following syntax:
@model gdemo(x=Param{Vector{Float64}, 2}())
....
end
Param{Vector{Float64}, 2}() can be made to be short for Vector{Float64}(undef, 2). So using the above, we know the eltype and length of x and it is clear from a user POV, that x will be a Parameter by default of this type and length. If we go with this, we won't need the Vector{Missing} anymore; nothing will do as we would already know the length and eltype from the model definition.
If this change turns out to be controversial, I will remove it from the compiler PR and we can discuss it later.
I'm partial to the x=Param{Vector{Float64}, 2}() solution, mostly due to the lengthiness of x::Vector{Float64}=[missing, missing].
Maybe this has already been discussed, but is there a reason we can't just do something basic like this:
@model gdemo(x = Param(300, Float64))
. . . .
end
Just a minor thought. I am still a fan of the x=Param{Vector{Float64}, 2}() syntax, though.
I'm generally in favour of using missing because it automatically allows us to handle missing values, which I believe is important in many domains. And it's semantically clear what happens.
I understand the appeal for something like x = Param{Vector{Float64}, 2}(). But I would consider this to be an additional syntax that we might support. The syntax using missing values seems more relevant to me.
So, if we were to use missing syntax, could I pass in a vector like [2,missing,5] and have the middle value in that vector be an assumption? Or would that not work?
Perhaps that's a poor example. I don't know why someone would have that kind of data.
I think that’s where we should be heading. Reasons for missing data can be diverse, in the simplest case think about sensory data where a sensor doesn’t provide data at time t. Using probabilistic modelling in such cases is attractive as it allows to handle missing data.
@cpfiffer Param(300, Float64) is also possible and we default to Vector.
I'm generally in favour of using missing
@trappmartin if we want to allow this fine granularity of missing data, then we will have to check if each variable is observed or assumed at run-time. I am ok with this, if we want to go that way. So basically, the partially data variable can be of type Vector{Union{T, Missing}} and we can assume any missing value. Union{T, Missing} is efficient in Julia, so that may not hurt performance much. But note that this doesn't solve the main problem addressed above, which is what happens when the whole vector is missing. We still need to know the eltype and length of vector.
so that may not hurt performance much
Actually, it should not hurt the performance at all for full parameters and full data variables because if use some if statement like this:
if x[i] isa Missing
...
else
...
end
and x is of type Vector{Float64} then the Julia compiler should be able to compile this if statement away. So this may be a pure gain if done right.
As said, I’m fine with the syntax you propose and I’m also fine with the more clumsy syntax using missing values. And inserting those if statements would be a great feature in my opinion.
But let’s keep @yebai and @willtebbutt in the loop to find a sensible solution everyone agrees on.
I'm actually fairly opposed to Turing automatically doing something with arrays comprising a mix of Missing and observed data. Whatever implementation we eventually use will necessarily add complexity to the compiler, and it's not clear to me whether or not this kind of feature is actually that useful. So I guess that I have a few thoughts:
I’m not aware of a feature request on missing values.
Regarding solutions, in a Bayesian framework the obvious solution is to treat missing values as unknown parameters. Alternativ solutions are for example to exclude observations with missing values.
I personally think it would be good to handle missing values such as other PPLs do. Especially as the overhead in the compiler would be minimal. But I’ll not try to push it if there are reasons not to support missing values.
I personally think it would be good to handle missing values such as other PPLs do.
@trappmartin Could you point me towards an example of this please?
in a Bayesian framework the obvious solution is to treat missing values as unknown parameters
I don't disagree. My point is that there is more than one way to realise this in practice. For example, if as a user I have vector of data and I know that a subset of the data is missing, then it's usually quite straightforward to chop it up into two vectors by hand, one containing missing stuff and the other observed stuff. You can then treat these as two separate vectors in the model, and continue as usual. I would argue that this is a straightforward solution to this problem, which doesn't require any work on our side. It also means that there's one language feature for the user to concern themselves with, and won't usually add a huge amount of work on their part.
There are also some non-trivial technical challenges. For example, if
x ~ MvNormal(m, C)
and we want to provide x as data with some Missing elements then it's clear how to proceed conceptually, but I don't really know what we would need to do from an implementation perspective. It somehow feel different to the
for n in ...
x[n] ~ Foo
end
situation, where my impression is that we have a better idea of how to implement stuff.
I'm open to being convinced about this, but the crux of my argument against doing it is basically that
@yebai where do you stand on this?
Hm, I’m not sure I understand your approach. 😅
Most cases of missing values I have encountered where those in which a sensor would sometimes not provide a measurement. Meaning that you would usually observe measurements from multiple sensors at a time and that some sensor sometimes lacks information due to whatever reason, e.g. out of range values. How would I proceed with your approach?
I’m open to everything btw.
Regarding other PPLs, Stan and pymc can handle missing values to some extent.
https://docs.pymc.io/notebooks/getting_started.html
See: Case study 2: Coal mining disasters
Hmm, I raised this issue of replacing undef with missing yesterday. My primary motivation is to make semantics of Mohamed’s syntax more explicit, ie, they are missing values, not any data. I can see @trappmartin point for support missing values, I think it is important, but should be done carefully in a separate effort. I lean towards @willtebbutt idea for keeping this out of the compiler: mixing missing values and data at a compiler level might lead to unintended confusions.
@mohamed82008 perhaps let’s focus on the dimension issue in the compiler PR, ie, to replace undef with missing and always treat them as parameters.
@yebai may you write down some test cases and expected behavior? There are a few ideas thrown around in this thread and some are incompatible with others, so I want to nail down what is actually required.
I think the 2 main use cases I have in mind are:
@model gdemo(x) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
x[1] ~ Normal(m, sqrt(s))
x[2] ~ Normal(m, sqrt(s))
return s, m
end
g = gdemo([missing, missing]) # treats x as a parameter of length 2 and eltype Real
and
g = gdemo(Union{Missing, Float64}[missing, 1.0]) # what should this be?
If we decide to treat x in the second case as a full parameter of length 2 and eltype Float64, then @trappmartin 's use case of partially missing data will be blocked AFAICT. If we introduce a new syntax for defining parameters of specific length and eltype like Param(2, Float64), then we can still accommodate Martin's use case. All of this is orthogonal to the default value of x which can be assigned using the syntax:
@model gdemo(x = default_value) = begin
....
end
This default value could be meaningful data or the missing vectors from above signalling that x is a parameter.
Since the specifications of this change that everyone agrees on are still not clear to me, I will remove it from the compiler PR and do it in a separate PR.
Hi @mohamed82008, sorry for the confusion. Let's only support the following syntax in the compiler PR (for now):
@model gdemo(x = default_value) = begin
....
end
to provide a way to initialize data variables, i.e.
x is always treated as a parameter in sample(gdemo()), regardless whether default_value contains a missing value or a float64. x is always treated as observed data in sample(gdemo([0.0 1.0])The following syntax should be forbidden before we explicitly support missing values during inference.
g = gdemo([missing, missing]) # returns error at runtime
g = gdemo(Union{Missing, Float64}[missing, 1.0]) # return error at runtime
Most helpful comment
It makes a lot of sense to further push the compiler design towards
For this particular case, I think we can leverage the
missingvalue type. For example,This would allow us the both
x)I think the burden of ensuring dimentionality match between arguments and actual use should be left on the user, for example, if the user writes
Then, the user shouldn't be doing
g = gdemo([1.0 2.0])org = gdemo([missing, missing]). That said, I do agree we can have a nice broadcasting syntax for defining IID variables.