I'm curious after our other discussion - how do you handle heteroskedasticity? When the noise level is inferred, I'm assuming you treat the problem as homoskedastic, because otherwise the whole problem starts to seem under-specified. But when the sem is passed in explicitly, what's your modeling approach? Can you point to a paper on this?
Hit send too soon as I think maybe I found the answer: https://botorch.org/api/models.html#heteropskedasticsingletaskgp ? So is this effectively doing a multi-task GP where one output is the objective's mean and the other is the objective's variance?
So if the sem is passed in, then in Ax we use a FixedNoiseGP model, where we literally just use the sem directly, so instead of K(X, X) + sigma^2I we use K(X, X) + diag(sem(X)^2).
The HeteroskedasticSingleTaskGP that you found is an advanced model that uses a GP to model the log-variances and uses the model predictions, so you'd have K(X, X) + exp(GP_logvar(X)). This GP together with the outcome GP are jointly fit. Compared to the FixedNoiseGP, the benefit of this modeling approach is twofold:
Note however, that we currently don't expose HeteroskedasticSingleTaskGP in Ax. So if sems are passed we use FixedNoiseGP, if not we infer a homoskedastic noise level.
Finally, there is also a way of inferring heteroskedastic noise levels, one relatively simple approach is the "most likely heteroskedastic GP". There is a long standing PR #250; we have cleaned that up internally and should be able merge that in in the near future (cc @jelena-markovic).
Thanks for the detailed explanation.
I'm afraid I'm not following the most-likely-heteroskedastic-gp approahc. PR #250 seems to be about Raytune - is that the right PR?
Ah sorry, the PR is on the botorch repo: https://github.com/pytorch/botorch/pull/250
Much better. I'm gonna close this as I've learned what I wanted and don't see anything more actionable. Thanks!
Most helpful comment
So if the sem is passed in, then in Ax we use a
FixedNoiseGPmodel, where we literally just use the sem directly, so instead ofK(X, X) + sigma^2Iwe useK(X, X) + diag(sem(X)^2).The
HeteroskedasticSingleTaskGPthat you found is an advanced model that uses a GP to model the log-variances and uses the model predictions, so you'd haveK(X, X) + exp(GP_logvar(X)). This GP together with the outcome GP are jointly fit. Compared to theFixedNoiseGP, the benefit of this modeling approach is twofold:Note however, that we currently don't expose
HeteroskedasticSingleTaskGPin Ax. So if sems are passed we useFixedNoiseGP, if not we infer a homoskedastic noise level.Finally, there is also a way of inferring heteroskedastic noise levels, one relatively simple approach is the "most likely heteroskedastic GP". There is a long standing PR #250; we have cleaned that up internally and should be able merge that in in the near future (cc @jelena-markovic).