Ax: How is hetereoskedasticity handled?

Created on 6 May 2020  路  5Comments  路  Source: facebook/Ax

I'm curious after our other discussion - how do you handle heteroskedasticity? When the noise level is inferred, I'm assuming you treat the problem as homoskedastic, because otherwise the whole problem starts to seem under-specified. But when the sem is passed in explicitly, what's your modeling approach? Can you point to a paper on this?

Most helpful comment

So if the sem is passed in, then in Ax we use a FixedNoiseGP model, where we literally just use the sem directly, so instead of K(X, X) + sigma^2I we use K(X, X) + diag(sem(X)^2).

The HeteroskedasticSingleTaskGP that you found is an advanced model that uses a GP to model the log-variances and uses the model predictions, so you'd have K(X, X) + exp(GP_logvar(X)). This GP together with the outcome GP are jointly fit. Compared to the FixedNoiseGP, the benefit of this modeling approach is twofold:

  1. The GP model performs regularization/shrinkage, so is helpful if the noise observations themselves are noisy.
  2. It provides out-of-sample noise predictions. This is important if you want to actively reason about the noise level at points you have not observed yet. For instance, this is something that is used in the KnowledgeGradient acquisition function.

Note however, that we currently don't expose HeteroskedasticSingleTaskGP in Ax. So if sems are passed we use FixedNoiseGP, if not we infer a homoskedastic noise level.

Finally, there is also a way of inferring heteroskedastic noise levels, one relatively simple approach is the "most likely heteroskedastic GP". There is a long standing PR #250; we have cleaned that up internally and should be able merge that in in the near future (cc @jelena-markovic).

All 5 comments

Hit send too soon as I think maybe I found the answer: https://botorch.org/api/models.html#heteropskedasticsingletaskgp ? So is this effectively doing a multi-task GP where one output is the objective's mean and the other is the objective's variance?

So if the sem is passed in, then in Ax we use a FixedNoiseGP model, where we literally just use the sem directly, so instead of K(X, X) + sigma^2I we use K(X, X) + diag(sem(X)^2).

The HeteroskedasticSingleTaskGP that you found is an advanced model that uses a GP to model the log-variances and uses the model predictions, so you'd have K(X, X) + exp(GP_logvar(X)). This GP together with the outcome GP are jointly fit. Compared to the FixedNoiseGP, the benefit of this modeling approach is twofold:

  1. The GP model performs regularization/shrinkage, so is helpful if the noise observations themselves are noisy.
  2. It provides out-of-sample noise predictions. This is important if you want to actively reason about the noise level at points you have not observed yet. For instance, this is something that is used in the KnowledgeGradient acquisition function.

Note however, that we currently don't expose HeteroskedasticSingleTaskGP in Ax. So if sems are passed we use FixedNoiseGP, if not we infer a homoskedastic noise level.

Finally, there is also a way of inferring heteroskedastic noise levels, one relatively simple approach is the "most likely heteroskedastic GP". There is a long standing PR #250; we have cleaned that up internally and should be able merge that in in the near future (cc @jelena-markovic).

Thanks for the detailed explanation.

I'm afraid I'm not following the most-likely-heteroskedastic-gp approahc. PR #250 seems to be about Raytune - is that the right PR?

Ah sorry, the PR is on the botorch repo: https://github.com/pytorch/botorch/pull/250

Much better. I'm gonna close this as I've learned what I wanted and don't see anything more actionable. Thanks!

Was this page helpful?
0 / 5 - 0 ratings