Ax does not actually recommend the best parameters

Created on 3 Feb 2021  Â·  16Comments  Â·  Source: facebook/Ax

see discussion here: https://github.com/facebookresearch/hydra/issues/1352

the basic issue is that the best parameters recommended by ax (as exposed by hydra integration) are often, in fact, not best as there can be other runs during optimization that achieved better objective value, but ax instead can recommend parameters from a worse run. this is at very least confusing, requires manual parameter selection by logging and examining parameters and objectives seen during tuning, and makes me wonder whether ax actually explores the parameter search space in a way that will find near-optimal parameters

please let us know if we somehow misinterpreted something or if hydra integration is not correct. thank you!

in progress upstream issue

Most helpful comment

If complete_trial is called with a single scalar, I believe we assume noiseless setting.

I don't think that's true: https://github.com/facebook/Ax/blob/5e5947db41285713fbf3e50e112235e581f05364/ax/service/utils/instantiation.py#L599-L600

we return None as the sem if only a scalar is provided, which should be interpreted as nosiy with unknown noise level (so we infer it)

All 16 comments

I am not particularly familiar with the hydra integration, but we certainly strive to exploring the search space in a way to find near-optimal parameters. We will take a look at the details of the integration.

the parameters ax recommends are actually exactly the same as one of the runs that was executed during optimization. but, there are other parameters that resulted in a lower target value. why would ax recommend exact parameters that are objectively worse than another exact set of parameters for which it knows the result?

Regarding the discrepancies between "suggested best" and "observed best" values: We use a model to fit the response surface and then use that to quantify the value of a configuration. This can help regularize observations in case the observations are noisy. The fact that Ax doesn't return the observed best value means one of two things: (i) Ax considers the observations noisy and the model regularizes the results significantly, or (ii) there is some bug.

So my first question is: In your use case, do you have noisy observations (i.e. if you evaluate the same configuration twice, will you get different results)?

Also, can you share the script that you ran for us to have a full repro? If not, could you provide us with the configurations and observed objective values so we can see what's going on with the model?

Either way, we should have a way to indicate in the hydra configuration whether observations should be considered noiseless.

hi! my use case here is that i am trying to use ax to find the best parameters for language model fusion for ASR decoding. there are typically two parameters to tune: language model weight and a silence weight. these are dependent on each other, meaning that the optimal lm weight for a particular silence weight may not be optimal for a different silence score.

it is also fully deterministic, meaning you will always get exactly the same result for the same parameter values.

unfortunately i can't provide a repro example as i encountered this as part of a new research project that has not yet been released. i can maybe share some logs if that would help or try to come up with a stand alone repro but that may take some time.

under what circumstances would ax consider observations noisy (aside from not being deterministic)? is it possible to disable this assumption - since it can vary based on the problem being solved?

providing more info to the user would be great! getting recommendations that are revealed to not the best (or even in the top 5 of best) after examining logs can be frustrating!

Hi @alexeib! I'm suspicious that this might be a bug, actually. After looking at the code for the plugin, I don't actually think that Ax is considering the observations to be noisy. So it would be great if you could help us investigate the issue further. Even if you can't provide a full repro, maybe you can provide some (anonymized) data that contains the parameter values that were tested and the observed objective for each?

under what circumstances would ax consider observations noisy (aside from not being deterministic)? is it possible to disable this assumption - since it can vary based on the problem being solved?

So in our standard API (not via hydra) this is achieved by passing a zero sem for each observation (In Ax, observations are (mean, sem) tuples). You can also pass NaN in which case we infer a noise level, or some other sem value if you happen to have access to that. So the user basically can indicate whether observations are noisy or not.

@ldworkin dp you mind pointing me to the place in hydra where this is handled?

since we are all at fb might be quicker if we coordinate over workchat and i can provide a repro there? will ping you tomorrow!

@Balandat I'm looking at this and making the (possibly wrong) assumption that rets[i].return_value is a single scalar, rather than a (mean, sem) tuple.

If complete_trial is called with a single scalar, I believe we assume noiseless setting.

One thing that I often like to chime in with is that it’s generally not a
good idea to set the noise levels to 0 or super close to 0. Having a
positive noise (hyper)parameter for the GP helps improve robustness if the
GP model is misspecified.

On Wed, Feb 3, 2021 at 6:47 PM Lili Dworkin notifications@github.com
wrote:

@Balandat https://github.com/Balandat I'm looking at
https://fburl.com/diffusion/yjdh75kn and making the (possibly wrong)
assumption that rets[i].return_value is a single scalar, rather than a
(mean, sem) tuple.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/facebook/Ax/issues/484#issuecomment-772985743, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AAAW34MEF5X3JAA46O4BJWDS5IDFDANCNFSM4W73GYIQ
.

If complete_trial is called with a single scalar, I believe we assume noiseless setting.

I don't think that's true: https://github.com/facebook/Ax/blob/5e5947db41285713fbf3e50e112235e581f05364/ax/service/utils/instantiation.py#L599-L600

we return None as the sem if only a scalar is provided, which should be interpreted as nosiy with unknown noise level (so we infer it)

Oh ok! My mistake.

The return value from the function is passed as is so in principle it should be possible for the function to return a (loss, 0) to achieve noiseless interpretation.

This will not work with other HPO plugins though, as they are expecting a scalar result.
If the above achieves the desired behavior, we can add a configuration flag to the Ax Sweeper to indicate if the function should be considered deterministic or not:

val : Any = rets[idx].return_value
assert isinstance(val, (float, tuple))
if isinstance(val, float):
  if cfg.deterministic:
    val = (val, 0)
  else:
    val = (val, None)

Absolutely, that makes sense. Investigating now to see whether switching to noiseless setting actually fixes the problem.

I’d caution against making the noise level exactly 0 as this will lead to
numerical issues / failures in cases where there are a lot of points and
the data doesn’t perfectly follow a GP. Max is there some other default
that we might want to suggest?

On Fri, Feb 5, 2021 at 6:33 AM Lili Dworkin notifications@github.com
wrote:

Absolutely, that makes sense. Investigating now to see whether switching
to noiseless setting actually fixes the problem.

—
You are receiving this because you commented.

Reply to this email directly, view it on GitHub
https://github.com/facebook/Ax/issues/484#issuecomment-774068264, or
unsubscribe
https://github.com/notifications/unsubscribe-auth/AAAW34MLQCPPQTWXKQY3CETS5P6SJANCNFSM4W73GYIQ
.

@omry, this was indeed the issue, so exposing this configuration flag would be the right fix! Is that something Shagun could work on?

@Balandat , re @eytan 's comment above, do you have any thoughts on whether we should use 0 or something else for the noiseless case?

We already clamp to 1e-7 internally (This is n the standardized output space) : https://github.com/facebook/Ax/blob/0846dc0b60f745b04fb4f65f94522f39424933ef/ax/models/torch/botorch_defaults.py#L44

A configuration flag has been added to Hydra for this purpose! https://github.com/facebookresearch/hydra/commit/eae4b4733d6e05c203304fc41607241d950f2294

Ax Hydra users can also return (score, 0) instead of score from their main function (even before the plugin change to get deterministic interpretation of the optimized function.

Was this page helpful?
0 / 5 - 0 ratings