Ax: [Question] Mixing Continuous and Discrete Spaces

Created on 12 Feb 2021  ·  10Comments  ·  Source: facebook/Ax

Hey, I'd like to use Ax for hyperparameter tuning for an RL problem I'm working on. Some values are continuous, some are a list of options, and some are continuous integers.

Many other bayesian optimization libraries are able to handle all these spaces at once, however if I'm reading the documentation correctly Ax does not. In this case should I just discretize my continuous spaces and use bandits, and if so are there any best practices for the discretization (e.g. how many discrete values do you usually want to pick)?

documentation question

All 10 comments

Hi @justinkterry ! What part of our documentation makes you think we can't handle this kind of search space? We should definintely be able to! We'll use BayesOpt to model the continuous numeric parameters (that will work regardless of whether they are floats or ints). The parameters that are lists of options won't be optimized via BayesOpt, but they'll still be included in the search space and explored.

Your bandit optimization page explicitly says they can only do discrete cases. After rereading you're right that the bayesian page doesn't say that, though a sentence clarifying the space domains for it might be useful for clarity.

Also, could you please elaborate on what you mean by "The parameters that are lists of options won't be optimized via BayesOpt, but they'll still be included in the search space and explored."?

Yes, we should definitely clarify on the Bayesian optimization page that as long as you have one continuous numeric parameter, BayesOpt will work!

For your second question, let me tag in @Balandat , who can better explain how our models currently handle mixed search spaces (and can also describe some active/ongoing work we have to improve this).

We currently handle discrete parameters in the following way:

  1. ordered discrete parameters (e.g integers): We perform a continuous relaxation, and model / optimize in this relaxed space, then round to the available values
  2. categorical parameters (e.g. lists of choices): We one-hot encode these in the model and also perform continuous optimization of the one-hot encoded model.

Both approaches for 1. and 2. have drawbacks, but as long as the cardinality of the discrete space is not huge they will work reasonably well. We also have other approaches in the works to deal with higher-cardinality discrete spaces, including things like continuous embeddings, custom kernels, custom optimization approaches etc. (cc @sdaulton, @dme65).

"Both approaches for 1. and 2. have drawbacks, but as long as the cardinality of the discrete space is not huge they will work reasonably well."

I have a 10D problem, with a 0-125 int space a 0-4092 int space, a [2,4,6,8] space and 7 very large continuous spaces. I assume that's fine for the current approaches?

cc @Balandat

Yeah I think that should be ok, the two large int spaces have so many values that the continuous relaxation should work very well for them.

What API are you planning to use? @lena-kashtelyan we'll want to make sure this doesn't get defaulted to using Sobol somehow.

@Balandat, I actually just checked to make sure it won't and it indeed won't –– GPEI will be selected by default.

@justinkterry, what @Balandat is referring to there is our automatic choice of optimization model that is done in Service and Loop APIs. We determine whether we should be using Bayesian optimization or quasi-random generation based on properties of search space, since Bayesian optimization performs well when the search space is mostly continuous.

Currently this logic works as follows:

  • it calculates number of discrete values (sum of lengths of value lists for all choice parameters, just 1 in your case I believe),
  • and calculates the number of range parameters (9 in your case, 2 int-valued ones and 7 float-valued ones),
  • uses BayesOpt if number of range parameters >= number of discrete values, so for you 9 >= 1, so the system will use BayesOpt.

It seems like the only outstanding item here is to make it clearer in our docs for BayesOpt that it does support choice parameters, but works better when at least one range parameter is present and best when there are more range parameters than discrete values.

@lena-kashtelyan

Regarding the SOBOL vs GPEI concerns here, I've been getting this printout during my various optimization tests:

ax.modelbridge.dispatch_utils: Using Bayesian Optimization generation strategy: GenerationStrategy(name='Sobol+GPEI', steps=[Sobol for 11 trials, GPEI for subsequent trials]). Iterations after 11 will take longer to generate due to model-fitting.

Could you please confirm that that's the desired behavior? At minimum it seems odd to switch after 11 not 10 (I'm optimizing over 7 floats and 3 ints and using the service API with ray.tune.)

Was this page helpful?
0 / 5 - 0 ratings