Ax: High-dimensional discrete search space

Created on 6 Nov 2020 · 9Comments · Source: facebook/Ax

Is it possible to set up a multi-objective Bayesian optimization model where the search space is given by discrete points in a high-dimensional Euclidean space?
To be more specific: I have a data set X of about 10000 samples with each sample being a point in the unit cube [0,1]^30. I have two objective functions whose domain is given by the discrete set X. Is it possible to define the search space to be the set X for a multiobjective optimization routine?
Following the tutorials and the documentation I was only able to figure out how to define the search space as the whole unit cube but not how to restrict it to discrete points therein.

Thanks in advance!

question

Source

grmaier

All 9 comments

Is your discrete set the cross product of discrete sets of values for each dimension of X?
I.e. Is the following true?
If x = [x_0, ..., x_d] is a configuration and X_0, ... X_d are discrete sets of values for each dimension, then in if x_i in X_i for i=0, ..., d, then x is a feasible configuration.

Or, is the discrete set a more restricted subset?

sdaulton on 6 Nov 2020

Thanks for your quick response @sdaulton!
Unforunately, the discrete set is in fact a more restrictive subset with no product structure as you described.

However, I thought about one issue which could resolve my problem: I could enumerate all discrete points from 0 to 10000 (say) and use the numbers 0,...,10000 as fixed parameters to build the search space. Then I would also change the domain of my objectives accordingly. Is it true that when optimizing the acquisition function on a discrete search space, its value is evaluated at every single discrete point and the point with the highest value of the acquisiton function is chosen as the next point for evaluating the objectives? If this were true, I think this would give me a valid multi-objective optimization routine for my purposes.

grmaier on 6 Nov 2020

I'd recommend specifying the search space as continuous in the 30d-space, so that the GP accounts for the ordinal structure of each parameter (it is ordinal right?). Then you could define a custom acqf_optimizer that takes the acquisition function (currently it would have to be qEHVI) and evaluates it at every possible point and then returns the candidate(s) with the maximum acquisition value(s): https://github.com/facebook/Ax/blob/master/ax/models/torch/botorch_moo.py#L198. You'd have to manually generate random initial configurations to evaluate before starting BO

sdaulton on 6 Nov 2020

👍1

Thanks for this suggestion!
Sorry, could you explain to me what you mean by ordinal structure? I am not sure if I understand it correctly and why my suggested strategy above may not be the best to recommend.

grmaier on 6 Nov 2020

Ah I just mean was just distinguishing that if the values of a parameter (single dimension here) are ordered (e.g. 0.1 > 0.02 > 0.01), then making that parameter ordered in Ax (e.g. a FLOAT), will give that information to the model. In contrast if a parameter are not ordered and the values are simply different categories, then the model would one-hot encode that parameter.

sdaulton on 6 Nov 2020

@sdaulton , @grmaier , I'm trying to understand this problem. Let me pretend d = 1, so we have a single dimension, and for simplicity X = {0, 0.5, 1}.

Is it correct that f(0) (where f is your objective) is closer to f(0.5) than it is to f(1)? Like, is there underlying smoothness here that a model can exploit?

Is it also correct that f(0.25) (where 0.25 is not in X) is completely undefined? Or is it defined, but we don't want to suggest this point as a winning candidate for some reason?

ldworkin on 6 Nov 2020

Is it correct that f(0) (where f is your objective) is closer to f(0.5) than it is to f(1)? Like, is there underlying smoothness here that a model can exploit?

Even if the function is smooth, this is not necessarily true, but if there is an ordinal relationship 0 < 0.5 < 1, then we should exploit that---as opposed to X = {"0", "0.5", "1"} being simply different categories

Is it also correct that f(0.25) (where 0.25 is not in X) is completely undefined?

I'll let @grmaier comment on this

sdaulton on 6 Nov 2020

👍1

@ldworkin I don't really have much information about the objective functions f1, f2 and their regularity. I may assume that they continuous or maybe C^2 as they describe real-world ouput data of (expensive) experiments. But I actually only know their values at the discrete points in X. Outside of X, the functions are undefined. But the points in X (which are tuples of floats) have no particular order.

Basically, I want to test if multiobjective Bayesian optimization is a suitable method to quickly identify the (known) pareto front of the set {(f1(x), f2(x)): x\in X} with X a subset of [0,1]^30. For this I want to start with a subset Y of X to initialize the optimization and then I want to fit two Gaussian processes corresponding to f1 and f2 to the points from Y and the points from X which are subsequently suggested by the acquisition functions.
I think I understand why my initial strategy described above would fail as the Gaussian process would not be on the domain [0,1]^30. I think the strategy described by @sdaulton could work. I will give it a try. If there are any other suggestions I would really appreciate it if you could share them with me.

Thanks for your help so far!

grmaier on 6 Nov 2020

👍1

Will close for now then, let us know if you have any other questions!

ldworkin on 11 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings