Ax: Contextual Variables in BO?

Created on 6 Mar 2020  路  4Comments  路  Source: facebook/Ax

Hi Ax - when working with physical systems, I get extra measurements that are different then a metric that I would like to pass to a GP model when training, but not sample from when proposing the next arms of experiment.

Is there framework for contextual BO in this case? Where the models will always be trained on additional measurements? Or can I fold it into a parameter that the acquisition function ignores?

question

Most helpful comment

There are a few different types of contextual BO, but it sounds like (correct me if I'm wrong!) your setting is that:

  • you specify x to evaluate
  • you get back an observation of metric at points (x, a), (x, b), (x,c). So, you observe metric at x under the three contexts a, b, and c.

Is that right? And now the goal is to jointly model x and context?

You can certainly add the context to the search space and fit a model. When generating new points though, you'll basically have to choose which context you want to generate for. I.e., you'd be asking for what is the best x for context a. You can get this using the fixed_features input to the model's gen, which basically allows you to fix some set of variables to a particular value during the optimization:
https://github.com/facebook/Ax/blob/6452114cd7684587e6fdbe92ce0dfe7ea9ddcbbb/ax/modelbridge/base.py#L550

So it would look like

from ax.core.observation import ObservationFeatures

new_points = m.gen(n=3, fixed_features=ObservationFeatures(parameters={'context_param': 'a'})

If you give some more specifics about your setting / problem I can probably give more suggestions!

All 4 comments

There are a few different types of contextual BO, but it sounds like (correct me if I'm wrong!) your setting is that:

  • you specify x to evaluate
  • you get back an observation of metric at points (x, a), (x, b), (x,c). So, you observe metric at x under the three contexts a, b, and c.

Is that right? And now the goal is to jointly model x and context?

You can certainly add the context to the search space and fit a model. When generating new points though, you'll basically have to choose which context you want to generate for. I.e., you'd be asking for what is the best x for context a. You can get this using the fixed_features input to the model's gen, which basically allows you to fix some set of variables to a particular value during the optimization:
https://github.com/facebook/Ax/blob/6452114cd7684587e6fdbe92ce0dfe7ea9ddcbbb/ax/modelbridge/base.py#L550

So it would look like

from ax.core.observation import ObservationFeatures

new_points = m.gen(n=3, fixed_features=ObservationFeatures(parameters={'context_param': 'a'})

If you give some more specifics about your setting / problem I can probably give more suggestions!

Let me see if I can clarify. Consider a design problem, where we modify three variables (a,b,c), and after manufacturing we observe a variable (d). The information in (d) will help the model's accuracy, but when choosing new design paramters, we must only use (a,b,c).

this is similar to using another objective, but we don't necessarily want to max or min (d), we just want to use it to help train the model. Does your post above still hold?

So if I understand correctly, the variable here isn't something that you can control at all, but rather it's just something that you measure as part of your function evaluation, in addition to the objective that you're trying to minimize. So, the thing you measure, d, is a function of the design parameters (a, b, c), and you think that incorporating the observation of d into the model could allow it to better estimate the objective. You're not interested in trying to control the value of d, you just think it is useful side information. Is that correct?

In that case (d) is really more a second outcome and rather than a parameter. You can get this into the model just by adding it to the data as a second metric, alongside the objective. For instance Section 5 here shows what the data looks like when modeling two outcomes: https://ax.dev/tutorials/gpei_hartmann_developer.html . There they are hartmann6 and l2norm; here they would be objective and (d) (whatever its actual name is). Now by default doing this will fit separate models for the objective and for (d) so it will not borrow strength from (d) into objective as you wish. But it is possible to use a multi-task model here.

But before going into that, I'm wondering how useful this will be. Why do you expect (d) to be informative about the objective? Is it highly correlated with the objective? Is there high noise level in the objective but lower in (d)? The way to borrow strength across metrics in a GP is to use a multi-task kernel. The main benefit that you can get from a multi-task kernel comes in settings where you have many more observations for (d) than you do for objective; that doesn't sound like the case here, is it?

If (d) and objective are observed at only the same points, there will be very limited benefit to multi-task modeling. How much benefit there are will depend on the noise levels and on the linear correlation between (d) and objective. In the best-case scenario where (d) and objective are perfectly correlated (so if you know (d), you can compute objective), using a multi-task model will reduce your predictive uncertainty by at most 1/sqrt(2) (so, 30% less predictive variance). As you decrease the linear correlation between (d) and objective, that number drops off rapidly. If they have no linear correlation, there is no benefit to multi-task modeling. If you have noiseless observations, there is no benefit to multi-task modeling. If you have unknown noise levels, we currently don't support multi-task modeling. Even in these settings with limited benefit, there is however, cost - there are additional model parameters to be inferred which makes inference more difficult and computationally costly. For that reason I do not think it is a good idea to fit multitask models unless you are in the setting where you have many more observations of (d) than you do of objective.

If you are in that setting, then it would be exactly that which we describe in this paper http://jmlr.org/papers/volume20/18-225/18-225.pdf and that I could point you towards how to set up.

Thank you for your detailed description @bletham. It sounds like the information added to the model most likely won't be a huge help (because not more numerous in data, or expected to be a super strong outcome with the preferred metric). I will definitely reference this response again, and let you know if I fall into the more advanced setting and need help with setup.

Was this page helpful?
0 / 5 - 0 ratings