Gpyopt: Using custom initialization values

Created on 29 Jun 2018  路  6Comments  路  Source: SheffieldML/GPyOpt

It would be nice if we could specify values to initialize with. I'm using the BayesianOptimization method to optimize the hyperparameters for a machine learning model on a very large dataset. As a result, each probe of the function takes a really long time. If something happens that causes training to fail, the entire optimization fails. I'm saving the parameter combinations and evaluation scores as optimization goes on. It would be nice if I could restart optimization and initialize with the saved parameter and eval results from the previous attempts.

For example, suppose I have two parameters _P_1_ and _P_2_. Let's say I get through two experiments (i.e. sample _f_ twice with two combinations of values for the parameters and get back two respective values, call them _V_1_ and _V_2_). Then something happens on the third experiment that causes it to crash. I would like to be able to restart BayesianOptimization and pass it the results from the first two experiments as initial values so that I can pick up where I left off.

1) Is this currently possible?
2) If not, could we add it as a feature?

Most helpful comment

This is absolutely possible. Top level API takes in arrays X and Y, that are supposed to be initial values. By default they are both set to None, but you can give it any number of values you have already collected.

All 6 comments

This is absolutely possible. Top level API takes in arrays X and Y, that are supposed to be initial values. By default they are both set to None, but you can give it any number of values you have already collected.

Just to elaborate how you'd start a BayesianOptimization from initial values, or from a custom design of experiments...

When you initialize your optimization object, set initial_design_numdata=0

myBopt = BayesianOptimization(my_obj, domain=bounds, model_type='GP',
                              initial_design_numdata=0)

Then you can use the API to set the X and Y arrays. X is the 2D array of your previous design points, while Y is the 2D array of your previous objective function values.

myBopt.X = my_prev_X
myBopt.Y = my_prev_Y

And then you can run the optimization

myBopt.run_optimization(max_iter=max_iter)

Thanks @cjekel for more info. I had a slightly different thing in mind, but that code would also work.

Just to elaborate how you'd start a BayesianOptimization from initial values, or from a custom design of experiments...

When you initialize your optimization object, set initial_design_numdata=0

myBopt = BayesianOptimization(my_obj, domain=bounds, model_type='GP',
                              initial_design_numdata=0)

Then you can use the API to set the X and Y arrays. X is the 2D array of your previous design points, while Y is the 2D array of your previous objective function values.

myBopt.X = my_prev_X
myBopt.Y = my_prev_Y

And then you can run the optimization

myBopt.run_optimization(max_iter=max_iter)

Hi, cjekel
Thanks for your explation, but I still have problem on doing this.
I did what you said, but error happened.
Here is the simple code

import numpy as np
import GPy
import GPyOpt

def myf(x1,x2):
    y = x1-1 + x2 ** 2
    return y

def f(x):
    for _x in x:
        y = myf(x1 = _x[0],x2=_x[1])
    return y

bounds = [{'name': 'var_1', 'type': 'continuous', 'domain': (-1,1)},
          {'name': 'var_2', 'type': 'continuous', 'domain': (-1,2)}]

myProblem = GPyOpt.methods.BayesianOptimization(f,bounds,initial_design_numdata=0,acquisition_type='EI')

array_x = np.array([1,0])
array_y = np.array([0])

print(array_x)
myProblem.X = array_x
myProblem.Y = array_y

max_iter = 15

myProblem.run_optimization(max_iter)

myProblem.x_opt

The error said that IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
Is it something wrong in the 2d array?

Thank you

@Seal-o-O you are very close, you just need to make sure that .X is a 2d array. This array is of shape number_of_function_evaluations by number_of_dimensions. So in your case the shape of .X should be (1, 2).

Also .Y must be a 2darray. I put the reshape command in the following because we may generally store this as a 1D array.

So here is a working version of your code.

import numpy as np
import GPy
import GPyOpt

def myf(x1,x2):
    y = x1-1 + x2 ** 2
    return y

def f(x):
    for _x in x:
        y = myf(x1 = _x[0],x2=_x[1])
    return y

bounds = [{'name': 'var_1', 'type': 'continuous', 'domain': (-1,1)},
          {'name': 'var_2', 'type': 'continuous', 'domain': (-1,2)}]

myProblem = GPyOpt.methods.BayesianOptimization(f,bounds,initial_design_numdata=0,acquisition_type='EI')

array_x = np.array([[1,0]])
array_y = np.array([0]).reshape(-1, 1)

print(array_x)
myProblem.X = array_x
myProblem.Y = array_y

max_iter = 15

myProblem.run_optimization(max_iter)

myProblem.x_opt

PS. Don't expect magic results when you only pass 1 function evaluation into the optimization problem. What's going to happen, is the first few evaluations will basically be maximizing the minimum distance from your existing points. Not too much different than using the Latin-hypercube sampling.

@cjekel

Thanks! It helps a lot!
your explanation always makes me suddenly feel refreshed

Was this page helpful?
0 / 5 - 0 ratings