It would be nice if we could specify values to initialize with. I'm using the BayesianOptimization
method to optimize the hyperparameters for a machine learning model on a very large dataset. As a result, each probe of the function takes a really long time. If something happens that causes training to fail, the entire optimization fails. I'm saving the parameter combinations and evaluation scores as optimization goes on. It would be nice if I could restart optimization and initialize with the saved parameter and eval results from the previous attempts.
For example, suppose I have two parameters _P_1_ and _P_2_. Let's say I get through two experiments (i.e. sample _f_ twice with two combinations of values for the parameters and get back two respective values, call them _V_1_ and _V_2_). Then something happens on the third experiment that causes it to crash. I would like to be able to restart BayesianOptimization and pass it the results from the first two experiments as initial values so that I can pick up where I left off.
1) Is this currently possible?
2) If not, could we add it as a feature?
This is absolutely possible. Top level API takes in arrays X and Y, that are supposed to be initial values. By default they are both set to None, but you can give it any number of values you have already collected.
Just to elaborate how you'd start a BayesianOptimization
from initial values, or from a custom design of experiments...
When you initialize your optimization object, set initial_design_numdata=0
myBopt = BayesianOptimization(my_obj, domain=bounds, model_type='GP',
initial_design_numdata=0)
Then you can use the API to set the X and Y arrays. X is the 2D array of your previous design points, while Y is the 2D array of your previous objective function values.
myBopt.X = my_prev_X
myBopt.Y = my_prev_Y
And then you can run the optimization
myBopt.run_optimization(max_iter=max_iter)
Thanks @cjekel for more info. I had a slightly different thing in mind, but that code would also work.
Just to elaborate how you'd start a
BayesianOptimization
from initial values, or from a custom design of experiments...When you initialize your optimization object, set
initial_design_numdata=0
myBopt = BayesianOptimization(my_obj, domain=bounds, model_type='GP', initial_design_numdata=0)
Then you can use the API to set the X and Y arrays. X is the 2D array of your previous design points, while Y is the 2D array of your previous objective function values.
myBopt.X = my_prev_X myBopt.Y = my_prev_Y
And then you can run the optimization
myBopt.run_optimization(max_iter=max_iter)
Hi, cjekel
Thanks for your explation, but I still have problem on doing this.
I did what you said, but error happened.
Here is the simple code
import numpy as np
import GPy
import GPyOpt
def myf(x1,x2):
y = x1-1 + x2 ** 2
return y
def f(x):
for _x in x:
y = myf(x1 = _x[0],x2=_x[1])
return y
bounds = [{'name': 'var_1', 'type': 'continuous', 'domain': (-1,1)},
{'name': 'var_2', 'type': 'continuous', 'domain': (-1,2)}]
myProblem = GPyOpt.methods.BayesianOptimization(f,bounds,initial_design_numdata=0,acquisition_type='EI')
array_x = np.array([1,0])
array_y = np.array([0])
print(array_x)
myProblem.X = array_x
myProblem.Y = array_y
max_iter = 15
myProblem.run_optimization(max_iter)
myProblem.x_opt
The error said that IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
Is it something wrong in the 2d array?
Thank you
@Seal-o-O you are very close, you just need to make sure that .X
is a 2d array. This array is of shape number_of_function_evaluations
by number_of_dimensions
. So in your case the shape of .X
should be (1, 2)
.
Also .Y
must be a 2darray. I put the reshape
command in the following because we may generally store this as a 1D array.
So here is a working version of your code.
import numpy as np
import GPy
import GPyOpt
def myf(x1,x2):
y = x1-1 + x2 ** 2
return y
def f(x):
for _x in x:
y = myf(x1 = _x[0],x2=_x[1])
return y
bounds = [{'name': 'var_1', 'type': 'continuous', 'domain': (-1,1)},
{'name': 'var_2', 'type': 'continuous', 'domain': (-1,2)}]
myProblem = GPyOpt.methods.BayesianOptimization(f,bounds,initial_design_numdata=0,acquisition_type='EI')
array_x = np.array([[1,0]])
array_y = np.array([0]).reshape(-1, 1)
print(array_x)
myProblem.X = array_x
myProblem.Y = array_y
max_iter = 15
myProblem.run_optimization(max_iter)
myProblem.x_opt
PS. Don't expect magic results when you only pass 1 function evaluation into the optimization problem. What's going to happen, is the first few evaluations will basically be maximizing the minimum distance from your existing points. Not too much different than using the Latin-hypercube sampling.
@cjekel
Thanks! It helps a lot!
your explanation always makes me suddenly feel refreshed
Most helpful comment
This is absolutely possible. Top level API takes in arrays X and Y, that are supposed to be initial values. By default they are both set to None, but you can give it any number of values you have already collected.