Keras: Hyper-parameter Optimization with keras?

Created on 29 Jan 2016 · 20Comments · Source: keras-team/keras

Hi,

I've just achieved a DNN with Keras, but in order to improve my model, I need some optimization overall for all hyper parameters (neuron number, layers, learning-rate etc...), I intend to use both grid search and random search, while i've seen some examples in scikit, but it seems keras isn't compatible with that.
Do I need to implement all by myself?
Or anyone has other idea of getting them?

Thanks a lot.

stale

Source

hanlianlu

Most helpful comment

Here's a little code using hyperopt for optimization of a few parameters of a basic MLP. Adapt or improve as desired!

from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from sklearn.metrics import roc_auc_score
import sys

X = []
y = []
X_val = []
y_val = []

space = {'choice': hp.choice('num_layers',
                    [ {'layers':'two', },
                    {'layers':'three',
                    'units3': hp.uniform('units3', 64,1024), 
                    'dropout3': hp.uniform('dropout3', .25,.75)}
                    ]),

            'units1': hp.uniform('units1', 64,1024),
            'units2': hp.uniform('units2', 64,1024),

            'dropout1': hp.uniform('dropout1', .25,.75),
            'dropout2': hp.uniform('dropout2',  .25,.75),

            'batch_size' : hp.uniform('batch_size', 28,128),

            'nb_epochs' :  100,
            'optimizer': hp.choice('optimizer',['adadelta','adam','rmsprop']),
            'activation': 'relu'
        }

def f_nn(params):   
    from keras.models import Sequential
    from keras.layers.core import Dense, Dropout, Activation
    from keras.optimizers import Adadelta, Adam, rmsprop

    print ('Params testing: ', params)
    model = Sequential()
    model.add(Dense(output_dim=params['units1'], input_dim = X.shape[1])) 
    model.add(Activation(params['activation']))
    model.add(Dropout(params['dropout1']))

    model.add(Dense(output_dim=params['units2'], init = "glorot_uniform")) 
    model.add(Activation(params['activation']))
    model.add(Dropout(params['dropout2']))

    if params['choice']['layers']== 'three':
        model.add(Dense(output_dim=params['choice']['units3'], init = "glorot_uniform")) 
        model.add(Activation(params['activation']))
        model.add(Dropout(params['choice']['dropout3']))    

    model.add(Dense(1))
    model.add(Activation('sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer=params['optimizer'])

    model.fit(X, y, nb_epoch=params['nb_epochs'], batch_size=params['batch_size'], verbose = 0)

    pred_auc =model.predict_proba(X_val, batch_size = 128, verbose = 0)
    acc = roc_auc_score(y_val, pred_auc)
    print('AUC:', acc)
    sys.stdout.flush() 
    return {'loss': -acc, 'status': STATUS_OK}


trials = Trials()
best = fmin(f_nn, space, algo=tpe.suggest, max_evals=50, trials=trials)
print 'best: '
print best

jacobzweig on 30 Jan 2016

👍37 ❤11

All 20 comments

Here's a little code using hyperopt for optimization of a few parameters of a basic MLP. Adapt or improve as desired!

from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from sklearn.metrics import roc_auc_score
import sys

X = []
y = []
X_val = []
y_val = []

space = {'choice': hp.choice('num_layers',
                    [ {'layers':'two', },
                    {'layers':'three',
                    'units3': hp.uniform('units3', 64,1024), 
                    'dropout3': hp.uniform('dropout3', .25,.75)}
                    ]),

            'units1': hp.uniform('units1', 64,1024),
            'units2': hp.uniform('units2', 64,1024),

            'dropout1': hp.uniform('dropout1', .25,.75),
            'dropout2': hp.uniform('dropout2',  .25,.75),

            'batch_size' : hp.uniform('batch_size', 28,128),

            'nb_epochs' :  100,
            'optimizer': hp.choice('optimizer',['adadelta','adam','rmsprop']),
            'activation': 'relu'
        }

def f_nn(params):   
    from keras.models import Sequential
    from keras.layers.core import Dense, Dropout, Activation
    from keras.optimizers import Adadelta, Adam, rmsprop

    print ('Params testing: ', params)
    model = Sequential()
    model.add(Dense(output_dim=params['units1'], input_dim = X.shape[1])) 
    model.add(Activation(params['activation']))
    model.add(Dropout(params['dropout1']))

    model.add(Dense(output_dim=params['units2'], init = "glorot_uniform")) 
    model.add(Activation(params['activation']))
    model.add(Dropout(params['dropout2']))

    if params['choice']['layers']== 'three':
        model.add(Dense(output_dim=params['choice']['units3'], init = "glorot_uniform")) 
        model.add(Activation(params['activation']))
        model.add(Dropout(params['choice']['dropout3']))    

    model.add(Dense(1))
    model.add(Activation('sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer=params['optimizer'])

    model.fit(X, y, nb_epoch=params['nb_epochs'], batch_size=params['batch_size'], verbose = 0)

    pred_auc =model.predict_proba(X_val, batch_size = 128, verbose = 0)
    acc = roc_auc_score(y_val, pred_auc)
    print('AUC:', acc)
    sys.stdout.flush() 
    return {'loss': -acc, 'status': STATUS_OK}


trials = Trials()
best = fmin(f_nn, space, algo=tpe.suggest, max_evals=50, trials=trials)
print 'best: '
print best

jacobzweig on 30 Jan 2016

👍37 ❤11

In case you're interested, with _hyperas_ you can use jinja-style templates directly within your keras model, instead of having to define the space separately. I've been using this little wrapper for a while and like to think it's pretty useful for quick experiments:

https://github.com/maxpumperla/hyperas

maxpumperla on 20 Feb 2016

👍2

Thank you for the tips, i will both go for a try :)

hanlianlu on 20 Feb 2016

@jacobzweig I've tried to adapt the example you'v offered for my own model, here comes a little problem:

def space():
space = {'num_layer' : hp.choice('num_layer',[{'layers':'add1'},{'layers':'add2'},
{'layers':'add3'},{'layers':'add4'}]),

         'activation' : hp.choice('activation',['ELU(alpha=1.0)','Activation(tanh)']),
         'optimizer' : hp.choice('optimizer',['SGD(lr=0.03, decay=1e-7, momentum=0.15, nesterov=True)','RMSprop','Adadelta','Adam']),
         'dropout1' : hp.uniform('dropout1',0.25,0.75),
         'dropout2' : hp.uniform('dropout2',0.05, 0.5),
         'nb_epochs' :  150,
         #'units' : hp.quniform('units', 800,1400,2),
         'units' : hp.choice('units', [1024,1512,2048,2560]),
         'regularizer' : hp.choice('regularizer',['l2','activity_l2']),           
         }

def model(space,X_train,Y_train,X_test,Y_test):
model = Sequential()
model.add(Dense(output_dim=space['units'], input_dim=X_train.shape[1], init='he_uniform', W_regularizer=l2(l=0.0001)))
print('it is ok add layer')
......

as I run this, it always return an error with

File "mtrand.pyx", line 220, in mtrand.cont2_array_sc (numpy/random/mtrand/mtrand.c:2902)

TypeError: an integer is required

It seems that the error occurred when the 'add(Dense)' is called.
Consider it might be the reason of units not being int, I've tried with hp.choice, hp.quniform, and hp.uniform for units definition, none of these solve that.
Would u give me a hint about the cause of that please?

hanlianlu on 8 Mar 2016

Sorry - not really sure what your error is. Looks like mtrand is something with numpy... perhaps try updating your numpy installation?

jacobzweig on 8 Mar 2016

1842 allows you to use sklearn grid_search api to tune both model parameters and hyper parameters. But it is not in built version yet.

ipod825 on 8 Mar 2016

http://machinelearningmastery.com/use-keras-deep-learning-models-scikit-learn-python/

vabatista on 17 Jun 2016

👍1

Thanks for the link - It'd be helpful to add an example like this to the docs too.

jacobzweig on 17 Jun 2016

With the scikit-learn wrapper, how would you guide the search based on 'best validation score' within a given number of epoch runs? For example, say nb_epoch=100 fixed, but a configuration achieved best validation error at 30, and another configuration achieved it at 50. It seems GridSearchCV will score the model only after the 100 epoch runs.

chanshing on 17 Aug 2016

👍1

@jacobzweig Great example code. Question though: it seems that you are optimizing on the AUC of your validation data:

pred_auc =model.predict_proba(X_val, batch_size = 128, verbose = 0)
    acc = roc_auc_score(y_val, pred_auc)

Does this not mean that in fact you are training the hyper parameters to learn the correct answer, rather than to predict it? It seems to me that the validation set has now become part of your training data through optimization of the hyper parameters?

jdelange on 29 Dec 2016

Hey @jdelange - you're correct - this is assuming that you have a separate unseen test set. It would be incorrect to report your validation set accuracy after any form of hyperparameter optimization - even slight manual tweaking.

jacobzweig on 29 Dec 2016

👍1

@maxpumperla Hi Max, is it possible to use hyperas with a model that is trained with data-parallelism across multiple GPUs? (i.e. I send separate batches to different GPUs, train the same model, and concatenate the outputs)

dylanrandle on 12 Jun 2017

👍1

hi @dylanrandle, do you want to move this to hyperas? Short answer: it depends what precisely you are doing. As hyperas is just a wrapper for hyperopt, which has a distributed mode using mongodb, this use case is generally covered. In fact, I would recommend using plain hyperopt for this.

maxpumperla on 13 Jun 2017

👍1

@maxpumperla Is it possible to make hyperas compatible with keras 2.x ?

kaushalshetty on 6 Sep 2017

Hi @jacobzweig
best = fmin(f_nn, space, algo=tpe.suggest, max_evals=50, trials=trials)
print 'best: '.
In the above code you are trying to minimize the f_nn function which returns a roc_auc_score. I was just wondering whether we should increase or decrease the roc_auc_score .

kaushalshetty on 7 Sep 2017

👍1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 6 Dec 2017

@hanlianlu Hi, is there any method to optimize the optimizer and it's params at the same time? Say, 'optimizer' : 'optimizer':
'SGD: lr=[0.01-0.08], decay=1e-7, momentum=[0.01-0.15], nesterov=True)'
,'RMSprop': ....
'Adadelta': .....
'Adam'] ....

tonywangcn on 28 Feb 2018

Hello,

If you are talking about hyper-Params within optimizer, it can be done in
the the same way as with other Params. Back then I was using “hyperas” in
early version of Keras.

But today keras should have hyper parameters optimization implemented
already, you could search a bit in docs.

Best
Hanlian Lyu

On Wed, 28 Feb 2018 at 17:00, TonyWang notifications@github.com wrote:

@hanlianlu https://github.com/hanlianlu Hi, is there any method to
optimize the optimizer and it's params? Say, 'optimizer' : 'optimizer':
'SGD: lr=[0.01-0.08], decay=1e-7, momentum=[0.01-0.15], nesterov=True)'
,'RMSprop': ....
'Adadelta': .....
'Adam'] ....

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/keras-team/keras/issues/1591#issuecomment-369284959,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AHE3vP5et7p72jdhvjPGoaisK9ehrhioks5tZXdZgaJpZM4HPNzM
.

>