Keras: Best practices question: decreasing learning rates between epochs

Created on 26 Oct 2015  Â·  26Comments  Â·  Source: keras-team/keras

Howdy,

In published papers I often see that the learning rates are decreased after some hundreds of epochs when learning stalls. What is the best way to do this in Keras? Thus far, I have been recompiling, but (not knowing if there is a better way), that seems foolish.

An example:

First, I build some model and train it.

model = Sequential()
# insert model here
optimizer = adagrad(lr=0.01)
model.compile(optimizer=optimizer)
model.fit(X,y,nb_epoch=50)

UPDATE -- the following works without having to recompile

K.set_value(model.optimizer.lr, 0.001)
model(X,y,nb_epoch=50)

Thank you to @EderSantana for the quick reply.

stale

Most helpful comment

@Rusianka I just found that one can do this to get and set value:

import keras.backend as K
sgd = SGD(lr=0.1, decay=0, momentum=0.9, nesterov=True)
K.set_value(sgd.lr, 0.5 * K.get_value(sgd.lr))

All 26 comments

@sergeyf check the solution in this https://github.com/fchollet/keras/issues/888
Please leave this issue open, even if it solves the problem for you. This is the second time we get this question, which means we need a better documentation. Since I'm already working on something else, would anybody else volunteer to write the documentation? We should close this after somebody writes the docs.

Thank you very much! I will leave this open.

This code may be work, to be tested...

class LrReducer(Callback):
    def __init__(self, patience=0, reduce_rate=0.5, reduce_nb=10, verbose=1):
        super(Callback, self).__init__()
        self.patience = patience
        self.wait = 0
        self.best_score = -1.
        self.reduce_rate = reduce_rate
        self.current_reduce_nb = 0
        self.reduce_nb = reduce_nb
        self.verbose = verbose

    def on_epoch_end(self, epoch, logs={}):
        current_score = logs.get('val_acc')
        if current_score > self.best_score:
            self.best_score = current_score
            self.wait = 0
            if self.verbose > 0:
                print('---current best val accuracy: %.3f' % current_score)
        else:
            if self.wait >= self.patience:
                self.current_reduce_nb += 1
                if self.current_reduce_nb <= 10:
                    lr = self.model.optimizer.lr.get_value()
                    self.model.optimizer.lr.set_value(lr*self.reduce_rate)
                else:
                    if self.verbose > 0:
                        print("Epoch %d: early stopping" % (epoch))
                    self.model.stop_training = True
            self.wait += 1

Thanks @jiumem! This might be a good pull request to Keras?

@sergeyf I just saw this thread, and I'd thought I'd throw in my own function I made to address this. I always use nb_epoch =1 because I'm interested in generating text:

def set_learning_rate(hist, learning_rate = 0, activate_halving_learning_rate = False, new_loss =0, past_loss = 0, counter = 0, save_model_dir=''):
    if activate_halving_learning_rate and (learning_rate>=0.0001):
        if counter == 0:
            new_loss = hist.history['loss'][0]
            if new_loss>=(past_loss): #you want at least a 0.5% loss decrease compared to the previous iteration
                learning_rate = float(learning_rate)/float(2)
                print 'you readjusted the learning rate to', learning_rate
                with open('models/'+save_model_dir+'/'+'history_report.txt', 'a+') as history_file:
                    history_file.write('For next Iteration, Learning Rate has been reduced to '+str(learning_rate)+'\n\n')
                    with open('history_reports/'+save_model_dir+'_'+'history_report.txt', 'a+') as history_file:
                    history_file.write('For next Iteration, Learning Rate has been reduced to '+str(learning_rate)+'\n\n')

            past_loss = new_loss
        return (learning_rate, new_loss, past_loss)

Awesome, thanks!

On Fri, Nov 6, 2015 at 7:24 AM, LeavesBreathe [email protected]
wrote:

@sergeyf https://github.com/sergeyf I just saw this thread, and I'd
thought I'd throw in my own function I made to address this. I always use
nb_epoch =1 because I'm interested in generating text:

def set_learning_rate(hist, learning_rate = 0, activate_halving_learning_rate = False, new_loss =0, past_loss = 0, counter = 0, save_model_dir=''):
if activate_halving_learning_rate and (learning_rate>=0.0001):
if counter == 0:
new_loss = hist.history['loss'][0]
if new_loss>=(past_loss): #you want at least a 0.5% loss decrease compared to the previous iteration
learning_rate = float(learning_rate)/float(2)
print 'you readjusted the learning rate to', learning_rate
with open('models/'+save_model_dir+'/'+'history_report.txt', 'a+') as history_file:
history_file.write('For next Iteration, Learning Rate has been reduced to '+str(learning_rate)+'\n\n')
with open('history_reports/'+save_model_dir+'_'+'history_report.txt', 'a+') as history_file:
history_file.write('For next Iteration, Learning Rate has been reduced to '+str(learning_rate)+'\n\n')

        past_loss = new_loss
    return (learning_rate, new_loss, past_loss)

—
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/898#issuecomment-154437036.

I have a problem using the solution like

model.optimizer.lr.set_value(0.01)
model(X,y,nb_epoch=50)

with tensorflow backend.
Can't do set_value and get_value as it was discussed here and in another thread.

return model_systole.optimizer.lr.get_value()
AttributeError: 'Tensor' object has no attribute 'get_value'

Any suggestions please?

@Rusianka I just found that one can do this to get and set value:

import keras.backend as K
sgd = SGD(lr=0.1, decay=0, momentum=0.9, nesterov=True)
K.set_value(sgd.lr, 0.5 * K.get_value(sgd.lr))

@entron when does the .set_value() happen? after every epoch?

@ShuaiW you can put the line K.set_value(sgd.lr, 0.5 * K.get_value(sgd.lr)) inside the epoch loop to set lr at each epoch.

@entron thanks for your response.

my model doesn't have an epoch loop; instead it's like this:

N_EPOCH = 100

model = Model(..)

model.compile(...)

model.fit(X, y, batch_size=64, nb_epoch=N_EPOCH, verbose=1, shuffle=True, callbacks=...)

Anyway to fit in?

Maybe you can set N_EPOCH=1 and loop outside.

So I had problems with the model.fit_generator function, so I decided to use model.fit instead and put it inside a for loop like so:

for x, y in generate_arrays_from_file(): x = model.fit(x, y, batch_size=16, nb_epoch=1, verbose=1)

Here are my questions:

1) I am using Adam for my optimizer, and I saw on another thread that it is impossible to directly get the current learning rate. You have to calculate it yourself indirectly.
http://stackoverflow.com/questions/37091751/keras-learning-rate-not-changing-despite-decay-in-sgd
Since nb_epoch is only 1 in the above function, and model.fit is inside a loop, is the learning rate guaranteed to decrease (that's my understanding of how Adam works) or do I have to write a separate function to manually decrease the learning rate myself?

2) My loss initially decreases quite rapidly, but then appears to fluctuate and stop decreasing even after several days of training. This is true both when I train on single images and multi-channel images. Since the loss is fluctuating after every call of model.fit, will that screw up Adam's calculations since it relies on the number of iterations and previous losses, or is all of this taken care of by Theano?

I am unfamiliar with Theano, and I do not have time at this point to learn about it so any information you can provide on both these questions is much appreciated.

Thanks.

@sergeyf @ShuaiW A more simpler solution for decay after specified epochs.

class decay_lr(Callback):
    ''' 
        n_epoch = no. of epochs after decay should happen.
        decay = decay value
    '''  
    def __init__(self, n_epoch, decay):
        super(decay_lr, self).__init__()
        self.n_epoch=n_epoch
        self.decay=decay

    def on_epoch_begin(self, epoch, logs={}):
        old_lr = self.model.optimizer.lr.get_value()
        if epoch > 1 and epoch%self.n_epoch == 0 :
            new_lr= self.decay*old_lr
            k.set_value(self.model.optimizer.lr, new_lr)
        else:
            k.set_value(self.model.optimizer.lr, old_lr)



decaySchedule=decay_lr(10, 0.95)

You can use this directly without the epoch loop.

In my case, the learning rate was supposed to be decayed by specific iterations. In a theano backend keras, I can do using the following code:

class SGDLearningRateTracker(Callback):
    def on_batch_begin(self, epoch, logs={}):
        optimizer = self.model.optimizer
        rate= 0.95
        iteration=5000
        lr = optimizer.lr.get_value()
        iterations=optimizer.iterations.get_value()
        if iterations % iteration == 1:
            lr_now = np.array(lr * rate, dtype= 'float32')
            optimizer.lr.set_value(lr_now) 
            print('Ir reduced from %f to %f' % (lr, lr_now))

But when I changed to a tensorflow backend keras, I can not use above code, beacause optimizer.lr is a tensorflow variable, and there is no get_value(). So I changed the code as follows:

class SGDLearningRateTracker(Callback):
    def on_batch_begin(self, epoch, logs={}):
        optimizer = self.model.optimizer
        rate= 0.95
        iteration=5
        lr = optimizer.lr 

        init_op = tf.initialize_all_variables()
        sess = tf.Session()
        sess.run(init_op)
        iterations=optimizer.iterations 
        lr_ori=sess.run(lr)
#        print('iter:', sess.run(iterations))  # this always prints 0 
        if iterations % iteration == 0:
            a= np.array(lr*rate, dtype='float32')
            optimizer.lr.set_value(a) 
            lr_now=sess.run(optimizer.lr)             
            print('Ir reduced from %f to %f' % (lr_ori, lr_now))

But it did not work. I found that the optimizer.iterations was always 0. So the learning rate will not chagne. Could someone help me to solve this?
Thanks!

Carol

Another option could be to use the LearningRateScheduler that
I found in the Keras documentation:
https://keras.io/callbacks/

You can use the schedule function that best fits your needs

@sergeyf please update the answer inside your initial question, because model.optimizer.lr.set_value() is no longer valid.
the actual method should be model.optimizer.lr.assign(your_learning_rate).
this also solves the problem of @Rusianka

@FedericoMuciaccia Thanks, I did as you suggested.

With TF backend, I did this (for inception-V3)

from keras.callbacks import LearningRateScheduler

def scheduler(epoch):
    if epoch%2==0 and epoch!=0:
        lr = K.get_value(model.optimizer.lr)
        K.set_value(model.optimizer.lr, lr*.9)
        print("lr changed to {}".format(lr*.9))
    return K.get_value(model.optimizer.lr)

lr_decay = LearningRateScheduler(scheduler)

model.fit_generator(train_gen, (nb_train_samples//batch_size)*batch_size,
                  nb_epoch=100, verbose=1,
                  validation_data=valid_gen,    nb_val_samples=val_size,
                  callbacks=[lr_decay])

EDIT

I'm happy it helped.
What I use now is the following :

from keras.callbacks import LearningRateScheduler

def lr_decay_callback(lr_init, lr_decay):
    def step_decay(epoch):
        return lr_init * (lr_decay ** (epoch + 1))
    return LearningRateScheduler(step_decay)

lr_decay =  lr_decay_callback(lr_init, lr_decay)

# callback=[lr_decay, ]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@FedericoMuciaccia @sergeyf
I think the syntax has changed again (using TF backend).

    adam = ks.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
    model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
    model.optimizer.lr.assign(100)
    # -> still trains perfectly, lr is not changed 

Keras doesn't throw an exception, but the lr doesn't change anyways.

Fortunately, the backend method still works:

K.set_value(model.optimizer.lr, 100)

Edit:

By the way, is it also possible to change the learning rate _during_ an epoch (e.g. after 1000 batches) while looping over fit_generator?
E.g. like

from keras.callbacks import LearningRateScheduler
def scheduler(batch_number):
    if batch_number % 1000 == 0:
        lr = K.get_value(model.optimizer.lr)
        K.set_value(model.optimizer.lr, lr*.9)
        print("lr changed to {}".format(lr*.9))
    return K.get_value(model.optimizer.lr)

lr_decay = LearningRateScheduler(scheduler)

while 1:
    model.fit_generator(gen, epochs=1, steps_per_epoch=int(filesize/batchsize), callbacks=[lr_decay])
    # do some other stuff in between epochs

Would be very useful when the data sample is large and/or the network is deep such that one epoch takes about 24h.

The "decay" option in the optimizer seems to be designed for learning rate decay. I did not see any suggestion on using this in the discussion. Could someone please comment on the use (or not use) the "decay" option?
adam = ks.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

It does learning rate decay:
https://github.com/fchollet/keras/blob/master/keras/optimizers.py#L420

This functionality was introduced roughly one year ago in within this commit:
https://github.com/fchollet/keras/commit/b2e8d5ab7c476fbed088ebee27ec3373e508af47

when using rate decay in SGD do the optimizer.iterations reset at each epoch?

@marc-moreaux
Why you change your code ?Does the code before you edit can be used rightly ? I don't understand the late one clearly

FYI updated issue and much simpler solution at https://github.com/keras-team/keras/issues/5724#issuecomment-614590419

Was this page helpful?
0 / 5 - 0 ratings

Related issues

amityaffliction picture amityaffliction  Â·  3Comments

kylemcdonald picture kylemcdonald  Â·  3Comments

KeironO picture KeironO  Â·  3Comments

somewacko picture somewacko  Â·  3Comments

zygmuntz picture zygmuntz  Â·  3Comments