Keras: Best practices question: decreasing learning rates between epochs

Created on 26 Oct 2015 · 26Comments · Source: keras-team/keras

Howdy,

In published papers I often see that the learning rates are decreased after some hundreds of epochs when learning stalls. What is the best way to do this in Keras? Thus far, I have been recompiling, but (not knowing if there is a better way), that seems foolish.

An example:

First, I build some model and train it.

model = Sequential()
# insert model here
optimizer = adagrad(lr=0.01)
model.compile(optimizer=optimizer)
model.fit(X,y,nb_epoch=50)

UPDATE -- the following works without having to recompile

K.set_value(model.optimizer.lr, 0.001)
model(X,y,nb_epoch=50)

Thank you to @EderSantana for the quick reply.

stale

Source

sergeyf

👍21 🎉4 😄1

Most helpful comment

@Rusianka I just found that one can do this to get and set value:

import keras.backend as K
sgd = SGD(lr=0.1, decay=0, momentum=0.9, nesterov=True)
K.set_value(sgd.lr, 0.5 * K.get_value(sgd.lr))

entron on 1 Apr 2016

👍34 😕1 🎉1

All 26 comments

@sergeyf check the solution in this https://github.com/fchollet/keras/issues/888
Please leave this issue open, even if it solves the problem for you. This is the second time we get this question, which means we need a better documentation. Since I'm already working on something else, would anybody else volunteer to write the documentation? We should close this after somebody writes the docs.

EderSantana on 26 Oct 2015

Thank you very much! I will leave this open.

sergeyf on 26 Oct 2015

This code may be work, to be tested...

class LrReducer(Callback):
    def __init__(self, patience=0, reduce_rate=0.5, reduce_nb=10, verbose=1):
        super(Callback, self).__init__()
        self.patience = patience
        self.wait = 0
        self.best_score = -1.
        self.reduce_rate = reduce_rate
        self.current_reduce_nb = 0
        self.reduce_nb = reduce_nb
        self.verbose = verbose

    def on_epoch_end(self, epoch, logs={}):
        current_score = logs.get('val_acc')
        if current_score > self.best_score:
            self.best_score = current_score
            self.wait = 0
            if self.verbose > 0:
                print('---current best val accuracy: %.3f' % current_score)
        else:
            if self.wait >= self.patience:
                self.current_reduce_nb += 1
                if self.current_reduce_nb <= 10:
                    lr = self.model.optimizer.lr.get_value()
                    self.model.optimizer.lr.set_value(lr*self.reduce_rate)
                else:
                    if self.verbose > 0:
                        print("Epoch %d: early stopping" % (epoch))
                    self.model.stop_training = True
            self.wait += 1

jiumem on 27 Oct 2015

👍16

Thanks @jiumem! This might be a good pull request to Keras?

sergeyf on 27 Oct 2015

@sergeyf I just saw this thread, and I'd thought I'd throw in my own function I made to address this. I always use nb_epoch =1 because I'm interested in generating text:

def set_learning_rate(hist, learning_rate = 0, activate_halving_learning_rate = False, new_loss =0, past_loss = 0, counter = 0, save_model_dir=''):
    if activate_halving_learning_rate and (learning_rate>=0.0001):
        if counter == 0:
            new_loss = hist.history['loss'][0]
            if new_loss>=(past_loss): #you want at least a 0.5% loss decrease compared to the previous iteration
                learning_rate = float(learning_rate)/float(2)
                print 'you readjusted the learning rate to', learning_rate
                with open('models/'+save_model_dir+'/'+'history_report.txt', 'a+') as history_file:
                    history_file.write('For next Iteration, Learning Rate has been reduced to '+str(learning_rate)+'\n\n')
                    with open('history_reports/'+save_model_dir+'_'+'history_report.txt', 'a+') as history_file:
                    history_file.write('For next Iteration, Learning Rate has been reduced to '+str(learning_rate)+'\n\n')

            past_loss = new_loss
        return (learning_rate, new_loss, past_loss)

NickShahML on 6 Nov 2015

Awesome, thanks!

On Fri, Nov 6, 2015 at 7:24 AM, LeavesBreathe [email protected]
wrote:

@sergeyf https://github.com/sergeyf I just saw this thread, and I'd
thought I'd throw in my own function I made to address this. I always use
nb_epoch =1 because I'm interested in generating text:

def set_learning_rate(hist, learning_rate = 0, activate_halving_learning_rate = False, new_loss =0, past_loss = 0, counter = 0, save_model_dir=''):
if activate_halving_learning_rate and (learning_rate>=0.0001):
if counter == 0:
new_loss = hist.history['loss'][0]
if new_loss>=(past_loss): #you want at least a 0.5% loss decrease compared to the previous iteration
learning_rate = float(learning_rate)/float(2)
print 'you readjusted the learning rate to', learning_rate
with open('models/'+save_model_dir+'/'+'history_report.txt', 'a+') as history_file:
history_file.write('For next Iteration, Learning Rate has been reduced to '+str(learning_rate)+'\n\n')
with open('history_reports/'+save_model_dir+'_'+'history_report.txt', 'a+') as history_file:
history_file.write('For next Iteration, Learning Rate has been reduced to '+str(learning_rate)+'\n\n')
        past_loss = new_loss
    return (learning_rate, new_loss, past_loss)
—
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/898#issuecomment-154437036.

sergeyf on 6 Nov 2015

I have a problem using the solution like

model.optimizer.lr.set_value(0.01)
model(X,y,nb_epoch=50)

with tensorflow backend.
Can't do set_value and get_value as it was discussed here and in another thread.

return model_systole.optimizer.lr.get_value()
AttributeError: 'Tensor' object has no attribute 'get_value'

Any suggestions please?

Rusianka on 23 Feb 2016

👍4

@Rusianka I just found that one can do this to get and set value:

import keras.backend as K
sgd = SGD(lr=0.1, decay=0, momentum=0.9, nesterov=True)
K.set_value(sgd.lr, 0.5 * K.get_value(sgd.lr))

entron on 1 Apr 2016

👍34 😕1 🎉1

@entron when does the .set_value() happen? after every epoch?

ShuaiW on 11 Jun 2016

@ShuaiW you can put the line K.set_value(sgd.lr, 0.5 * K.get_value(sgd.lr)) inside the epoch loop to set lr at each epoch.

entron on 11 Jun 2016

@entron thanks for your response.

my model doesn't have an epoch loop; instead it's like this:

N_EPOCH = 100

model = Model(..)

model.compile(...)

model.fit(X, y, batch_size=64, nb_epoch=N_EPOCH, verbose=1, shuffle=True, callbacks=...)

Anyway to fit in?

ShuaiW on 11 Jun 2016

Maybe you can set N_EPOCH=1 and loop outside.

entron on 12 Jun 2016

So I had problems with the model.fit_generator function, so I decided to use model.fit instead and put it inside a for loop like so:

for x, y in generate_arrays_from_file(): x = model.fit(x, y, batch_size=16, nb_epoch=1, verbose=1)

Here are my questions:

1) I am using Adam for my optimizer, and I saw on another thread that it is impossible to directly get the current learning rate. You have to calculate it yourself indirectly.
http://stackoverflow.com/questions/37091751/keras-learning-rate-not-changing-despite-decay-in-sgd
Since nb_epoch is only 1 in the above function, and model.fit is inside a loop, is the learning rate guaranteed to decrease (that's my understanding of how Adam works) or do I have to write a separate function to manually decrease the learning rate myself?

2) My loss initially decreases quite rapidly, but then appears to fluctuate and stop decreasing even after several days of training. This is true both when I train on single images and multi-channel images. Since the loss is fluctuating after every call of model.fit, will that screw up Adam's calculations since it relies on the number of iterations and previous losses, or is all of this taken care of by Theano?

I am unfamiliar with Theano, and I do not have time at this point to learn about it so any information you can provide on both these questions is much appreciated.

Thanks.

greg-robinson on 6 Jul 2016

👍4

@sergeyf @ShuaiW A more simpler solution for decay after specified epochs.

class decay_lr(Callback):
    ''' 
        n_epoch = no. of epochs after decay should happen.
        decay = decay value
    '''  
    def __init__(self, n_epoch, decay):
        super(decay_lr, self).__init__()
        self.n_epoch=n_epoch
        self.decay=decay

    def on_epoch_begin(self, epoch, logs={}):
        old_lr = self.model.optimizer.lr.get_value()
        if epoch > 1 and epoch%self.n_epoch == 0 :
            new_lr= self.decay*old_lr
            k.set_value(self.model.optimizer.lr, new_lr)
        else:
            k.set_value(self.model.optimizer.lr, old_lr)



decaySchedule=decay_lr(10, 0.95)

You can use this directly without the epoch loop.

ishank26 on 8 Sep 2016

👍8

In my case, the learning rate was supposed to be decayed by specific iterations. In a theano backend keras, I can do using the following code:

class SGDLearningRateTracker(Callback):
    def on_batch_begin(self, epoch, logs={}):
        optimizer = self.model.optimizer
        rate= 0.95
        iteration=5000
        lr = optimizer.lr.get_value()
        iterations=optimizer.iterations.get_value()
        if iterations % iteration == 1:
            lr_now = np.array(lr * rate, dtype= 'float32')
            optimizer.lr.set_value(lr_now) 
            print('Ir reduced from %f to %f' % (lr, lr_now))

But when I changed to a tensorflow backend keras, I can not use above code, beacause optimizer.lr is a tensorflow variable, and there is no get_value(). So I changed the code as follows:

class SGDLearningRateTracker(Callback):
    def on_batch_begin(self, epoch, logs={}):
        optimizer = self.model.optimizer
        rate= 0.95
        iteration=5
        lr = optimizer.lr 

        init_op = tf.initialize_all_variables()
        sess = tf.Session()
        sess.run(init_op)
        iterations=optimizer.iterations 
        lr_ori=sess.run(lr)
#        print('iter:', sess.run(iterations))  # this always prints 0 
        if iterations % iteration == 0:
            a= np.array(lr*rate, dtype='float32')
            optimizer.lr.set_value(a) 
            lr_now=sess.run(optimizer.lr)             
            print('Ir reduced from %f to %f' % (lr_ori, lr_now))

But it did not work. I found that the optimizer.iterations was always 0. So the learning rate will not chagne. Could someone help me to solve this?
Thanks!

Carol

anewlearner on 21 Oct 2016

Another option could be to use the LearningRateScheduler that
I found in the Keras documentation:
https://keras.io/callbacks/

You can use the schedule function that best fits your needs

alalbiol on 30 Oct 2016

👍6

@sergeyf please update the answer inside your initial question, because model.optimizer.lr.set_value() is no longer valid.
the actual method should be model.optimizer.lr.assign(your_learning_rate).
this also solves the problem of @Rusianka

FedericoMuciaccia on 21 Jan 2017

🎉3

@FedericoMuciaccia Thanks, I did as you suggested.

sergeyf on 21 Jan 2017

With TF backend, I did this (for inception-V3)

from keras.callbacks import LearningRateScheduler

def scheduler(epoch):
    if epoch%2==0 and epoch!=0:
        lr = K.get_value(model.optimizer.lr)
        K.set_value(model.optimizer.lr, lr*.9)
        print("lr changed to {}".format(lr*.9))
    return K.get_value(model.optimizer.lr)

lr_decay = LearningRateScheduler(scheduler)

model.fit_generator(train_gen, (nb_train_samples//batch_size)*batch_size,
                  nb_epoch=100, verbose=1,
                  validation_data=valid_gen,    nb_val_samples=val_size,
                  callbacks=[lr_decay])

EDIT

I'm happy it helped.
What I use now is the following :

from keras.callbacks import LearningRateScheduler

def lr_decay_callback(lr_init, lr_decay):
    def step_decay(epoch):
        return lr_init * (lr_decay ** (epoch + 1))
    return LearningRateScheduler(step_decay)

lr_decay =  lr_decay_callback(lr_init, lr_decay)

# callback=[lr_decay, ]

marc-moreaux on 13 Mar 2017

👍26

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 11 Jun 2017

@FedericoMuciaccia @sergeyf
I think the syntax has changed again (using TF backend).

    adam = ks.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
    model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
    model.optimizer.lr.assign(100)
    # -> still trains perfectly, lr is not changed

Keras doesn't throw an exception, but the lr doesn't change anyways.

Fortunately, the backend method still works:

K.set_value(model.optimizer.lr, 100)

Edit:

By the way, is it also possible to change the learning rate _during_ an epoch (e.g. after 1000 batches) while looping over fit_generator?
E.g. like

from keras.callbacks import LearningRateScheduler
def scheduler(batch_number):
    if batch_number % 1000 == 0:
        lr = K.get_value(model.optimizer.lr)
        K.set_value(model.optimizer.lr, lr*.9)
        print("lr changed to {}".format(lr*.9))
    return K.get_value(model.optimizer.lr)

lr_decay = LearningRateScheduler(scheduler)

while 1:
    model.fit_generator(gen, epochs=1, steps_per_epoch=int(filesize/batchsize), callbacks=[lr_decay])
    # do some other stuff in between epochs

Would be very useful when the data sample is large and/or the network is deep such that one epoch takes about 24h.

ViaFerrata on 13 Sep 2017

👍2

The "decay" option in the optimizer seems to be designed for learning rate decay. I did not see any suggestion on using this in the discussion. Could someone please comment on the use (or not use) the "decay" option?
adam = ks.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)