I'm confused about the decay parameter in SGD() function of keras, how does it work concretely, for example, I use a learning rate of lr=1.0, and set the decay parameter to be decay=1e-6, anyone can help me? Thanks in advance!
The learning rate (lr) updates according to code.
Thank you, but I'm confused about the self.iterations ,does it mean a batch updata, of an epoch, or something, for example, I want to train a RNN, and I ran the training with total size of 20000 training data, batch_size=20,np_epoch=3, initial learning rate =0.1,decay=1e-6, then how will it do the decay?
def get_updates(self, params, constraints, loss):
grads = self.get_gradients(loss, params)
lr = self.lr * (1. / (1. + self.decay * self.iterations))
self.updates = [(self.iterations, self.iterations + 1.)]
a batch update
batch update meaning it decays the lr every batch (means decay 1000 times in the single epoch)?
yes, i believe so, since get_updates is called on each batch.
Most helpful comment
a batch update