Keras: loss increase after each epoch

Created on 19 Feb 2017 · 14Comments · Source: keras-team/keras

Hi,
I'm using keras to train a deep convnet for a regression problem.
the thing is that the loss decreases during an epoch but at the begining of a new epoch the loss is increased:

there is a similar issue mentioned here by hadi-ds but not answered.
I'm using a data generator to get data, the data is produced in a loop and isn't exactly the same for each epoch. I'm also using a custom loss function.

I can't think of anything to explain this behavior, any ideas?

Thanks

stale

Source

AidaSamri

👍2

Most helpful comment

@pierluigiferrari I am having the same problem. I am doing a simple MLP regression with 2 hidden layers. I can train the data fine using raw Tensorflow and scikit-learn, but training with Keras shows an increase in loss with each iteration.

My model setup looks something like:

model = keras.models.Sequential()
layer_sizes = [num_features] + list([50, 10]) + [1] #num_features ~= 25
for layer in range(1, len(layer_sizes)):
    in_size = layer_sizes[layer-1]
    out_size = layer_sizes[layer]
    if layer < len(layer_sizes)-1:
        model.add(keras.layers.Dense(out_size, input_shape=(in_size,), activation='relu',                                      bias_initializer='glorot_uniform', kernel_regularizer=keras.regularizers.l2(0.0001)))
    else:
        model.add(keras.layers.Dense(1, kernel_initializer='glorot_uniform'))

opt = keras.optimizers.Adam(lr=0.001)
model.compile(loss='mse', optimizer=opt, metrics=[keras.metrics.mae, coeff_determination])

Training with a batch size of 200 and 500,000 samples leads to an increase in the loss.

I hit something similar with Tensorflow, and it had to do with the dimensions of the dimensions of the tensors going into the loss calculation. y_pred had dimensions of (N,), while y_true had (N,1). The mean squared error was being calculated on a NxN matrix, not an Nx1 or N-length vector, since y_pred-y_true produces an NxN matrix. When I matched the dimensions, Tensorflow started working. I can't seem to plumb through Keras though to see if it is the same issue.

jadamwilson2 on 12 Jun 2017

👍7

All 14 comments

Perhaps the nature of your data is changing over the epoch? I.e. data at the beginning follow mainly some different pattern from what your model sees towards an end of an epoch.

You say your data is not the same for each epoch, but you perhaps can shuffle them 'more'. Or you could experiment with the minibatch size.

ibenes on 19 Feb 2017

Thank you for your reply @ibenes

I'm positive the nature of my data isn't changing over the epoch cause of the way I generate it, actually I'm drawing RGBs from sequences in a loop, for this drastic change in loss each epoch should end at the end of a sequence which isn't possible.

AidaSamri on 19 Feb 2017

But those sequencies you draw your samples from -- they are fixed or do you shuffle them for each epoch? I can imagine that if those are frames from a video, then whatever is your target is likely to only change continuously in time....

ibenes on 19 Feb 2017

sorry for the delay @ibenes
no, i don't shuffle my data, but i don't see how's that relevant, i don't seem to get your point. would you explain a bit?

AidaSamri on 25 Feb 2017

Hi! By default, you should shuffle your data between epochs, as it
generally helps the learning.

And for your case, it is really difficult to tell, since you gave so
little detail about both your architecture and your data. But let's
assume that you are predicting a 'class' of animal, as one of four leg /
human or ape / bird / fish. And that your data are labelled videos from
a made up BBC series 'Life in the jungle, life in a village: how are
they different?', where we first see some problem solving in a village
(humans, pigs, chicken, goldfish etc.) and a similar situation in a
jungle (gorilas, tigers, ostriches, piranhas etc.).

So, your model first sees the village life, improves is classification
performance there, but then, for the second half of the epoch, adapts to
the jungle environment. So, when your data returns to village, it is a
little bit confused.

On 02/25/2017 06:21 PM, AidaSamri wrote:

sorry for the delay @ibenes https://github.com/ibenes
no, i don't shuffle my data, but i don't see how's that relevant, i
don't seem to get your point. would you explain a bit?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/5441#issuecomment-282498071,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AV2TVY4nAavoXmUcABG_WbGTZoOYMRVPks5rgGMDgaJpZM4MFdkR.

ibenes on 26 Feb 2017

I will try out your suggestion and shuffle my data, Thanks. but it still is not the case.
my problem isn't classification but regression,but using your example the situation is that the categories of the end of an epoch and beginning of a new epoch are the same, there shouldn't be a sudden change in loss, right?

AidaSamri on 1 Mar 2017

Sorry for the classification/regression mistake. Nevertheless, I do not
think it is really an issue here. The intended point of the example is,
that although the prior distribution of targets is stationary over the
epoch, the input data differs greatly and changes only slowly. Thus
allowing your model to adapt to the input 'domain' and later be confused
on its sudden change.

It would be really helpful if you gave some more details into what both
your data and task are. Otherwise it is just up to you to experiment,
for any advice will be very general only.

ibenes on 1 Mar 2017

I'm trying to learn frame to frame ego-motion meaning regressing rotation and translation from two input RGBs. my network is Siamese styled with GoogLeNet as each branch, the two branches are merged and followed by some FCs.
I'm using consecutive frames of KITTI sequences as input. as i said the ending and beginning of two consecutive epochs are almost always happening in the same sequence of data, there is no shuffling or randomness in picking the 2 frames, I'm simply doing it in their original order.
hope this clear things up

AidaSamri on 2 Mar 2017

I'm observing the same problem: The loss decreases over the course of an epoch, but then jumps back up a bit at the beginning of each new epoch. I'm also using the fit_generator and a custom loss function, and I've tested the behavior of the loss function thoroughly, so I believe it's unlikely that the loss function itself causes this behavior. In my case, I'm training an SSD with a 2D bounding box regression and classification multitask loss function.

I am shuffling my data before each epoch and perform random online data augmentation during training, so structural differences of the data between the beginning and the end of a given epoch are not possible.

So far I, too, am puzzled what causes this behavior. Could this have to do with anything in Keras itself? I couldn't think of anything, but right now I can't think of any plausible cause in general. Any suggestions would be much appreciated.

pierluigiferrari on 14 Apr 2017

👍5

My model setup looks something like:

model = keras.models.Sequential()
layer_sizes = [num_features] + list([50, 10]) + [1] #num_features ~= 25
for layer in range(1, len(layer_sizes)):
    in_size = layer_sizes[layer-1]
    out_size = layer_sizes[layer]
    if layer < len(layer_sizes)-1:
        model.add(keras.layers.Dense(out_size, input_shape=(in_size,), activation='relu',                                      bias_initializer='glorot_uniform', kernel_regularizer=keras.regularizers.l2(0.0001)))
    else:
        model.add(keras.layers.Dense(1, kernel_initializer='glorot_uniform'))

opt = keras.optimizers.Adam(lr=0.001)
model.compile(loss='mse', optimizer=opt, metrics=[keras.metrics.mae, coeff_determination])

Training with a batch size of 200 and 500,000 samples leads to an increase in the loss.

jadamwilson2 on 12 Jun 2017

👍7

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 10 Sep 2017

I have similar wonder before but in my case it is the loss drop rapidly in the beginning of epoch. After digging into the source code of keras, I think this may be due to the way how loss is computed for the progress bar. To be noticed, the "loss" showed on the progress bar is not actually the instant "loss" for that batch but the average loss of the past batches of that epoch. So if your model's loss have a increasing or decreasing trend, you may find the "loss" change rapidly in the beginning of epoch.

See the code here and here

chaoqing on 4 Jul 2018

👍3

One general possible cause of such behaviour is a broken loss function. It could be a custom function, as in my case, or some mismatch in parameters to a built-in function, as @jadamwilson2 described 3 posts above.