Hi,
I'm using keras to train a deep convnet for a regression problem.
the thing is that the loss decreases during an epoch but at the begining of a new epoch the loss is increased:

there is a similar issue mentioned here by hadi-ds but not answered.
I'm using a data generator to get data, the data is produced in a loop and isn't exactly the same for each epoch. I'm also using a custom loss function.
I can't think of anything to explain this behavior, any ideas?
Thanks
Perhaps the nature of your data is changing over the epoch? I.e. data at the beginning follow mainly some different pattern from what your model sees towards an end of an epoch.
You say your data is not the same for each epoch, but you perhaps can shuffle them 'more'. Or you could experiment with the minibatch size.
Thank you for your reply @ibenes
I'm positive the nature of my data isn't changing over the epoch cause of the way I generate it, actually I'm drawing RGBs from sequences in a loop, for this drastic change in loss each epoch should end at the end of a sequence which isn't possible.
But those sequencies you draw your samples from -- they are fixed or do you shuffle them for each epoch? I can imagine that if those are frames from a video, then whatever is your target is likely to only change continuously in time....
sorry for the delay @ibenes
no, i don't shuffle my data, but i don't see how's that relevant, i don't seem to get your point. would you explain a bit?
Hi! By default, you should shuffle your data between epochs, as it
generally helps the learning.
And for your case, it is really difficult to tell, since you gave so
little detail about both your architecture and your data. But let's
assume that you are predicting a 'class' of animal, as one of four leg /
human or ape / bird / fish. And that your data are labelled videos from
a made up BBC series 'Life in the jungle, life in a village: how are
they different?', where we first see some problem solving in a village
(humans, pigs, chicken, goldfish etc.) and a similar situation in a
jungle (gorilas, tigers, ostriches, piranhas etc.).
So, your model first sees the village life, improves is classification
performance there, but then, for the second half of the epoch, adapts to
the jungle environment. So, when your data returns to village, it is a
little bit confused.
On 02/25/2017 06:21 PM, AidaSamri wrote:
sorry for the delay @ibenes https://github.com/ibenes
no, i don't shuffle my data, but i don't see how's that relevant, i
don't seem to get your point. would you explain a bit?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/5441#issuecomment-282498071,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AV2TVY4nAavoXmUcABG_WbGTZoOYMRVPks5rgGMDgaJpZM4MFdkR.
I will try out your suggestion and shuffle my data, Thanks. but it still is not the case.
my problem isn't classification but regression,but using your example the situation is that the categories of the end of an epoch and beginning of a new epoch are the same, there shouldn't be a sudden change in loss, right?
Sorry for the classification/regression mistake. Nevertheless, I do not
think it is really an issue here. The intended point of the example is,
that although the prior distribution of targets is stationary over the
epoch, the input data differs greatly and changes only slowly. Thus
allowing your model to adapt to the input 'domain' and later be confused
on its sudden change.
It would be really helpful if you gave some more details into what both
your data and task are. Otherwise it is just up to you to experiment,
for any advice will be very general only.
I'm trying to learn frame to frame ego-motion meaning regressing rotation and translation from two input RGBs. my network is Siamese styled with GoogLeNet as each branch, the two branches are merged and followed by some FCs.
I'm using consecutive frames of KITTI sequences as input. as i said the ending and beginning of two consecutive epochs are almost always happening in the same sequence of data, there is no shuffling or randomness in picking the 2 frames, I'm simply doing it in their original order.
hope this clear things up
I'm observing the same problem: The loss decreases over the course of an epoch, but then jumps back up a bit at the beginning of each new epoch. I'm also using the fit_generator and a custom loss function, and I've tested the behavior of the loss function thoroughly, so I believe it's unlikely that the loss function itself causes this behavior. In my case, I'm training an SSD with a 2D bounding box regression and classification multitask loss function.
I am shuffling my data before each epoch and perform random online data augmentation during training, so structural differences of the data between the beginning and the end of a given epoch are not possible.
So far I, too, am puzzled what causes this behavior. Could this have to do with anything in Keras itself? I couldn't think of anything, but right now I can't think of any plausible cause in general. Any suggestions would be much appreciated.
@pierluigiferrari I am having the same problem. I am doing a simple MLP regression with 2 hidden layers. I can train the data fine using raw Tensorflow and scikit-learn, but training with Keras shows an increase in loss with each iteration.
My model setup looks something like:
model = keras.models.Sequential()
layer_sizes = [num_features] + list([50, 10]) + [1] #num_features ~= 25
for layer in range(1, len(layer_sizes)):
in_size = layer_sizes[layer-1]
out_size = layer_sizes[layer]
if layer < len(layer_sizes)-1:
model.add(keras.layers.Dense(out_size, input_shape=(in_size,), activation='relu', bias_initializer='glorot_uniform', kernel_regularizer=keras.regularizers.l2(0.0001)))
else:
model.add(keras.layers.Dense(1, kernel_initializer='glorot_uniform'))
opt = keras.optimizers.Adam(lr=0.001)
model.compile(loss='mse', optimizer=opt, metrics=[keras.metrics.mae, coeff_determination])
Training with a batch size of 200 and 500,000 samples leads to an increase in the loss.
I hit something similar with Tensorflow, and it had to do with the dimensions of the dimensions of the tensors going into the loss calculation. y_pred had dimensions of (N,), while y_true had (N,1). The mean squared error was being calculated on a NxN matrix, not an Nx1 or N-length vector, since y_pred-y_true produces an NxN matrix. When I matched the dimensions, Tensorflow started working. I can't seem to plumb through Keras though to see if it is the same issue.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
I have similar wonder before but in my case it is the loss drop rapidly in the beginning of epoch. After digging into the source code of keras, I think this may be due to the way how loss is computed for the progress bar. To be noticed, the "loss" showed on the progress bar is not actually the instant "loss" for that batch but the average loss of the past batches of that epoch. So if your model's loss have a increasing or decreasing trend, you may find the "loss" change rapidly in the beginning of epoch.
One general possible cause of such behaviour is a broken loss function. It could be a custom function, as in my case, or some mismatch in parameters to a built-in function, as @jadamwilson2 described 3 posts above.
increase the batch-size. and be aware of the memory
Most helpful comment
@pierluigiferrari I am having the same problem. I am doing a simple MLP regression with 2 hidden layers. I can train the data fine using raw Tensorflow and scikit-learn, but training with Keras shows an increase in loss with each iteration.
My model setup looks something like:
Training with a batch size of 200 and 500,000 samples leads to an increase in the loss.
I hit something similar with Tensorflow, and it had to do with the dimensions of the dimensions of the tensors going into the loss calculation.
y_predhad dimensions of (N,), whiley_truehad (N,1). The mean squared error was being calculated on a NxN matrix, not an Nx1 or N-length vector, sincey_pred-y_trueproduces an NxN matrix. When I matched the dimensions, Tensorflow started working. I can't seem to plumb through Keras though to see if it is the same issue.