Keras: Predictions are all zero

Created on 4 Sep 2016  路  6Comments  路  Source: keras-team/keras

I am using Keras 1.0.7 and TensorFlow 0.10.0 as backend.
I build an RNN to solve a 'many to many' regression prediction problem:

def RNN_keras(feat_num, timestep_num=100):
    model = Sequential()
    model.add(LSTM(input_shape=(timestep_num, feat_num), output_dim=768, activation='relu', return_sequences=True))  
    model.add(LSTM(output_dim=256, activation='relu', return_sequences=True))
    model.add(TimeDistributed(Dense(output_dim=1, activation='relu'))) # sequence labeling

    sgd = SGD(lr=0.01, decay=1e-6, momentum=0.99, nesterov=True)
    model.compile(loss='mean_squared_error',
                  optimizer=sgd,
                  metrics=['mean_squared_error'])

    return model

This is the code that I use to do training.

print("\n****** Iterating over each batch of the training data ******")
for epoch in range(1, NUM_EPOCH+1):
    batch_index = 0

    for X_batch, y_batch in mig.Xy_gen(mig.X_train, mig.y_train, batch_size=BATCH_SIZE):
        batch_index += 1        

        ''' RNN '''
        loss = rnn.train_on_batch(X_batch, y_batch)
        print("Epoch %d/%d : Batch %d/%d | %s = %f | root_%s = %f" %
              (epoch, NUM_EPOCH, batch_index, num_batch, 
               rnn.metrics_names[0], loss[0], rnn.metrics_names[1], np.sqrt(loss[1])))

    ''' Use the RNN trained after this epoch to predict all training examples
        and compute the training error of this epoch '''    
    rmsd_training = RMSD_batch()
    for X_batch, y_batch in mig.Xy_gen(mig.X_train, mig.y_train, batch_size=BATCH_SIZE):  
        _pred = rnn.predict_on_batch(X_batch) # _pred are all zero !!!
        rmsd_training.update(y_batch, _pred)

    print("*** Epoch %d: RMSD(training) = %f \n" % (epoch, rmsd_training.final_RMSD()))

Below is the output of the output: (The input vectors contains only 0 or 1. They are all categorical variables encoded by one-hot encoding. The training data are shuffled at the beginning of every epoch. So far, I put all training examples in on single batch.)

13945 examples in the training set
1439 examples in the test set

Building training input vectors ...
431 unique feature names
The length of each vector will be 431
Using TensorFlow backend.

Build model...

****** Iterating over each batch of the training data ******
Epoch 1/3 : Batch 1/1 | loss = 6356.635742 | root_mean_squared_error = 79.727188
****** Epoch 1: RMSD(training) = 24.217030 

Epoch 2/3 : Batch 1/1 | loss = 586.464478 | root_mean_squared_error = 24.212215
****** Epoch 2: RMSD(training) = 24.217030 

Epoch 3/3 : Batch 1/1 | loss = 586.464478 | root_mean_squared_error = 24.212488
****** Epoch 3: RMSD(training) = 24.217030 

I find that the loss decreases over epoch. But the RMSD on the training data are the same. (I do not know why the one that I calculated is different from the one calculated by Keras. Not sure how Keras calculate the MSD) Then, I check the prediction. I find that all prediction are all zero in all epochs. This is why the RMSD at the end of all epochs are all the same because the prediction are all zero! I also check the input y. Only about 10% are zero. Zero does not dominate the training y. So I think it is not because of data imbalance. Now, I guess it is because the layers and activation that I am using?

Does anyone can help me? Thanks a lot!

Most helpful comment

You might have too many zeros in the outputs; since you use relus it is extremely easy for the net to learn to output zeroes only. Try leaky relus or tanh.

All 6 comments

It works when I change 'relu' to 'tanh', although I am not very clear the reason...

You might have too many zeros in the outputs; since you use relus it is extremely easy for the net to learn to output zeroes only. Try leaky relus or tanh.

Thanks @HristoBuyukliev for the suggestion of using leaky ReLUs. It solved the problem flawlessly. I didn't even know these things existed...
In code the only thing you have to change is
from keras.layers.advanced_activations import LeakyReLU
and then change you model from
model.add(Activation("relu")
to
model.add(LeakyReLU(alpha=0.3))
The alpha value of LeakyReLU controls the slope when _x<=0_, you have to tweak that value by yourself. 0.3 worked for me.

This thread is helpful, however, I get an error:
AttributeError: module 'tensorflow.python.ops.nn' has no attribute 'leaky_relu'

I guess LeakyReLU is in another module now, or it has been renamed.

Michael model.add(LeakyReLU(alpha=0.3)) works for me. I'm using python 3

I think it is a case of dying relu.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

amityaffliction picture amityaffliction  路  3Comments

MarkVdBergh picture MarkVdBergh  路  3Comments

snakeztc picture snakeztc  路  3Comments

somewacko picture somewacko  路  3Comments

anjishnu picture anjishnu  路  3Comments