Keras: Weights are almost Nan, accuracy and loss are still 0.5

Created on 29 Nov 2016  路  7Comments  路  Source: keras-team/keras

I am using the following layers for training:

seq = Sequential()
seq.add(Embedding(100000, 32, input_length=input_dim))
seq.add(Dropout(0.1))
seq.add(LSTM(64))
seq.add(Dropout(0.1))
seq.add(Dense(128,activation='relu'))

rms = RMSprop(lr = 0.00001)
model.compile(loss=contrastive_loss, optimizer=rms)
model.fit([x1, x2], y, batch_size=128, nb_epoch=1)

However, after training, the loss and accuracy were 0.5, and I checked the weights, the results were terrible:

array([[ nan,  nan,  nan, ...,  nan,  nan,  nan],
    [ nan,  nan,  nan, ...,  nan,  nan,  nan],
    [ nan,  nan,  nan, ...,  nan,  nan,  nan],
    ...,
    [ nan,  nan,  nan, ...,  nan,  nan,  nan],
    [ nan,  nan,  nan, ...,  nan,  nan,  nan],
    [ nan,  nan,  nan, ...,  nan,  nan,  nan]], dtype=float32),
 array([             nan,  -1.09340586e-02,  -4.78311162e-03,
     -1.61668728e-03,   1.06338365e-03,              nan,
                 nan,   8.06208933e-04,  -5.45365829e-03,
     -3.45902704e-03,              nan,              nan,
                 nan,  -1.05342746e-03,              nan,
     -6.31471910e-03,              nan,              nan,
      9.69409477e-04,   4.55729757e-03,              nan,
      1.98780419e-03,              nan,              nan,
      1.05216762e-03,  -8.49252101e-04,              nan,
                 nan,   3.82378814e-04,   7.34463474e-03,
.....], dtype=float32)]

Is there any problem with my network?

stale

Most helpful comment

@longma307 @aryopg I had the same problem of the weights going to nan. I fixed it by modifying euclidean_distance so it never takes the square root of a negative value. In theory this should never happen because we're squaring the numbers, but it could be a numerical stability issue, I'm not sure.

More specifically, change

def euclidean_distance(vects):
    x, y = vects
    return K.sqrt(K.sum(K.square(x - y), axis=1, keepdims=True))

to

def euclidean_distance(vects):
    x, y = vects
    return K.sqrt(K.maximum(K.sum(K.square(x - y), axis=1, keepdims=True), K.epsilon()))

All 7 comments

Is this a classification or regression problem? Your last activation is ReLU, which does not have an upper bound. It's a bit hard to see what's going on without the surrounding code / data.

@phdowling This is a classification problem. I followed the example https://github.com/fchollet/keras/blob/master/examples/mnist_siamese_graph.py
but changed the base-network and used my own data.
I have more than 1 million data, when I first trained the model, the loss was decreased and accuracy was increased. But after training complete, the training and validation loss was 0.5, the accuracy was 0.5.
Is there any problem about the 'relu'?

Any update on this issue?

@longma307 @aryopg I had the same problem of the weights going to nan. I fixed it by modifying euclidean_distance so it never takes the square root of a negative value. In theory this should never happen because we're squaring the numbers, but it could be a numerical stability issue, I'm not sure.

More specifically, change

def euclidean_distance(vects):
    x, y = vects
    return K.sqrt(K.sum(K.square(x - y), axis=1, keepdims=True))

to

def euclidean_distance(vects):
    x, y = vects
    return K.sqrt(K.maximum(K.sum(K.square(x - y), axis=1, keepdims=True), K.epsilon()))

Thanks @nigeljyng why not use K.abs ? is there any paper or something that i should read to understand that?

@aryopg Upon further thought I don't think it's a negative value problem but a small number problem. Squaring a small number makes the number even smaller, and my hunch is that that's where the NaNs occur. I'm not 100% sure about this so please correct me if I'm wrong.

Here's an example.

>>> x = np.array([[1e-10000, 2e-10000]])
>>> K.eval(K.square(x))
array([[ 0.,  0.]])
>>> K.eval(K.maximum(K.square(x), K.epsilon()))
array([[  1.00000000e-07,   1.00000000e-07]])

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

harishkrishnav picture harishkrishnav  路  3Comments

somewacko picture somewacko  路  3Comments

kylemcdonald picture kylemcdonald  路  3Comments

LuCeHe picture LuCeHe  路  3Comments

vinayakumarr picture vinayakumarr  路  3Comments