Keras: Weights are almost Nan, accuracy and loss are still 0.5

Created on 29 Nov 2016 · 7Comments · Source: keras-team/keras

I am using the following layers for training:

seq = Sequential()
seq.add(Embedding(100000, 32, input_length=input_dim))
seq.add(Dropout(0.1))
seq.add(LSTM(64))
seq.add(Dropout(0.1))
seq.add(Dense(128,activation='relu'))

rms = RMSprop(lr = 0.00001)
model.compile(loss=contrastive_loss, optimizer=rms)
model.fit([x1, x2], y, batch_size=128, nb_epoch=1)

However, after training, the loss and accuracy were 0.5, and I checked the weights, the results were terrible:

array([[ nan,  nan,  nan, ...,  nan,  nan,  nan],
    [ nan,  nan,  nan, ...,  nan,  nan,  nan],
    [ nan,  nan,  nan, ...,  nan,  nan,  nan],
    ...,
    [ nan,  nan,  nan, ...,  nan,  nan,  nan],
    [ nan,  nan,  nan, ...,  nan,  nan,  nan],
    [ nan,  nan,  nan, ...,  nan,  nan,  nan]], dtype=float32),
 array([             nan,  -1.09340586e-02,  -4.78311162e-03,
     -1.61668728e-03,   1.06338365e-03,              nan,
                 nan,   8.06208933e-04,  -5.45365829e-03,
     -3.45902704e-03,              nan,              nan,
                 nan,  -1.05342746e-03,              nan,
     -6.31471910e-03,              nan,              nan,
      9.69409477e-04,   4.55729757e-03,              nan,
      1.98780419e-03,              nan,              nan,
      1.05216762e-03,  -8.49252101e-04,              nan,
                 nan,   3.82378814e-04,   7.34463474e-03,
.....], dtype=float32)]

Is there any problem with my network?

stale

Source

longma307

Most helpful comment

@longma307 @aryopg I had the same problem of the weights going to nan. I fixed it by modifying euclidean_distance so it never takes the square root of a negative value. In theory this should never happen because we're squaring the numbers, but it could be a numerical stability issue, I'm not sure.

More specifically, change

def euclidean_distance(vects):
    x, y = vects
    return K.sqrt(K.sum(K.square(x - y), axis=1, keepdims=True))

def euclidean_distance(vects):
    x, y = vects
    return K.sqrt(K.maximum(K.sum(K.square(x - y), axis=1, keepdims=True), K.epsilon()))

nigeljyng on 11 Apr 2017

👍2

All 7 comments

Is this a classification or regression problem? Your last activation is ReLU, which does not have an upper bound. It's a bit hard to see what's going on without the surrounding code / data.

phdowling on 30 Nov 2016

@phdowling This is a classification problem. I followed the example https://github.com/fchollet/keras/blob/master/examples/mnist_siamese_graph.py
but changed the base-network and used my own data.
I have more than 1 million data, when I first trained the model, the loss was decreased and accuracy was increased. But after training complete, the training and validation loss was 0.5, the accuracy was 0.5.
Is there any problem about the 'relu'?

longma307 on 1 Dec 2016

Any update on this issue?

aryopg on 30 Mar 2017

More specifically, change

def euclidean_distance(vects):
    x, y = vects
    return K.sqrt(K.sum(K.square(x - y), axis=1, keepdims=True))

def euclidean_distance(vects):
    x, y = vects
    return K.sqrt(K.maximum(K.sum(K.square(x - y), axis=1, keepdims=True), K.epsilon()))

nigeljyng on 11 Apr 2017

👍2

Thanks @nigeljyng why not use K.abs ? is there any paper or something that i should read to understand that?

aryopg on 16 Apr 2017

@aryopg Upon further thought I don't think it's a negative value problem but a small number problem. Squaring a small number makes the number even smaller, and my hunch is that that's where the NaNs occur. I'm not 100% sure about this so please correct me if I'm wrong.

Here's an example.

>>> x = np.array([[1e-10000, 2e-10000]])
>>> K.eval(K.square(x))
array([[ 0.,  0.]])
>>> K.eval(K.maximum(K.square(x), K.epsilon()))
array([[  1.00000000e-07,   1.00000000e-07]])

nigeljyng on 16 Apr 2017

👍1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.