I am using the following layers for training:
seq = Sequential()
seq.add(Embedding(100000, 32, input_length=input_dim))
seq.add(Dropout(0.1))
seq.add(LSTM(64))
seq.add(Dropout(0.1))
seq.add(Dense(128,activation='relu'))
rms = RMSprop(lr = 0.00001)
model.compile(loss=contrastive_loss, optimizer=rms)
model.fit([x1, x2], y, batch_size=128, nb_epoch=1)
However, after training, the loss and accuracy were 0.5, and I checked the weights, the results were terrible:
array([[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
...,
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan],
[ nan, nan, nan, ..., nan, nan, nan]], dtype=float32),
array([ nan, -1.09340586e-02, -4.78311162e-03,
-1.61668728e-03, 1.06338365e-03, nan,
nan, 8.06208933e-04, -5.45365829e-03,
-3.45902704e-03, nan, nan,
nan, -1.05342746e-03, nan,
-6.31471910e-03, nan, nan,
9.69409477e-04, 4.55729757e-03, nan,
1.98780419e-03, nan, nan,
1.05216762e-03, -8.49252101e-04, nan,
nan, 3.82378814e-04, 7.34463474e-03,
.....], dtype=float32)]
Is there any problem with my network?
Is this a classification or regression problem? Your last activation is ReLU, which does not have an upper bound. It's a bit hard to see what's going on without the surrounding code / data.
@phdowling This is a classification problem. I followed the example https://github.com/fchollet/keras/blob/master/examples/mnist_siamese_graph.py
but changed the base-network and used my own data.
I have more than 1 million data, when I first trained the model, the loss was decreased and accuracy was increased. But after training complete, the training and validation loss was 0.5, the accuracy was 0.5.
Is there any problem about the 'relu'?
Any update on this issue?
@longma307 @aryopg I had the same problem of the weights going to nan. I fixed it by modifying euclidean_distance so it never takes the square root of a negative value. In theory this should never happen because we're squaring the numbers, but it could be a numerical stability issue, I'm not sure.
More specifically, change
def euclidean_distance(vects):
x, y = vects
return K.sqrt(K.sum(K.square(x - y), axis=1, keepdims=True))
to
def euclidean_distance(vects):
x, y = vects
return K.sqrt(K.maximum(K.sum(K.square(x - y), axis=1, keepdims=True), K.epsilon()))
Thanks @nigeljyng why not use K.abs ? is there any paper or something that i should read to understand that?
@aryopg Upon further thought I don't think it's a negative value problem but a small number problem. Squaring a small number makes the number even smaller, and my hunch is that that's where the NaNs occur. I'm not 100% sure about this so please correct me if I'm wrong.
Here's an example.
>>> x = np.array([[1e-10000, 2e-10000]])
>>> K.eval(K.square(x))
array([[ 0., 0.]])
>>> K.eval(K.maximum(K.square(x), K.epsilon()))
array([[ 1.00000000e-07, 1.00000000e-07]])
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
Most helpful comment
@longma307 @aryopg I had the same problem of the weights going to
nan. I fixed it by modifyingeuclidean_distanceso it never takes the square root of a negative value. In theory this should never happen because we're squaring the numbers, but it could be a numerical stability issue, I'm not sure.More specifically, change
to