Hello!
I'm running into an issue where training a simple model on examples that contain all zeroes returns NaN for the weights and the loss. Here is an example:
import numpy as np
from keras.models import Sequential
from keras.layers.core import Activation
from keras.layers.convolutional import Convolution1D
np.random.seed(0)
X2 = np.zeros([1, 1, 1])
Y2 = np.ones([1, 1, 1])
model = Sequential()
model.add(
Convolution1D(
1,
1,
input_dim=1,
border_mode='valid'))
model.add(Activation('relu'))
model.compile(optimizer='adagrad', loss='MSE')
hist = model.fit(
X2,
Y2,
nb_epoch=2)
Running the above code gives:
Epoch 1/2
1/1 [==============================] - 0s - loss: 1.0000
Epoch 2/2
1/1 [==============================] - 0s - loss: nan
This happens even when the number of examples is larger (in the above code, it's 1) โ just one example that's all zero is sufficient for the NaN to occur. Changing the zero to any other number, even 0.00001, removes the problem. The problem also goes away when you remove the ReLU layer.
I don't get this problem when running on the PyPI Theano build. The problem only occurs when I pull the latest Theano build from their Github repo. However, using the older Theano build isn't an option, because of the concat bug there.
Does anyone know what's going on? Thanks in advance!
Pangwei, I think I've traced it to weighted_objective in models.py (the weighting is, I believe, used to apply class weights); if you strip away the weighting, then the nan's go away when you compute the gradients:
import numpy as np
from keras.models import Sequential
from keras.layers.core import Activation
from keras.layers.convolutional import Convolution1D
from keras import objectives
from keras.models import weighted_objective
np.random.seed(0)
X2 = np.zeros([1, 1, 1])
Y2 = np.ones([1, 1, 1])
model = Sequential()
model.add(
Convolution1D(
nb_filter=1,
filter_length=1,
input_dim=1,
border_mode='valid'))
model.add(Activation('relu'))
model.compile(optimizer='sgd', loss='MSE')
train_loss_weighted = weighted_objective(objectives.get("MSE"))(model.y, model.y_train, model.weights, None)
train_loss_unweighted = objectives.get("MSE")(model.y,model.y_train).mean() #weighted_loss(model.y, model.y_train, model.weights, None)
thegrad_weighted = T.grad(train_loss_weighted, model.params)
thegrad_unweighted = T.grad(train_loss_unweighted, model.params)
train_ins = [model.X_train, model.y, model.weights]
f_grad_weighted = theano.function([model.X_train, model.y, model.weights], thegrad_weighted)
print("weighted",f_grad_weighted(X2,Y2,np.ones(Y2.shape[:-1] + (1,))))
f_grad_unweighted = theano.function([model.X_train, model.y], thegrad_unweighted)
print("unweighted",f_grad_unweighted(X2,Y2))
Running the above gives:
('weighted', [array([[[[ nan]]]]), array([ nan])])
('unweighted', [array([[[[ 0.]]]]), array([-1.])])
Here are the contents of weighted_objective for convenience...it looks promising as there is a division by filtered_weights.sum()
def weighted_objective(fn):
def weighted(y_true, y_pred, weights, mask=None):
# it's important that 0 * Inf == 0, not NaN, so we need to filter
# those out first
filtered_y_true = y_true[weights.nonzero()[:-1]]
filtered_y_pred = y_pred[weights.nonzero()[:-1]]
filtered_weights = weights[weights.nonzero()]
obj_output = fn(filtered_y_true, filtered_y_pred)
weighted = filtered_weights * obj_output
if mask is None:
# Instead of calling mean() here, we divide by the sum of filtered_weights.
return weighted.sum() / filtered_weights.sum()
else:
filtered_mask = mask[weights.nonzero()[:-1]]
return weighted.sum() / (filtered_mask * filtered_weights).sum()
return weighted
I don't see a problem in Theano here, but it isn't clear. If there is one, tell me. As you know a division by 0 would generate a nan and that could be the case. I don't understand enough to tell why the nan exist only in the dev version and not in the last release.
Yeah, I don't understand enough either. Can you replicate the error above? Nothing in the code obviously suggests that a division by 0 should exist when you're training a simple ReLU model.
NaN can appear in multiple circumstances, not just division 0 by 0. Usually the cause is a mathematically unstable computation. I had NaN problems with PReLu, but did not pinpoint the exact problem.
From Wikipedia on NaN (https://en.wikipedia.org/wiki/NaN#Operations_generating_NaN):
From Theano mailing list on NaN (https://groups.google.com/forum/#!topic/theano-users/UTn3hepy1sw):
A remark to anyone looking at this. (I'm trying to understand a possibly similar problem, and I sidetracked onto this because it's simpler than mine .)
It appears that using DebugMode in theano causes ReLU to have a nan gradient at 0.
Specifically, using current (dev) theano on linux, the following prints nan in DebugMode and 0.5 otherwise:
a =theano.tensor.fscalar("a")
b=theano.tensor.nnet.relu(a)
c=theano.grad(b,a)
f=theano.function([a],[c])
print f(0.0)
I don't know how reliably relu has a proper (sub)gradient outside debug mode, so I don't know if this issue is related to that question.
I am able to reproduce @bottler's error. The problem occurs when using optimizer=fast_compile but not when using optimizer=fast_run.
I also encountered the same bug. Is it a kind of "feature" of Theano? Probably better to report in Theano forums/issues.
Yes, this appears to be a Theano issue.
looks like it was already reported
I think this case is different. The current case is due to fast_compile disable stability optimization. They are needed in this case to make it work. To have fast_compile + just the stability optimization, use optimizer=stabilize.
I made an issue about this, as it get reported frequently enough to rethink that: #4442
If you update Theano to the dev version, it should work when optimizer=fast_compile or mode=FAST_COMPILE:
http://www.deeplearning.net/software/theano/install.html#bleeding-edge-install-instructions
@gw0 i meet the prelu Nan error.could u share your solution for this case?
@Lzc6996 Unfortunately, I did not solve it. Restarting the training a couple of times, changing the weight initialization, and lowering the learning rate helped.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.
Most helpful comment
NaN can appear in multiple circumstances, not just division 0 by 0. Usually the cause is a mathematically unstable computation. I had NaN problems with PReLu, but did not pinpoint the exact problem.
From Wikipedia on NaN (https://en.wikipedia.org/wiki/NaN#Operations_generating_NaN):
From Theano mailing list on NaN (https://groups.google.com/forum/#!topic/theano-users/UTn3hepy1sw):