Keras: Problem with batch normalization layer

Created on 13 Jun 2017 · 13Comments · Source: keras-team/keras

I am trying to use batch normalization, but for some reason, even for the simplest network, when I run model.fit even for one epoch,the loss is nan and naturally no learning is performed.
For example - I use a simple model like this:
model = Sequential()
model.add(Conv2D(32,kernel_size=(3,3),activation='relu',input_shape=(16,16,3)))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(2,activation='softmax'))
model.compile (loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

If I remove the batch normalization, everything works great.
I am using keras 2.0.4 and theano 0.9.0, cuda 7. I tried removing cudnn, got the same results.
I tried a diffrent axis (axis=1) when calling BN, (although this should not be right) and got the same result.
What am I doing wrong ?
Thank YOU!

stale

Source

freundzi

Most helpful comment

Good catch, don't use binary_crossentropy with a categorical class encoding unless you actually have multiple labels per sample.

fchollet on 13 Jun 2017

👍2

All 13 comments

Try BN with a range of different parameters, in particular for epsilon. Also try to see what happens for your model on CPU.

fchollet on 13 Jun 2017

When Running in CPU Mode, everything is OK (But not practical...)
Changing the parameters didn't help (what parameters are there other then momentum, and epsilon ?)
Note that when using model.predict (before the first fit), i receive a valid output (not nan).
I used this code:
model = Sequential()
model.add(Conv2D(32,kernel_size=(3,3),activation='relu',input_shape=(8,8,3)))
model.add(BatchNormalization(axis =-1,epsilon=0.02,momentum=0.97))
model.add(Flatten())
model.add(Dense(2,activation='softmax'))
model.compile (loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

xTrain = numpy.random.randn(100,8,8,3)
yTrain = numpy.random.randint (0,2,size=(100,2))

model.fit (xTrain,yTrain)

freundzi on 13 Jun 2017

Sounds like a problem with your CUDA/cuDNN install.

fchollet on 13 Jun 2017

Thank You !
Other than that, everything was swell..
I use Cuda7
Using gpu device 0: GeForce GTX TITAN (CNMeM is disabled, cuDNN 4007)
This happens if I remove the cuDNN as well.
I am using an old Ubuntu, 12.04.

freundzi on 13 Jun 2017

Also do you only have 2 categories that are mutually exclusive? You should encode them as 0s and 1s and I think your last layer should be:

model.add(Dense(1, activation='sigmoid'))

the-moliver on 13 Jun 2017

Good catch, don't use binary_crossentropy with a categorical class encoding unless you actually have multiple labels per sample.

fchollet on 13 Jun 2017

👍2

I wasn't aware that this was not allowed... I have been using it quite a lot and had no problems so far..
I will check and update

freundzi on 13 Jun 2017

It's not a keras issue, it's an understanding what you're doing issue

the-moliver on 13 Jun 2017

😄1

I changed the code to this, but got the same results.
result:
Epoch 1/1
100/100 [==============================] - 0s - loss: nan - acc: 0.0000e+00

code:
model = Sequential()
model.add(Conv2D(32,kernel_size=(3,3),activation='relu',input_shape=(8,8,3)))
model.add(BatchNormalization(axis =-1,epsilon=0.02,momentum=0.97))
model.add(Flatten())
model.add(Dense(1,activation='sigmoid'))
model.compile (loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

xTrain = numpy.random.randn(100,8,8,3)
yTrain = numpy.random.randint (0,2,size=(100,1))

model.fit (xTrain,yTrain)

freundzi on 14 Jun 2017

I have the same NaN problem with batch normalization. Did you solve it?

yongxuUSTC on 26 Jun 2017

No. I can only say that on a different computer the same code trained with no problem. It is probably something to do with an old cuda / umbuntu version.

freundzi on 26 Jun 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.