Keras: Problem with batch normalization layer

Created on 13 Jun 2017  路  13Comments  路  Source: keras-team/keras

I am trying to use batch normalization, but for some reason, even for the simplest network, when I run model.fit even for one epoch,the loss is nan and naturally no learning is performed.
For example - I use a simple model like this:
model = Sequential()
model.add(Conv2D(32,kernel_size=(3,3),activation='relu',input_shape=(16,16,3)))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(2,activation='softmax'))
model.compile (loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

If I remove the batch normalization, everything works great.
I am using keras 2.0.4 and theano 0.9.0, cuda 7. I tried removing cudnn, got the same results.
I tried a diffrent axis (axis=1) when calling BN, (although this should not be right) and got the same result.
What am I doing wrong ?
Thank YOU!

stale

Most helpful comment

Good catch, don't use binary_crossentropy with a categorical class encoding unless you actually have multiple labels per sample.

All 13 comments

Try BN with a range of different parameters, in particular for epsilon. Also try to see what happens for your model on CPU.

When Running in CPU Mode, everything is OK (But not practical...)
Changing the parameters didn't help (what parameters are there other then momentum, and epsilon ?)
Note that when using model.predict (before the first fit), i receive a valid output (not nan).
I used this code:
model = Sequential()
model.add(Conv2D(32,kernel_size=(3,3),activation='relu',input_shape=(8,8,3)))
model.add(BatchNormalization(axis =-1,epsilon=0.02,momentum=0.97))
model.add(Flatten())
model.add(Dense(2,activation='softmax'))
model.compile (loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

xTrain = numpy.random.randn(100,8,8,3)
yTrain = numpy.random.randint (0,2,size=(100,2))

model.fit (xTrain,yTrain)

Sounds like a problem with your CUDA/cuDNN install.

Thank You !
Other than that, everything was swell..
I use Cuda7
Using gpu device 0: GeForce GTX TITAN (CNMeM is disabled, cuDNN 4007)
This happens if I remove the cuDNN as well.
I am using an old Ubuntu, 12.04.

Also do you only have 2 categories that are mutually exclusive? You should encode them as 0s and 1s and I think your last layer should be:

model.add(Dense(1, activation='sigmoid'))

Good catch, don't use binary_crossentropy with a categorical class encoding unless you actually have multiple labels per sample.

I wasn't aware that this was not allowed... I have been using it quite a lot and had no problems so far..
I will check and update

It's not a keras issue, it's an understanding what you're doing issue

I changed the code to this, but got the same results.
result:
Epoch 1/1
100/100 [==============================] - 0s - loss: nan - acc: 0.0000e+00

code:
model = Sequential()
model.add(Conv2D(32,kernel_size=(3,3),activation='relu',input_shape=(8,8,3)))
model.add(BatchNormalization(axis =-1,epsilon=0.02,momentum=0.97))
model.add(Flatten())
model.add(Dense(1,activation='sigmoid'))
model.compile (loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

xTrain = numpy.random.randn(100,8,8,3)
yTrain = numpy.random.randint (0,2,size=(100,1))

model.fit (xTrain,yTrain)

I have the same NaN problem with batch normalization. Did you solve it?

No. I can only say that on a different computer the same code trained with no problem. It is probably something to do with an old cuda / umbuntu version.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

Did anyone find a solution for this problem?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Imorton-zd picture Imorton-zd  路  3Comments

farizrahman4u picture farizrahman4u  路  3Comments

snakeztc picture snakeztc  路  3Comments

harishkrishnav picture harishkrishnav  路  3Comments

vinayakumarr picture vinayakumarr  路  3Comments