Keras: fp16 doesn't work

Created on 11 Dec 2018 · 9Comments · Source: keras-team/keras

Hi,

I am attempting to train a MobileNetV2 classification model using fp16. I am using the latest GitHub version of Keras that contains the fp16 batch normalization fix. The model compiles and builds and trains, but the accuracy is always stuck at 25% while the lost constantly decreases.

When I disable K.set_floatx('float16') and stop using fp16 the model works fine and converges. Any idea why fp16 isn't working?

Source

teaglin

Most helpful comment

Thanks for the example. The issue was the network itself. MobileNetV2 just didn't want to train on fp16 with my data. I went back and rebuilt a custom shallower network based on mobile net and the model now trains correctly on fp16.

Thanks for the help!

teaglin on 13 Dec 2018

👍3 🚀1

All 9 comments

Do the following:

K.set_epsilon(1e-4)
If this still doesn't work - try to train the model with float32 for few epochs so it will stabilize and continue the training with float16

Golbstein on 12 Dec 2018

I tried setting K.set_epsilon(1e-4) and that did not work.

I also tried your second option, but I am having a tough time verifying if it actually works. I included an example. I am also running this on two – 2080tis, which have fp16 support.

So when I run this code everything seems to work and the accuracy goes up. But, two indicators that make me think it's not working. If I don't change the batch size the time per epoch is the exact same. In theory fp16 should be faster than fp32. Second if I change the batch size to 220 in the loop I get memory errors. But if I run training from the beginning with fp16 enabled using a batch of 220 I do not get memory errors.

batch_size = 180
floatPrecision = np.float32

for i, d in enumerate(split_data):
   generator = DataLoader(d, batch_size=batch_size, floatPrecision=floatPrecision)
   model.fit_generator(generator, steps_per_epoch=len(generator), epochs=1, verbose=1, nb_worker=6)
    if i == 1:
       K.set_floatx('float16')
       K.set_epsilon(1e-4)
       floatPrecision = np.float16
       batch_size = 220

teaglin on 12 Dec 2018

You have to first train for few epochs, and then "restart" the code and load the weights. Because your model was already defined to use float32 and it won't change this by K.set_floatx (after setting it to float16 you've to re-build the model and load the weights. Also, set use_multiprocessing=True in fit_generator to utilize your cores

batch_size = 180
floatPrecision = np.float32

model = get_model()

for i, d in enumerate(split_data):
    generator = DataLoader(d, batch_size=batch_size, floatPrecision=floatPrecision)
    model.fit_generator(generator, steps_per_epoch=len(generator), epochs=1, verbose=1, 
                        workers=6, use_multiprocessing=True)
    if i<1:
        model.save_weights('epoch0.h5')
    if i == 1:
        K.set_floatx('float16')
        K.set_epsilon(1e-4)
        floatPrecision = np.float16
        batch_size = 220
        model = get_model()
        model.load_weights('epoch0.h5')

Golbstein on 13 Dec 2018

I tried your suggestion and it still did not work. So I have 1.5 million images and a handful of classes. After a couple iterations the model gets to 75% accuracy. I restart the python script using fp16. I load the model weights and train. I've verified that it works by trying it with float32 and the training picks up around 75% accuracy and continues increasing.

But when I switch to float16 then load the weights and begin training the accuracy immediately drops to 25% and stays at 25%. Do you have any examples of fp16 training that work? I honestly don't think fp16 training works in keras.

teaglin on 13 Dec 2018

I honestly don't think fp16 training works in keras.

Here you go buddy

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

K.set_floatx('float16')
K.set_epsilon(1e-4)

batch_size = 128
num_classes = 10
epochs = 12

# input image dimensions
img_rows, img_cols = 28, 28

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float16')
x_test = x_test.astype('float16')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
60000/60000 [==============================] - 7s 118us/step - loss: 0.4379 - acc: 0.8641 - val_loss: 0.0706 - val_acc: 0.9775
Epoch 2/12
60000/60000 [==============================] - 4s 70us/step - loss: 0.1308 - acc: 0.9611 - val_loss: 0.0593 - val_acc: 0.9822
Epoch 3/12
60000/60000 [==============================] - 4s 72us/step - loss: 0.1030 - acc: 0.9697 - val_loss: 0.0595 - val_acc: 0.9820
Epoch 4/12
60000/60000 [==============================] - 4s 70us/step - loss: 0.0888 - acc: 0.9733 - val_loss: 0.0464 - val_acc: 0.9851
Epoch 5/12
60000/60000 [==============================] - 4s 70us/step - loss: 0.0781 - acc: 0.9763 - val_loss: 0.0402 - val_acc: 0.9875
Epoch 6/12
60000/60000 [==============================] - 4s 68us/step - loss: 0.0677 - acc: 0.9798 - val_loss: 0.0348 - val_acc: 0.9886
Epoch 7/12
60000/60000 [==============================] - 4s 69us/step - loss: 0.0612 - acc: 0.9810 - val_loss: 0.0346 - val_acc: 0.9895
Epoch 8/12
60000/60000 [==============================] - 4s 71us/step - loss: 0.0591 - acc: 0.9826 - val_loss: 0.0372 - val_acc: 0.9885
Epoch 9/12
60000/60000 [==============================] - 4s 72us/step - loss: 0.0568 - acc: 0.9835 - val_loss: 0.0433 - val_acc: 0.9874
Epoch 10/12
60000/60000 [==============================] - 4s 71us/step - loss: 0.0509 - acc: 0.9843 - val_loss: 0.0347 - val_acc: 0.9897
Epoch 11/12
60000/60000 [==============================] - 4s 70us/step - loss: 0.0519 - acc: 0.9846 - val_loss: 0.0355 - val_acc: 0.9892
Epoch 12/12
60000/60000 [==============================] - 4s 70us/step - loss: 0.0476 - acc: 0.9857 - val_loss: 0.0389 - val_acc: 0.9884
Test loss: 0.038860102272033695
Test accuracy: 0.9884