Keras-retinanet: Error while training with float16 precision

Created on 19 Jun 2018 · 23Comments · Source: fizyr/keras-retinanet

While training the model using train.py as float 16 weights (by setting keras.backend.set_floatx('float16')) in train.py.

I get the following error -

TypeError: Input 'boxes' of 'NonMaxSuppressionV2' Op has type float16 that does not match expected type of float32.

Any suggestions on how to train with lower precision ?

Source

raghavgurbaxani

All 23 comments

Currently this is not supported. However, pull requests to support float16 would be very welcome :)

Should you be interested in adding support, the slack channel would be a good place to chat with some of the developers working on keras-retinanet. Alternatively, discussing things here would also be possible of course.

de-vri-es on 19 Jun 2018

@de-vri-es thanks for you response. Was hoping to deploy an object detector using float16 precision on an IoT device.

Any suggestions would be appreciated.

raghavgurbaxani on 19 Jun 2018

It looks like what would be needed is to add support for float16 precision in the layers that currently don't have it.

NonMaxSuppressionV2 is from tensorflow though, so it would have to be patched there:
https://github.com/tensorflow/tensorflow/blob/9d2abd2ace95e6e352ba1292cc38c77b7bd1adc7/tensorflow/core/ops/image_ops.cc#L653

de-vri-es on 20 Jun 2018

@vcarpani I think this will be fixed with your PR in Keras and https://github.com/fizyr/keras-retinanet/pull/786 right?

hgaiser on 4 Dec 2018

It should fix the error, but that does not mean that the FP16 training will converge.
This paper explains why.

vcarpani on 4 Dec 2018

The pull request is now merged, however for RetinaNet, I suggest to train with FP32 and then convert to FP16

vcarpani on 7 Dec 2018

Closing this, because of what @vcarpani said.

hgaiser on 10 Jan 2019

@vcarpani By your suggestion do you mean we train on fp32 and for inference convert the model to fp16?

tonmoyborah on 15 May 2019

Yes, we did not try to train with mixed precision, so training with fp32 and then converting to fp16 is the only solution viable in this moment. If you want to try training with mixed precision keep us updated on the results.

vcarpani on 15 May 2019

How I can convert training on fp32 model to Fp16?
Whot about results? Whot metricts do u have?

gosha20777 on 13 Aug 2019

👍1

is there any plan to update the code to TF2.0 then use tf.keras instead of keras ? since keras will stop upgrade after TF2.0. Then the mixed precision will be achieved using tf.keras.mixed_precision.experimental.set_policy("default_mixed")

dominicshanshan on 3 Feb 2020

@dominicshanshan there is an experimental implementation of this algorithm using tensorflow here.

hgaiser on 3 Feb 2020

👍1

@hgaiser, I did an experiment, compared the training speed by using keras and pytorch in retinanet, throughput speed up 9-10x, with the same precision (using Resnet50 as backbone).

in pytorch, it opened mixed precision of course, and resized, this can also cause this insane speedup.

here is pytorch code: https://github.com/NVIDIA/retinanet-examples

test machine: NVIDIA DGX-1, 8 Tesla v100 16G GPU.

I can send you detailed comparison result if you interested. just let me know. Thanks for the effort !!

dominicshanshan on 3 Feb 2020

I can send you detailed comparison result if you interested. just let me know. Thanks for the effort !!

Yeah I would be interested! My email is in my profile. I've also been looking into pytorch, I see some interesting benefits.

hgaiser on 3 Feb 2020

👍1

I have another little thought, I will try to ditch keras multi-GPU distribution in this script and using Horovod instead. But I am not sure if your team already tried this idea ? if is yes, could you let me know if Horovod is better for speed up the training?

dominicshanshan on 3 Feb 2020

I have heard people working with it before, but haven't tried it. It can't be much worse than the keras implementation ;p

hgaiser on 3 Feb 2020

Here is a nice article about keras-retinanet on Horovod:
https://docs.microsoft.com/en-us/archive/blogs/machinelearning/how-to-do-distributed-deep-learning-for-object-detection-using-horovod-on-azure

vcarpani on 5 Feb 2020

👍1

@vcarpani , i will do this experiment, 7.2x speed up is not bad, close linear scale up

dominicshanshan on 5 Feb 2020

according my experiment analysis, if I use keras distribution system, 8 GPU vs 1 GPU is just speed up 2x, way far from linear scale up

dominicshanshan on 5 Feb 2020

What about Horovod ?

hgaiser on 10 Feb 2020

@hgaiser, still try to figure out how Microsoft can speed up 7x faster. Well, Horovod is better than Keras when I can put more batch per GPU. I will send you the detailed model config and excel result via email.

dominicshanshan on 10 Feb 2020

@hgaiser , by the way, pytorch is far more faster than Keras or Horovod Keras .. I reckon mainly because it changed to FP16 precision, and less intra communication in each node. Or pytorch distribution algorithm is more efficient than Keras.

dominicshanshan on 10 Feb 2020

@hgaiser , by the way, pytorch is far more faster than Keras or Horovod Keras .. I reckon mainly because it changed to FP16 precision, and less intra communication in each node. Or pytorch distribution algorithm is more efficient than Keras.

Wow that's a huge difference!

hgaiser on 10 Feb 2020

Was this page helpful?

0 / 5 - 0 ratings