Hi
While training the model using train.py as float 16 weights (by setting keras.backend.set_floatx('float16')) in train.py.
I get the following error -
TypeError: Input 'boxes' of 'NonMaxSuppressionV2' Op has type float16 that does not match expected type of float32.
Any suggestions on how to train with lower precision ?
Currently this is not supported. However, pull requests to support float16 would be very welcome :)
Should you be interested in adding support, the slack channel would be a good place to chat with some of the developers working on keras-retinanet. Alternatively, discussing things here would also be possible of course.
@de-vri-es thanks for you response. Was hoping to deploy an object detector using float16 precision on an IoT device.
Any suggestions would be appreciated.
It looks like what would be needed is to add support for float16 precision in the layers that currently don't have it.
NonMaxSuppressionV2 is from tensorflow though, so it would have to be patched there:
https://github.com/tensorflow/tensorflow/blob/9d2abd2ace95e6e352ba1292cc38c77b7bd1adc7/tensorflow/core/ops/image_ops.cc#L653
@vcarpani I think this will be fixed with your PR in Keras and https://github.com/fizyr/keras-retinanet/pull/786 right?
It should fix the error, but that does not mean that the FP16 training will converge.
This paper explains why.
The pull request is now merged, however for RetinaNet, I suggest to train with FP32 and then convert to FP16
Closing this, because of what @vcarpani said.
@vcarpani By your suggestion do you mean we train on fp32 and for inference convert the model to fp16?
Yes, we did not try to train with mixed precision, so training with fp32 and then converting to fp16 is the only solution viable in this moment. If you want to try training with mixed precision keep us updated on the results.
How I can convert training on fp32 model to Fp16?
Whot about results? Whot metricts do u have?
is there any plan to update the code to TF2.0 then use tf.keras instead of keras ? since keras will stop upgrade after TF2.0. Then the mixed precision will be achieved using tf.keras.mixed_precision.experimental.set_policy("default_mixed")
@dominicshanshan there is an experimental implementation of this algorithm using tensorflow here.
@hgaiser, I did an experiment, compared the training speed by using keras and pytorch in retinanet, throughput speed up 9-10x, with the same precision (using Resnet50 as backbone).
in pytorch, it opened mixed precision of course, and resized, this can also cause this insane speedup.
here is pytorch code: https://github.com/NVIDIA/retinanet-examples
test machine: NVIDIA DGX-1, 8 Tesla v100 16G GPU.
I can send you detailed comparison result if you interested. just let me know. Thanks for the effort !!
I can send you detailed comparison result if you interested. just let me know. Thanks for the effort !!
Yeah I would be interested! My email is in my profile. I've also been looking into pytorch, I see some interesting benefits.
I have another little thought, I will try to ditch keras multi-GPU distribution in this script and using Horovod instead. But I am not sure if your team already tried this idea ? if is yes, could you let me know if Horovod is better for speed up the training?
I have heard people working with it before, but haven't tried it. It can't be much worse than the keras implementation ;p
Here is a nice article about keras-retinanet on Horovod:
https://docs.microsoft.com/en-us/archive/blogs/machinelearning/how-to-do-distributed-deep-learning-for-object-detection-using-horovod-on-azure
@vcarpani , i will do this experiment, 7.2x speed up is not bad, close linear scale up

according my experiment analysis, if I use keras distribution system, 8 GPU vs 1 GPU is just speed up 2x, way far from linear scale up
What about Horovod ?

@hgaiser, still try to figure out how Microsoft can speed up 7x faster. Well, Horovod is better than Keras when I can put more batch per GPU. I will send you the detailed model config and excel result via email.

@hgaiser , by the way, pytorch is far more faster than Keras or Horovod Keras .. I reckon mainly because it changed to FP16 precision, and less intra communication in each node. Or pytorch distribution algorithm is more efficient than Keras.
@hgaiser , by the way, pytorch is far more faster than Keras or Horovod Keras .. I reckon mainly because it changed to FP16 precision, and less intra communication in each node. Or pytorch distribution algorithm is more efficient than Keras.
Wow that's a huge difference!