Models: fp16 support in the Object Detection API [Feature request]

Created on 22 Mar 2018 · 19Comments · Source: tensorflow/models

Featuere request: fp16/mixed precision support for training

Is fp16/mixed precision support on the roadmap for training networks using the Object Detection API?
If not, do you see any issues that needs to be resolved? It seems like you would either need to have two sets of pretrained models, our some automatic conversions between them.

System information

What is the top-level directory of the model you are using: N/A
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): N/A
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): N/A
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 1.5.0
Bazel version (if compiling from source): N/A
CUDA/cuDNN version: CUDA 9.1/cuDNN v7.1.3
GPU model and memory: Nvidia Titan V, 12GB
Exact command to reproduce: N/A

research feature

Source

eilifsolberg

👍7

All 19 comments

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

tensorflowbutler on 25 Apr 2018

👎3

@tensorflowbutler this is a feature request
@eilifsolberg hey I agree with this, can you edit the title and add [Feature Request]

austinmw on 25 Jun 2018

How is the progress of this problem? I know that the official 2.0 code has FP16, but what about the old code? Who knows how to write? I tried but failed... @austinmw @eilifsolberg @tensorflowbutler

gzchenjiajun on 22 Oct 2019

I guess there have come two solutions for this (for tensorflow >= 1.14):

If you use Nvdia NGC containers, you should be able to just set the environment variable TF_ENABLE_AUTO_MIXED_PRECISION to '1', e.g. by

os.environ['TF_ENABLE_AUTO_MIXED_PRECISION'] = '1'

Otherwise you need to wrap the optimizers in optimizer_builder.py like

optimizer = tf.train.experimental.enable_mixed_precision_graph_rewrite(optimizer)

See https://developer.nvidia.com/automatic-mixed-precision for more.

eilifsolberg on 22 Oct 2019

I have modified this document yesterday and also configured the environment variable os.environ['TF_ENABLE_AUTO_MIXED_PRECISION'] = '1' as required.

/research/object_detection/builders/optimizer_builder.py
  If optimizer_type == 'momentum_optimizer':
    Config = optimizer_config.momentum_optimizer
    Learning_rate = _create_learning_rate(config.learning_rate)
    Summary_vars.append(learning_rate)
    Optimizer = tf.train.MomentumOptimizer(
        Learning_rate,
        Momentum=config.momentum_optimizer_value)
    Optimizer = tf.train.experimental.enable_mixed_precision_graph_rewrite(optimizer)

But I found that it didn't work, I don't know what happened. The pre-training is ssd_resnet101_v1_fpn_shared_box_predictor_oid_512x512_sync.config

I was thinking that it might be caused by 2 problems:
The first is that the location of the environment variable is set incorrectly.
The second one is that this code is wrong (I feel that the root code is not implemented to this step)

@eilifsolberg

gzchenjiajun on 23 Oct 2019

I am training with gpu. I am currently upgrading to tensorflow-gpu to 1.14, but tensorflow is still 1.13. Is this related?

@eilifsolberg

gzchenjiajun on 23 Oct 2019

Do you use nvidia ngc containers >= 19.03? The environment variable only works in this case, and needs to be set before model_builder.py is called. This could be done by setting the environment variable in the shell before the script is run.

Otherwise you need tensorflow 1.14 or higher and edit model_builder.py

eilifsolberg on 23 Oct 2019

You should not both set the environment variable and wrap the optimizer, only one of them.

eilifsolberg on 23 Oct 2019

What do you mean by saying that the environment variable ['TF_ENABLE_AUTO_MIXED_PRECISION'] = '1' and tf.train.experimental.enable_mixed_precision_graph_rewrite only need to use one at the same time?
My GPU is Tesla T4, I think it should meet the requirements of nvidia ngc containers >= 19.03

I have been troubled by this problem for a long time, I really hope to solve it, thank you very much.
@eilifsolberg

gzchenjiajun on 24 Oct 2019

I ran the demo code of FP16, it is working
But in tensorflow/models object_detection, it’s been a long time or not.
@eilifsolberg

gzchenjiajun on 24 Oct 2019

Research/object_detection/builders/optimizer_builder.py:57
I added the enable_mixed_precision_graph_rewrite code on line 57 of optimizer_builder.py, but it still has no effect. Where should I write this code?

@eilifsolberg

gzchenjiajun on 24 Oct 2019

Another point is that my current training program will report this warning, which is the warning I have after I upgraded from tensorflow-gpu1.13 to 1.14.
I don't know if this has any effect.
W1024 13:45:00.194824 139707432818496 ag_logging.py:145] Entity > could not be transformed and will be executed as-is. Please report This to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the Bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting >: AssertionError: Bad argument number for Name: 3, expecting 4

@eilifsolberg

gzchenjiajun on 24 Oct 2019

I guess you should do it in line 76 of https://github.com/tensorflow/models/blob/master/research/object_detection/builders/optimizer_builder.py

as it will then be done independent of which optimizer you use. If you do this, don't change or set the environment variable.

Not sure about the last question, seems like this is something that you might want to report as an issue on the tensorflow github page (not tensorflow/models, but tensorflow/tensorflow).

eilifsolberg on 24 Oct 2019

I have added the relevant code to line 76, and other requirements are also made according to your guidelines.
But it has no effect. Is there any solution?
thank
@eilifsolberg

gzchenjiajun on 25 Oct 2019

Sorry, not sure what the problem might be. I think you should ask a question on StackOverflow.

eilifsolberg on 26 Oct 2019

Ok, I haven't researched it for the time being.
Tensorflow2.0 and related code have been upgraded but still mixed precision or no effect, Stack Overflow also asked
Half precision float - fp16 support in the Object Detection API(tensorflow) - Stack Overflow
Https://stackoverflow.com/questions/58585259/fp16-support-in-the-object-detection-apitensorflow

I am using tensorRT for speeding up for the time being.

@eilifsolberg

gzchenjiajun on 29 Oct 2019

I would like to ask how you use the mixing precision to speed up your project. thank
@eilifsolberg

gzchenjiajun on 29 Oct 2019

bump.. no updates?