Models: too slow when Quantization-aware training ssd mobiletnet v2

Created on 29 May 2019 · 9Comments · Source: tensorflow/models

use ssdlite_mobilenet_v2_coco.config
and modify it by adding graph_rewriter

graph_rewriter { quantization { delay: 0 weight_bits: 8 activation_bits: 8 } }
but the time of each step is 10x slower than the config without graph_rewriter

the log without graph_rewriter

INFO:tensorflow:global step 199709: loss = 1.4051 (0.745 sec/step)
INFO:tensorflow:global step 199710: loss = 1.5033 (0.564 sec/step)
INFO:tensorflow:global step 199710: loss = 1.5033 (0.564 sec/step)
INFO:tensorflow:global step 199711: loss = 1.7374 (1.093 sec/step)
INFO:tensorflow:global step 199711: loss = 1.7374 (1.093 sec/step)
INFO:tensorflow:global step 199712: loss = 1.6265 (0.812 sec/step)

the log with graph_rewiter

INFO:tensorflow:global step 4554: loss = 9.3010 (4.084 sec/step)
INFO:tensorflow:global step 4554: loss = 9.3010 (4.084 sec/step)
INFO:tensorflow:global step 4555: loss = 8.2835 (4.055 sec/step)
INFO:tensorflow:global step 4555: loss = 8.2835 (4.055 sec/step)
INFO:tensorflow:global step 4556: loss = 8.0293 (4.060 sec/step)
INFO:tensorflow:global step 4556: loss = 8.0293 (4.060 sec/step)

the tensorflow-gpu version is 1.12.
is the speed normal, any idea for this?

lite research

Source

roadcode

Most helpful comment

I am facing the same problem. I am more concerned about inference. For inference, I converted the model (SSD ResNet 50) to TFLite format. A TFLite converted quantized model is 3 times slower than TFLite converted non-quantized model. This is the exact opposite of what I was expecting from quantization. Thanks.

skulhare on 20 Dec 2019

👍3

All 9 comments

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

tensorflowbutler on 30 May 2019

during multi-gpu Quantization-aware training, i check the the saved ckpt model, and find that when using create_training_graph, the node became

clone_1/FeatureExtractor/MobilenetV2/expanded_conv_6/expand/act_quant/clone_1/FeatureExtractor/MobilenetV2/expanded_conv_6/expand/act_quant/max/biased

when using create_eval_graph, actually the node is

FeatureExtractor/MobilenetV2/expanded_conv_6/expand/act_quant/max/biased

so i have to rewrite the node to save the frozen quantization pb, is there something wrong?

roadcode on 3 Jun 2019

4783

roadcode on 3 Jun 2019

I'm seeing the same behavior, aprox 0.125 secs/step with ssdlite_mobilenet_v2_coco float, and 0.8 secs/steps training the quantized version of the same model.

Info:

What is the top-level directory of the model you are using
ssd_mobilenet_v2_quantized_300x300_coco_2018_09_14

Have I written custom code
No

OS Platform and Distribution
Ubuntu 18.04 64-bit Intel

TensorFlow installed from
Sources

TensorFlow version
1.12

Bazel version
0.17.2

CUDA/cuDNN version
10.0

GPU model and memory
GTX2080TI with 11 GB

Exact command to reproduce
python3 /content/models/research/object_detection/legacy/train.py --logtostderr --train_dir={model_dir} --pipeline_config_path={pipeline_fname}

hvico on 13 Jun 2019

@hvico have you fix this problem?

roadcode on 26 Jun 2019

@hvico have you fix this problem?

Nop, I haven't. It trains much slower than the float model but it works.

hvico on 26 Jun 2019

Quantization aware training is much slower in general for vision models, since internally we have to do two Conv operations for every Conv in the float model. 10x slower seems a bit extreme though - looping in Suharsh to comment

alanchiao on 11 Jul 2019

👍1

I am also facing the same problem.Quantization aware training for ssd_mobilenet_v1_coco is upto 10x slower than without the quantization(by not adding this line "graph_rewriter { quantization { delay: 0 weight_bits: 8 activation_bits: 8 } }"). Is it because i am doing it wrong or it in not yet fixed by tensor flow?