models/research
CUDA_VISIBLE_DEVICES=-1 python object_detection/export_tflite_ssd_graph.py --pipeline_config_path=object_detection/graphs/ssd_mobilenet_v1_0.75_depth_quantized_300x300_coco14_sync_2018_07_03/pipeline.config --trained_checkpoint_prefix=object_detection/graphs/ssd_mobilenet_v1_0.75_depth_quantized_300x300_coco14_sync_2018_07_03/model.ckpt --output_directory=/tmp/ssd_mobilenet_v1_0.75_depth_quantized_300x300_coco14_sync_2018_07_03/ --add_postprocessing_op=true
@achowdhery Are the "quantized" models in the zoo checkpoint's supposed to contain the FakeQuant
(Min/Max)?
I tried ssd_mobilenet_v1_0.75_depth_quantized_coco
and ssd_mobilenet_v1_quantized_coco
.
I used the command from the tutorial to export a quantized TF-Lite model.
parvizp@cent-nano-0:~/Git/tensorflow.new/models/research$ CUDA_VISIBLE_DEVICES=-1 python object_detection/export_tflite_ssd_graph.py --pipeline_config_path=object_detection/graphs/ssd_mobilenet_v1_0.75_depth_quantized_300x300_coco14_sync_2018_07_03/pipeline.config --trained_checkpoint_prefix=object_detection/graphs/ssd_mobilenet_v1_0.75_depth_quantized_300x300_coco14_sync_2018_07_03/model.ckpt --output_directory=/tmp/ssd_mobilenet_v1_0.75_depth_quantized_300x300_coco14_sync_2018_07_03/ --add_postprocessing_op=true
2018-07-16 10:47:11.378546: E tensorflow/stream_executor/cuda/cuda_driver.cc:397] failed call to cuInit: CUDA_ERROR_NO_DEVICE
2018-07-16 10:47:11.378601: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] retrieving CUDA diagnostic information for host: cent-nano-0
2018-07-16 10:47:11.378610: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:170] hostname: cent-nano-0
2018-07-16 10:47:11.378641: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:194] libcuda reported version is: 396.26.0
2018-07-16 10:47:11.378670: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:198] kernel reported version is: 396.26.0
2018-07-16 10:47:11.378678: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:305] kernel version seems to match DSO: 396.26.0
2018-07-16 10:47:13.255871: W tensorflow/core/framework/op_kernel.cc:1275] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key BoxPredictor_0/BoxEncodingPredictor/act_quant/max not found in checkpoint
Traceback (most recent call last):
File "object_detection/export_tflite_ssd_graph.py", line 137, in <module>
tf.app.run(main)
File "/home/parvizp/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "object_detection/export_tflite_ssd_graph.py", line 133, in main
FLAGS.max_classes_per_detection)
File "/home/parvizp/Git/tensorflow.new/models/research/object_detection/export_tflite_ssd_graph_lib.py", line 261, in export_tflite_graph
initializer_nodes='')
File "/home/parvizp/Git/tensorflow.new/models/research/object_detection/exporter.py", line 72, in freeze_graph_with_def_protos
saver.restore(sess, input_checkpoint)
File "/home/parvizp/.local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1743, in restore
err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key BoxPredictor_0/BoxEncodingPredictor/act_quant/max not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
Caused by op u'save/RestoreV2', defined at:
File "object_detection/export_tflite_ssd_graph.py", line 137, in <module>
tf.app.run(main)
File "/home/parvizp/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "object_detection/export_tflite_ssd_graph.py", line 133, in main
FLAGS.max_classes_per_detection)
File "/home/parvizp/Git/tensorflow.new/models/research/object_detection/export_tflite_ssd_graph_lib.py", line 261, in export_tflite_graph
initializer_nodes='')
File "/home/parvizp/Git/tensorflow.new/models/research/object_detection/exporter.py", line 67, in freeze_graph_with_def_protos
tf.import_graph_def(input_graph_def, name='')
File "/home/parvizp/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/parvizp/.local/lib/python2.7/site-packages/tensorflow/python/framework/importer.py", line 442, in import_graph_def
_ProcessNewOps(graph)
File "/home/parvizp/.local/lib/python2.7/site-packages/tensorflow/python/framework/importer.py", line 234, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "/home/parvizp/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3360, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "/home/parvizp/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3251, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "/home/parvizp/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1716, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key BoxPredictor_0/BoxEncodingPredictor/act_quant/max not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
@parvizp Have you tried the checkpoint in the tutorial also copied here? https://storage.googleapis.com/download.tensorflow.org/models/tflite/ssd_mobilenet_v1_0.75_depth_300x300_quant_pets_2018_06_29.zip
We often get this error when we have not done the export_tflite_ssd_graph.py - double checking you have already passed the checkpoint through that to get the frozen graph. I can double check the ones in the model zoo
@achowdhery Thanks, I just tried your URL and the export succeeds.
Thanks. I have verified the models are converting from model zoo as well.
@achowdhery I got the similar errors when I converted models from https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
NotFoundError (see above for traceback): Key BoxPredictor_0/BoxEncodingPredictor/act_quant/max not found in checkpoint
The model I am trying to convert is
Please give exact instructions to reproduce. Need to make sure we see same issue
@achowdhery I followed your blog https://medium.com/tensorflow/training-and-serving-a-realtime-mobile-object-detector-in-30-minutes-with-cloud-tpus-b78971cf1193. The only difference is the model file I tried to export.
Export the model with:
python object_detection/export_tflite_ssd_graph.py \
--pipeline_config_path object_detection/samples/configs/ssd_mobilenet_v1_0.75_depth_quantized_300x300_coco14_sync.config \
--trained_checkpoint_prefix ssd_mobilenet_v1_0.75_depth_quantized_300x300_coco14_sync_2018_07_03/model.ckpt \
--output_directory ssd_mobilenet_v1_0.75_depth_quantized_300x300_coco14_sync_2018_07_03/tflite \
--add_postprocessing_op true \
If you start with this checkpoint, does it work:
https://storage.googleapis.com/download.tensorflow.org/models/tflite/ssd_mobilenet_v1_0.75_depth_300x300_quant_pets_2018_06_29.zip
Hi, I also met the same problem as melody-rain did.
The checkpoint of
https://storage.googleapis.com/download.tensorflow.org/models/tflite/ssd_mobilenet_v1_0.75_depth_300x300_quant_pets_2018_06_29.zip
is ok. But failed by starting with checkpoint from model zoo.
Thanks. The models have been updated in the model zoo now:
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
The models have been updated in the model zoo now
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md
@achowdhery hi锛孖 wanna to export ssdlite_mobilenetv2, and I meet the same issue,
Tensor name "BoxPredictor_0/BoxEncodingPredictor/biases" not found in checkpoint files
@RichardLiee What checkpoint are you using? Please provide a link.
@RichardLiee have you checked this?
hi @achowdhery ,
I tried to train a quantized model for mobile devices.
But when I converted the model to tflite, I got this:
tensorflow/lite/toco/tooling_util.cc:1694] Array FeatureExtractor/MobilenetV1/MobilenetV1/Squeeze_excitation_Conv2d_3_depthwise/mul, which is an input to the Conv operator producing the output array FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_pointwise/Relu6, is lacking min/max data, which is necessary for quantization.
If accuracy matters, either target a non-quantized output format, or run quantized training with your model from a floating point checkpoint to change the input graph to contain min/max information.
If you don't care about accuracy, you can pass --default_ranges_min= and --default_ranges_max= for easy experimentation.
Aborted (core dumped)
Please help deal with this, did the training process need more configs?
When I added --default_ranges_min=0 --default_ranges_max=6, the tflite accuracy drop so bad. But it works for some cases (decrease the accuracy a bit).
i am facing the exact same problem as yours @oopsodd, can someone give some hint or solution to solve this problem?
I didn't solve the problem. (default_ranges_min=0, default_ranges_max=6) option works for some specific image sizes input of the same network.
thank you @oopsodd for your. "dummy quantization" does not work well in performance as you said
I meet the same problem when train my own quantized model. How to fix it
Key BoxPredictor_0/BoxEncodingPredictor/act_quant/max not found in checkpoint
Faced this issue when tried to train with http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_quantized_300x300_coco_2018_09_14.tar.gz
I fixed that problem by using the downloaded model as a pretrained model. Then, in the configuration file for trained, this must be added to the end before retraining:
graph_rewriter {
quantization {
delay: 48000
weight_bits: 8
activation_bits: 8
}
}
By doing this, the model will be prepared for future quantization and export necessary information in checkpoint.
I fixed that problem by using the downloaded model as a pretrained model. Then, in the configuration file for trained, this must be added to the end before retraining:
graph_rewriter { quantization { delay: 48000 weight_bits: 8 activation_bits: 8 } }
By doing this, the model will be prepared for future quantization and export necessary information in checkpoint.
@chrissaher I tried what you suggested but I still get the same error.
Can you please provide the configuration file you are using for training?
Can you please provide the configuration file you are using for training?
This is the Config file I have used and configured proper paths in PATH to CONFIGURED
@chrissaher When I downloaded this http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_quantized_300x300_coco_2018_09_14.tar.gz model from the model zoo, it doesn't have a checkpoint file in it.
@raj-shah14 I successfully transformed that model to tflite using the following script:
tflite_convert \
--output_file="object_detection/zoo/ssd_mobilenet_v2_quantized_300x300_coco_2018_09_14/model.tflite" \
--graph_def_file="object_detection/zoo/ssd_mobilenet_v2_quantized_300x300_coco_2018_09_14/tflite_graph.pb" \
--inference_type=QUANTIZED_UINT8 \
--input_arrays="normalized_input_image_tensor" \
--output_arrays="TFLite_Detection_PostProcess","TFLite_Detection_PostProcess:1","TFLite_Detection_PostProcess:2","TFLite_Detection_PostProcess:3" \
--mean_values=128 \
--std_dev_values=128 \
--input_shapes=1,300,300,3 \
--change_concat_input_ranges=false \
--allow_nudging_weights_to_use_fast_gemm_kernel=true \
--allow_custom_ops
Please modify the path to your files correctly.
@chrissaher Thanks for your reply. I also was able to do this, but this is post training quantization and affects the accuracy too much. I was trying to do quantization aware training.
It would be great if you could guide me with that.
Hey @raj-shah14 I have the same issue.
@chrissaher adding
graph_rewriter { quantization { delay: 48000 weight_bits: 8 activation_bits: 8 } }
didn't work for me too.
I'm trying to train ssd_mobilenet_v2_quantized_300x300_coco using the legacy train.py and then freeze the checkpoint.
It fails on trying to freeze it.
When I train with --num_clones=1
the freeze succeeds but with --num_clones=4
it fails.
Did anyone solve it?
@oopsodd @NorwayLobster I meet the same issue when train a quantized model and try to covert it to tf.lite. Do you have some idea about this.
Did you try using the export script https://github.com/tensorflow/models/blob/master/research/object_detection/export_tflite_ssd_graph.py instead?
@achowdhery yeah, ssd model is fine for me. But when i use shared architecture, ppn. I train it with quantization . When use toco, the wrong information as following:
Array WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_0/Conv2D, which is an input to the Mul operator producing the output array WeightSharedConvolutionalBoxPredictor/Relu6, is lacking min/max data, which is necessary for quantization.
@doronAtuar i also met the same problem, and i look into the saved checkpoint, find that there are something wrong in it, there are some node like
clone_1/FeatureExtractor/MobilenetV2/expanded_conv_6/expand/act_quant/clone_1/FeatureExtractor/MobilenetV2/expanded_conv_6/expand/act_quant/max/biased
but actually it should be
FeatureExtractor/MobilenetV2/expanded_conv_6/expand/act_quant/max/biased
i fix this by rewrite the node name
@chrissaher Thanks for your reply. I also was able to do this, but this is post training quantization and affects the accuracy too much. I was trying to do quantization aware training.
It would be great if you could guide me with that.
Did you find help on quantization aware training?
I a trying to use ssd_mobilenet_v1_quantized_coco as my pretrained model for training but i get error. but when i use ssd_mobilenet_v1_coco its working but the loss is not converging and training is too slow.
I met the problem when I tried to train mobileV3(quantization aware), TF version:1.15.2, ubuntu 18.04
@doronAtuar i also met the same problem, and i look into the saved checkpoint, find that there are something wrong in it, there are some node like
clone_1/FeatureExtractor/MobilenetV2/expanded_conv_6/expand/act_quant/clone_1/FeatureExtractor/MobilenetV2/expanded_conv_6/expand/act_quant/max/biased
but actually it should be
FeatureExtractor/MobilenetV2/expanded_conv_6/expand/act_quant/max/biased
i fix this by rewrite the node name
I am facing the same problem when I used multi training. How did you rewrite the node?
Thanks you
Most helpful comment
The models have been updated in the model zoo now
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md