Env:
Reproduce:
Error Message
[TensorRT] ERROR: UFFParser: Parser error: BoxPredictor_0/Reshape: Reshape: -1
dimension specified more than 1 time
Model:
https://drive.google.com/open?id=14r1d8vq7NmnmdW3IUBdhOnPQvYTJAxtm
Is there any update on this ?
@xiehaoina , @miteshgithub
While exporting the graph for inference set the batch size to 1 to resolve this issue.
I have faced a similar issue when i trained with tensorflow > 1.13.1 using latest object detection api. Setting the batch size to 1 resolved the issue but got error in grid anchor generator because of namespace mismatches in the latest object detection api. So better train with objection detection api which uses tensorflow <= 1.12.0.
Use tensorboard to compare your model and pretrained model available in model zoo repo.
Example: In the pretrained model available in zoo there are 486 nodes in MultipleGridAnchorGenerator whereas in your model 484 nodes are present. These might cause error.
Kindly check the following thread for more info
https://devtalk.nvidia.com/default/topic/1043557/tensorrt/error-uffparser-parser-error-boxpredictor_0-reshape-reshape-1-dimension-specified-more-than-1-/
@gopinath-r does the error in GridAnchor generator looked like this, by any change?
[TensorRT] INFO: UFFParser: parsing GridAnchor
[libprotobuf FATAL /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/externals/protobuf/aarch64/10.0/include/google/protobuf/repeated_field.h:1408] CHECK failed: (index) < (current_size_):
Traceback (most recent call last):
File "main.py", line 41, in <module>
parser.parse('tmp.uff', network)
RuntimeError: CHECK failed: (index) < (current_size_):
I've been trying to fix this. And indeed as you say, my MultipleGridAnchorGenerator is also 484 node in the output frozen model. Since TF 1.12.0 doesn't support CUDA 10 (which is installed on my host and on my Jetson Nano), I went with 1.14.0. Can you confirm that training/exporting the graph on TF 1.12.0 fixes this issue?
Hi Alex,
I got the same error when i used latest object detection api with tf
-1.13.1.
Traning using tf 1.3.1 and exporting the inference graph using 1.12.0
didn't solve the problem.
So i trained with tf 1.12.0 and the problem resolved.
On Mon, 2 Sep 2019, 2:21 pm Alex March, notifications@github.com wrote:
@gopinath-r https://github.com/gopinath-r does the error in GridAnchor
generator looked like this, by any change?[TensorRT] INFO: UFFParser: parsing GridAnchor
[libprotobuf FATAL /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/externals/protobuf/aarch64/10.0/include/google/protobuf/repeated_field.h:1408] CHECK failed: (index) < (current_size_):
Traceback (most recent call last):
File "main.py", line 41, in
parser.parse('tmp.uff', network)
RuntimeError: CHECK failed: (index) < (current_size_):I've been trying to fix this. And indeed as you say, my
MultipleGridAnchorGenerator is also 484 node in the output frozen model.
Since TF 1.12.0 doesn't support CUDA 10 (which is installed on my host and
on my Jetson Nano), I went with 1.14.0. Can you confirm that
training/exporting the graph on TF 1.12.0 fixes this issue?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/NVIDIA/TensorRT/issues/26?email_source=notifications&email_token=AGVG5CIMROQ4CL6MTKDTNQTQHTH23A5CNFSM4H6KDNZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5VFMGI#issuecomment-527062553,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGVG5CKMSW5I7XZY6UYMS6DQHTH23ANCNFSM4H6KDNZQ
.
@gopinath-r Thank you for your reply.
I was able to re-train and convert TF->UFF->TRT model with TF 1.13.1 installed (on both host and target), using (as suggested on devtalk an older version of tensorflor/models repo at checkpoint ae0a9409212d0072938fa60c9f85740bb89ced7e. With the train.py script located in object_detection/train.py rather than object_detection/legacy/train.py in the latest master.
This got rid of the UFFParser errors, further there was one more oversight on my part, where the number of classes in Postprocessor plugin was not set correctly, I'm retraining with 1 class. So the correct value should be 2:
Postprocessor = gs.create_plugin_node(
name="Postprocessor",
op="NMS_TRT",
shareLocation=1,
varianceEncodedInTarget=0,
backgroundLabelId=0,
confidenceThreshold=1e-8,
nmsThreshold=0.6,
topK=100,
keepTopK=100,
numClasses=2,
inputOrder=[0, 2, 1],
confSigmoid=1,
isNormalized=1
)
Also, the following lines were not relevant, since the inputs simply didn't exist on those layers and i've just commented these out:
graph.find_nodes_by_op("NMS_TRT")[0].input.remove("Input")
graph.find_nodes_by_name("Input")[0].input.remove("image_tensor:0")
I am now re-training the network with correct number of iterations to verify that it's working as expected.
@gopinath-r Thank you for your reply.
I was able to re-train and convert TF->UFF->TRT model with TF 1.13.1 installed (on both host and target), using (as suggested on devtalk an older version of tensorflor/models repo at checkpoint
ae0a9409212d0072938fa60c9f85740bb89ced7e. With the train.py script located in object_detection/train.py rather than object_detection/legacy/train.py in the latest master.This got rid of the UFFParser errors, further there was one more oversight on my part, where the number of classes in Postprocessor plugin was not set correctly, I'm retraining with 1 class. So the correct value should be 2:
Postprocessor = gs.create_plugin_node( name="Postprocessor", op="NMS_TRT", shareLocation=1, varianceEncodedInTarget=0, backgroundLabelId=0, confidenceThreshold=1e-8, nmsThreshold=0.6, topK=100, keepTopK=100, numClasses=2, inputOrder=[0, 2, 1], confSigmoid=1, isNormalized=1 )Also, the following lines were not relevant, since the inputs simply didn't exist on those layers and i've just commented these out:
graph.find_nodes_by_op("NMS_TRT")[0].input.remove("Input") graph.find_nodes_by_name("Input")[0].input.remove("image_tensor:0")I am now re-training the network with correct number of iterations to verify that it's working as expected.
I reverted to the commit you mentioned but got the following error.
ValueError: SSD Inception V2 feature extractor always usesscope returned by `conv_hyperparams_fn` for both the base feature extractor and the additional layers added since there is no arg_scope defined for the base feature extractor.
And uncommenting the override_base_feature_extractor_hyperparams: true in config, I got TypeError: Cannot convert a list containing a tensor of dtype <dtype: 'int32'> to <dtype: 'float32'> (Tensor is: <tf.Tensor 'Preprocessor/stack_1:0' shape=(1, 3) dtype=int32>)
@hosaka
In case this link helps: https://github.com/AastaNV/TRT_object_detection/blob/master/config/model_ssd_mobilenet_v2_coco_2018_03_29.py
From forum post here: https://devtalk.nvidia.com/default/topic/1069027/tensorrt/parsing-gridanchor-op-_gridanchor_trt-protobuf-repeated_field-h-1408-check-failed-index-lt-current_size_-/post/5415537/#5415537
Most helpful comment
@gopinath-r Thank you for your reply.
I was able to re-train and convert TF->UFF->TRT model with TF 1.13.1 installed (on both host and target), using (as suggested on devtalk an older version of tensorflor/models repo at checkpoint
ae0a9409212d0072938fa60c9f85740bb89ced7e. With the train.py script located in object_detection/train.py rather than object_detection/legacy/train.py in the latest master.This got rid of the UFFParser errors, further there was one more oversight on my part, where the number of classes in Postprocessor plugin was not set correctly, I'm retraining with 1 class. So the correct value should be 2:
Also, the following lines were not relevant, since the inputs simply didn't exist on those layers and i've just commented these out:
I am now re-training the network with correct number of iterations to verify that it's working as expected.