Tensorrt: SampleUffMaskRCNN inference takes too long

Created on 24 Mar 2020 · 6Comments · Source: NVIDIA/TensorRT

Hi,
I am trying and working on sampleUffMaskRCNN example but test takes too long about 10 minutes.

./sample_uff_maskRCNN -d ../data/faster-rcnn/ --fp16
&&&& RUNNING TensorRT.sample_maskrcnn # ./sample_uff_maskRCNN -d ../data/faster-rcnn/ --fp16
[03/24/2020-_10:07:39_] [I] Building and running a GPU inference engine for Mask RCNN
[03/24/2020-10:07:42] [W] [TRT] Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[03/24/2020-10:07:49] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
n[03/24/2020-10:17:29] [I] [TRT] Detected 1 inputs and 2 output network tensors.
[03/24/2020-10:17:30] [W] [TRT] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[03/24/2020-10:17:35] [I] Run for 10 times with Batch Size 1
[03/24/2020-_10:17:35_] [I] Average inference time is 541.588 ms/frame
[03/24/2020-10:17:35] [I] Detected dog in../../../data/faster-rcnn/001763.ppm with confidence 99.9171 and coordinates (259.168, 13.8497, 488.325, 370.227)
[03/24/2020-10:17:35] [I] Detected dog in../../../data/faster-rcnn/001763.ppm with confidence 99.8545 and coordinates (27.6872, 45.7848, 317.037, 365.295)
[03/24/2020-10:17:35] [I] The results are stored in current directory: 0.ppm
&&&& PASSED TensorRT.sample_maskrcnn # ./sample_uff_maskRCNN -d ../data/faster-rcnn/ --fp16

What is the reason of that?
My graphic card is Quadro M2000.

Samples Long Builder Time question

Source

ucaglarcaliskan

Most helpful comment

You should check this line for inference time only:
[03/24/2020-10:17:35] [I] Average inference time is 541.588 ms/frame

And I think it is common for TensorRT to run minutes to build engine, especially for deep backbone like ResNet101 here. Per my experiments, the inference time can reach to ~ 40ms/frame on T4.

Tyler-D on 25 Mar 2020

🎉1 👍1

All 6 comments

Hi,
Could you provide your env setting information (e.g., TRT version, cuda, etc) and your whole processing?
Also, did you try other precision? like 32?

It was my output below. Of course, my setting was not completely same with you, but probably it can help you to reference something...

&&&& RUNNING TensorRT.sample_maskrcnn # ./sample_uff_maskRCNN -d ../data
[01/24/2020-16:55:03] [I] Building and running a GPU inference engine for Mask RCNN
[01/24/2020-16:55:08] [I] [TRT] 
[01/24/2020-16:55:08] [I] [TRT] --------------- Layers running on DLA: 
[01/24/2020-16:55:08] [I] [TRT] 
[01/24/2020-16:55:08] [I] [TRT] --------------- Layers running on GPU: 
[01/24/2020-16:55:08] [I] [TRT] conv1/convolution + activation_1/Relu, max_pooling2d_1/MaxPool, res2a_branch2a/convolution + activation_2/Relu, res2a_branch2b/convolution + activation_3/Relu, res2a_branch1/convolution, res2a_branch2c/convolution + add_1/add + res2a_out/Relu, res2b_branch2a/convolution + activation_4/Relu, res2b_branch2b/convolution + activation_5/Relu, res2b_branch2c/convolution + add_2/add + res2b_out/Relu, res2c_branch2a/convolution + activation_6/Relu, res2c_branch2b/convolution + activation_7/Relu, res2c_branch2c/convolution + add_3/add + res2c_out/Relu, res3a_branch2a/convolution + activation_8/Relu, res3a_branch2b/convolution + activation_9/Relu, res3a_branch1/convolution, res3a_branch2c/convolution + add_4/add + res3a_out/Relu, res3b_branch2a/convolution + activation_10/Relu, res3b_branch2b/convolution + activation_11/Relu, res3b_branch2c/convolution + add_5/add + res3b_out/Relu, res3c_branch2a/convolution + activation_12/Relu, res3c_branch2b/convolution + activation_13/Relu, res3c_branch2c/convolution + add_6/add + res3c_out/Relu, res3d_branch2a/convolution + activation_14/Relu, res3d_branch2b/convolution + activation_15/Relu, res3d_branch2c/convolution + add_7/add + res3d_out/Relu, res4a_branch2a/convolution + activation_16/Relu, res4a_branch2b/convolution + activation_17/Relu, res4a_branch1/convolution, res4a_branch2c/convolution + add_8/add + res4a_out/Relu, res4b_branch2a/convolution + activation_18/Relu, res4b_branch2b/convolution + activation_19/Relu, res4b_branch2c/convolution + add_9/add + res4b_out/Relu, res4c_branch2a/convolution + activation_20/Relu, res4c_branch2b/convolution + activation_21/Relu, res4c_branch2c/convolution + add_10/add + res4c_out/Relu, res4d_branch2a/convolution + activation_22/Relu, res4d_branch2b/convolution + activation_23/Relu, res4d_branch2c/convolution + add_11/add + res4d_out/Relu, res4e_branch2a/convolution + activation_24/Relu, res4e_branch2b/convolution + activation_25/Relu, res4e_branch2c/convolution + add_12/add + res4e_out/Relu, res4f_branch2a/convolution + activation_26/Relu, res4f_branch2b/convolution + activation_27/Relu, res4f_branch2c/convolution + add_13/add + res4f_out/Relu, res4g_branch2a/convolution + activation_28/Relu, res4g_branch2b/convolution + activation_29/Relu, res4g_branch2c/convolution + add_14/add + res4g_out/Relu, res4h_branch2a/convolution + activation_30/Relu, res4h_branch2b/convolution + activation_31/Relu, res4h_branch2c/convolution + add_15/add + res4h_out/Relu, res4i_branch2a/convolution + activation_32/Relu, res4i_branch2b/convolution + activation_33/Relu, res4i_branch2c/convolution + add_16/add + res4i_out/Relu, res4j_branch2a/convolution + activation_34/Relu, res4j_branch2b/convolution + activation_35/Relu, res4j_branch2c/convolution + add_17/add + res4j_out/Relu, res4k_branch2a/convolution + activation_36/Relu, res4k_branch2b/convolution + activation_37/Relu, res4k_branch2c/convolution + add_18/add + res4k_out/Relu, res4l_branch2a/convolution + activation_38/Relu, res4l_branch2b/convolution + activation_39/Relu, res4l_branch2c/convolution + add_19/add + res4l_out/Relu, res4m_branch2a/convolution + activation_40/Relu, res4m_branch2b/convolution + activation_41/Relu, res4m_branch2c/convolution + add_20/add + res4m_out/Relu, res4n_branch2a/convolution + activation_42/Relu, res4n_branch2b/convolution + activation_43/Relu, res4n_branch2c/convolution + add_21/add + res4n_out/Relu, res4o_branch2a/convolution + activation_44/Relu, res4o_branch2b/convolution + activation_45/Relu, res4o_branch2c/convolution + add_22/add + res4o_out/Relu, res4p_branch2a/convolution + activation_46/Relu, res4p_branch2b/convolution + activation_47/Relu, res4p_branch2c/convolution + add_23/add + res4p_out/Relu, res4q_branch2a/convolution + activation_48/Relu, res4q_branch2b/convolution + activation_49/Relu, res4q_branch2c/convolution + add_24/add + res4q_out/Relu, res4r_branch2a/convolution + activation_50/Relu, res4r_branch2b/convolution + activation_51/Relu, res4r_branch2c/convolution + add_25/add + res4r_out/Relu, res4s_branch2a/convolution + activation_52/Relu, res4s_branch2b/convolution + activation_53/Relu, res4s_branch2c/convolution + add_26/add + res4s_out/Relu, res4t_branch2a/convolution + activation_54/Relu, res4t_branch2b/convolution + activation_55/Relu, res4t_branch2c/convolution + add_27/add + res4t_out/Relu, res4u_branch2a/convolution + activation_56/Relu, res4u_branch2b/convolution + activation_57/Relu, res4u_branch2c/convolution + add_28/add + res4u_out/Relu, res4v_branch2a/convolution + activation_58/Relu, res4v_branch2b/convolution + activation_59/Relu, res4v_branch2c/convolution + add_29/add + res4v_out/Relu, res4w_branch2a/convolution + activation_60/Relu, res4w_branch2b/convolution + activation_61/Relu, res4w_branch2c/convolution + add_30/add + res4w_out/Relu, res5a_branch2a/convolution + activation_62/Relu, res5a_branch2b/convolution + activation_63/Relu, res5a_branch1/convolution, res5a_branch2c/convolution + add_31/add + res5a_out/Relu, res5b_branch2a/convolution + activation_64/Relu, res5b_branch2b/convolution + activation_65/Relu, res5b_branch2c/convolution + add_32/add + res5b_out/Relu, res5c_branch2a/convolution + activation_66/Relu, res5c_branch2b/convolution + activation_67/Relu, res5c_branch2c/convolution + add_33/add + res5c_out/Relu, fpn_c5p5/convolution, fpn_p5upsampled, fpn_c4p4/convolution + fpn_p4add/add, fpn_p4upsampled, fpn_c3p3/convolution + fpn_p3add/add, fpn_p3upsampled, fpn_c2p2/convolution + fpn_p2add/add, fpn_p2/convolution, rpn_model/rpn_conv_shared/convolution + rpn_model/rpn_conv_shared/Relu, rpn_model/rpn_class_raw/convolution || rpn_model/rpn_bbox_pred/convolution, rpn_model/permute_1/transpose + (Unnamed Layer* 1735) [Shuffle] + rpn_model/reshape_1/Reshape, rpn_model/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model/rpn_class_xxx/sub, rpn_model/rpn_class_xxx/Exp), rpn_model/rpn_class_xxx/Sum, rpn_model/rpn_class_xxx/truediv, fpn_p3/convolution, rpn_model_1/rpn_conv_shared/convolution + rpn_model_1/rpn_conv_shared/Relu, rpn_model_1/rpn_class_raw/convolution || rpn_model_1/rpn_bbox_pred/convolution, rpn_model_1/permute_1/transpose + (Unnamed Layer* 1757) [Shuffle] + rpn_model_1/reshape_1/Reshape, rpn_model_1/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model_1/rpn_class_xxx/sub, rpn_model_1/rpn_class_xxx/Exp), rpn_model_1/rpn_class_xxx/Sum, rpn_model_1/rpn_class_xxx/truediv, fpn_p4/convolution, rpn_model_2/rpn_conv_shared/convolution + rpn_model_2/rpn_conv_shared/Relu, rpn_model_2/rpn_class_raw/convolution || rpn_model_2/rpn_bbox_pred/convolution, rpn_model_2/permute_1/transpose + (Unnamed Layer* 1779) [Shuffle] + rpn_model_2/reshape_1/Reshape, rpn_model_2/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model_2/rpn_class_xxx/sub, rpn_model_2/rpn_class_xxx/Exp), rpn_model_2/rpn_class_xxx/Sum, rpn_model_2/rpn_class_xxx/truediv, fpn_p5/convolution, rpn_model_3/rpn_conv_shared/convolution + rpn_model_3/rpn_conv_shared/Relu, rpn_model_3/rpn_class_raw/convolution || rpn_model_3/rpn_bbox_pred/convolution, rpn_model_3/permute_1/transpose + (Unnamed Layer* 1801) [Shuffle] + rpn_model_3/reshape_1/Reshape, rpn_model_3/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model_3/rpn_class_xxx/sub, rpn_model_3/rpn_class_xxx/Exp), rpn_model_3/rpn_class_xxx/Sum, rpn_model_3/rpn_class_xxx/truediv, fpn_p6/MaxPool, rpn_model_4/rpn_conv_shared/convolution + rpn_model_4/rpn_conv_shared/Relu, rpn_model_4/rpn_class_raw/convolution || rpn_model_4/rpn_bbox_pred/convolution, rpn_model_4/permute_1/transpose + (Unnamed Layer* 1820) [Shuffle] + rpn_model_4/reshape_1/Reshape, rpn_model_4/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model_4/rpn_class_xxx/sub, rpn_model_4/rpn_class_xxx/Exp), rpn_model_4/rpn_class_xxx/Sum, rpn_model_4/rpn_class_xxx/truediv, rpn_model/reshape_2/Reshape, rpn_model_1/reshape_2/Reshape, rpn_model_2/reshape_2/Reshape, rpn_model_3/reshape_2/Reshape, rpn_model_4/reshape_2/Reshape, ROI, roi_align_classifier, mrcnn_class_conv1/convolution + activation_68/Relu, mrcnn_class_conv2/convolution + activation_69/Relu, mrcnn_bbox_fc/MatMul + mrcnn_bbox_fc/BiasAdd, mrcnn_class_logits/MatMul + mrcnn_class_logits/BiasAdd, mrcnn_class/Softmax, mrcnn_detection, mrcnn_detection_bboxes, roi_align_mask_trt, mrcnn_mask_conv1/convolution + activation_71/Relu, mrcnn_mask_conv2/convolution + activation_72/Relu, mrcnn_mask_conv3/convolution + activation_73/Relu, mrcnn_mask_conv4/convolution + activation_74/Relu, (Unnamed Layer* 1997) [Deconvolution] + mrcnn_mask_deconv/conv2d_transpose, mrcnn_mask_deconv/BiasAdd + mrcnn_mask_deconv/Relu, mrcnn_mask/convolution, mrcnn_mask/Sigmoid, 
[01/24/2020-16:59:49] [I] [TRT] Detected 1 inputs and 2 output network tensors.
[01/24/2020-16:59:54] [I] Run for 10 times with Batch Size 1
[01/24/2020-16:59:54] [I] Average inference time is 452.944 ms/frame
[01/24/2020-16:59:54] [I] Detected dog in../data/001763.ppm with confidence 99.9171 and coordinates (259.165, 13.8516, 488.325, 370.222)
[01/24/2020-16:59:54] [I] Detected dog in../data/001763.ppm with confidence 99.8545 and coordinates (27.6855, 45.785, 317.039, 365.296)
[01/24/2020-16:59:54] [I] The results are stored in current directory: 0.ppm
&&&& PASSED TensorRT.sample_maskrcnn # ./sample_uff_maskRCNN -d ../data

BTW my testing was using TRT6 on AGX.

chiehpower on 25 Mar 2020

❤1

Half2 support requested on hardware without native FP16 support, performance will be negatively affected.

Also try without the --fp16 flag. Your hardware (M2000) doesn't support it.

rmccorm4 on 25 Mar 2020

👍1

Hi,
Could you provide your env setting information (e.g., TRT version, cuda, etc) and your whole processing?
Also, did you try other precision? like 32?

It was my output below. Of course, my setting was not completely same with you, but probably it can help you to reference something...

&&&& RUNNING TensorRT.sample_maskrcnn # ./sample_uff_maskRCNN -d ../data
[01/24/2020-16:55:03] [I] Building and running a GPU inference engine for Mask RCNN
[01/24/2020-16:55:08] [I] [TRT] 
[01/24/2020-16:55:08] [I] [TRT] --------------- Layers running on DLA: 
[01/24/2020-16:55:08] [I] [TRT] 
[01/24/2020-16:55:08] [I] [TRT] --------------- Layers running on GPU: 
[01/24/2020-16:55:08] [I] [TRT] conv1/convolution + activation_1/Relu, max_pooling2d_1/MaxPool, res2a_branch2a/convolution + activation_2/Relu, res2a_branch2b/convolution + activation_3/Relu, res2a_branch1/convolution, res2a_branch2c/convolution + add_1/add + res2a_out/Relu, res2b_branch2a/convolution + activation_4/Relu, res2b_branch2b/convolution + activation_5/Relu, res2b_branch2c/convolution + add_2/add + res2b_out/Relu, res2c_branch2a/convolution + activation_6/Relu, res2c_branch2b/convolution + activation_7/Relu, res2c_branch2c/convolution + add_3/add + res2c_out/Relu, res3a_branch2a/convolution + activation_8/Relu, res3a_branch2b/convolution + activation_9/Relu, res3a_branch1/convolution, res3a_branch2c/convolution + add_4/add + res3a_out/Relu, res3b_branch2a/convolution + activation_10/Relu, res3b_branch2b/convolution + activation_11/Relu, res3b_branch2c/convolution + add_5/add + res3b_out/Relu, res3c_branch2a/convolution + activation_12/Relu, res3c_branch2b/convolution + activation_13/Relu, res3c_branch2c/convolution + add_6/add + res3c_out/Relu, res3d_branch2a/convolution + activation_14/Relu, res3d_branch2b/convolution + activation_15/Relu, res3d_branch2c/convolution + add_7/add + res3d_out/Relu, res4a_branch2a/convolution + activation_16/Relu, res4a_branch2b/convolution + activation_17/Relu, res4a_branch1/convolution, res4a_branch2c/convolution + add_8/add + res4a_out/Relu, res4b_branch2a/convolution + activation_18/Relu, res4b_branch2b/convolution + activation_19/Relu, res4b_branch2c/convolution + add_9/add + res4b_out/Relu, res4c_branch2a/convolution + activation_20/Relu, res4c_branch2b/convolution + activation_21/Relu, res4c_branch2c/convolution + add_10/add + res4c_out/Relu, res4d_branch2a/convolution + activation_22/Relu, res4d_branch2b/convolution + activation_23/Relu, res4d_branch2c/convolution + add_11/add + res4d_out/Relu, res4e_branch2a/convolution + activation_24/Relu, res4e_branch2b/convolution + activation_25/Relu, res4e_branch2c/convolution + add_12/add + res4e_out/Relu, res4f_branch2a/convolution + activation_26/Relu, res4f_branch2b/convolution + activation_27/Relu, res4f_branch2c/convolution + add_13/add + res4f_out/Relu, res4g_branch2a/convolution + activation_28/Relu, res4g_branch2b/convolution + activation_29/Relu, res4g_branch2c/convolution + add_14/add + res4g_out/Relu, res4h_branch2a/convolution + activation_30/Relu, res4h_branch2b/convolution + activation_31/Relu, res4h_branch2c/convolution + add_15/add + res4h_out/Relu, res4i_branch2a/convolution + activation_32/Relu, res4i_branch2b/convolution + activation_33/Relu, res4i_branch2c/convolution + add_16/add + res4i_out/Relu, res4j_branch2a/convolution + activation_34/Relu, res4j_branch2b/convolution + activation_35/Relu, res4j_branch2c/convolution + add_17/add + res4j_out/Relu, res4k_branch2a/convolution + activation_36/Relu, res4k_branch2b/convolution + activation_37/Relu, res4k_branch2c/convolution + add_18/add + res4k_out/Relu, res4l_branch2a/convolution + activation_38/Relu, res4l_branch2b/convolution + activation_39/Relu, res4l_branch2c/convolution + add_19/add + res4l_out/Relu, res4m_branch2a/convolution + activation_40/Relu, res4m_branch2b/convolution + activation_41/Relu, res4m_branch2c/convolution + add_20/add + res4m_out/Relu, res4n_branch2a/convolution + activation_42/Relu, res4n_branch2b/convolution + activation_43/Relu, res4n_branch2c/convolution + add_21/add + res4n_out/Relu, res4o_branch2a/convolution + activation_44/Relu, res4o_branch2b/convolution + activation_45/Relu, res4o_branch2c/convolution + add_22/add + res4o_out/Relu, res4p_branch2a/convolution + activation_46/Relu, res4p_branch2b/convolution + activation_47/Relu, res4p_branch2c/convolution + add_23/add + res4p_out/Relu, res4q_branch2a/convolution + activation_48/Relu, res4q_branch2b/convolution + activation_49/Relu, res4q_branch2c/convolution + add_24/add + res4q_out/Relu, res4r_branch2a/convolution + activation_50/Relu, res4r_branch2b/convolution + activation_51/Relu, res4r_branch2c/convolution + add_25/add + res4r_out/Relu, res4s_branch2a/convolution + activation_52/Relu, res4s_branch2b/convolution + activation_53/Relu, res4s_branch2c/convolution + add_26/add + res4s_out/Relu, res4t_branch2a/convolution + activation_54/Relu, res4t_branch2b/convolution + activation_55/Relu, res4t_branch2c/convolution + add_27/add + res4t_out/Relu, res4u_branch2a/convolution + activation_56/Relu, res4u_branch2b/convolution + activation_57/Relu, res4u_branch2c/convolution + add_28/add + res4u_out/Relu, res4v_branch2a/convolution + activation_58/Relu, res4v_branch2b/convolution + activation_59/Relu, res4v_branch2c/convolution + add_29/add + res4v_out/Relu, res4w_branch2a/convolution + activation_60/Relu, res4w_branch2b/convolution + activation_61/Relu, res4w_branch2c/convolution + add_30/add + res4w_out/Relu, res5a_branch2a/convolution + activation_62/Relu, res5a_branch2b/convolution + activation_63/Relu, res5a_branch1/convolution, res5a_branch2c/convolution + add_31/add + res5a_out/Relu, res5b_branch2a/convolution + activation_64/Relu, res5b_branch2b/convolution + activation_65/Relu, res5b_branch2c/convolution + add_32/add + res5b_out/Relu, res5c_branch2a/convolution + activation_66/Relu, res5c_branch2b/convolution + activation_67/Relu, res5c_branch2c/convolution + add_33/add + res5c_out/Relu, fpn_c5p5/convolution, fpn_p5upsampled, fpn_c4p4/convolution + fpn_p4add/add, fpn_p4upsampled, fpn_c3p3/convolution + fpn_p3add/add, fpn_p3upsampled, fpn_c2p2/convolution + fpn_p2add/add, fpn_p2/convolution, rpn_model/rpn_conv_shared/convolution + rpn_model/rpn_conv_shared/Relu, rpn_model/rpn_class_raw/convolution || rpn_model/rpn_bbox_pred/convolution, rpn_model/permute_1/transpose + (Unnamed Layer* 1735) [Shuffle] + rpn_model/reshape_1/Reshape, rpn_model/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model/rpn_class_xxx/sub, rpn_model/rpn_class_xxx/Exp), rpn_model/rpn_class_xxx/Sum, rpn_model/rpn_class_xxx/truediv, fpn_p3/convolution, rpn_model_1/rpn_conv_shared/convolution + rpn_model_1/rpn_conv_shared/Relu, rpn_model_1/rpn_class_raw/convolution || rpn_model_1/rpn_bbox_pred/convolution, rpn_model_1/permute_1/transpose + (Unnamed Layer* 1757) [Shuffle] + rpn_model_1/reshape_1/Reshape, rpn_model_1/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model_1/rpn_class_xxx/sub, rpn_model_1/rpn_class_xxx/Exp), rpn_model_1/rpn_class_xxx/Sum, rpn_model_1/rpn_class_xxx/truediv, fpn_p4/convolution, rpn_model_2/rpn_conv_shared/convolution + rpn_model_2/rpn_conv_shared/Relu, rpn_model_2/rpn_class_raw/convolution || rpn_model_2/rpn_bbox_pred/convolution, rpn_model_2/permute_1/transpose + (Unnamed Layer* 1779) [Shuffle] + rpn_model_2/reshape_1/Reshape, rpn_model_2/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model_2/rpn_class_xxx/sub, rpn_model_2/rpn_class_xxx/Exp), rpn_model_2/rpn_class_xxx/Sum, rpn_model_2/rpn_class_xxx/truediv, fpn_p5/convolution, rpn_model_3/rpn_conv_shared/convolution + rpn_model_3/rpn_conv_shared/Relu, rpn_model_3/rpn_class_raw/convolution || rpn_model_3/rpn_bbox_pred/convolution, rpn_model_3/permute_1/transpose + (Unnamed Layer* 1801) [Shuffle] + rpn_model_3/reshape_1/Reshape, rpn_model_3/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model_3/rpn_class_xxx/sub, rpn_model_3/rpn_class_xxx/Exp), rpn_model_3/rpn_class_xxx/Sum, rpn_model_3/rpn_class_xxx/truediv, fpn_p6/MaxPool, rpn_model_4/rpn_conv_shared/convolution + rpn_model_4/rpn_conv_shared/Relu, rpn_model_4/rpn_class_raw/convolution || rpn_model_4/rpn_bbox_pred/convolution, rpn_model_4/permute_1/transpose + (Unnamed Layer* 1820) [Shuffle] + rpn_model_4/reshape_1/Reshape, rpn_model_4/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model_4/rpn_class_xxx/sub, rpn_model_4/rpn_class_xxx/Exp), rpn_model_4/rpn_class_xxx/Sum, rpn_model_4/rpn_class_xxx/truediv, rpn_model/reshape_2/Reshape, rpn_model_1/reshape_2/Reshape, rpn_model_2/reshape_2/Reshape, rpn_model_3/reshape_2/Reshape, rpn_model_4/reshape_2/Reshape, ROI, roi_align_classifier, mrcnn_class_conv1/convolution + activation_68/Relu, mrcnn_class_conv2/convolution + activation_69/Relu, mrcnn_bbox_fc/MatMul + mrcnn_bbox_fc/BiasAdd, mrcnn_class_logits/MatMul + mrcnn_class_logits/BiasAdd, mrcnn_class/Softmax, mrcnn_detection, mrcnn_detection_bboxes, roi_align_mask_trt, mrcnn_mask_conv1/convolution + activation_71/Relu, mrcnn_mask_conv2/convolution + activation_72/Relu, mrcnn_mask_conv3/convolution + activation_73/Relu, mrcnn_mask_conv4/convolution + activation_74/Relu, (Unnamed Layer* 1997) [Deconvolution] + mrcnn_mask_deconv/conv2d_transpose, mrcnn_mask_deconv/BiasAdd + mrcnn_mask_deconv/Relu, mrcnn_mask/convolution, mrcnn_mask/Sigmoid, 
[01/24/2020-16:59:49] [I] [TRT] Detected 1 inputs and 2 output network tensors.
[01/24/2020-16:59:54] [I] Run for 10 times with Batch Size 1
[01/24/2020-16:59:54] [I] Average inference time is 452.944 ms/frame
[01/24/2020-16:59:54] [I] Detected dog in../data/001763.ppm with confidence 99.9171 and coordinates (259.165, 13.8516, 488.325, 370.222)
[01/24/2020-16:59:54] [I] Detected dog in../data/001763.ppm with confidence 99.8545 and coordinates (27.6855, 45.785, 317.039, 365.296)
[01/24/2020-16:59:54] [I] The results are stored in current directory: 0.ppm
&&&& PASSED TensorRT.sample_maskrcnn # ./sample_uff_maskRCNN -d ../data

BTW my testing was using TRT6 on AGX.

My CUDA version is 10.0 and I installed TensorRT 7.0.0.11 from source. Also, in your output, it takes 5 minutes. Is that normal? @chiehpower

Half2 support requested on hardware without native FP16 support, performance will be negatively affected.

Also try without the --fp16 flag. Your hardware (M2000) doesn't support it.

I try without that but the result is 6 minutes. I saw that there is a notation like
builder->setFp16Mode(mParams.fp16);

Is the default value fp16? How can i try with fp32?

ucaglarcaliskan on 25 Mar 2020

You should check this line for inference time only:
[03/24/2020-10:17:35] [I] Average inference time is 541.588 ms/frame

And I think it is common for TensorRT to run minutes to build engine, especially for deep backbone like ResNet101 here. Per my experiments, the inference time can reach to ~ 40ms/frame on T4.

Tyler-D on 25 Mar 2020

🎉1 👍1

The default is FP32.
According to what you said, you used the FP32 (without --fp16) that it took 6 mins. (It is normal time.) Besides, the reason of why using the fp16 took 10 mins was mentioned by Ryan.

Half2 support requested on hardware without native FP16 support, performance will be negatively affected.

If your device is not suitable for using fp16, the performance will be not good.

When you are going to run the first time, it will take for a long time on building an enigne.
You can serialize an engine, and deserialize an engine on running second time. I believe that it will be faster than first time.

BTW, JinTian told me this thing... Let you reference...
https://github.com/onnx/onnx-tensorrt/issues/413#issuecomment-598564905

chiehpower on 25 Mar 2020

Closing as I don't think the build time of the sample is an issue. Please feel free to continue to discuss though.

rmccorm4 on 26 Mar 2020

Was this page helpful?

0 / 5 - 0 ratings