Hi,
I am trying and working on sampleUffMaskRCNN example but test takes too long about 10 minutes.
./sample_uff_maskRCNN -d ../data/faster-rcnn/ --fp16
&&&& RUNNING TensorRT.sample_maskrcnn # ./sample_uff_maskRCNN -d ../data/faster-rcnn/ --fp16
[03/24/2020-_10:07:39_] [I] Building and running a GPU inference engine for Mask RCNN
[03/24/2020-10:07:42] [W] [TRT] Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[03/24/2020-10:07:49] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
n[03/24/2020-10:17:29] [I] [TRT] Detected 1 inputs and 2 output network tensors.
[03/24/2020-10:17:30] [W] [TRT] Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles
[03/24/2020-10:17:35] [I] Run for 10 times with Batch Size 1
[03/24/2020-_10:17:35_] [I] Average inference time is 541.588 ms/frame
[03/24/2020-10:17:35] [I] Detected dog in../../../data/faster-rcnn/001763.ppm with confidence 99.9171 and coordinates (259.168, 13.8497, 488.325, 370.227)
[03/24/2020-10:17:35] [I] Detected dog in../../../data/faster-rcnn/001763.ppm with confidence 99.8545 and coordinates (27.6872, 45.7848, 317.037, 365.295)
[03/24/2020-10:17:35] [I] The results are stored in current directory: 0.ppm
&&&& PASSED TensorRT.sample_maskrcnn # ./sample_uff_maskRCNN -d ../data/faster-rcnn/ --fp16
What is the reason of that?
My graphic card is Quadro M2000.
Hi,
Could you provide your env setting information (e.g., TRT version, cuda, etc) and your whole processing?
Also, did you try other precision? like 32?
It was my output below. Of course, my setting was not completely same with you, but probably it can help you to reference something...
&&&& RUNNING TensorRT.sample_maskrcnn # ./sample_uff_maskRCNN -d ../data
[01/24/2020-16:55:03] [I] Building and running a GPU inference engine for Mask RCNN
[01/24/2020-16:55:08] [I] [TRT]
[01/24/2020-16:55:08] [I] [TRT] --------------- Layers running on DLA:
[01/24/2020-16:55:08] [I] [TRT]
[01/24/2020-16:55:08] [I] [TRT] --------------- Layers running on GPU:
[01/24/2020-16:55:08] [I] [TRT] conv1/convolution + activation_1/Relu, max_pooling2d_1/MaxPool, res2a_branch2a/convolution + activation_2/Relu, res2a_branch2b/convolution + activation_3/Relu, res2a_branch1/convolution, res2a_branch2c/convolution + add_1/add + res2a_out/Relu, res2b_branch2a/convolution + activation_4/Relu, res2b_branch2b/convolution + activation_5/Relu, res2b_branch2c/convolution + add_2/add + res2b_out/Relu, res2c_branch2a/convolution + activation_6/Relu, res2c_branch2b/convolution + activation_7/Relu, res2c_branch2c/convolution + add_3/add + res2c_out/Relu, res3a_branch2a/convolution + activation_8/Relu, res3a_branch2b/convolution + activation_9/Relu, res3a_branch1/convolution, res3a_branch2c/convolution + add_4/add + res3a_out/Relu, res3b_branch2a/convolution + activation_10/Relu, res3b_branch2b/convolution + activation_11/Relu, res3b_branch2c/convolution + add_5/add + res3b_out/Relu, res3c_branch2a/convolution + activation_12/Relu, res3c_branch2b/convolution + activation_13/Relu, res3c_branch2c/convolution + add_6/add + res3c_out/Relu, res3d_branch2a/convolution + activation_14/Relu, res3d_branch2b/convolution + activation_15/Relu, res3d_branch2c/convolution + add_7/add + res3d_out/Relu, res4a_branch2a/convolution + activation_16/Relu, res4a_branch2b/convolution + activation_17/Relu, res4a_branch1/convolution, res4a_branch2c/convolution + add_8/add + res4a_out/Relu, res4b_branch2a/convolution + activation_18/Relu, res4b_branch2b/convolution + activation_19/Relu, res4b_branch2c/convolution + add_9/add + res4b_out/Relu, res4c_branch2a/convolution + activation_20/Relu, res4c_branch2b/convolution + activation_21/Relu, res4c_branch2c/convolution + add_10/add + res4c_out/Relu, res4d_branch2a/convolution + activation_22/Relu, res4d_branch2b/convolution + activation_23/Relu, res4d_branch2c/convolution + add_11/add + res4d_out/Relu, res4e_branch2a/convolution + activation_24/Relu, res4e_branch2b/convolution + activation_25/Relu, res4e_branch2c/convolution + add_12/add + res4e_out/Relu, res4f_branch2a/convolution + activation_26/Relu, res4f_branch2b/convolution + activation_27/Relu, res4f_branch2c/convolution + add_13/add + res4f_out/Relu, res4g_branch2a/convolution + activation_28/Relu, res4g_branch2b/convolution + activation_29/Relu, res4g_branch2c/convolution + add_14/add + res4g_out/Relu, res4h_branch2a/convolution + activation_30/Relu, res4h_branch2b/convolution + activation_31/Relu, res4h_branch2c/convolution + add_15/add + res4h_out/Relu, res4i_branch2a/convolution + activation_32/Relu, res4i_branch2b/convolution + activation_33/Relu, res4i_branch2c/convolution + add_16/add + res4i_out/Relu, res4j_branch2a/convolution + activation_34/Relu, res4j_branch2b/convolution + activation_35/Relu, res4j_branch2c/convolution + add_17/add + res4j_out/Relu, res4k_branch2a/convolution + activation_36/Relu, res4k_branch2b/convolution + activation_37/Relu, res4k_branch2c/convolution + add_18/add + res4k_out/Relu, res4l_branch2a/convolution + activation_38/Relu, res4l_branch2b/convolution + activation_39/Relu, res4l_branch2c/convolution + add_19/add + res4l_out/Relu, res4m_branch2a/convolution + activation_40/Relu, res4m_branch2b/convolution + activation_41/Relu, res4m_branch2c/convolution + add_20/add + res4m_out/Relu, res4n_branch2a/convolution + activation_42/Relu, res4n_branch2b/convolution + activation_43/Relu, res4n_branch2c/convolution + add_21/add + res4n_out/Relu, res4o_branch2a/convolution + activation_44/Relu, res4o_branch2b/convolution + activation_45/Relu, res4o_branch2c/convolution + add_22/add + res4o_out/Relu, res4p_branch2a/convolution + activation_46/Relu, res4p_branch2b/convolution + activation_47/Relu, res4p_branch2c/convolution + add_23/add + res4p_out/Relu, res4q_branch2a/convolution + activation_48/Relu, res4q_branch2b/convolution + activation_49/Relu, res4q_branch2c/convolution + add_24/add + res4q_out/Relu, res4r_branch2a/convolution + activation_50/Relu, res4r_branch2b/convolution + activation_51/Relu, res4r_branch2c/convolution + add_25/add + res4r_out/Relu, res4s_branch2a/convolution + activation_52/Relu, res4s_branch2b/convolution + activation_53/Relu, res4s_branch2c/convolution + add_26/add + res4s_out/Relu, res4t_branch2a/convolution + activation_54/Relu, res4t_branch2b/convolution + activation_55/Relu, res4t_branch2c/convolution + add_27/add + res4t_out/Relu, res4u_branch2a/convolution + activation_56/Relu, res4u_branch2b/convolution + activation_57/Relu, res4u_branch2c/convolution + add_28/add + res4u_out/Relu, res4v_branch2a/convolution + activation_58/Relu, res4v_branch2b/convolution + activation_59/Relu, res4v_branch2c/convolution + add_29/add + res4v_out/Relu, res4w_branch2a/convolution + activation_60/Relu, res4w_branch2b/convolution + activation_61/Relu, res4w_branch2c/convolution + add_30/add + res4w_out/Relu, res5a_branch2a/convolution + activation_62/Relu, res5a_branch2b/convolution + activation_63/Relu, res5a_branch1/convolution, res5a_branch2c/convolution + add_31/add + res5a_out/Relu, res5b_branch2a/convolution + activation_64/Relu, res5b_branch2b/convolution + activation_65/Relu, res5b_branch2c/convolution + add_32/add + res5b_out/Relu, res5c_branch2a/convolution + activation_66/Relu, res5c_branch2b/convolution + activation_67/Relu, res5c_branch2c/convolution + add_33/add + res5c_out/Relu, fpn_c5p5/convolution, fpn_p5upsampled, fpn_c4p4/convolution + fpn_p4add/add, fpn_p4upsampled, fpn_c3p3/convolution + fpn_p3add/add, fpn_p3upsampled, fpn_c2p2/convolution + fpn_p2add/add, fpn_p2/convolution, rpn_model/rpn_conv_shared/convolution + rpn_model/rpn_conv_shared/Relu, rpn_model/rpn_class_raw/convolution || rpn_model/rpn_bbox_pred/convolution, rpn_model/permute_1/transpose + (Unnamed Layer* 1735) [Shuffle] + rpn_model/reshape_1/Reshape, rpn_model/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model/rpn_class_xxx/sub, rpn_model/rpn_class_xxx/Exp), rpn_model/rpn_class_xxx/Sum, rpn_model/rpn_class_xxx/truediv, fpn_p3/convolution, rpn_model_1/rpn_conv_shared/convolution + rpn_model_1/rpn_conv_shared/Relu, rpn_model_1/rpn_class_raw/convolution || rpn_model_1/rpn_bbox_pred/convolution, rpn_model_1/permute_1/transpose + (Unnamed Layer* 1757) [Shuffle] + rpn_model_1/reshape_1/Reshape, rpn_model_1/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model_1/rpn_class_xxx/sub, rpn_model_1/rpn_class_xxx/Exp), rpn_model_1/rpn_class_xxx/Sum, rpn_model_1/rpn_class_xxx/truediv, fpn_p4/convolution, rpn_model_2/rpn_conv_shared/convolution + rpn_model_2/rpn_conv_shared/Relu, rpn_model_2/rpn_class_raw/convolution || rpn_model_2/rpn_bbox_pred/convolution, rpn_model_2/permute_1/transpose + (Unnamed Layer* 1779) [Shuffle] + rpn_model_2/reshape_1/Reshape, rpn_model_2/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model_2/rpn_class_xxx/sub, rpn_model_2/rpn_class_xxx/Exp), rpn_model_2/rpn_class_xxx/Sum, rpn_model_2/rpn_class_xxx/truediv, fpn_p5/convolution, rpn_model_3/rpn_conv_shared/convolution + rpn_model_3/rpn_conv_shared/Relu, rpn_model_3/rpn_class_raw/convolution || rpn_model_3/rpn_bbox_pred/convolution, rpn_model_3/permute_1/transpose + (Unnamed Layer* 1801) [Shuffle] + rpn_model_3/reshape_1/Reshape, rpn_model_3/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model_3/rpn_class_xxx/sub, rpn_model_3/rpn_class_xxx/Exp), rpn_model_3/rpn_class_xxx/Sum, rpn_model_3/rpn_class_xxx/truediv, fpn_p6/MaxPool, rpn_model_4/rpn_conv_shared/convolution + rpn_model_4/rpn_conv_shared/Relu, rpn_model_4/rpn_class_raw/convolution || rpn_model_4/rpn_bbox_pred/convolution, rpn_model_4/permute_1/transpose + (Unnamed Layer* 1820) [Shuffle] + rpn_model_4/reshape_1/Reshape, rpn_model_4/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model_4/rpn_class_xxx/sub, rpn_model_4/rpn_class_xxx/Exp), rpn_model_4/rpn_class_xxx/Sum, rpn_model_4/rpn_class_xxx/truediv, rpn_model/reshape_2/Reshape, rpn_model_1/reshape_2/Reshape, rpn_model_2/reshape_2/Reshape, rpn_model_3/reshape_2/Reshape, rpn_model_4/reshape_2/Reshape, ROI, roi_align_classifier, mrcnn_class_conv1/convolution + activation_68/Relu, mrcnn_class_conv2/convolution + activation_69/Relu, mrcnn_bbox_fc/MatMul + mrcnn_bbox_fc/BiasAdd, mrcnn_class_logits/MatMul + mrcnn_class_logits/BiasAdd, mrcnn_class/Softmax, mrcnn_detection, mrcnn_detection_bboxes, roi_align_mask_trt, mrcnn_mask_conv1/convolution + activation_71/Relu, mrcnn_mask_conv2/convolution + activation_72/Relu, mrcnn_mask_conv3/convolution + activation_73/Relu, mrcnn_mask_conv4/convolution + activation_74/Relu, (Unnamed Layer* 1997) [Deconvolution] + mrcnn_mask_deconv/conv2d_transpose, mrcnn_mask_deconv/BiasAdd + mrcnn_mask_deconv/Relu, mrcnn_mask/convolution, mrcnn_mask/Sigmoid,
[01/24/2020-16:59:49] [I] [TRT] Detected 1 inputs and 2 output network tensors.
[01/24/2020-16:59:54] [I] Run for 10 times with Batch Size 1
[01/24/2020-16:59:54] [I] Average inference time is 452.944 ms/frame
[01/24/2020-16:59:54] [I] Detected dog in../data/001763.ppm with confidence 99.9171 and coordinates (259.165, 13.8516, 488.325, 370.222)
[01/24/2020-16:59:54] [I] Detected dog in../data/001763.ppm with confidence 99.8545 and coordinates (27.6855, 45.785, 317.039, 365.296)
[01/24/2020-16:59:54] [I] The results are stored in current directory: 0.ppm
&&&& PASSED TensorRT.sample_maskrcnn # ./sample_uff_maskRCNN -d ../data
BTW my testing was using TRT6 on AGX.
Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
Also try without the --fp16 flag. Your hardware (M2000) doesn't support it.
Hi,
Could you provide your env setting information (e.g., TRT version, cuda, etc) and your whole processing?
Also, did you try other precision? like 32?It was my output below. Of course, my setting was not completely same with you, but probably it can help you to reference something...
&&&& RUNNING TensorRT.sample_maskrcnn # ./sample_uff_maskRCNN -d ../data [01/24/2020-16:55:03] [I] Building and running a GPU inference engine for Mask RCNN [01/24/2020-16:55:08] [I] [TRT] [01/24/2020-16:55:08] [I] [TRT] --------------- Layers running on DLA: [01/24/2020-16:55:08] [I] [TRT] [01/24/2020-16:55:08] [I] [TRT] --------------- Layers running on GPU: [01/24/2020-16:55:08] [I] [TRT] conv1/convolution + activation_1/Relu, max_pooling2d_1/MaxPool, res2a_branch2a/convolution + activation_2/Relu, res2a_branch2b/convolution + activation_3/Relu, res2a_branch1/convolution, res2a_branch2c/convolution + add_1/add + res2a_out/Relu, res2b_branch2a/convolution + activation_4/Relu, res2b_branch2b/convolution + activation_5/Relu, res2b_branch2c/convolution + add_2/add + res2b_out/Relu, res2c_branch2a/convolution + activation_6/Relu, res2c_branch2b/convolution + activation_7/Relu, res2c_branch2c/convolution + add_3/add + res2c_out/Relu, res3a_branch2a/convolution + activation_8/Relu, res3a_branch2b/convolution + activation_9/Relu, res3a_branch1/convolution, res3a_branch2c/convolution + add_4/add + res3a_out/Relu, res3b_branch2a/convolution + activation_10/Relu, res3b_branch2b/convolution + activation_11/Relu, res3b_branch2c/convolution + add_5/add + res3b_out/Relu, res3c_branch2a/convolution + activation_12/Relu, res3c_branch2b/convolution + activation_13/Relu, res3c_branch2c/convolution + add_6/add + res3c_out/Relu, res3d_branch2a/convolution + activation_14/Relu, res3d_branch2b/convolution + activation_15/Relu, res3d_branch2c/convolution + add_7/add + res3d_out/Relu, res4a_branch2a/convolution + activation_16/Relu, res4a_branch2b/convolution + activation_17/Relu, res4a_branch1/convolution, res4a_branch2c/convolution + add_8/add + res4a_out/Relu, res4b_branch2a/convolution + activation_18/Relu, res4b_branch2b/convolution + activation_19/Relu, res4b_branch2c/convolution + add_9/add + res4b_out/Relu, res4c_branch2a/convolution + activation_20/Relu, res4c_branch2b/convolution + activation_21/Relu, res4c_branch2c/convolution + add_10/add + res4c_out/Relu, res4d_branch2a/convolution + activation_22/Relu, res4d_branch2b/convolution + activation_23/Relu, res4d_branch2c/convolution + add_11/add + res4d_out/Relu, res4e_branch2a/convolution + activation_24/Relu, res4e_branch2b/convolution + activation_25/Relu, res4e_branch2c/convolution + add_12/add + res4e_out/Relu, res4f_branch2a/convolution + activation_26/Relu, res4f_branch2b/convolution + activation_27/Relu, res4f_branch2c/convolution + add_13/add + res4f_out/Relu, res4g_branch2a/convolution + activation_28/Relu, res4g_branch2b/convolution + activation_29/Relu, res4g_branch2c/convolution + add_14/add + res4g_out/Relu, res4h_branch2a/convolution + activation_30/Relu, res4h_branch2b/convolution + activation_31/Relu, res4h_branch2c/convolution + add_15/add + res4h_out/Relu, res4i_branch2a/convolution + activation_32/Relu, res4i_branch2b/convolution + activation_33/Relu, res4i_branch2c/convolution + add_16/add + res4i_out/Relu, res4j_branch2a/convolution + activation_34/Relu, res4j_branch2b/convolution + activation_35/Relu, res4j_branch2c/convolution + add_17/add + res4j_out/Relu, res4k_branch2a/convolution + activation_36/Relu, res4k_branch2b/convolution + activation_37/Relu, res4k_branch2c/convolution + add_18/add + res4k_out/Relu, res4l_branch2a/convolution + activation_38/Relu, res4l_branch2b/convolution + activation_39/Relu, res4l_branch2c/convolution + add_19/add + res4l_out/Relu, res4m_branch2a/convolution + activation_40/Relu, res4m_branch2b/convolution + activation_41/Relu, res4m_branch2c/convolution + add_20/add + res4m_out/Relu, res4n_branch2a/convolution + activation_42/Relu, res4n_branch2b/convolution + activation_43/Relu, res4n_branch2c/convolution + add_21/add + res4n_out/Relu, res4o_branch2a/convolution + activation_44/Relu, res4o_branch2b/convolution + activation_45/Relu, res4o_branch2c/convolution + add_22/add + res4o_out/Relu, res4p_branch2a/convolution + activation_46/Relu, res4p_branch2b/convolution + activation_47/Relu, res4p_branch2c/convolution + add_23/add + res4p_out/Relu, res4q_branch2a/convolution + activation_48/Relu, res4q_branch2b/convolution + activation_49/Relu, res4q_branch2c/convolution + add_24/add + res4q_out/Relu, res4r_branch2a/convolution + activation_50/Relu, res4r_branch2b/convolution + activation_51/Relu, res4r_branch2c/convolution + add_25/add + res4r_out/Relu, res4s_branch2a/convolution + activation_52/Relu, res4s_branch2b/convolution + activation_53/Relu, res4s_branch2c/convolution + add_26/add + res4s_out/Relu, res4t_branch2a/convolution + activation_54/Relu, res4t_branch2b/convolution + activation_55/Relu, res4t_branch2c/convolution + add_27/add + res4t_out/Relu, res4u_branch2a/convolution + activation_56/Relu, res4u_branch2b/convolution + activation_57/Relu, res4u_branch2c/convolution + add_28/add + res4u_out/Relu, res4v_branch2a/convolution + activation_58/Relu, res4v_branch2b/convolution + activation_59/Relu, res4v_branch2c/convolution + add_29/add + res4v_out/Relu, res4w_branch2a/convolution + activation_60/Relu, res4w_branch2b/convolution + activation_61/Relu, res4w_branch2c/convolution + add_30/add + res4w_out/Relu, res5a_branch2a/convolution + activation_62/Relu, res5a_branch2b/convolution + activation_63/Relu, res5a_branch1/convolution, res5a_branch2c/convolution + add_31/add + res5a_out/Relu, res5b_branch2a/convolution + activation_64/Relu, res5b_branch2b/convolution + activation_65/Relu, res5b_branch2c/convolution + add_32/add + res5b_out/Relu, res5c_branch2a/convolution + activation_66/Relu, res5c_branch2b/convolution + activation_67/Relu, res5c_branch2c/convolution + add_33/add + res5c_out/Relu, fpn_c5p5/convolution, fpn_p5upsampled, fpn_c4p4/convolution + fpn_p4add/add, fpn_p4upsampled, fpn_c3p3/convolution + fpn_p3add/add, fpn_p3upsampled, fpn_c2p2/convolution + fpn_p2add/add, fpn_p2/convolution, rpn_model/rpn_conv_shared/convolution + rpn_model/rpn_conv_shared/Relu, rpn_model/rpn_class_raw/convolution || rpn_model/rpn_bbox_pred/convolution, rpn_model/permute_1/transpose + (Unnamed Layer* 1735) [Shuffle] + rpn_model/reshape_1/Reshape, rpn_model/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model/rpn_class_xxx/sub, rpn_model/rpn_class_xxx/Exp), rpn_model/rpn_class_xxx/Sum, rpn_model/rpn_class_xxx/truediv, fpn_p3/convolution, rpn_model_1/rpn_conv_shared/convolution + rpn_model_1/rpn_conv_shared/Relu, rpn_model_1/rpn_class_raw/convolution || rpn_model_1/rpn_bbox_pred/convolution, rpn_model_1/permute_1/transpose + (Unnamed Layer* 1757) [Shuffle] + rpn_model_1/reshape_1/Reshape, rpn_model_1/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model_1/rpn_class_xxx/sub, rpn_model_1/rpn_class_xxx/Exp), rpn_model_1/rpn_class_xxx/Sum, rpn_model_1/rpn_class_xxx/truediv, fpn_p4/convolution, rpn_model_2/rpn_conv_shared/convolution + rpn_model_2/rpn_conv_shared/Relu, rpn_model_2/rpn_class_raw/convolution || rpn_model_2/rpn_bbox_pred/convolution, rpn_model_2/permute_1/transpose + (Unnamed Layer* 1779) [Shuffle] + rpn_model_2/reshape_1/Reshape, rpn_model_2/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model_2/rpn_class_xxx/sub, rpn_model_2/rpn_class_xxx/Exp), rpn_model_2/rpn_class_xxx/Sum, rpn_model_2/rpn_class_xxx/truediv, fpn_p5/convolution, rpn_model_3/rpn_conv_shared/convolution + rpn_model_3/rpn_conv_shared/Relu, rpn_model_3/rpn_class_raw/convolution || rpn_model_3/rpn_bbox_pred/convolution, rpn_model_3/permute_1/transpose + (Unnamed Layer* 1801) [Shuffle] + rpn_model_3/reshape_1/Reshape, rpn_model_3/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model_3/rpn_class_xxx/sub, rpn_model_3/rpn_class_xxx/Exp), rpn_model_3/rpn_class_xxx/Sum, rpn_model_3/rpn_class_xxx/truediv, fpn_p6/MaxPool, rpn_model_4/rpn_conv_shared/convolution + rpn_model_4/rpn_conv_shared/Relu, rpn_model_4/rpn_class_raw/convolution || rpn_model_4/rpn_bbox_pred/convolution, rpn_model_4/permute_1/transpose + (Unnamed Layer* 1820) [Shuffle] + rpn_model_4/reshape_1/Reshape, rpn_model_4/rpn_class_xxx/Max, fusedPointwiseNode(rpn_model_4/rpn_class_xxx/sub, rpn_model_4/rpn_class_xxx/Exp), rpn_model_4/rpn_class_xxx/Sum, rpn_model_4/rpn_class_xxx/truediv, rpn_model/reshape_2/Reshape, rpn_model_1/reshape_2/Reshape, rpn_model_2/reshape_2/Reshape, rpn_model_3/reshape_2/Reshape, rpn_model_4/reshape_2/Reshape, ROI, roi_align_classifier, mrcnn_class_conv1/convolution + activation_68/Relu, mrcnn_class_conv2/convolution + activation_69/Relu, mrcnn_bbox_fc/MatMul + mrcnn_bbox_fc/BiasAdd, mrcnn_class_logits/MatMul + mrcnn_class_logits/BiasAdd, mrcnn_class/Softmax, mrcnn_detection, mrcnn_detection_bboxes, roi_align_mask_trt, mrcnn_mask_conv1/convolution + activation_71/Relu, mrcnn_mask_conv2/convolution + activation_72/Relu, mrcnn_mask_conv3/convolution + activation_73/Relu, mrcnn_mask_conv4/convolution + activation_74/Relu, (Unnamed Layer* 1997) [Deconvolution] + mrcnn_mask_deconv/conv2d_transpose, mrcnn_mask_deconv/BiasAdd + mrcnn_mask_deconv/Relu, mrcnn_mask/convolution, mrcnn_mask/Sigmoid, [01/24/2020-16:59:49] [I] [TRT] Detected 1 inputs and 2 output network tensors. [01/24/2020-16:59:54] [I] Run for 10 times with Batch Size 1 [01/24/2020-16:59:54] [I] Average inference time is 452.944 ms/frame [01/24/2020-16:59:54] [I] Detected dog in../data/001763.ppm with confidence 99.9171 and coordinates (259.165, 13.8516, 488.325, 370.222) [01/24/2020-16:59:54] [I] Detected dog in../data/001763.ppm with confidence 99.8545 and coordinates (27.6855, 45.785, 317.039, 365.296) [01/24/2020-16:59:54] [I] The results are stored in current directory: 0.ppm &&&& PASSED TensorRT.sample_maskrcnn # ./sample_uff_maskRCNN -d ../dataBTW my testing was using TRT6 on AGX.
My CUDA version is 10.0 and I installed TensorRT 7.0.0.11 from source. Also, in your output, it takes 5 minutes. Is that normal? @chiehpower
Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
Also try without the
--fp16flag. Your hardware (M2000) doesn't support it.
I try without that but the result is 6 minutes. I saw that there is a notation like
builder->setFp16Mode(mParams.fp16);
Is the default value fp16? How can i try with fp32?
You should check this line for inference time only:
[03/24/2020-10:17:35] [I] Average inference time is 541.588 ms/frame
And I think it is common for TensorRT to run minutes to build engine, especially for deep backbone like ResNet101 here. Per my experiments, the inference time can reach to ~ 40ms/frame on T4.
The default is FP32.
According to what you said, you used the FP32 (without --fp16) that it took 6 mins. (It is normal time.) Besides, the reason of why using the fp16 took 10 mins was mentioned by Ryan.
Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
If your device is not suitable for using fp16, the performance will be not good.
When you are going to run the first time, it will take for a long time on building an enigne.
You can serialize an engine, and deserialize an engine on running second time. I believe that it will be faster than first time.
BTW, JinTian told me this thing... Let you reference...
https://github.com/onnx/onnx-tensorrt/issues/413#issuecomment-598564905
Closing as I don't think the build time of the sample is an issue. Please feel free to continue to discuss though.
Most helpful comment
You should check this line for inference time only:
[03/24/2020-10:17:35] [I] Average inference time is 541.588 ms/frame
And I think it is common for TensorRT to run minutes to build engine, especially for deep backbone like ResNet101 here. Per my experiments, the inference time can reach to ~ 40ms/frame on T4.