I'm trying to convert the frozen weights from faster_rcnn_resnet50_coco to TensorRT and I'm getting the following error when I call session.run():
2018-12-06 12:21:53.405304: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=100
2018-12-06 12:21:53.405458: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for my_trt_op_1
Is this a bug? If not, what is the cause of this error?
Here is a minimal example of the code:
trt_graph = trt.create_inference_graph(
input_graph_def=frozen_graph,
outputs=output_names,
max_batch_size=1,
max_workspace_size_bytes=4000000000,
precision_mode='FP16',
minimum_segment_size=50
)
tf_config = tf.ConfigProto()
tf_config.gpu_options.allow_growth = True
tf_sess = tf.Session(config=tf_config)
tf.import_graph_def(trt_graph, name='')\
scores, boxes, classes, num_detections = tf_sess.run([tf_scores, tf_boxes, tf_classes, tf_num_detections], feed_dict={
tf_input: image_resized[None, ...]
})
I'm following this guide in an NVIDIA repo if you want to see the complete code. Supposedly someone else got the same faster-rcnn working so it should work.
Has anyone successfully converted a faster-rcnn model in tensorflow 1.12?
Hi @samikama , can you please take a look at this query? Thanks.
hi @atyshka I face the same problem which you have met. Did you resolve a faster-rcnn model detection in trt_tf_model
@cheneyoung I have not. Can you post some system information so we can get to the bottom of this? I鈥檓 running TF 1.12 with Tensorrt 5 on Ubuntu 18.04 on a Jetson Xavier
@atyshka @samikama
I'm facing the exactly same issue. Here is my experiment settings:
Faster R-CNN model for object detection
Tensorflow 1.12
TensorRT5.0.2.6
CUDA/cuDNN: 9.0/7.1.4
The Faster R-CNN model is quite an important model that computational heavy for inference, but somehow seems not be supported by TRTEngine yet, perhaps because of the RPN and ROI.
Error log:
2019-02-13 10:11:33.585022: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=16, requested batch=21660
2019-02-13 10:11:33.585071: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for GridAnchorGenerator/my_trt_op_3
2019-02-13 10:11:33.586279: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=16, requested batch=21660
2019-02-13 10:11:33.586320: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for ClipToWindow/my_trt_op_1
2019-02-13 10:11:33.586870: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=16, requested batch=21660
2019-02-13 10:11:33.586890: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for ClipToWindow/Area/my_trt_op_0
2019-02-13 10:11:33.603856: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=16, requested batch=4800
2019-02-13 10:11:33.603902: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for my_trt_op_4
2019-02-13 10:11:34.833328: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=16, requested batch=4800
2019-02-13 10:11:34.833344: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=16, requested batch=4800
2019-02-13 10:11:34.833363: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for SecondStageBoxPredictor/BoxEncodingPredictor/my_trt_op_5
2019-02-13 10:11:34.833414: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for SecondStageBoxPredictor/ClassPredictor/my_trt_op_6
@oscarriddle I don鈥檛 think it鈥檚 a lack of support, I鈥檝e seen several people who successfully converted faster-Rcnn models in tf 1.8. Newer versions seem to be having the problem
@atyshka Actually, I have tried TF1.7+TRT4 to convert the faster-RCNN, but failed at the conversion stage. While TF1.12+TRT5 can successfully convert the graph, but failed at inference stage as reported. So from my point of view, newer trt is more likely to complete the job.
any update on this?
facing the same issue..
Tensorflow 1.13.1
TensorRT 5.0.2.6
CUDA/cuDNN: 10.0/7.4.1
I fixed this problem by setting the batch size to num_images x 300, for instance if you're going to process 8 images at a time, set the batch size to 2400.
num_images = 8
trt_graph = trt.create_inference_graph(
input_graph_def=tf.get_default_graph().as_graph_def(),
outputs=output_node,
max_batch_size=num_images * 300,
max_workspace_size_bytes=1 << 25,
precision_mode='FP16',
minimum_segment_size=50
)
On V100 the performance gain for FP16 was about 20%, I'm going to try INT8 next.
Tensorflow 1.13.1
TensorRT 5.0.2.6
CUDA/cuDNN: 10.0/7.4.1
This is because your batch size changed in your graph, you need to set para max_batch_size to the max batch size. for me, I have concat op in my graph so some of my op has 2*batch_size (32) of input(16). I set max_batch_size = 32, everything is ok
Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.
Most helpful comment
I fixed this problem by setting the batch size to num_images x 300, for instance if you're going to process 8 images at a time, set the batch size to 2400.
On V100 the performance gain for FP16 was about 20%, I'm going to try INT8 next.
Tensorflow 1.13.1
TensorRT 5.0.2.6
CUDA/cuDNN: 10.0/7.4.1