Models: Tensorflow-TensorRT "Engine buffer is full"

Created on 6 Dec 2018 · 11Comments · Source: tensorflow/models

System information

What is the top-level directory of the model you are using: object_detection
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
TensorFlow installed from (source or binary): Binary
TensorFlow version (use command below): 1.12
Bazel version (if compiling from source): n/a
CUDA/cuDNN version: 10/7
GPU model and memory: Jetson Xavier 16GB shared w/ RAM
Exact command to reproduce: sess.run()

Describe the problem

I'm trying to convert the frozen weights from faster_rcnn_resnet50_coco to TensorRT and I'm getting the following error when I call session.run():

2018-12-06 12:21:53.405304: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=1, requested batch=100 2018-12-06 12:21:53.405458: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for my_trt_op_1

Is this a bug? If not, what is the cause of this error?

Source code / logs

Here is a minimal example of the code:

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=output_names,
    max_batch_size=1,
    max_workspace_size_bytes=4000000000,
    precision_mode='FP16',
    minimum_segment_size=50
)

tf_config = tf.ConfigProto()
tf_config.gpu_options.allow_growth = True
tf_sess = tf.Session(config=tf_config)
tf.import_graph_def(trt_graph, name='')\
scores, boxes, classes, num_detections = tf_sess.run([tf_scores, tf_boxes, tf_classes, tf_num_detections], feed_dict={
    tf_input: image_resized[None, ...]
})

I'm following this guide in an NVIDIA repo if you want to see the complete code. Supposedly someone else got the same faster-rcnn working so it should work.

awaiting model gardener bug

Source

atyshka

Most helpful comment

I fixed this problem by setting the batch size to num_images x 300, for instance if you're going to process 8 images at a time, set the batch size to 2400.

num_images = 8
trt_graph = trt.create_inference_graph(
    input_graph_def=tf.get_default_graph().as_graph_def(),
    outputs=output_node,
    max_batch_size=num_images * 300,
    max_workspace_size_bytes=1 << 25,
    precision_mode='FP16',  
    minimum_segment_size=50 
    )

On V100 the performance gain for FP16 was about 20%, I'm going to try INT8 next.

Tensorflow 1.13.1
TensorRT 5.0.2.6
CUDA/cuDNN: 10.0/7.4.1

alexkokh on 25 Apr 2019

👍2

All 11 comments

Has anyone successfully converted a faster-rcnn model in tensorflow 1.12?

atyshka on 17 Dec 2018

Hi @samikama , can you please take a look at this query? Thanks.

msymp on 28 Dec 2018

hi @atyshka I face the same problem which you have met. Did you resolve a faster-rcnn model detection in trt_tf_model

cheneyoung on 11 Jan 2019

@cheneyoung I have not. Can you post some system information so we can get to the bottom of this? I’m running TF 1.12 with Tensorrt 5 on Ubuntu 18.04 on a Jetson Xavier

atyshka on 11 Jan 2019

@atyshka @samikama
I'm facing the exactly same issue. Here is my experiment settings:
Faster R-CNN model for object detection
Tensorflow 1.12
TensorRT5.0.2.6
CUDA/cuDNN: 9.0/7.1.4

The Faster R-CNN model is quite an important model that computational heavy for inference, but somehow seems not be supported by TRTEngine yet, perhaps because of the RPN and ROI.

Error log:

2019-02-13 10:11:33.585022: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=16, requested batch=21660
2019-02-13 10:11:33.585071: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for GridAnchorGenerator/my_trt_op_3
2019-02-13 10:11:33.586279: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=16, requested batch=21660
2019-02-13 10:11:33.586320: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for ClipToWindow/my_trt_op_1
2019-02-13 10:11:33.586870: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=16, requested batch=21660
2019-02-13 10:11:33.586890: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for ClipToWindow/Area/my_trt_op_0
2019-02-13 10:11:33.603856: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=16, requested batch=4800
2019-02-13 10:11:33.603902: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for my_trt_op_4
2019-02-13 10:11:34.833328: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=16, requested batch=4800
2019-02-13 10:11:34.833344: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:260] Engine buffer is full. buffer limit=1, current entries=16, requested batch=4800
2019-02-13 10:11:34.833363: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for SecondStageBoxPredictor/BoxEncodingPredictor/my_trt_op_5
2019-02-13 10:11:34.833414: W tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:277] Failed to get engine batch, running native segment for SecondStageBoxPredictor/ClassPredictor/my_trt_op_6

oscarriddle on 13 Feb 2019

@oscarriddle I don’t think it’s a lack of support, I’ve seen several people who successfully converted faster-Rcnn models in tf 1.8. Newer versions seem to be having the problem

atyshka on 18 Feb 2019

@atyshka Actually, I have tried TF1.7+TRT4 to convert the faster-RCNN, but failed at the conversion stage. While TF1.12+TRT5 can successfully convert the graph, but failed at inference stage as reported. So from my point of view, newer trt is more likely to complete the job.

oscarriddle on 18 Feb 2019

any update on this?
facing the same issue..

Tensorflow 1.13.1
TensorRT 5.0.2.6
CUDA/cuDNN: 10.0/7.4.1

iariav on 18 Apr 2019

I fixed this problem by setting the batch size to num_images x 300, for instance if you're going to process 8 images at a time, set the batch size to 2400.

num_images = 8
trt_graph = trt.create_inference_graph(
    input_graph_def=tf.get_default_graph().as_graph_def(),
    outputs=output_node,
    max_batch_size=num_images * 300,
    max_workspace_size_bytes=1 << 25,
    precision_mode='FP16',  
    minimum_segment_size=50 
    )

On V100 the performance gain for FP16 was about 20%, I'm going to try INT8 next.

Tensorflow 1.13.1
TensorRT 5.0.2.6
CUDA/cuDNN: 10.0/7.4.1

alexkokh on 25 Apr 2019

👍2

This is because your batch size changed in your graph, you need to set para max_batch_size to the max batch size. for me, I have concat op in my graph so some of my op has 2*batch_size (32) of input(16). I set max_batch_size = 32, everything is ok

CLIsVeryOK on 24 Jul 2019

Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.