Pre:
The static shapes version for this network is fine.
I'm trying to run a Caffe version Retinanet(upsampling is implemented by Deconvolution) to make use of the new Dynamic Shapes feature. Building failed for the upsampling + crop with logs:
[TensorRT] ERROR: Internal error: could not find any implementation for node rf_c3_upsampling_ + crop0, try increasing the workspace size with IBuilder::setMaxWorkspaceSize()
[TensorRT] ERROR: ../builder/tacticOptimizer.cpp (1461) - OutOfMemory Error in computeCosts: 0
TensorRT Version: TensorRT 6.0.1.5
GPU Type: GTX 1060
Nvidia Driver Version: 418.56
CUDA Version: release 10.0, V10.0.130
CUDNN Version: 7.6.3
Operating System + Version: Linux 4.15.0-66-generic #75~16.04.1-Ubuntu x86_64
Python Version (if applicable): Python 3.5.2
TensorFlow Version (if applicable): None
PyTorch Version (if applicable): None
Baremetal or Container (if container which image + tag): None
Building engine codes:
# dynamic shape
flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(flag) as network, trt.CaffeParser() as parser:
_check_precision_support(builder, precision)
# builder.max_workspace_size = max_workspace
# builder.max_batch_size = max_batchsize
builder.max_workspace_size = 6 << 30
builder.max_batch_size = 1
print(builder.max_workspace_size)
print(builder.max_batch_size)
# Set precision mode
if 'FP16' == precision.upper():
builder.fp16_mode = True
if 'INT8' == precision.upper():
try:
calibrator = kwargs['calibrator']
except KeyError as e:
raise KeyError("INT8 mode needs a calibrator.")
builder.int8_calibrator = calibrator
builder.int8_mode = True
# Mark output nodes
model_tensors = parser.parse(deploy=deploy_file, model=model_file, network=network, dtype=dtype)
[network.mark_output(model_tensors.find(output)) for output in output_nodes]
# dynamic shape
# for i in range(0, network.num_inputs):
# input_tensor = network.get_input(0)
# input_tensor.shape = trt.Dims([input_tensor.shape[0], input_tensor.shape[1], -1, -1])
profile = builder.create_optimization_profile()
config = builder.create_builder_config()
print(network.get_input(0).name, network.get_input(0).shape)
profile.set_shape(
network.get_input(0).name,
(network.get_input(0).shape[0], 3, 20, 20),
(network.get_input(0).shape[0], 3, 30, 30),
(network.get_input(0).shape[0], 3, 40, 40))
config.add_optimization_profile(profile)
# TensorRTHelper.parse_network(network)
engine = builder.build_engine(network, config)
return engine
Question:
Looking forward to your response.
Using build_cuda_engine for static shapes and build_engine for dynamic shapes. For build_cuda_engine, all params should be set on builder; For build_engine, all params should be set on config. In my case:
# builder.max_workspace_size = max_workspace
# builder.max_batch_size = max_batchsize
config = builder.create_builder_config()
config.max_workspace_size = 1 << 30
I encountered the same problem, and it doesn't work even I increased the workspace_size to 20G.
My model is ssd_inception_v2 converted from tensorflow, and the error is:
could not find any implementation for node FeatureExtractor/InceptionV2/Mixed_5c_2_Conv2d_6_3x3_s2_128/Conv2D, try increasing the workspace size with IBuilder::setMaxWorkspaceSize()
The error layer is a stride=2 convolution, and the input size is nhwc = (1, 2, 4, 64).
But the same layer in another model runs OK, so I think it's not really unimplemented.
Most helpful comment
Using
build_cuda_enginefor static shapes andbuild_enginefor dynamic shapes. Forbuild_cuda_engine, all params should be set onbuilder; Forbuild_engine, all params should be set onconfig. In my case: