Tensorrt: Bulding Engine Error: could not find any implementation for node rf_c3_upsampling_ (a deconvolution layer) + crop0

Created on 11 Nov 2019 · 2Comments · Source: NVIDIA/TensorRT

Description

Pre:
The static shapes version for this network is fine.

I'm trying to run a Caffe version Retinanet(upsampling is implemented by Deconvolution) to make use of the new Dynamic Shapes feature. Building failed for the upsampling + crop with logs:

[TensorRT] ERROR: Internal error: could not find any implementation for node rf_c3_upsampling_ + crop0, try increasing the workspace size with IBuilder::setMaxWorkspaceSize()
[TensorRT] ERROR: ../builder/tacticOptimizer.cpp (1461) - OutOfMemory Error in computeCosts: 0

Environment

TensorRT Version: TensorRT 6.0.1.5
GPU Type: GTX 1060
Nvidia Driver Version: 418.56
CUDA Version: release 10.0, V10.0.130
CUDNN Version: 7.6.3
Operating System + Version: Linux 4.15.0-66-generic #75~16.04.1-Ubuntu x86_64
Python Version (if applicable): Python 3.5.2
TensorFlow Version (if applicable): None
PyTorch Version (if applicable): None
Baremetal or Container (if container which image + tag): None

Steps To Reproduce

Building engine codes:

 # dynamic shape
flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(flag) as network, trt.CaffeParser() as parser:

  _check_precision_support(builder, precision)

  # builder.max_workspace_size = max_workspace
  # builder.max_batch_size = max_batchsize
  builder.max_workspace_size = 6 << 30
  builder.max_batch_size = 1

  print(builder.max_workspace_size)
  print(builder.max_batch_size)

  # Set precision mode
  if 'FP16' == precision.upper():
    builder.fp16_mode = True
  if 'INT8' == precision.upper():
    try:
      calibrator = kwargs['calibrator']
    except KeyError as e:
      raise KeyError("INT8 mode needs a calibrator.")
    builder.int8_calibrator = calibrator
    builder.int8_mode = True

  # Mark output nodes
  model_tensors = parser.parse(deploy=deploy_file, model=model_file, network=network, dtype=dtype)
  [network.mark_output(model_tensors.find(output)) for output in output_nodes]

  # dynamic shape
  # for i in range(0, network.num_inputs):
  #   input_tensor = network.get_input(0)
  #   input_tensor.shape = trt.Dims([input_tensor.shape[0], input_tensor.shape[1], -1, -1])
  profile = builder.create_optimization_profile()
  config = builder.create_builder_config()
  print(network.get_input(0).name, network.get_input(0).shape)
  profile.set_shape(
    network.get_input(0).name,
    (network.get_input(0).shape[0], 3, 20, 20),
    (network.get_input(0).shape[0], 3, 30, 30),
    (network.get_input(0).shape[0], 3, 40, 40))
  config.add_optimization_profile(profile)

  # TensorRTHelper.parse_network(network)
  engine =  builder.build_engine(network, config)

  return engine

Question:

Really no implementation for deconv + crop in dynamic shapes, if true when will this be supported?
The relation between [No Implementation] and [Max Workspace], why is the hint telling me to set a larger max workspace?

Looking forward to your response.

Python 6.x

Source

Alnlll

Most helpful comment

Using build_cuda_engine for static shapes and build_engine for dynamic shapes. For build_cuda_engine, all params should be set on builder; For build_engine, all params should be set on config. In my case:

# builder.max_workspace_size = max_workspace
# builder.max_batch_size = max_batchsize

config = builder.create_builder_config()
config.max_workspace_size = 1 << 30

Alnlll on 11 Nov 2019

👍2 🚀1

All 2 comments

# builder.max_workspace_size = max_workspace
# builder.max_batch_size = max_batchsize

config = builder.create_builder_config()
config.max_workspace_size = 1 << 30

Alnlll on 11 Nov 2019

👍2 🚀1

I encountered the same problem, and it doesn't work even I increased the workspace_size to 20G.
My model is ssd_inception_v2 converted from tensorflow, and the error is:
could not find any implementation for node FeatureExtractor/InceptionV2/Mixed_5c_2_Conv2d_6_3x3_s2_128/Conv2D, try increasing the workspace size with IBuilder::setMaxWorkspaceSize()
The error layer is a stride=2 convolution, and the input size is nhwc = (1, 2, 4, 64).
But the same layer in another model runs OK, so I think it's not really unimplemented.