Invoked like:
with builder.build_engine(network, builder_config) as engine:
print(dir(builder_config))
['DLA_core', '__class__', '__del__', '__delattr__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'add_optimization_profile', 'avg_timing_iterations', 'can_run_on_DLA', 'clear_flag', 'default_device_type', 'flags', 'get_device_type', 'get_flag', 'int8_calibrator', 'is_device_type_set', 'max_workspace_size', 'min_timing_iterations', 'num_optimization_profiles', 'profile_stream', 'reset', 'reset_device_type', 'set_device_type', 'set_flag']
print(dir(network))
['__class__', '__del__', '__delattr__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'add_activation', 'add_concatenation', 'add_constant', 'add_convolution', 'add_convolution_nd', 'add_deconvolution', 'add_deconvolution_nd', 'add_elementwise', 'add_fully_connected', 'add_gather', 'add_identity', 'add_input', 'add_lrn', 'add_matrix_multiply', 'add_matrix_multiply_deprecated', 'add_padding', 'add_parametric_relu', 'add_plugin', 'add_plugin_ext', 'add_plugin_v2', 'add_pooling', 'add_pooling_nd', 'add_ragged_softmax', 'add_reduce', 'add_resize', 'add_rnn', 'add_rnn_v2', 'add_scale', 'add_scale_nd', 'add_shape', 'add_shuffle', 'add_slice', 'add_softmax', 'add_topk', 'add_unary', 'convolution_output_dimensions_formula', 'deconvolution_output_dimensions_formula', 'get_input', 'get_layer', 'get_output', 'has_explicit_precision', 'has_implicit_batch_dimension', 'mark_output', 'mark_output_for_shapes', 'name', 'num_inputs', 'num_layers', 'num_outputs', 'pooling_output_dimensions_formula', 'remove_tensor', 'unmark_output', 'unmark_output_for_shapes']
TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):
Hi @lapolonio,
Please provide the full python script you're using to parse, I think this is related to a different issue while configuring the builder/builder_config.
Also, please provide the environment info.
Hi @rmccorm4 I have met the same problem with @lapolonio .
I want use precision FP32 to build an engine instead of FP16. The document says FP32 is a default precision, so I get rid of builder_config.set_flag(trt.BuilderFlag.FP16),but it made an error.
I checked the document again, found no fp32 in BuilderFlag. How should I deal with this? my TensorRT Version is v6.0.1
Thanks!
with builder.create_network(explicit_batch_flag) as network, builder.create_builder_config() as builder_config:
builder_config.max_workspace_size = 5000 * (1024 * 1024) # 5000 MiB
# builder_config.set_flag(trt.BuilderFlag.FP16)
input_ids = network.add_input(name="input_ids", dtype=trt.int32, shape=(-1, S))
segment_ids = network.add_input(name="segment_ids", dtype=trt.int32, shape=(-1, S))
input_mask = network.add_input(name="input_mask", dtype=trt.int32, shape=(-1, S))
def set_profile_shape(profile, batch_size):
shape = (batch_size, S)
profile.set_shape("input_ids", min=shape, opt=shape, max=shape)
profile.set_shape("segment_ids", min=shape, opt=shape, max=shape)
profile.set_shape("input_mask", min=shape, opt=shape, max=shape)
# Specify profiles for the batch sizes we're interested in.
# For maximum performance, we will tie each profile to exactly one shape rather than a range.
bs1_profile = builder.create_optimization_profile()
set_profile_shape(bs1_profile, 1)
builder_config.add_optimization_profile(bs1_profile)
bs_user_profile = builder.create_optimization_profile()
set_profile_shape(bs_user_profile, B)
builder_config.add_optimization_profile(bs_user_profile)
bs8_profile = builder.create_optimization_profile()
set_profile_shape(bs8_profile, 8)
builder_config.add_optimization_profile(bs8_profile)
# Create the network
inputs = Äinput_ids, segment_ids, input_maskÜ
emb_layer = network.add_plugin_v2(inputs, fn)
embeddings = emb_layer.get_output(0)
mask_idx = emb_layer.get_output(1)
config.num_hidden_layers = config.num_hidden_layers + P + 1
bert_out = bert_model(config, init_dict, network, embeddings, mask_idx)
pool = output_pooling(trt.ReduceOperation.AVG, network, bert_out)
pool_out = pool.get_output(0)
network.mark_output(pool_out)
with builder.build_engine(network, builder_config) as engine:
TRT_LOGGER.log(TRT_LOGGER.VERBOSE, "Serializing Engine...")
serialized_engine = engine.serialize()
TRT_LOGGER.log(TRT_LOGGER.INFO, "Saving Engine to ä:¨".format(outputbase))
with open(outputbase, 'wb') as fout:
fout.write(serialized_engine)
TRT_LOGGER.log(TRT_LOGGER.INFO, "Done.")
ÄTensorRTÜ ERROR: (Unnamed Layer* 0) ÄPluginV2DynamicExtÜ: could not find any supported formats consistent with input/output data types
ÄTensorRTÜ ERROR: ../builder/cudnnBuilderGraphNodes.cpp (539) - Misc Error in reportPluginError: 0 (could not find any supported formats consistent with input/output data types)
ÄTensorRTÜ ERROR: ../builder/cudnnBuilderGraphNodes.cpp (539) - Misc Error in reportPluginError: 0 (could not find any supported formats consistent with input/output data types)

Hi @jiangpinglei,
Yes FP32 is on by default. Just to clarify, if you keep in the FP16 builder flag, then this code works? But only when you comment it out, it fails?
I think this type of error would occur if the types aren't explicitly supported in the plugin implementations, but I looked at the 4 BERT plugins and they all seem to have checks for KFLOAT (FP32) or KHALF (FP16), so I'm not sure why this is happening.
Can you set the TRT_LOGGER's verbosity to VERBOSE: TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE) and share the full output as an attachment or something? (drag and drop the log file onto the issue)
Also, can you share the contents of your output_pooling() function?
Hi @rmccorm4,
Thanks for your quick response.
Yes , It works well if I choose FP16 mode.
AS you recommend, I set TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE), and the log file is attached. I don't know whether it helps. My output_pooling() function is also shown below.
Thank you!
def output_pooling(pooling_type, network, input_tensor):
"""
output pooling
"""
pool_layer = network.add_reduce(input_tensor, pooling_type, 2, False)
set_layer_name(pool_layer, "avg", "pooling")
return pool_layer
Hi @jiangpinglei,
Since I see the FP32/FP16 checks in the plugins, I'm not quite sure what the issue is. Some ideas I have:
https://github.com/rmccorm4/tensorrt-utils/blob/master/network/dump_network.py
...
pool_out = pool.get_output(0)
network.mark_output(pool_out)
# Add this in before building engine
from dump_network import dump_network
dump_network(network, "trt_network.json")
bert_out as the output instead, does that make a difference?Hi @rmccorm4,
Thank you for information.
I have tested without pool layer, it got the same result (FP16 worked, FP32 didn't work).
I also tested your dump_network.py script. it got a same output log, no matter how I changed ty = trt.PluginFieldType.FLOAT32 and builder_config.set_flag(trt.BuilderFlag.FP16).
As error information is about Unnamed Layer* 0
[TensorRT] ERROR: (Unnamed Layer* 0) [PluginV2DynamicExt]: could not find any supported formats consistent with input/output data types
the dump network log shown as below.
"0": {
"inputs": {
"0": {
"dtype": "DataType.INT32",
"name": "'input_ids'",
"shape": "(1, 20)"
},
"1": {
"dtype": "DataType.INT32",
"name": "'segment_ids'",
"shape": "(1, 20)"
},
"2": {
"dtype": "DataType.INT32",
"name": "'input_mask'",
"shape": "(1, 20)"
}
},
"name": "'(Unnamed Layer* 0) [PluginV2DynamicExt]'",
"num_inputs": "3",
"num_outputs": "2",
"outputs": {
"0": {
"dtype": "DataType.FLOAT",
"name": "'(Unnamed Layer* 0) [PluginV2DynamicExt]_output_0'",
"shape": "(1, 20, 480, 1, 1)"
},
"1": {
"dtype": "DataType.INT32",
"name": "'(Unnamed Layer* 0) [PluginV2DynamicExt]_output_1'",
"shape": "(1,)"
}
},
"precision": "DataType.FLOAT",
"precision_is_set": "False",
"type": "LayerType.PLUGIN_V2"
},
Based on the above information, I think the problem lies in BERT plugins. There are some difference between tensorrt python API and BERT plugins c++ source code.
As source code of tensorrt python API is not visible, it is hard to identify where the exact problem lies in. Maybe I should try C++ instead of python.
How do you think?
Hi @rmccorm4 ,
After I ran another plugin example : LReLU_TRT, I am quite sure the problem lies in the bert plugin CustomEmbLayerNormPluginDynamic .
LReLU_TRT test ran well while CustomEmbLayerNormPluginDynamic failed.
import tensorrt as trt
import numpy as np
TRT_LOGGER = trt.Logger(trt.Logger.INFO)
trt.init_libnvinfer_plugins(TRT_LOGGER, '')
PLUGIN_CREATORS = trt.get_plugin_registry().plugin_creator_list
def get_trt_plugin(plugin_name):
plugin = None
for plugin_creator in PLUGIN_CREATORS:
print('plugin_creator.name: ', plugin_creator.name)
if plugin_creator.name == plugin_name:
lrelu_slope_field = trt.PluginField("neg_slope", np.array([0.1], dtype=np.float32), trt.PluginFieldType.FLOAT32)
field_collection = trt.PluginFieldCollection([lrelu_slope_field])
plugin = plugin_creator.create_plugin(name=plugin_name, field_collection=field_collection)
return plugin
def main():
with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network:
TRT_LOGGER.log(TRT_LOGGER.INFO, "Start.")
# builder.fp16_mode = True
builder.max_workspace_size = 2**20
input_layer = network.add_input(name="input_layer", dtype=trt.float32, shape=(1, 1))
lrelu = network.add_plugin_v2(inputs=[input_layer], plugin=get_trt_plugin("LReLU_TRT"))
lrelu.get_output(0).name = "outputs"
network.mark_output(lrelu.get_output(0))
with builder.build_cuda_engine(network) as engine:
TRT_LOGGER.log(TRT_LOGGER.VERBOSE, "Serializing Engine...")
serialized_engine = engine.serialize()
TRT_LOGGER.log(TRT_LOGGER.INFO, "Done.")
main()
with builder.create_network(explicit_batch_flag) as network, builder.create_builder_config() as builder_config:
builder_config.max_workspace_size = 5000 * (1024 * 1024) # 5000 MiB
# builder_config.set_flag(trt.BuilderFlag.FP16)
input_ids = network.add_input(name="input_ids", dtype=trt.int32, shape=(-1, S))
segment_ids = network.add_input(name="segment_ids", dtype=trt.int32, shape=(-1, S))
input_mask = network.add_input(name="input_mask", dtype=trt.int32, shape=(-1, S))
def set_profile_shape(profile, batch_size):
shape = (batch_size, S)
profile.set_shape("input_ids", min=shape, opt=shape, max=shape)
profile.set_shape("segment_ids", min=shape, opt=shape, max=shape)
profile.set_shape("input_mask", min=shape, opt=shape, max=shape)
# Specify profiles for the batch sizes we're interested in.
# For maximum performance, we will tie each profile to exactly one shape rather than a range.
bs1_profile = builder.create_optimization_profile()
set_profile_shape(bs1_profile, 1)
builder_config.add_optimization_profile(bs1_profile)
bs_user_profile = builder.create_optimization_profile()
set_profile_shape(bs_user_profile, B)
builder_config.add_optimization_profile(bs_user_profile)
bs8_profile = builder.create_optimization_profile()
set_profile_shape(bs8_profile, 8)
builder_config.add_optimization_profile(bs8_profile)
# Create the network
inputs = Äinput_ids, segment_ids, input_maskÜ
emb_layer = network.add_plugin_v2(inputs, fn)
embeddings = emb_layer.get_output(0)
mask_idx = emb_layer.get_output(1)
network.mark_output(embeddings)
with builder.build_engine(network, builder_config) as engine:
TRT_LOGGER.log(TRT_LOGGER.VERBOSE, "Serializing Engine...")
serialized_engine = engine.serialize()
TRT_LOGGER.log(TRT_LOGGER.INFO, "Saving Engine to ä:¨".format(outputbase))
with open(outputbase, 'wb') as fout:
fout.write(serialized_engine)
TRT_LOGGER.log(TRT_LOGGER.INFO, "Done.")
Hi @jiangpinglei,
This PR may fix the issue: https://github.com/NVIDIA/TensorRT/pull/248/files
Feel free to wait until it's merged, or just apply simple one line change yourself and see if it works.
Hi @rmccorm4,
I have tried to change myself code by comment this two lines and add a new line:
// const DataType out_type = outputs[0].desc.type;
// assert(out_type == DataType::kFLOAT || out_type == DataType::kHALF);
assert(outputs[0].desc.type == DataType::kFLOAT || outputs[0].desc.type == DataType::kHALF);
But it still doesn't work. From the code above we can see, It makes no difference for a DataType check assertion.
Hi @rmccorm4 ,
I have fixed this issue just changed bool output_fp16 = false; in https://github.com/NVIDIA/TensorRT/blob/v6.0.1/demo/BERT/plugins/embLayerNormPlugin.cu
I think as precision mode in engine builder is FP32 by default, so the parameter won't be passed to plugin. If the plugin precision mode is FP16 by default, then the issue happen.
Thank you very much for all your answers!
Glad it worked!
Most helpful comment
Hi @rmccorm4 ,
I have fixed this issue just changed
bool output_fp16 = false;in https://github.com/NVIDIA/TensorRT/blob/v6.0.1/demo/BERT/plugins/embLayerNormPlugin.cuI think as precision mode in engine builder is FP32 by default, so the parameter won't be passed to plugin. If the plugin precision mode is FP16 by default, then the issue happen.
Thank you very much for all your answers!