Tensorrt: builder.build_engine throws `AttributeError: __enter__`

Created on 20 Nov 2019  ·  11Comments  ·  Source: NVIDIA/TensorRT

Description

Invoked like:
with builder.build_engine(network, builder_config) as engine:

print(dir(builder_config))

['DLA_core', '__class__', '__del__', '__delattr__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'add_optimization_profile', 'avg_timing_iterations', 'can_run_on_DLA', 'clear_flag', 'default_device_type', 'flags', 'get_device_type', 'get_flag', 'int8_calibrator', 'is_device_type_set', 'max_workspace_size', 'min_timing_iterations', 'num_optimization_profiles', 'profile_stream', 'reset', 'reset_device_type', 'set_device_type', 'set_flag']

print(dir(network))

['__class__', '__del__', '__delattr__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'add_activation', 'add_concatenation', 'add_constant', 'add_convolution', 'add_convolution_nd', 'add_deconvolution', 'add_deconvolution_nd', 'add_elementwise', 'add_fully_connected', 'add_gather', 'add_identity', 'add_input', 'add_lrn', 'add_matrix_multiply', 'add_matrix_multiply_deprecated', 'add_padding', 'add_parametric_relu', 'add_plugin', 'add_plugin_ext', 'add_plugin_v2', 'add_pooling', 'add_pooling_nd', 'add_ragged_softmax', 'add_reduce', 'add_resize', 'add_rnn', 'add_rnn_v2', 'add_scale', 'add_scale_nd', 'add_shape', 'add_shuffle', 'add_slice', 'add_softmax', 'add_topk', 'add_unary', 'convolution_output_dimensions_formula', 'deconvolution_output_dimensions_formula', 'get_input', 'get_layer', 'get_output', 'has_explicit_precision', 'has_implicit_batch_dimension', 'mark_output', 'mark_output_for_shapes', 'name', 'num_inputs', 'num_layers', 'num_outputs', 'pooling_output_dimensions_formula', 'remove_tensor', 'unmark_output', 'unmark_output_for_shapes']

Environment

TensorRT Version:
GPU Type:
Nvidia Driver Version:
CUDA Version:
CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Steps To Reproduce

Python

Most helpful comment

Hi @rmccorm4 ,

I have fixed this issue just changed bool output_fp16 = false; in https://github.com/NVIDIA/TensorRT/blob/v6.0.1/demo/BERT/plugins/embLayerNormPlugin.cu

I think as precision mode in engine builder is FP32 by default, so the parameter won't be passed to plugin. If the plugin precision mode is FP16 by default, then the issue happen.

Thank you very much for all your answers!

All 11 comments

Hi @lapolonio,

Please provide the full python script you're using to parse, I think this is related to a different issue while configuring the builder/builder_config.

Also, please provide the environment info.

Hi @rmccorm4 I have met the same problem with @lapolonio .

I want use precision FP32 to build an engine instead of FP16. The document says FP32 is a default precision, so I get rid of builder_config.set_flag(trt.BuilderFlag.FP16),but it made an error.

I checked the document again, found no fp32 in BuilderFlag. How should I deal with this? my TensorRT Version is v6.0.1

Thanks!

CODE:

with builder.create_network(explicit_batch_flag) as network, builder.create_builder_config() as builder_config:
            builder_config.max_workspace_size = 5000 * (1024 * 1024) # 5000 MiB

            # builder_config.set_flag(trt.BuilderFlag.FP16)

            input_ids = network.add_input(name="input_ids", dtype=trt.int32, shape=(-1, S))
            segment_ids = network.add_input(name="segment_ids", dtype=trt.int32, shape=(-1, S))
            input_mask = network.add_input(name="input_mask", dtype=trt.int32, shape=(-1, S))

            def set_profile_shape(profile, batch_size):
                shape = (batch_size, S)
                profile.set_shape("input_ids", min=shape, opt=shape, max=shape)
                profile.set_shape("segment_ids", min=shape, opt=shape, max=shape)
                profile.set_shape("input_mask", min=shape, opt=shape, max=shape)

            # Specify profiles for the batch sizes we're interested in.
            # For maximum performance, we will tie each profile to exactly one shape rather than a range.
            bs1_profile = builder.create_optimization_profile()
            set_profile_shape(bs1_profile, 1)
            builder_config.add_optimization_profile(bs1_profile)

            bs_user_profile = builder.create_optimization_profile()
            set_profile_shape(bs_user_profile, B)
            builder_config.add_optimization_profile(bs_user_profile)

            bs8_profile = builder.create_optimization_profile()
            set_profile_shape(bs8_profile, 8)
            builder_config.add_optimization_profile(bs8_profile)

            # Create the network
            inputs = Äinput_ids, segment_ids, input_maskÜ
            emb_layer = network.add_plugin_v2(inputs, fn)

            embeddings = emb_layer.get_output(0)
            mask_idx = emb_layer.get_output(1)

            config.num_hidden_layers = config.num_hidden_layers + P + 1

            bert_out = bert_model(config, init_dict, network, embeddings, mask_idx)

            pool = output_pooling(trt.ReduceOperation.AVG, network, bert_out)

            pool_out = pool.get_output(0)

            network.mark_output(pool_out)


            with builder.build_engine(network, builder_config) as engine:
                TRT_LOGGER.log(TRT_LOGGER.VERBOSE, "Serializing Engine...")
                serialized_engine = engine.serialize()
                TRT_LOGGER.log(TRT_LOGGER.INFO, "Saving Engine to ä:¨".format(outputbase))
                with open(outputbase, 'wb') as fout:
                    fout.write(serialized_engine)
                TRT_LOGGER.log(TRT_LOGGER.INFO, "Done.")

ERROR:

ÄTensorRTÜ ERROR: (Unnamed Layer* 0) ÄPluginV2DynamicExtÜ: could not find any supported formats consistent with input/output data types
ÄTensorRTÜ ERROR: ../builder/cudnnBuilderGraphNodes.cpp (539) - Misc Error in reportPluginError: 0 (could not find any supported formats consistent with input/output data types)
ÄTensorRTÜ ERROR: ../builder/cudnnBuilderGraphNodes.cpp (539) - Misc Error in reportPluginError: 0 (could not find any supported formats consistent with input/output data types)

BuilderFlag:

image

Hi @jiangpinglei,

Yes FP32 is on by default. Just to clarify, if you keep in the FP16 builder flag, then this code works? But only when you comment it out, it fails?

I think this type of error would occur if the types aren't explicitly supported in the plugin implementations, but I looked at the 4 BERT plugins and they all seem to have checks for KFLOAT (FP32) or KHALF (FP16), so I'm not sure why this is happening.

Can you set the TRT_LOGGER's verbosity to VERBOSE: TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE) and share the full output as an attachment or something? (drag and drop the log file onto the issue)

Also, can you share the contents of your output_pooling() function?

Hi @rmccorm4,

Thanks for your quick response.

Yes , It works well if I choose FP16 mode.

AS you recommend, I set TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE), and the log file is attached. I don't know whether it helps. My output_pooling() function is also shown below.

Thank you!

output_pooling func:

def output_pooling(pooling_type, network, input_tensor):
    """
    output pooling
    """
    pool_layer = network.add_reduce(input_tensor, pooling_type, 2, False)
    set_layer_name(pool_layer, "avg", "pooling")
    return pool_layer

verbose log:

trt_log.txt

Hi @jiangpinglei,

Since I see the FP32/FP16 checks in the plugins, I'm not quite sure what the issue is. Some ideas I have:

  1. Could you try running this script to dump out some info about your network, and see if maybe anything looks wrong with the input/output types/precisions etc.? That might help debug a bit better. I'm thinking one of the layers explicitly only supports FP16 or something.

https://github.com/rmccorm4/tensorrt-utils/blob/master/network/dump_network.py

...
pool_out = pool.get_output(0)
network.mark_output(pool_out)

# Add this in before building engine
from dump_network import dump_network
dump_network(network, "trt_network.json")
  1. The original demo didn't have a pooling layer. I doubt this is the issue, but just in case - if you remove the pooling layer at the end and mark bert_out as the output instead, does that make a difference?

Hi @rmccorm4,

Thank you for information.

I have tested without pool layer, it got the same result (FP16 worked, FP32 didn't work).

I also tested your dump_network.py script. it got a same output log, no matter how I changed ty = trt.PluginFieldType.FLOAT32 and builder_config.set_flag(trt.BuilderFlag.FP16).

As error information is about Unnamed Layer* 0

[TensorRT] ERROR: (Unnamed Layer* 0) [PluginV2DynamicExt]: could not find any supported formats consistent with input/output data types

the dump network log shown as below.

dump_network:

"0": {
        "inputs": {
            "0": {
                "dtype": "DataType.INT32",
                "name": "'input_ids'",
                "shape": "(1, 20)"
            },
            "1": {
                "dtype": "DataType.INT32",
                "name": "'segment_ids'",
                "shape": "(1, 20)"
            },
            "2": {
                "dtype": "DataType.INT32",
                "name": "'input_mask'",
                "shape": "(1, 20)"
            }
        },
        "name": "'(Unnamed Layer* 0) [PluginV2DynamicExt]'",
        "num_inputs": "3",
        "num_outputs": "2",
        "outputs": {
            "0": {
                "dtype": "DataType.FLOAT",
                "name": "'(Unnamed Layer* 0) [PluginV2DynamicExt]_output_0'",
                "shape": "(1, 20, 480, 1, 1)"
            },
            "1": {
                "dtype": "DataType.INT32",
                "name": "'(Unnamed Layer* 0) [PluginV2DynamicExt]_output_1'",
                "shape": "(1,)"
            }
        },
        "precision": "DataType.FLOAT",
        "precision_is_set": "False",
        "type": "LayerType.PLUGIN_V2"
    },

Based on the above information, I think the problem lies in BERT plugins. There are some difference between tensorrt python API and BERT plugins c++ source code.

As source code of tensorrt python API is not visible, it is hard to identify where the exact problem lies in. Maybe I should try C++ instead of python.

How do you think?

Hi @rmccorm4 ,

After I ran another plugin example : LReLU_TRT, I am quite sure the problem lies in the bert plugin CustomEmbLayerNormPluginDynamic .

LReLU_TRT test ran well while CustomEmbLayerNormPluginDynamic failed.

LReLU_TRT test:

import tensorrt as trt
import numpy as np

TRT_LOGGER = trt.Logger(trt.Logger.INFO)

trt.init_libnvinfer_plugins(TRT_LOGGER, '')
PLUGIN_CREATORS = trt.get_plugin_registry().plugin_creator_list

def get_trt_plugin(plugin_name):
        plugin = None
        for plugin_creator in PLUGIN_CREATORS:
            print('plugin_creator.name: ', plugin_creator.name)
            if plugin_creator.name == plugin_name:
                lrelu_slope_field = trt.PluginField("neg_slope", np.array([0.1], dtype=np.float32), trt.PluginFieldType.FLOAT32)
                field_collection = trt.PluginFieldCollection([lrelu_slope_field])
                plugin = plugin_creator.create_plugin(name=plugin_name, field_collection=field_collection)
        return plugin

def main():
    with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network:
        TRT_LOGGER.log(TRT_LOGGER.INFO, "Start.")
        # builder.fp16_mode = True
        builder.max_workspace_size = 2**20
        input_layer = network.add_input(name="input_layer", dtype=trt.float32, shape=(1, 1))
        lrelu = network.add_plugin_v2(inputs=[input_layer], plugin=get_trt_plugin("LReLU_TRT"))
        lrelu.get_output(0).name = "outputs"
        network.mark_output(lrelu.get_output(0))
        with builder.build_cuda_engine(network) as engine:
            TRT_LOGGER.log(TRT_LOGGER.VERBOSE, "Serializing Engine...")
            serialized_engine = engine.serialize()
            TRT_LOGGER.log(TRT_LOGGER.INFO, "Done.")
main()

CustomEmbLayerNormPluginDynamic test:

with builder.create_network(explicit_batch_flag) as network, builder.create_builder_config() as builder_config:
            builder_config.max_workspace_size = 5000 * (1024 * 1024) # 5000 MiB

            # builder_config.set_flag(trt.BuilderFlag.FP16)

            input_ids = network.add_input(name="input_ids", dtype=trt.int32, shape=(-1, S))
            segment_ids = network.add_input(name="segment_ids", dtype=trt.int32, shape=(-1, S))
            input_mask = network.add_input(name="input_mask", dtype=trt.int32, shape=(-1, S))

            def set_profile_shape(profile, batch_size):
                shape = (batch_size, S)
                profile.set_shape("input_ids", min=shape, opt=shape, max=shape)
                profile.set_shape("segment_ids", min=shape, opt=shape, max=shape)
                profile.set_shape("input_mask", min=shape, opt=shape, max=shape)

            # Specify profiles for the batch sizes we're interested in.
            # For maximum performance, we will tie each profile to exactly one shape rather than a range.
            bs1_profile = builder.create_optimization_profile()
            set_profile_shape(bs1_profile, 1)
            builder_config.add_optimization_profile(bs1_profile)

            bs_user_profile = builder.create_optimization_profile()
            set_profile_shape(bs_user_profile, B)
            builder_config.add_optimization_profile(bs_user_profile)

            bs8_profile = builder.create_optimization_profile()
            set_profile_shape(bs8_profile, 8)
            builder_config.add_optimization_profile(bs8_profile)

            # Create the network
            inputs = Äinput_ids, segment_ids, input_maskÜ
            emb_layer = network.add_plugin_v2(inputs, fn)

            embeddings = emb_layer.get_output(0)
            mask_idx = emb_layer.get_output(1)

            network.mark_output(embeddings)


            with builder.build_engine(network, builder_config) as engine:
                TRT_LOGGER.log(TRT_LOGGER.VERBOSE, "Serializing Engine...")
                serialized_engine = engine.serialize()
                TRT_LOGGER.log(TRT_LOGGER.INFO, "Saving Engine to ä:¨".format(outputbase))
                with open(outputbase, 'wb') as fout:
                    fout.write(serialized_engine)
                TRT_LOGGER.log(TRT_LOGGER.INFO, "Done.")

Hi @jiangpinglei,

This PR may fix the issue: https://github.com/NVIDIA/TensorRT/pull/248/files

Feel free to wait until it's merged, or just apply simple one line change yourself and see if it works.

Hi @rmccorm4,

I have tried to change myself code by comment this two lines and add a new line:

// const DataType out_type = outputs[0].desc.type;
// assert(out_type == DataType::kFLOAT || out_type == DataType::kHALF);
assert(outputs[0].desc.type == DataType::kFLOAT || outputs[0].desc.type == DataType::kHALF);

But it still doesn't work. From the code above we can see, It makes no difference for a DataType check assertion.

Hi @rmccorm4 ,

I have fixed this issue just changed bool output_fp16 = false; in https://github.com/NVIDIA/TensorRT/blob/v6.0.1/demo/BERT/plugins/embLayerNormPlugin.cu

I think as precision mode in engine builder is FP32 by default, so the parameter won't be passed to plugin. If the plugin precision mode is FP16 by default, then the issue happen.

Thank you very much for all your answers!

Glad it worked!

Was this page helpful?
0 / 5 - 0 ratings