Tensorrt: There might be a memory leak when use dynamic inputs

Created on 21 Jan 2020  路  10Comments  路  Source: NVIDIA/TensorRT

Description

I am trying to use Dynamic shapes features with TensorRT Python API. When run the code in Relevant Files gpu memory sometimes will keep in a constant value, sometimes will keep increasing until finish or until gpu OOM.

GPU memory for running 10000 times:

  • no gpu memory leak: 791MB
  • gpu memory leak: 2123MB

Environment

TensorRT Version: 6.0.1.5
GPU Type: GTX1080
Nvidia Driver Version: 430.26
CUDA Version: V10.1.243
CUDNN Version: 7.6.3
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.8
PyTorch Version (if applicable): 1.3.0
Baremetal or Container (if container which image + tag):

Relevant Files

import tensorrt as trt
import numpy as np
import torch

MAX_INPUT_SIZE = 150
TIMES = 100000

def build_network(builder):
    # build a single conv2d layer
    network = builder.create_network(
        1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    )
    input_trt_tensor = network.add_input(
        name="input_0", shape=(1, 3, -1, -1), dtype=trt.float32,
    )
    input_trt_tensor.location = trt.TensorLocation.DEVICE

    # add conv layer
    conv_out_channels = 8
    conv_in_channels = 3
    kernel = np.ones((conv_out_channels, conv_in_channels, 3, 3)).astype(
        np.float32
    )
    bias = np.zeros(conv_out_channels).astype(np.float32)
    conv_layer = network.add_convolution(
        input=input_trt_tensor,
        num_output_maps=conv_out_channels,
        kernel_shape=(3, 3),
        kernel=kernel,
        bias=bias,
    )
    conv_layer.stride = (1, 1)
    conv_layer.padding = (1, 1)
    conv_layer.dilation = (1, 1)

    output_trt_tensor = conv_layer.get_output(0)
    output_trt_tensor.name = "output_0"
    output_trt_tensor.location = trt.TensorLocation.DEVICE
    output_trt_tensor.dtype = trt.float32
    network.mark_output(output_trt_tensor)
    return network


def create_builder():
    trt_logger = trt.Logger(trt.Logger.ERROR)
    builder = trt.Builder(trt_logger)
    builder.max_workspace_size = 0
    builder.fp16_mode = False
    builder.max_batch_size = 1
    builder.strict_type_constraints = False
    return builder


if __name__ == "__main__":
    builder = create_builder()
    network = build_network(builder)

    config = builder.create_builder_config()
    profile = builder.create_optimization_profile()
    profile.set_shape(
        "input_0",
        min=(1, 3, 100, 100),
        opt=(1, 3, 101, 101),
        max=(1, 3, MAX_INPUT_SIZE, MAX_INPUT_SIZE),
    )
    config.add_optimization_profile(profile)

    with builder.build_engine(network, config) as engine:
        with engine.create_execution_context() as context:
            for it in range(TIMES):
                # input_shape = (1, 3, random.randint(100, MAX_INPUT_SIZE), random.randint(100, MAX_INPUT_SIZE))
                input_shape = (1, 3, MAX_INPUT_SIZE, MAX_INPUT_SIZE)
                torch_input = torch.ones(input_shape, dtype=torch.float32).cuda()

                bindings = [None] * 2
                bindings[engine.get_binding_index("input_0")] = torch_input.data_ptr()

                context.set_binding_shape(0, input_shape)
                output_shape = tuple(context.get_binding_shape(0))
                torch_output = torch.empty(
                    size=output_shape, dtype=torch.float32, device=torch.device("cuda")
                )
                bindings[engine.get_binding_index("output_0")] = torch_output.data_ptr()
                assert input_shape == output_shape

                ret = context.execute_v2(bindings)
                assert ret is True

                del torch_input
                del torch_output

Steps To Reproduce

Run the above code.

Memory Leak bug

Most helpful comment

Is it fixed on TensorRT 7.1? On TensorRT 7.0, this bug still exists. If you can, please add hotfix to TensorRT 7.0.

All 10 comments

TensorRT 6 (nvcr.io/nvidia/tensorrt:19.12-py3)

  • :x: Could not repro on V100
  • :heavy_check_mark: Could repro on P4

TensorRT 7.0.0.11

  • :x: Could not repro on V100
  • :heavy_check_mark: Could repro on P4

This is probably related to: https://devtalk.nvidia.com/default/topic/1065018/tensorrt/context-gt-setbindingdimensions-casing-gpu-memory-leak

Looking into this.

This issue has been fixed upstream and should be included in the next release. Closing for now.

This issue has been fixed upstream and should be included in the next release. Closing for now.

still found memory leak in trt7. P4

Yes @ZimingLu ,

The next release isn't out yet. My comment was based on the current release (TensorRT 7.0)

Is my understanding correct that this would affect Pascal era cards? I'm seeing various reports of crippling memory leaks on GTX 10x0 series cards, whereas RTX era cards work fine.

Is there any known workaround for this (except for going back to TensorRT 6 and fixed inputs etc)? Confusingly enough I'm not seeing this on some Ubuntu 18.04 systems.

same problem...Is there a plan?

@rmccorm4 When dose the fix release?

Any update on when the fix will be released? We've got multiple devs with 10-series cards in their machines running into this issue and its a real pain.

Is it fixed on TensorRT 7.1? On TensorRT 7.0, this bug still exists. If you can, please add hotfix to TensorRT 7.0.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

dhkim0225 picture dhkim0225  路  4Comments

stengoes picture stengoes  路  6Comments

float123 picture float123  路  6Comments

prathik-naidu picture prathik-naidu  路  3Comments

peijason picture peijason  路  3Comments