Tensorrt: There might be a memory leak when use dynamic inputs

Created on 21 Jan 2020 · 10Comments · Source: NVIDIA/TensorRT

Description

I am trying to use Dynamic shapes features with TensorRT Python API. When run the code in Relevant Files gpu memory sometimes will keep in a constant value, sometimes will keep increasing until finish or until gpu OOM.

GPU memory for running 10000 times:

no gpu memory leak: 791MB
gpu memory leak: 2123MB

Environment

TensorRT Version: 6.0.1.5
GPU Type: GTX1080
Nvidia Driver Version: 430.26
CUDA Version: V10.1.243
CUDNN Version: 7.6.3
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.8
PyTorch Version (if applicable): 1.3.0
Baremetal or Container (if container which image + tag):

Relevant Files

import tensorrt as trt
import numpy as np
import torch

MAX_INPUT_SIZE = 150
TIMES = 100000

def build_network(builder):
    # build a single conv2d layer
    network = builder.create_network(
        1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
    )
    input_trt_tensor = network.add_input(
        name="input_0", shape=(1, 3, -1, -1), dtype=trt.float32,
    )
    input_trt_tensor.location = trt.TensorLocation.DEVICE

    # add conv layer
    conv_out_channels = 8
    conv_in_channels = 3
    kernel = np.ones((conv_out_channels, conv_in_channels, 3, 3)).astype(
        np.float32
    )
    bias = np.zeros(conv_out_channels).astype(np.float32)
    conv_layer = network.add_convolution(
        input=input_trt_tensor,
        num_output_maps=conv_out_channels,
        kernel_shape=(3, 3),
        kernel=kernel,
        bias=bias,
    )
    conv_layer.stride = (1, 1)
    conv_layer.padding = (1, 1)
    conv_layer.dilation = (1, 1)

    output_trt_tensor = conv_layer.get_output(0)
    output_trt_tensor.name = "output_0"
    output_trt_tensor.location = trt.TensorLocation.DEVICE
    output_trt_tensor.dtype = trt.float32
    network.mark_output(output_trt_tensor)
    return network


def create_builder():
    trt_logger = trt.Logger(trt.Logger.ERROR)
    builder = trt.Builder(trt_logger)
    builder.max_workspace_size = 0
    builder.fp16_mode = False
    builder.max_batch_size = 1
    builder.strict_type_constraints = False
    return builder


if __name__ == "__main__":
    builder = create_builder()
    network = build_network(builder)

    config = builder.create_builder_config()
    profile = builder.create_optimization_profile()
    profile.set_shape(
        "input_0",
        min=(1, 3, 100, 100),
        opt=(1, 3, 101, 101),
        max=(1, 3, MAX_INPUT_SIZE, MAX_INPUT_SIZE),
    )
    config.add_optimization_profile(profile)

    with builder.build_engine(network, config) as engine:
        with engine.create_execution_context() as context:
            for it in range(TIMES):
                # input_shape = (1, 3, random.randint(100, MAX_INPUT_SIZE), random.randint(100, MAX_INPUT_SIZE))
                input_shape = (1, 3, MAX_INPUT_SIZE, MAX_INPUT_SIZE)
                torch_input = torch.ones(input_shape, dtype=torch.float32).cuda()

                bindings = [None] * 2
                bindings[engine.get_binding_index("input_0")] = torch_input.data_ptr()

                context.set_binding_shape(0, input_shape)
                output_shape = tuple(context.get_binding_shape(0))
                torch_output = torch.empty(
                    size=output_shape, dtype=torch.float32, device=torch.device("cuda")
                )
                bindings[engine.get_binding_index("output_0")] = torch_output.data_ptr()
                assert input_shape == output_shape

                ret = context.execute_v2(bindings)
                assert ret is True

                del torch_input
                del torch_output

Steps To Reproduce

Run the above code.

Memory Leak bug

Source

Sanster

Most helpful comment

Is it fixed on TensorRT 7.1? On TensorRT 7.0, this bug still exists. If you can, please add hotfix to TensorRT 7.0.

insikk on 20 Jul 2020

👍3

All 10 comments

TensorRT 6 (nvcr.io/nvidia/tensorrt:19.12-py3)

:x: Could not repro on V100
:heavy_check_mark: Could repro on P4

TensorRT 7.0.0.11

:x: Could not repro on V100
:heavy_check_mark: Could repro on P4

This is probably related to: https://devtalk.nvidia.com/default/topic/1065018/tensorrt/context-gt-setbindingdimensions-casing-gpu-memory-leak

Looking into this.

rmccorm4 on 22 Jan 2020

This issue has been fixed upstream and should be included in the next release. Closing for now.

rmccorm4 on 14 Feb 2020

👍3

This issue has been fixed upstream and should be included in the next release. Closing for now.

still found memory leak in trt7. P4

ZimingLu on 15 Mar 2020

Yes @ZimingLu ,

The next release isn't out yet. My comment was based on the current release (TensorRT 7.0)

rmccorm4 on 15 Mar 2020

Is my understanding correct that this would affect Pascal era cards? I'm seeing various reports of crippling memory leaks on GTX 10x0 series cards, whereas RTX era cards work fine.

gcp on 9 May 2020

Is there any known workaround for this (except for going back to TensorRT 6 and fixed inputs etc)? Confusingly enough I'm not seeing this on some Ubuntu 18.04 systems.

gcp on 9 May 2020

same problem...Is there a plan?

HaoLiuHust on 5 Jun 2020

@rmccorm4 When dose the fix release?

HaoLiuHust on 5 Jun 2020

Any update on when the fix will be released? We've got multiple devs with 10-series cards in their machines running into this issue and its a real pain.