I am trying to use Dynamic shapes features with TensorRT Python API. When run the code in Relevant Files gpu memory sometimes will keep in a constant value, sometimes will keep increasing until finish or until gpu OOM.
GPU memory for running 10000 times:
TensorRT Version: 6.0.1.5
GPU Type: GTX1080
Nvidia Driver Version: 430.26
CUDA Version: V10.1.243
CUDNN Version: 7.6.3
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6.8
PyTorch Version (if applicable): 1.3.0
Baremetal or Container (if container which image + tag):
import tensorrt as trt
import numpy as np
import torch
MAX_INPUT_SIZE = 150
TIMES = 100000
def build_network(builder):
# build a single conv2d layer
network = builder.create_network(
1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
)
input_trt_tensor = network.add_input(
name="input_0", shape=(1, 3, -1, -1), dtype=trt.float32,
)
input_trt_tensor.location = trt.TensorLocation.DEVICE
# add conv layer
conv_out_channels = 8
conv_in_channels = 3
kernel = np.ones((conv_out_channels, conv_in_channels, 3, 3)).astype(
np.float32
)
bias = np.zeros(conv_out_channels).astype(np.float32)
conv_layer = network.add_convolution(
input=input_trt_tensor,
num_output_maps=conv_out_channels,
kernel_shape=(3, 3),
kernel=kernel,
bias=bias,
)
conv_layer.stride = (1, 1)
conv_layer.padding = (1, 1)
conv_layer.dilation = (1, 1)
output_trt_tensor = conv_layer.get_output(0)
output_trt_tensor.name = "output_0"
output_trt_tensor.location = trt.TensorLocation.DEVICE
output_trt_tensor.dtype = trt.float32
network.mark_output(output_trt_tensor)
return network
def create_builder():
trt_logger = trt.Logger(trt.Logger.ERROR)
builder = trt.Builder(trt_logger)
builder.max_workspace_size = 0
builder.fp16_mode = False
builder.max_batch_size = 1
builder.strict_type_constraints = False
return builder
if __name__ == "__main__":
builder = create_builder()
network = build_network(builder)
config = builder.create_builder_config()
profile = builder.create_optimization_profile()
profile.set_shape(
"input_0",
min=(1, 3, 100, 100),
opt=(1, 3, 101, 101),
max=(1, 3, MAX_INPUT_SIZE, MAX_INPUT_SIZE),
)
config.add_optimization_profile(profile)
with builder.build_engine(network, config) as engine:
with engine.create_execution_context() as context:
for it in range(TIMES):
# input_shape = (1, 3, random.randint(100, MAX_INPUT_SIZE), random.randint(100, MAX_INPUT_SIZE))
input_shape = (1, 3, MAX_INPUT_SIZE, MAX_INPUT_SIZE)
torch_input = torch.ones(input_shape, dtype=torch.float32).cuda()
bindings = [None] * 2
bindings[engine.get_binding_index("input_0")] = torch_input.data_ptr()
context.set_binding_shape(0, input_shape)
output_shape = tuple(context.get_binding_shape(0))
torch_output = torch.empty(
size=output_shape, dtype=torch.float32, device=torch.device("cuda")
)
bindings[engine.get_binding_index("output_0")] = torch_output.data_ptr()
assert input_shape == output_shape
ret = context.execute_v2(bindings)
assert ret is True
del torch_input
del torch_output
Run the above code.
TensorRT 6 (nvcr.io/nvidia/tensorrt:19.12-py3)
TensorRT 7.0.0.11
This is probably related to: https://devtalk.nvidia.com/default/topic/1065018/tensorrt/context-gt-setbindingdimensions-casing-gpu-memory-leak
Looking into this.
This issue has been fixed upstream and should be included in the next release. Closing for now.
This issue has been fixed upstream and should be included in the next release. Closing for now.
still found memory leak in trt7. P4
Yes @ZimingLu ,
The next release isn't out yet. My comment was based on the current release (TensorRT 7.0)
Is my understanding correct that this would affect Pascal era cards? I'm seeing various reports of crippling memory leaks on GTX 10x0 series cards, whereas RTX era cards work fine.
Is there any known workaround for this (except for going back to TensorRT 6 and fixed inputs etc)? Confusingly enough I'm not seeing this on some Ubuntu 18.04 systems.
same problem...Is there a plan?
@rmccorm4 When dose the fix release?
Any update on when the fix will be released? We've got multiple devs with 10-series cards in their machines running into this issue and its a real pain.
Is it fixed on TensorRT 7.1? On TensorRT 7.0, this bug still exists. If you can, please add hotfix to TensorRT 7.0.
Most helpful comment
Is it fixed on TensorRT 7.1? On TensorRT 7.0, this bug still exists. If you can, please add hotfix to TensorRT 7.0.