Tensorrt: Tensorrt in Python: Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)

Created on 28 Dec 2019 · 26Comments · Source: NVIDIA/TensorRT

Description

When I run my code (accelerating caffe model using tensorrt) with detection network (using caffe2), this error occured: "ERROR: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception".

Environment

TensorRT Version: 6.0.1.5
GPU Type: 2080 ti
Nvidia Driver Version: 418
CUDA Version: 10.0
CUDNN Version: 7.6.2
Operating System + Version: ubuntu 16.04
Python Version (if applicable): 2.7
Baremetal or Container (if container which image + tag): runing in docker

Can somebody help me? Thanks!

Caffe 6.x

Source

Eric-Zhang1990

Most helpful comment

As an update for anyone else coming across this issue, posted a workaround that ended up resolving this CUDNN_STATUS_MAPPING_ERROR: https://github.com/NVIDIA/TensorRT/issues/62#issuecomment-653085425

prathik-naidu on 2 Jul 2020

❤1 👍1

All 26 comments

Hi @Eric-Zhang1990,

Can you share the the model and the code you're running to repro the issue so I can further debug?

rmccorm4 on 29 Dec 2019

@rmccorm4 I can run it correctly in my own computer, but when I load two models (one is caffe2 for detecting, one is my trt model), this error occurs, the code includes other infos, sorry.

Eric-Zhang1990 on 30 Dec 2019

@rmccorm4 I mean, tensorrt model can not be loaded with other model together??

Eric-Zhang1990 on 30 Dec 2019

@rmccorm4 This is my trt code, can you help me check the code? does it have error on it?
Thank you!
caffe_tensorrt.txt

Eric-Zhang1990 on 30 Dec 2019

Hi @Eric-Zhang1990,

Can you share the output of nvidia-smi on your machine / in the container? I'm curious if the GPUs are in EXCLUSIVE_PROCESS mode, similar to here: https://github.com/NVIDIA/TensorRT/issues/294

rmccorm4 on 30 Dec 2019

@rmccorm4 This is output of nvidia-smi, I use card 5 for caffe2 detection, card 7 for my trt classify.
trt--m34
pid is 34544.

Eric-Zhang1990 on 2 Jan 2020

@Eric-Zhang1990 can you show the full output? I wanted to see the top half as well to see each GPU's settings

rmccorm4 on 2 Jan 2020

@rmccorm4 Ok.
trt--m34

Eric-Zhang1990 on 2 Jan 2020

@rmccorm4 Can you see the picture?

Eric-Zhang1990 on 2 Jan 2020

@rmccorm4 I have seen the issue #294 , but when I run it on one device , the error also occurs.

Eric-Zhang1990 on 2 Jan 2020

Hi @Eric-Zhang1990,

I'm not quite sure what the issue is. The only thing that stands out to me is that nvidia-smi reports CUDA 10.2 + Driver 430, but CUDA 10.2 requires driver >= 440.33 per this table: https://docs.nvidia.com/deploy/cuda-compatibility/index.html#binary-compatibility__table-toolkit-driver.

Compatability mode might help in this scenario with specific lower driver versions like 418, but that only applies to Tesla GPUs (V100, T4, etc.) and you have a GeForce GPU (RTX 2080)

Can you try upgrading your driver to >= 440.33? (or use CUDA 10.0 instead of 10.2 for both your CUDA version and for the TensorRT download as well which should be easy since you're using containers)

rmccorm4 on 2 Jan 2020

@rmccorm4 Thank you for your kind reply, I will have a try.

Eric-Zhang1990 on 3 Jan 2020

@Eric-Zhang1990
What was the result of the attempt? Is it effective? I had the same problem

gzchenjiajun on 28 Feb 2020

Same problem.

xieydd on 17 Mar 2020

Same problem.

guanshuicheng on 10 Apr 2020

Same problem.
However, I can successfully use tensorrt to do inference in my test code (which I use torch dataloader to feed the data)
but in my demo.py code, I use multi threading to process data and model inference, this error comes up.
I put these code in the main thread

engine= get_engine(cfg.engine_file_path)
cuda.init()
device = cuda.Device(0) # enter your Gpu id here
ctx = device.make_context()
inputs, outputs, bindings, stream = allocate_buffers(engine)
ctx.pop()

and for the model inference thread
def model_inference():
...
with engine.create_execution_context() as context:
for raw_frames, data, flag in VideoStream.get_batch():
inputs[0].host = data
trt_outputs = do_inference_v2(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)

the error throw at context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)

def do_inference_v2(context, bindings, inputs, outputs, stream):
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
[cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
stream.synchronize()
return [out.host for out in outputs]

@rmccorm4
@Eric-Zhang1990 have you solve the problem?

yuluhan on 20 May 2020

I solve the problem!
my environment config is:
Tensorrt 7.0
cuda 10.2
cudnn 7.6.5

I change the code as below
I move those initialize code to the model inference thread:
def test(epoch, wb):
cuda.init()
device = cuda.Device(0)
ctx = device.make_context()
engine = get_engine(onnx_file_path, engine_file_path)
context = engine.create_execution_context()
inputs, outputs, bindings, stream = allocate_buffers(engine)
for batch_idx, (frame_idx, data, detected_frame, target) in enumerate(test_loader):
inputs[0].host = data
trt_outputs = do_inference_v2(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
ctx.pop()
del ctx

and it throw another error
[TensorRT] ERROR: ../rtSafe/cuda/cudaConvolutionRunner.cpp (303) - Cudnn Error in execute: 8 (CUDNN_STATUS_EXECUTION_FAILED)

I found the solution here #311
I install pytorch with cudatoolkit=10.1 (use conda install) but my tensorrt cuda version is 10.2
so I reinstall cudatoolkit by conda install cudatoolkit=10.2
then it runs successfully!
BTW
the warning also disappear
[TensorRT] WARNING: TensorRT was linked against cuDNN 7.6.4 but loaded cuDNN 7.6.3

317

yuluhan on 21 May 2020

👎1 👍1

@yuluhan @gzchenjiajun @Eric-Zhang1990 were any of you able to resolve this error? I'm running a very simple test case:

        cuda.init()
        device = cuda.Device(0)
        ctx = device.make_context()
        ctx.push()

        self.engine = self._load_engine()
        self.context = self.engine.create_execution_context()
        inputs = [torch.ones((1, 3, 256, 416), device="cuda:0")]
        outputs = [torch.zeros((1, 3, 8, 13), device="cuda:0"), torch.zeros((1, 3, 16, 26), device="cuda:0"), 
                    torch.zeros((1, 3, 32, 52), device="cuda:0"), torch.zeros((1, 6552, 6), device="cuda:0")]
        bindings = [_input.data_ptr() for _input in inputs] + [_output.data_ptr() for _output in outputs]

        self.context.execute_v2(bindings)

        ctx.pop()
        del ctx

but still getting the same ERROR: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR) issue. This is pretty much similar to @yuluhan's solution, but it doesn't resolve this issue. One thing to note is that I'm using pytorch tensors on the gpu as bindings and I'm only getting this issue when I try this (did not get this error when I was using cuda.mem_alloc). Any ideas?

Thanks!

prathik-naidu on 1 Jul 2020

@yuluhan @gzchenjiajun @Eric-Zhang1990 were any of you able to resolve this error? I'm running a very simple test case:
        cuda.init()
        device = cuda.Device(0)
        ctx = device.make_context()
        ctx.push()

        self.engine = self._load_engine()
        self.context = self.engine.create_execution_context()
        inputs = [torch.ones((1, 3, 256, 416), device="cuda:0")]
        outputs = [torch.zeros((1, 3, 8, 13), device="cuda:0"), torch.zeros((1, 3, 16, 26), device="cuda:0"), 
                    torch.zeros((1, 3, 32, 52), device="cuda:0"), torch.zeros((1, 6552, 6), device="cuda:0")]
        bindings = [_input.data_ptr() for _input in inputs] + [_output.data_ptr() for _output in outputs]

        self.context.execute_v2(bindings)

        ctx.pop()
        del ctx
but still getting the same ERROR: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR) issue. This is pretty much similar to @yuluhan's solution, but it doesn't resolve this issue. One thing to note is that I'm using pytorch tensors on the gpu as bindings and I'm only getting this issue when I try this (did not get this error when I was using cuda.mem_alloc). Any ideas?

Thanks!

I think it is essential to allocate the space on GPU using cuda.mem_alloc().
I guess the trt engine only support this input form.

yuluhan on 1 Jul 2020

@yuluhan referencing this issue https://github.com/NVIDIA/TensorRT/issues/62

seems like this is possible (https://github.com/NVIDIA-AI-IOT/jetbot/blob/cf3e264ae6/jetbot/tensorrt_model.py) so wondering why this issue is happening here?

prathik-naidu on 1 Jul 2020

As an update for anyone else coming across this issue, posted a workaround that ended up resolving this CUDNN_STATUS_MAPPING_ERROR: https://github.com/NVIDIA/TensorRT/issues/62#issuecomment-653085425

prathik-naidu on 2 Jul 2020

❤1 👍1

@prathik-naidu

I can't solve the issue with your workaround. Do you have any update on this?

twmht on 13 Aug 2020

@prathik-naidu

I just found out the error (ERROR: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR) is only related to the engine converted from onnx and using pytorch tensor as the input and output cuda memory. When using torch2trt (https://github.com/NVIDIA-AI-IOT/torch2trt) by converting from onnx to tensorrt, I never have this problem.

And if I use pycuda to init the memory, the error is gone even I use the engine conveted from onnx

twmht on 13 Aug 2020

@twmht how to init the memory by pycuda?

ahangchen on 8 Sep 2020

@twmht how to init the memory by pycuda?

Before any access to cuda through pytorch, import the following:

import pycuda.autoinit

@prathik-naidu
This is not resolved or as a related issue as @twmht mentioned through the engine converted from onnx and using pytorch cuda tensor. There is likely something going around between pytorch and pycuda initialization.