Tensorrt: Tensorrt in Python: Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)

Created on 28 Dec 2019  Â·  26Comments  Â·  Source: NVIDIA/TensorRT

Description

When I run my code (accelerating caffe model using tensorrt) with detection network (using caffe2), this error occured: "ERROR: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception".

Environment

TensorRT Version: 6.0.1.5
GPU Type: 2080 ti
Nvidia Driver Version: 418
CUDA Version: 10.0
CUDNN Version: 7.6.2
Operating System + Version: ubuntu 16.04
Python Version (if applicable): 2.7
Baremetal or Container (if container which image + tag): runing in docker

Can somebody help me? Thanks!

Caffe 6.x

Most helpful comment

As an update for anyone else coming across this issue, posted a workaround that ended up resolving this CUDNN_STATUS_MAPPING_ERROR: https://github.com/NVIDIA/TensorRT/issues/62#issuecomment-653085425

All 26 comments

Hi @Eric-Zhang1990,

Can you share the the model and the code you're running to repro the issue so I can further debug?

@rmccorm4 I can run it correctly in my own computer, but when I load two models (one is caffe2 for detecting, one is my trt model), this error occurs, the code includes other infos, sorry.

@rmccorm4 I mean, tensorrt model can not be loaded with other model together??

@rmccorm4 This is my trt code, can you help me check the code? does it have error on it?
Thank you!
caffe_tensorrt.txt

Hi @Eric-Zhang1990,

Can you share the output of nvidia-smi on your machine / in the container? I'm curious if the GPUs are in EXCLUSIVE_PROCESS mode, similar to here: https://github.com/NVIDIA/TensorRT/issues/294

@rmccorm4 This is output of nvidia-smi, I use card 5 for caffe2 detection, card 7 for my trt classify.
trt--m34
pid is 34544.

@Eric-Zhang1990 can you show the full output? I wanted to see the top half as well to see each GPU's settings

@rmccorm4 Ok.
trt--m34

@rmccorm4 Can you see the picture?

@rmccorm4 I have seen the issue #294 , but when I run it on one device , the error also occurs.

Hi @Eric-Zhang1990,

I'm not quite sure what the issue is. The only thing that stands out to me is that nvidia-smi reports CUDA 10.2 + Driver 430, but CUDA 10.2 requires driver >= 440.33 per this table: https://docs.nvidia.com/deploy/cuda-compatibility/index.html#binary-compatibility__table-toolkit-driver.

Compatability mode might help in this scenario with specific lower driver versions like 418, but that only applies to Tesla GPUs (V100, T4, etc.) and you have a GeForce GPU (RTX 2080)

Can you try upgrading your driver to >= 440.33? (or use CUDA 10.0 instead of 10.2 for both your CUDA version and for the TensorRT download as well which should be easy since you're using containers)

@rmccorm4 Thank you for your kind reply, I will have a try.

@Eric-Zhang1990
What was the result of the attempt? Is it effective? I had the same problem

Same problem.

Same problem.

Same problem.
However, I can successfully use tensorrt to do inference in my test code (which I use torch dataloader to feed the data)
but in my demo.py code, I use multi threading to process data and model inference, this error comes up.
I put these code in the main thread

engine= get_engine(cfg.engine_file_path)
cuda.init()
device = cuda.Device(0) # enter your Gpu id here
ctx = device.make_context()
inputs, outputs, bindings, stream = allocate_buffers(engine)
ctx.pop()

and for the model inference thread
def model_inference():
...
with engine.create_execution_context() as context:
for raw_frames, data, flag in VideoStream.get_batch():
inputs[0].host = data
trt_outputs = do_inference_v2(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)

the error throw at context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)

def do_inference_v2(context, bindings, inputs, outputs, stream):
[cuda.memcpy_htod_async(inp.device, inp.host, stream) for inp in inputs]
context.execute_async_v2(bindings=bindings, stream_handle=stream.handle)
[cuda.memcpy_dtoh_async(out.host, out.device, stream) for out in outputs]
stream.synchronize()
return [out.host for out in outputs]

@rmccorm4
@Eric-Zhang1990 have you solve the problem?

I solve the problem!
my environment config is:
Tensorrt 7.0
cuda 10.2
cudnn 7.6.5

I change the code as below
I move those initialize code to the model inference thread:
def test(epoch, wb):
cuda.init()
device = cuda.Device(0)
ctx = device.make_context()
engine = get_engine(onnx_file_path, engine_file_path)
context = engine.create_execution_context()
inputs, outputs, bindings, stream = allocate_buffers(engine)
for batch_idx, (frame_idx, data, detected_frame, target) in enumerate(test_loader):
inputs[0].host = data
trt_outputs = do_inference_v2(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
ctx.pop()
del ctx

and it throw another error
[TensorRT] ERROR: ../rtSafe/cuda/cudaConvolutionRunner.cpp (303) - Cudnn Error in execute: 8 (CUDNN_STATUS_EXECUTION_FAILED)

I found the solution here #311
I install pytorch with cudatoolkit=10.1 (use conda install) but my tensorrt cuda version is 10.2
so I reinstall cudatoolkit by conda install cudatoolkit=10.2
then it runs successfully!
BTW
the warning also disappear
[TensorRT] WARNING: TensorRT was linked against cuDNN 7.6.4 but loaded cuDNN 7.6.3

317

@yuluhan @gzchenjiajun @Eric-Zhang1990 were any of you able to resolve this error? I'm running a very simple test case:

        cuda.init()
        device = cuda.Device(0)
        ctx = device.make_context()
        ctx.push()

        self.engine = self._load_engine()
        self.context = self.engine.create_execution_context()
        inputs = [torch.ones((1, 3, 256, 416), device="cuda:0")]
        outputs = [torch.zeros((1, 3, 8, 13), device="cuda:0"), torch.zeros((1, 3, 16, 26), device="cuda:0"), 
                    torch.zeros((1, 3, 32, 52), device="cuda:0"), torch.zeros((1, 6552, 6), device="cuda:0")]
        bindings = [_input.data_ptr() for _input in inputs] + [_output.data_ptr() for _output in outputs]

        self.context.execute_v2(bindings)

        ctx.pop()
        del ctx

but still getting the same ERROR: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR) issue. This is pretty much similar to @yuluhan's solution, but it doesn't resolve this issue. One thing to note is that I'm using pytorch tensors on the gpu as bindings and I'm only getting this issue when I try this (did not get this error when I was using cuda.mem_alloc). Any ideas?

Thanks!

@yuluhan @gzchenjiajun @Eric-Zhang1990 were any of you able to resolve this error? I'm running a very simple test case:

        cuda.init()
        device = cuda.Device(0)
        ctx = device.make_context()
        ctx.push()

        self.engine = self._load_engine()
        self.context = self.engine.create_execution_context()
        inputs = [torch.ones((1, 3, 256, 416), device="cuda:0")]
        outputs = [torch.zeros((1, 3, 8, 13), device="cuda:0"), torch.zeros((1, 3, 16, 26), device="cuda:0"), 
                    torch.zeros((1, 3, 32, 52), device="cuda:0"), torch.zeros((1, 6552, 6), device="cuda:0")]
        bindings = [_input.data_ptr() for _input in inputs] + [_output.data_ptr() for _output in outputs]

        self.context.execute_v2(bindings)

        ctx.pop()
        del ctx

but still getting the same ERROR: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR) issue. This is pretty much similar to @yuluhan's solution, but it doesn't resolve this issue. One thing to note is that I'm using pytorch tensors on the gpu as bindings and I'm only getting this issue when I try this (did not get this error when I was using cuda.mem_alloc). Any ideas?

Thanks!

I think it is essential to allocate the space on GPU using cuda.mem_alloc().
I guess the trt engine only support this input form.

@yuluhan referencing this issue https://github.com/NVIDIA/TensorRT/issues/62

seems like this is possible (https://github.com/NVIDIA-AI-IOT/jetbot/blob/cf3e264ae6/jetbot/tensorrt_model.py) so wondering why this issue is happening here?

As an update for anyone else coming across this issue, posted a workaround that ended up resolving this CUDNN_STATUS_MAPPING_ERROR: https://github.com/NVIDIA/TensorRT/issues/62#issuecomment-653085425

@prathik-naidu

I can't solve the issue with your workaround. Do you have any update on this?

@prathik-naidu

I just found out the error (ERROR: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR) is only related to the engine converted from onnx and using pytorch tensor as the input and output cuda memory. When using torch2trt (https://github.com/NVIDIA-AI-IOT/torch2trt) by converting from onnx to tensorrt, I never have this problem.

And if I use pycuda to init the memory, the error is gone even I use the engine conveted from onnx

@twmht how to init the memory by pycuda?

@twmht how to init the memory by pycuda?

Before any access to cuda through pytorch, import the following:

import pycuda.autoinit

@prathik-naidu
This is not resolved or as a related issue as @twmht mentioned through the engine converted from onnx and using pytorch cuda tensor. There is likely something going around between pytorch and pycuda initialization.

Maybe, you had set the wrong GPU id.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Xianqi-Zhang picture Xianqi-Zhang  Â·  5Comments

dhkim0225 picture dhkim0225  Â·  4Comments

anmol039w picture anmol039w  Â·  5Comments

peijason picture peijason  Â·  3Comments

lapolonio picture lapolonio  Â·  5Comments