pytorch 🚀 - PyTorch 1.3: random "RuntimeError: CUDA error: unspecified launch failure"

Can you provide a minimal code example to reproduce? Please also copy and paste the output from our environment collection script. You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

vincentqb on 14 Oct 2019

Hello @vincentqb,

Code example: https://github.com/pytorch/pytorch/files/3723821/PyTorch.zip

Output from the environment collection script:

Collecting environment information...
PyTorch version: 1.3.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Microsoft Windows 10 Enterprise
GCC version: Could not collect
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudnn64_7.dll

Versions of relevant libraries:
[pip] numpy==1.15.4
[pip] torch==1.3.0
[pip] torchvision==0.4.1
[conda] blas                      1.0                         mkl
[conda] libblas                   3.8.0                    13_mkl    conda-forge
[conda] libcblas                  3.8.0                    13_mkl    conda-forge
[conda] liblapack                 3.8.0                    13_mkl    conda-forge
[conda] mkl                       2019.4                      245
[conda] mkl-service               2.3.0            py37hb782905_0
[conda] pytorch                   1.3.0           py3.7_cuda101_cudnn7_0    pytorch
[conda] torchvision               0.4.1                py37_cu101    pytorch

nvidia-smi:

Mon Oct 14 21:05:01 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 436.48       Driver Version: 436.48       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060   WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   63C    P2    28W /  N/A |   1103MiB /  6144MiB |     13%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      6156      C   C:\Anaconda3\envs\torch12\python.exe       N/A      |
+-----------------------------------------------------------------------------+

alexeygolyshev on 14 Oct 2019

@alexeygolyshev is that a minimal example? Looks like there is a lot of code in there.
If you could reduce the size of the code, it would really help with finding what is the root cause, thanks !

albanD on 14 Oct 2019

Hello @albanD,
Yes, this is a minimal example. I don't think I can greatly reduce the code. I have already deleted the data preprocessing.

alexeygolyshev on 14 Oct 2019

@albanD My inputs: [sentences, words, characters]. I have 2 varying dimensions: different number of words in a sentence and different number of characters in a word.

alexeygolyshev on 14 Oct 2019

Unfortunately I don't have a setup with notebook available. Could you run your code with anomaly_mode enabled and post here the extended stack trace?

albanD on 14 Oct 2019

👀1

🐛 Bug

No problem in PyTorch 1.2. Archive with code and data: https://github.com/pytorch/pytorch/files/3723821/PyTorch.zip

Windows 10 (1903), Python 3.7.4, RTX 2060 (driver version 436.48)

RuntimeError                              Traceback (most recent call last)
<ipython-input-6-68308ed1e055> in <module>
     35         cum_loss.append(loss.item())
     36 
---> 37         loss.backward()
     38         optimizer.step()
     39 

C:\Anaconda3\envs\torch13\lib\site-packages\torch\tensor.py in backward(self, gradient, retain_graph, create_graph)
    148                 products. Defaults to ``False``.
    149         """
--> 150         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    151 
    152     def register_hook(self, hook):

C:\Anaconda3\envs\torch13\lib\site-packages\torch\autograd\__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     97     Variable._execution_engine.run_backward(
     98         tensors, grad_tensors, retain_graph, create_graph,
---> 99         allow_unreachable=True)  # allow_unreachable flag
    100 
    101 

RuntimeError: CUDA error: unspecified launch failure

cc @ezyang @SsnL @albanD @zou3519 @gqchen

Hello, my computer system is the same as yours,【Win10(1903),Python 3.7.4, RTX 2060 (driver version 441.20),torch.version==1.2.0,】
I encountered the same problem as you. Have you solved it now?

you say No problem in PyTorch 1.2. Can you tell me all the information in this version？
CUDA?CUDNN?Python?

Huer-H on 30 Dec 2019

Hello @JYH9351,
I am currently using PyTorch 1.3.0 in production. I don't know why, but this helps:

with t.autograd.set_detect_anomaly(False):
    for epoch in range(epochs):
        ...

Crashes less frequently, not in the first 2 epochs.

alexeygolyshev on 30 Dec 2019

Does switching off the TDR settings helps? https://zhuanlan.zhihu.com/p/38141415

peterjc123 on 30 Dec 2019

No. TDR = 60. Run 2 times. Crashed in epochs 2 and 11. This error appears randomly.
with t.autograd.set_detect_anomaly(True) increases time per epoch in 5x. In October, I waited several hours, but there was no error. So there is no extended stack trace.
Sometimes with t.autograd.set_detect_anomaly(False) can increase time without errors. But I am not sure. In October, I trained several networks with a 2-day uptime. But in later experiments, it also crashed randomly.

# gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
# Idx     W     C     C     %     %     %     %   MHz   MHz
    0    28    62     -    12     0     0     0  6801   960
    0    30    62     -    11     0     0     0  6801  1155
    0    32    63     -    20     7     0     0  6801  1155
    0    32    62     -    13     1     0     0  6801   960
    0    28    62     -    15     1     0     0  6801   960
    0    28    62     -    16     1     0     0  6801   960
    0    28    62     -    15     1     0     0  6801   960
    0    28    63     -    14     1     0     0  6801   960
    0    27    63     -    13     0     0     0  6801   960
    0    28    62     -    11     3     0     0  6801   960
    0    28    62     -     0     0     0     0  6801   960
    0    12    62     -     0     0     0     0   810   345
    0     5    61     -     0     0     0     0   405   345

alexeygolyshev on 30 Dec 2019

I have to say that it is difficult to say where the problem is without the stacktrace including the exact crash site. But we may get that with the help of a RelWithDebInfo build and the attachment of the VS debugger. I could build one for you if you have trouble in building the project.

peterjc123 on 30 Dec 2019

It will be great if you can prepare the debug build. I don't have much experience.

alexeygolyshev on 30 Dec 2019

Interesting.

peterjc123 on 31 Dec 2019

I had this issue training a model from https://github.com/wgrathwohl/JEM with PyTorch 1.3
I used this command

python train_wrn_ebm.py --lr .0001 --dataset cifar10 --optimizer adam --p_x_weight 1.0 --p_y_given_x_weight 1.0 --p_x_y_weight 0.0 --sigma .03 --width 2 --depth 40 --save_dir ./experiments --plot_uncond --warmup_iters 1000

The error happened seemingly randomly in the middle of training. I am using linux mint, not Windows.

hendrycks on 2 Jan 2020

👍3

I will suggested that you try again with uninstalling GPU driver with DDU and installing the driver that comes with cuda toolkit.

Too many bugs with Nvidia GPU driver on win 10.

kice on 20 Jan 2020

I have run into this same issue and tried the suggestion of @kice of installing the driver from the cuda toolkit with no luck.

dalupus on 20 Jan 2020

I am running into similar issues on my windows machine, I have a simple pipeline for binary classification with an LSTM and it shuts down at epochs (seems to be random).

Yourivdzee on 20 Jan 2020

👍8

My issue is also with lstm. Interestingly when I add torch.autograd.set_detect_anomaly(True) to get a stack trace, it takes about 20% longer to train but didn't fail. I will run a few more times to see if that is consistently true.

dalupus on 21 Jan 2020

👍4

Same problem with LSTM + binary classification + error in random epoch on windows 10 + Pytorch 1.4

File "C:/Users/User/GoogleDrive/mad2-recommend/gnn/train.py", line 108, in main train_model(train_loader, predict_score_net, optimizer) File "C:/Users/User/GoogleDrive/mad2-recommend/gnn/train.py", line 41, in train_model loss.backward() File "C:\Users\User\Anaconda3\lib\site-packages\torch\tensor.py", line 195, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "C:\Users\User\Anaconda3\lib\site-packages\torch\autograd\__init__.py", line 99, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: CUDA error: unspecified launch failure

update:
GRU has the same problem.

shingyipcheung on 21 Jan 2020

@shingyipcheung Are you able to replicate the error with torch.autograd.set_detect_anomaly(True) set in order to get a full stacktrace?

dalupus on 24 Jan 2020

I'm having this issue as well! (EDIT: on latest 1.4). The network will train for awhile, then at some random point, the classifier will halt with this exception.

It is possible to reproduce by using FastAI AWD-LSTM transfer learning for text classification on a very large dataset: https://docs.fast.ai/text.html

After this happens, further CUDA operations result in the same error until the kernel is restarted.

I suspect a lot of this simply does not get tested on Windows. Professionally, I always use Linux for machine learning tasks. It just so happens that my only personal system with a GPU runs Windows and does not have space for a Linux install. Furthermore, "Ubuntu on Windows" does not support CUDA.

rothn on 22 Feb 2020

👍4

I have the same issue when I train with LSTM + classification, the error occurs in random epoch on windows 10 + Pytorch 1.4
waiting for a solution

Lovebear912 on 5 Mar 2020

Raising priority based on user activity

ezyang on 8 Mar 2020

❤3 👀1

@peterjc123 Would you like to look at this issue again? Thanks!

yf225 on 9 Mar 2020

I'm facing the same issue when running on Win10 + Pytorch 1.4 with LSTM + classification. My GPU could run smoothly with other models like CNN.
The same piece of code could be executed on server with Ubuntu system. So guess some issue with Win10 compatibility

helloxy96 on 25 Mar 2020

Could you please test whether it is still a problem in the nightly package or the 1.5.0 package?

peterjc123 on 25 Mar 2020

@peterjc123 the issue is resolved after I upgrade to nightly version. Thanks for the help :)

helloxy96 on 25 Mar 2020

👍1

Same problem, LSTM training random crash.
Pytorch 1.5.0, 1660Ti, 2600X, windows10, 32G

azuredream on 24 Apr 2020

👍2

I encountered same error with Pytorch 1.4.0, RTX2060, windows10, also training an LSTM.
Trying to upgrade to pytorch 1.5.0 now and will report again.
BTW I did not observed this error on my developing laptop pytorch1.4, GTX1050Ti (don know if I was lucky enough or it's specific for some GPU)

Update:
After upgrade to pytorch 1.5.0
The error still occurs randomly. (with RTX2060 on windows10)

LuciusLan on 25 Apr 2020

👍3

Same issue here. On a Win10 machine with two 2080Ti cards. Kernal crashes after the first few batches of a GRU model.

Pytorch: 1.5.0

XiaomoWu on 29 Apr 2020

Well, instead of describing the fact that it occurs, it will be better to provide some code for us to reproduce it at our side.

peterjc123 on 29 Apr 2020

Hi @peterjc123 , I'd like to but it just occurred in various positions of my code that I cannot specify a snippet to reproduce. later I will work on minimizing my project for your team to test.

LuciusLan on 29 Apr 2020

👍1

If it helps I see the same thing with any windows nvidia driver above 431.68. Using 431.68 or below seems to be fine.

roceh on 8 May 2020

👍2

@roceh does running with torch.autograd.set_detect_anomaly(True) and a newer driver crash and produce a useful stack trace?

mattip on 12 May 2020

I have same problem running CenterNet object detection codes. with ubuntu18 + 2080super + CUDA10. It runs fine for few epochs and randomly crash at some iteration. Any solution...?

TWDH on 16 May 2020

@TWDH can you get a stacktrace from a crash?

mattip on 17 May 2020

Hi @peterjc123 ,

I am a newbie of using pytorch & detectron2. I also encounter this problem last night. I use my own dataset (a small dataset, 180+ train images, 40+ val images), prepared in MS COCO format.

I am using Docker, and the container setup equals to the Docker file in detectron2 repo.
https://github.com/facebookresearch/detectron2/blob/master/docker/Dockerfile

And here is my PC config:

CPU: Intel i7-7800X
Host memory: 64GB
GPU: NVIDIA GTX 1080 (8GB)

Attached is the train.py I use for train detectron2.
train_py.zip

Please help. Thanks.

cclo-astri on 21 May 2020

Hi @cclo-astri. Thanks for reporting. Most of the other reports here have been for windows. Could you try to report the stacktrace from the failure by using torch.autograd.set_detect_anomaly(True) and rerunning the code? It would also help to know which version of PyTorch, Python, and Cuda you are using

mattip on 21 May 2020

Hi @mattip

After I change the batch_size and num_workers of dataloader to 1, the error seems gone (at least the training could run for nearly 5 hours, but the PC suddenly hangs and the training is incomplete). The ETA of training time also decreases, why ?

Before:
-- cfg.DATALOADER.NUM_WORKERS = 2
-- cfg.SOLVER.IMS_PER_BATCH = 2
-- ETA: 9.5 hours, max_mem: 5.8GBytes (nvidia-smi shows 6.8GBytes)
-- Failed after 1.5 hours
After:
-- cfg.DATALOADER.NUM_WORKERS = 1
-- cfg.SOLVER.IMS_PER_BATCH = 1
-- ETA: 5 hours, max_mem: 3.5GBytes (nvidia-smi shows 4.5GBytes)

And here is the detail information of my docker environment:

PyTorch: 1.5+cu101
Torchvision: 0.6+cu101
CUDA + CUDNN: comes from Docker image: nvidia/cuda:10.1-cudnn7-devel
- CUDA: 10.1.243
- CUDNN: 7.6.5.32

For further version information of installed packages, please refer to: https://hub.docker.com/layers/nvidia/cuda/10.1-cudnn7-devel/images/sha256-557de4ba2cb674029ffb602bed8f748d44d59bb7db9daa746ea72a102406d3ec?context=explore

Thanks.

cclo-astri on 21 May 2020

I just got this error today after updating my NVIDIA drivers to 445.87 (I haven't updated them for a year at least). I'm using a GTX 1060 (6Gb), Cuda compilation tools, release 9.0, V9.0.176, pytorch 1.5.0+cu92, cudnn 7.3.0

My LSTM was training fine before. With a set seed, it always crashes at the same time (at the middle of epoch 52). I first get the error below

r_out, (h_out, c_out) = self.rnn(x)

File "C:\Users\willi\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
result = self.forward(input, *kwargs)

File "C:\Users\willi\Anaconda3\lib\site-packages\torch\nn\modules\rnn.py", line 570, in forward
self.dropout, self.training, self.bidirectional, self.batch_first)

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

But then when I try to train again without killing the kernel, I get :

File "C:\Users\willi\Anaconda3\lib\site-packages\torchtext\data\iterator.py", line 156, in __iter__
yield Batch(minibatch, self.dataset, self.device)

File "C:\Users\willi\Anaconda3\lib\site-packages\torchtext\databatch.py", line 34, in __init__
setattr(self, name, field.process(batch, device=device))

File "C:\Users\willi\Anaconda3\lib\site-packages\torchtext\data\field.py", line 237, in process
tensor = self.numericalize(padded, device=device)

File "C:\Users\willi\Anaconda3\lib\site-packages\torchtext\data\field.py", line 359, in numericalize
var = torch.tensor(arr, dtype=self.dtype, device=device)

RuntimeError: CUDA error: unspecified launch failure

Update : tried rolling back my NVIDIA drivers to 442.59 and the error still appears at epoch 60.

wsdea on 22 May 2020

This related issue mentions a fix to our problem, which consistently works on my machine: https://github.com/pytorch/pytorch/issues/21819. Specifically this comment.

System:

Pytorch: 1.5
NVIDIA Driver: 446.14
CUDA Version: 10.2 and 11.0
GPU: 1660
OS: Windows 10 (19041.264)

I'm experiencing similar issues, however, detect anomaly is enabled in this case. The model is a simple two-layer (convolutional) revnet using less than a gigabyte of vram. The system itself uses between 300 and 800 MB.\
The errors pop up with both CUDA 10.2 and 11.0, giving the same tracebacks in both runs.

Warning: Error detected in CudnnConvolutionBackward. Traceback of forward call that caused the error:
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\autograd\function.py", line 77, in apply
    return self._forward_cls.backward(self, *args)
  File "C:\ProgramData\Anaconda3\lib\site-packages\memcnn-1.3.2-py3.7.egg\memcnn\models\revop.py", line 83, in backward
    temp_output = ctx.fn(*detached_inputs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\memcnn-1.3.2-py3.7.egg\memcnn\models\additive.py", line 65, in forward
    gmd = self.Gm.forward(y1)
  File "C:\Users\UserName\Documents\Project\pytorch\model.py", line 121, in forward
    sort = self.sort_conv(inp)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 349, in forward
    return self._conv_forward(input, self.weight)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 346, in _conv_forward
    self.padding, self.dilation, self.groups)
 (print_stack at ..\torch\csrc\autograd\python_anomaly_mode.cpp:60)
Warning: Error detected in InvertibleCheckpointFunctionBackward. Traceback of forward call that caused the error:
  File ".\main.py", line 14, in <module>
    model.fit()
  File "C:\Users\UserName\Documents\Project\pytorch\main.py", line 129, in fit
    out = self.model(src)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\container.py", line 100, in forward
    input = module(input)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\memcnn-1.3.2-py3.7.egg\memcnn\models\revop.py", line 183, in forward
    *(xin + tuple([p for p in self._fn.parameters() if p.requires_grad])))
 (print_stack at ..\torch\csrc\autograd\python_anomaly_mode.cpp:60)
Traceback (most recent call last):
  File ".\main.py", line 14, in <module>
    model.fit()
  File "C:\Users\UserName\Documents\Project\pytorch\main.py", line 131, in fit
    err.backward()
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\tensor.py", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\autograd\__init__.py", line 100, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA error: an illegal memory access was encountered (operator () at C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/CUDAScalar.cu:19)
(no backtrace available)

When using chromium (but not brave, chrome, firefox and even games on ultra settings) however, it won't ever run the first few batches and instead go straight to this:

Warning: Error detected in CudnnConvolutionBackward. Traceback of forward call that caused the error:
  File ".\main.py", line 14, in <module>
    model.fit()
  File "C:\Users\UserName\Documents\Project\pytorch\main.py", line 129, in fit
    out = self.model(src)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\container.py", line 100, in forward
    input = module(input)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\UserName\Documents\Project\pytorch\model.py", line 121, in forward
    sort = self.sort_conv(inp)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 349, in forward
    return self._conv_forward(input, self.weight)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\nn\modules\conv.py", line 346, in _conv_forward
    self.padding, self.dilation, self.groups)
 (print_stack at ..\torch\csrc\autograd\python_anomaly_mode.cpp:60)
Traceback (most recent call last):
  File ".\main.py", line 14, in <module>
    model.fit()
  File "C:\Users\UserName\Documents\Project\pytorch\main.py", line 131, in fit
    err.backward()
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\tensor.py", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "C:\ProgramData\Anaconda3\lib\site-packages\torch\autograd\__init__.py", line 100, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA error: unspecified launch failure (operator () at C:/cb/pytorch_1000000000000/work/aten/src/ATen/native/cuda/CUDAScalar.cu:19)
(no backtrace available)

Curiously it then fixed itself after waiting for six hours, allowing it to process roughly 20,000 batches in one hour while crashing immediately after finishing these.

ClashLuke on 4 Jun 2020

I've compiled the binaries with debug info.
https://5833189-65600975-gh.circle-artifacts.com/0/w/final_pkgs/torch-1.6.0.dev20200613-cp36-cp36m-win_amd64.whl
https://5833191-65600975-gh.circle-artifacts.com/0/w/final_pkgs/torch-1.6.0.dev20200613-cp37-cp37m-win_amd64.whl
https://5833196-65600975-gh.circle-artifacts.com/0/w/final_pkgs/torch-1.6.0.dev20200613-cp38-cp38-win_amd64.whl
You can install them and then get some more info using cuda-memcheck.

:: PythonRoot in the line below refers to the directory of your Python installation
:: e.g. C:\Python37
set _NT_ALT_SYMBOL_PATH=[PythonRoot]\Lib\site-packages\torch\lib
cuda-memcheck python your-script.py

peterjc123 on 13 Jun 2020

With cuda-memcheck python bug.py, I get OOM. Epoch 0 never ends. But memory is growing.

So I run python bug.py. I see 2x speedup: 10 seconds per epoch vs 20 seconds in PyTorch 1.5. But:

epoch: 254
Traceback (most recent call last):
  File "bug.py", line 124, in <module>
    main()
  File "bug.py", line 104, in main
    loss.backward()
  File "C:\Anaconda3\envs\torch16\lib\site-packages\torch\tensor.py", line 184, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "C:\Anaconda3\envs\torch16\lib\site-packages\torch\autograd\__init__.py", line 125, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
Exception raised from _cudnn_rnn_backward_input at ..\aten\src\ATen\native\cudnn\RNN.cpp:923 (most recent call first):
00007FFD84CE087200007FFD84C85FAB c10.dll!caffe2::TypeMeta::_typeMetaDataInstance<unsigned char> [<unknown file> @ <unknown line number>]
00007FFC91A959B600007FFC8C1EC160 torch_cuda.dll!THCudaShortTensor_set4d [<unknown file> @ <unknown line number>]
00007FFC91AC19F000007FFC8C1EC160 torch_cuda.dll!THCudaShortTensor_set4d [<unknown file> @ <unknown line number>]
00007FFC91ABFAAF00007FFC8C1EC160 torch_cuda.dll!THCudaShortTensor_set4d [<unknown file> @ <unknown line number>]
00007FFC91B7C43800007FFC8C1EC160 torch_cuda.dll!THCudaShortTensor_set4d [<unknown file> @ <unknown line number>]
00007FFC91B8BD3D00007FFC8C1EC160 torch_cuda.dll!THCudaShortTensor_set4d [<unknown file> @ <unknown line number>]
00007FFD3CAABF3A00007FFD345D6D2D torch_cpu.dll!caffe2::ExternalDataProto::MaybeArenaPtr [<unknown file> @ <unknown line number>]
00007FFD3CAE625E00007FFD345D6D2D torch_cpu.dll!caffe2::ExternalDataProto::MaybeArenaPtr [<unknown file> @ <unknown line number>]
00007FFD3CAABD2200007FFD345D6D2D torch_cpu.dll!caffe2::ExternalDataProto::MaybeArenaPtr [<unknown file> @ <unknown line number>]
00007FFD3CB347E500007FFD345D6D2D torch_cpu.dll!caffe2::ExternalDataProto::MaybeArenaPtr [<unknown file> @ <unknown line number>]
00007FFD3FAC814D00007FFD345D6D2D torch_cpu.dll!caffe2::ExternalDataProto::MaybeArenaPtr [<unknown file> @ <unknown line number>]
00007FFD3FAD573D00007FFD345D6D2D torch_cpu.dll!caffe2::ExternalDataProto::MaybeArenaPtr [<unknown file> @ <unknown line number>]
00007FFD3CAABF3A00007FFD345D6D2D torch_cpu.dll!caffe2::ExternalDataProto::MaybeArenaPtr [<unknown file> @ <unknown line number>]
00007FFD3CAE625E00007FFD345D6D2D torch_cpu.dll!caffe2::ExternalDataProto::MaybeArenaPtr [<unknown file> @ <unknown line number>]
00007FFD3CAABD2200007FFD345D6D2D torch_cpu.dll!caffe2::ExternalDataProto::MaybeArenaPtr [<unknown file> @ <unknown line number>]
00007FFD3CB347E500007FFD345D6D2D torch_cpu.dll!caffe2::ExternalDataProto::MaybeArenaPtr [<unknown file> @ <unknown line number>]
00007FFD3F8D35D100007FFD345D6D2D torch_cpu.dll!caffe2::ExternalDataProto::MaybeArenaPtr [<unknown file> @ <unknown line number>]
00007FFD3F89F55900007FFD345D6D2D torch_cpu.dll!caffe2::ExternalDataProto::MaybeArenaPtr [<unknown file> @ <unknown line number>]
00007FFD40055CA900007FFD345D6D2D torch_cpu.dll!caffe2::ExternalDataProto::MaybeArenaPtr [<unknown file> @ <unknown line number>]
00007FFD4005764A00007FFD345D6D2D torch_cpu.dll!caffe2::ExternalDataProto::MaybeArenaPtr [<unknown file> @ <unknown line number>]
00007FFD4005EE1900007FFD345D6D2D torch_cpu.dll!caffe2::ExternalDataProto::MaybeArenaPtr [<unknown file> @ <unknown line number>]
00007FFD4005E94200007FFD345D6D2D torch_cpu.dll!caffe2::ExternalDataProto::MaybeArenaPtr [<unknown file> @ <unknown line number>]
00007FFD60C21C3B00007FFD608D6622 torch_python.dll!THPVariable_Wrap [<unknown file> @ <unknown line number>]
00007FFD40042F3400007FFD345D6D2D torch_cpu.dll!caffe2::ExternalDataProto::MaybeArenaPtr [<unknown file> @ <unknown line number>]
00007FFD8E62D9F200007FFD8E62D980 ucrtbase.dll!o_strncat_s [<unknown file> @ <unknown line number>]
00007FFD8F817BD400007FFD8F817BC0 KERNEL32.DLL!BaseThreadInitThunk [<unknown file> @ <unknown line number>]
00007FFD916ECEE100007FFD916ECEC0 ntdll.dll!RtlUserThreadStart [<unknown file> @ <unknown line number>]

bug.zip

alexeygolyshev on 17 Jun 2020

I get a lot of random errors, like this.
In my case, i fix this bugs by making 2 things
1 I change raiser from my card (1060 6gb connected to PCI-E 8x slot by raiser)
2 Change PCI-E slot to 16X.

All errors are gone. I think my problem was specific, and not fully connected to this topic, but i think this actions can help someone

pnzr00t on 20 Jun 2020

When using LSTM and cuda, I saw the same error when I use my entire dataset.
But your assistance helped me, I used: torch.autograd.set_detect_anomaly(True) and I could use the full dataset when I set the above statement. thank you.

Gaussian01 on 21 Jun 2020

We are also training LSTMs and experiencing similar issues. We have tried various environment configurations:

(Note windows had the latest updates applied)
1) Windows 10, Nvidia driver CUDA 10.1, Pytorch 1.5 CUDA 10.1
2) Windows 10, Nvidia driver CUDA 10.2, Pytorch 1.5 CUDA 10.2
3) Windows 10, Nvidia driver CUDA 11, Pytorch 1.5 CUDA 10.2
and all of the above with Pytorch 1.5.1.

4) Ubuntu Linux 20.04, Nvidia driver CUDA 10.1, Pytorch 1.5 CUDA 10.1
5) Ubuntu Linux 20.04, Nvidia driver CUDA 10.2, Pytorch 1.5 CUDA 10.2

Training always fails at a random epoch with an unspecified launch failure, or unknown error.

JoshuaSv2 on 22 Jun 2020

Encountered the same problem (cashes at random epochs during training) yesterday with LSTM networks on a NVIDIA GTX 1070 and Windows 10.

Solved the problem by updating the drivers to 451.48. Unfortunately I don't know the drivers I had when I was getting the crashes.

dotXem on 2 Jul 2020

Encountered the same problem (cashes at random epochs during training) yesterday with LSTM networks on a NVIDIA GTX 1070 and Windows 10.

Solved the problem by updating the drivers to 451.48. Unfortunately I don't know the drivers I had when I was getting the crashes.

I have a similar problem, but I think I have them since I updated to 451.48.

EDIT: I tried both the April (445.87) and May (446.14) drivers, but ended up with the same results. I don't think the drivers are the problem for me.

wdwit on 5 Jul 2020

Not sure who to contact but im fairly confident i can recreate this issue.

Essentially if I choose the an output method (when token=1 or 2) it raises this error, however this does not occur with the concat method (when token=3)

Optimizer: AdamW
Criterion: CrossEntropyLoss()

Finally the error seems to be raised within my evaluate function at this line
total_acc += pred.eq(label.view_as(pred)).sum().item()

Please let me know if you need anything else.

RNN architecture Code:
`
class L_Rec_RNN(nn.Module):
def __init__(self, input_size, hidden_size, num_classes, hu, n_layers=1, token=3, use_cuda=False):
super(L_Rec_RNN, self).__init__()
self.name = "Language RNN"
self.hu_1 = hu
self.classes = num_classes
self.hidden_size = hidden_size
self.layers = n_layers
self.token = token
self.use_cuda = use_cuda
self.dr = 0.3

    # condition to multiply the input to the first FC depending on the token method that is used to feed it
    if token < 3:
        mult = 1
    else:
        mult = 2

    # [batch_size, 201, 552]
    self.rnn = nn.GRU(input_size, hidden_size, self.layers, batch_first=True)

    self.lin = nn.Sequential(
        nn.Linear(mult * self.hidden_size, self.hu_1)
        , nn.ReLU()
        , nn.Dropout(self.dr)
        , nn.Linear(self.hu_1, self.classes)

    )

def forward(self, x):
    # pre prep
    x = x.squeeze(1)  # for 1d convolutions
    hs = torch.zeros(self.layers, x.size(0), self.hidden_size)

    if self.use_cuda and torch.cuda.is_available():
        hs = hs.cuda()

    # calling the RNN
    x, _ = self.rnn(x, hs)

    # various methods for choosing the output hidden layer
    # Method 1
    if self.token == 1:
        x = x[:, -1, :]

    # Method 2
    elif self.token == 2:
        x = torch.max(x, dim=1)[0]

    # Method 3
    else:
        x = torch.cat([torch.max(x, dim=1)[0], torch.mean(x, dim=1)], dim=1)

    # calling the FC layers
    x = self.lin(x)

    return x`

Evaluate Function Code:
`
def evaluate(model, data, criterion=nn.CrossEntropyLoss(), get_loss=False, use_cuda=False):

total_loss = 0.0
total_acc = 0.0
total_epoch = 0
counter = 0

for spec, label in data:
    if use_cuda and torch.cuda.is_available():
        spec = spec.cuda()
        label = label.cuda()

    out = model(spec)
    loss = criterion(out, label)

    pred = out.max(1, keepdim=True)[1]

    total_acc += pred.eq(label.view_as(pred)).sum().item()
    total_loss += loss.item()
    total_epoch += len(label)
    counter += 1

acc = float(total_acc) / total_epoch
loss = float(total_loss) / counter

if get_loss:
    return acc, loss
else:
    return acc`

lwfielder on 12 Jul 2020

👍1

I don't know if this helps, but I'm getting that error when switching from samples that are 200^3 to samples that are 256^3. Although I have plenty of memory to store them. The program errors out when calculating one of my personal metrics.

I'm using pytorch 1.5 with CUDA 9.2

je-santos on 22 Jul 2020

I have been struggling with this problem for quite some time now, but am unable to resolve it. However, there are some things I noticed about the problem.
1> It happens only when implemented on GPU. There is no problem with the same code when all the operations are do on CPU.
2> The problem is machine specific. It runs perfectly on some other machine (even on GPU) but crashes on mine. This shows that the code is fine, and that the problem lies with the machine/software/match-mismatch.
3> Most people seem to mention that the problem happens when doing a binary classification with an LSTM. This is my case also.

By the way, I reinstalled my Windows, Anaconda, Pytorch, GPU drivers, everything but the problem remains. However, the same code works perfectly on my collaborator's machine. I am using
Python 3.7
Conda 4.8.3
Pytorch 1.5.1
Torchvision 0.6.1
CUDA 10.2.89
NVIDIA 451.67

This is a serious problem, and I request more serious Pytorch researchers to look into it. I am attaching my code for others to take a look.

LSTM Github.zip

Rick0590 on 22 Jul 2020

👍5

Same issue. Using Pytorch 1.5.1 and GTX 1050 ti

Jgoldfeder on 23 Jul 2020

👍2

The same error in neural machine translation(NMT) with OpenNMT-py using pytorch 1.5.1 and cuda 10.2 .
I think this is an error with LSTM cells.
Now, I am working with transformers, and I will report the results.

zhang-jinyi on 24 Jul 2020

Same issue. Using Pytorch 1.2.0 and GTX 2060s cuda 10.0

aimasa on 27 Jul 2020

Same issue with pytorch 1.3.1 on Quadro RTX 8000, and similarly to others training a model with an LSTM layer. Also, trying with different seeds would sometimes crash with "cuDNN error: CUDNN_STATUS_INTERNAL_ERROR".

I tried to fix with set_detect_anomaly(True) and the process just got stuck on the same epoch where it previously crashed and seemed to be stuck in some backend loop - cuda showed 100% utilisation with occasional dips to ~55%.

In the end it seems like I finally managed to get around the issue completely by disabling cudnn with "torch.backends.cudnn.enabled = False", but I'm guessing this might lead to sub-optimal performance and potentially other issues?

oliso on 7 Aug 2020

@ngimel，
from JoshuaSv2 and hendrycks comments， it looks not a windows specific issue.
https://github.com/pytorch/pytorch/issues/39872 is a similar issue.

mszhanyi on 11 Aug 2020

The same error in neural machine translation(NMT) with OpenNMT-py using pytorch 1.5.1 and cuda 10.2 .
I think this is an error with LSTM cells.
Now, I am working with transformers, and I will report the results.

Yes, this error does not occur with the transformer model. It's just an LSTM related error.

zhang-jinyi on 13 Aug 2020

i have the same problem with pytorch 1.6.0
GTX 1060 on windows 10

Traceback (most recent call last):
  File "train.py", line 126, in <module>
    print(' Loss = %s' % epoch_train(
  File "C:\Users\mehrd\Jupyter\SQLNet-master\sqlnet\utils.py", line 148, in epoch_train
    score = model.forward(q_seq, col_seq, col_num, pred_entry,
  File "C:\Users\mehrd\Jupyter\SQLNet-master\sqlnet\model\seq2sql.py", line 123, in forward
    x_emb_var, x_len = self.embed_layer.gen_x_batch(q, col)
  File "C:\Users\mehrd\Jupyter\SQLNet-master\sqlnet\model\modules\word_embedding.py", line 76, in gen_x_batch
    val_inp = val_inp.cuda()
RuntimeError: CUDA error: unspecified launch failure

rafiepour on 15 Aug 2020

I faced the same problem with an LSTM model, but after setting the environment variable CUDA_LAUNCH_BLOCKING=1 before running the script I no longer get the error. This was suggested in this post for debugging purposes only, but like the answer there my code runs fine as well.

Any idea why this is the case?

Working with pytorch 1.5.0 and CUDA 10.1 on Windows 10.

jeroenmollink on 18 Aug 2020

👍1

Hello, I recently faced and solved this issue on my Windows machine.
In my case, this issue was invoked by Windows Timeout Detection and Recovery (TDR), which shuts down CUDA kernels that fail to respond in time.

The fix is as follow:

Run "Registry Editor" as Administrator, navigate to KeyPath : HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers
Change KeyValue : TdrDelay to a higher value. Default is 2 or 8 seconds, and in my case, setting it to 64 seconds does the trick.
Reboot.

This should do the trick on Windows 10. Hope this helps.

PS. setting CUDA_LAUNCH_BLOCKING=1 also solves the issue but comes at a heavy performance penalty.

YinPing-Cho on 21 Aug 2020

👍3

@YinPing-Cho For me, this made it better, but it still did not completely solve the issue. I eventually set it to 10000 seconds.

rothn on 21 Aug 2020

Just FYI, I was also keep failing with RuntimeError: CUDA error: unspecified launch failure when I was trained sound event detection model on P100 on google colab. For my case the solution was simply changing num_workers in dataloader from 2 to 0. I don't know how this error is related with num_workers and it was so hard to debug.

bamps53 on 23 Aug 2020

👍3

Just FYI, I was also keep failing with RuntimeError: CUDA error: unspecified launch failure when I was trained sound event detection model on P100 on google colab. For my case the solution was simply changing num_workers in dataloader from 2 to 0. I don't know how this error is related with num_workers and it was so hard to debug.

This also helped me, but did not completely resolve the issue. Combined with @YinPing-Cho's fix, I was able to completely train an AWD-LSTM model on the second try awhile ago. In my experience though, these just made the issue more rare.

rothn on 24 Aug 2020

Just FYI, I was also keep failing with RuntimeError: CUDA error: unspecified launch failure when I was trained sound event detection model on P100 on google colab. For my case the solution was simply changing num_workers in dataloader from 2 to 0. I don't know how this error is related with num_workers and it was so hard to debug.

This actually reduced the error in my environment, windows 10, CUDA 10.0, pytorch1.4. I don't know why but one of the reasons might be that making num_worker 0 initializes some internal setting every iteration. Some warnings (in my case, deprecated warning of nn.Softmax) show up every iteration when I make num_workers 0.

By the way, before this change, runtimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED in addition to RuntimeError: CUDA error: unspecified launch failure stopped my program randomly after some iterations.

MasakazuYposhimura on 27 Aug 2020

After 30000 iterations, the error finally occurred...
This reduces the error but not the solution for me

MasakazuYposhimura on 27 Aug 2020

Exception has occurred: RuntimeError
cuDNN error: CUDNN_STATUS_MAPPING_ERROR (getCudnnHandle at ..aten\src\ATen\cudnn\Handle.cpp:45)
(no backtrace available)

cudnn = 7.6.5
cudatoolkit = 10.2.89
pytorch = 1.5.1

hadypranoto on 29 Aug 2020

Exception has occurred: RuntimeError
CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc) (gemm at ..aten\src\ATen\cuda\CUDABlas.cpp:165)
(no backtrace available)

hadypranoto on 29 Aug 2020

Same error on windows, training an LSTM on a GTX 2080 TI. Happens with both Pytorch 1.5 and 1.6.
Very annoying as it seems random, and the training is completely broken when it happens.

ReinforcedMan on 2 Sep 2020

Switching from 1.6 to 1.5 and downgrading my Nvidia driver to 431.86 fixed the error for me.

Jgoldfeder on 4 Sep 2020

Same error while training an LSTM with a big batch size on windows, I was getting random crashes after 1 to 20 epochs. Setting torch.backends.cudnn.enabled = False fixed the issue.

Pytorch 1.5.1
Cuda 10.2.89
CuDNN 7.6.5
GTX 1070 - MSI Gaming X - Driver 445.75
Windows 10 Pro 1909 build 18363.1016

lucas-emery on 9 Sep 2020

@lucas-emery, did you try extend the TDR display or disable TDR as https://developer.download.nvidia.com/NsightVisualStudio/2.2/Documentation/UserGuide/HTML/Content/Timeout_Detection_Recovery.htm

mszhanyi on 9 Sep 2020

@mszhanyi i did try extending the TDR to 60 seconds. I was able to run a 13 hour training session after setting the TDR and restarting my pc, but the backprop time was also faster (down from 1 min to 10/15 secs), I guess it was just a coincidence and cudnn chose a different algorithm.
After that I stopped the training to update a function and when I tried to resume I couldn't get past 20 epochs without a crash, sometimes "illegal memory" and sometimes "launch failure" the backprop time went back up to 1 minute. I reverted my changes and tried to train a new model from scratch but it crashed between 1 and 20 epochs with the same errors. After setting torch.backends.cudnn.enabled = False with no code changes and no reboot it stopped crashing and backprop time went down to 20 secs. That training session lasted 12 hours with no errors.
I did two more 4 hour sessions without problems today.

lucas-emery on 9 Sep 2020

@lucas-emery , could you provide a simplified script that I could reproduce it?

mszhanyi on 9 Sep 2020

@mszhanyi I'm afraid it won't be possible, it's a very complex model on a reinforcement learning task. I'll let you know if I find anything else. I'll try to get something reproducible after I finish.
The error started appearing after I incremented my batch size to 1k with an unroll length of 32.

lucas-emery on 9 Sep 2020

I'm getting this issue on my RTX 3080, and I can't even downgrade PyTorch because older versions don't support RTX 3000.

These two fixes worked for me, but both have a performance penalty:

os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
torch.backends.cudnn.enabled = False

serg06 on 30 Oct 2020

👀1

Same issue on 3090+Windows10+CUDA11+PyTorch(Stable&Nightly)

These fixes worked for me,too.

$env:CUDA_LAUNCH_BLOCKING=1 increases the training time by 500%.
torch.backends.cudnn.enabled = False increases the training time by 20%.

moe001 on 4 Nov 2020

We are facing the same issue. Tried on Ubuntu 18.04, Nvidia K80, M60, V100, all with the same pytorch version 1.6.0, cuda 11.

Applying the below fix doesn't help as well.... :(
```
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
torch.backends.cudnn.enabled = False
````

ysx001 on 15 Nov 2020

Pytorch: PyTorch 1.3: random "RuntimeError: CUDA error: unspecified launch failure"

🐛 Bug

Most helpful comment

All 77 comments

🐛 Bug

Related issues