I got the following error when I run python cifar10_train.py
.
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:390] Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5105 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\kernels\conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
Operating System: Windows 10
CUDA: Cuda compilation tools, release 8.0, V8.0.44
cuDNN: 5.1
tensorflow: 1.0.0
The output from python -c "import tensorflow; print(tensorflow.__version__)"
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:135] successfully opened CUDA library cublas64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:135] successfully opened CUDA library cudnn64_5.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:135] successfully opened CUDA library cufft64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:135] successfully opened CUDA library nvcuda.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:135] successfully opened CUDA library curand64_80.dll locally
1.0.0
I have upgrade cudnn from 5.0 to 5.1. But it didn't work.
@secsilm The error indicates that the cuDNN you've loaded is 5.0, not 5.1. Perhaps the following documentation will help:
https://www.tensorflow.org/install/install_windows#requirements_to_run_tensorflow_with_gpu_support
cuDNN v5.1. For details, see NVIDIA's documentation. Note that cuDNN is typically installed in a
different location from the other CUDA DLLs. Ensure that you add the directory where you installed the
cuDNN DLL to your %PATH% environment variable.
@tatatodd Yes, I have reset my cuDNN environment variable and problem solved.
@tatatodd how did u reset ur cuDNN environment variable ? I have the same problem
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
loading datasets
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX 970M
major: 5 minor: 2 memoryClockRate (GHz) 1.038
pciBusID 0000:01:00.0
Total memory: 3.00GiB
Free memory: 2.64GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 970M, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:346] Loaded cudnn library: 5110 but source was compiled against 4007. If using a binary install, upgrade your cudnn library to match. If building from sources, make sure the library loaded matches the version you specified during compile configuration.
F tensorflow/core/kernels/conv_ops.cc:459] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
Aborted (core dumped)
@ehfo0 You can just replace the old files(5.0) with the new files(5.1), if you have set the cudnn environment variable in PATH
.
I had this problem too, but realized that the error is thrown only if I try to use GPU in the second instance of python. It seems that only one instance of python can to claim the GPU resource. Weird. If you close the python session that used the GPU resources, then it is freed up and another session can use it. This is the behavior that I experienced in PyCharm and Python 3.6 via command prompt on Windows 10 x64. Anyone has any further insight or workaround?
I had the same problem with @omelnikov . The second python instance to use the same GPU will give the following error:
F tensorflow/core/kernels/conv_ops.cc:667] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWino
gradNonfusedAlgo
Aborted (core dumped)
System specs:
Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-87-generic x86_64)
python 3.6.3
tensorflow-gpu 1.4.1
cuda 8.0
cudnn 6
Same problem as provided by @HaoshengZou
I had the same program.But after i close chrome and VMware.(free more memory).
It work!!!
@UesugiErii Very Special Thanks to your thorough knowledge.
@HaoshengZou , I had same Problem, but I don't know how can I fix this issue.. any one please give suggestion for this issue..
When I use the keras==1.2.0, I have got the same problem. Fortunately, the issue was solved after I upgrade the tensorflow from 1.2.0 to 1.3.0 .
The above workarounds work in almost all cases. However, for me, the problem persisted in spite of updating the drivers restarting the machine. I solved it by explicitly sourcing the .bashrc file.
source ~/.bashrc
Try this
python=
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"]="1"
I change the version of tensorflow-gpu from 1.4 to 1.2, and it works well.
conda install tensorflow-gpu=1.2
feedliu tanks it worked for me
Try this
import os os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # see issue #152 os.environ["CUDA_VISIBLE_DEVICES"]="1"
This code saved me , many thanks
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"]="1"
Since I have only one GPU device on my machine, this setting turned out to be assigning the model to run on CPU. That is why my code was working. But still the actual issue was not solved.
@cramraj8 Try the code below:
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"]="0"
Thanks @tony2037 , just add the 3 lines the code works.
The above solutions didn't work for me, any other approach fellas?
Try this
import os os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # see issue #152 os.environ["CUDA_VISIBLE_DEVICES"]="1"
hello , I use this method, but I find it trains not with GPU, instead of CPU. 打开任务管理器是感觉是CPU爆表啊
I change the version of tensorflow-gpu from 1.4 to 1.2, and it works well.
conda install tensorflow-gpu=1.2
Thx, very useful! tensorflow-gpu version has problems, you should check your own versions try again and again, uninstall and install..... tensorflow-gpu找到对应的版本号然后卸载再重装
Most helpful comment
I had the same problem with @omelnikov . The second python instance to use the same GPU will give the following error:
F tensorflow/core/kernels/conv_ops.cc:667] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWino(), &algorithms)
gradNonfusedAlgo
Aborted (core dumped)
System specs:
Ubuntu 16.04.3 LTS (GNU/Linux 4.4.0-87-generic x86_64)
python 3.6.3
tensorflow-gpu 1.4.1
cuda 8.0
cudnn 6