Hi, I have updated nvidia-cuda 9.0 container, which now uses CUDNN_VERSION 7.1.1.5.
Now when I try to run keras from keras Docker I have an error:
E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 7101 (compatibility version 7100) but source was compiled with 7004 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
Tensorflow version: 1.6.0
Keras version: 2.1.4
Which source was compiled with 7004 version? How can I recompile it?
It's a problem about versions. You can solve this by installing cuDNN v7.0.x.
@zhaoyang10 Yes, I understand that this problem because of cudnn version.
What I don't know is how to refer to nvidia/cuda Docker image with cuDNN v7.0.x.
Keras Dockerfile is built on of top Nvidia container:
ARG cuda_version=9.0
ARG cudnn_version=7
FROM nvidia/cuda:${cuda_version}-cudnn${cudnn_version}-devel
But nvidia/cuda:$9.0-cudnn7-devel
container now includes cuDNN v7.1.1.5 version and I don't now how to request the earlier one.
I actually have the same problem as you @taneta. Have you found a solution, yet?
@taneta I ran into the same problem running nvidia-docker after upgrading to CUDA 9.1 and CUDNN 7.1. Here is how I was able to fix the issue:
As @zhaoyang10 mentioned, you need to downgrade CUDNN.
First, you can search for available versions using apt-cache madison libcudnn7
. Pick an appropriate version (7.0.x) and then run the following command to downgrade, replacing the CUDNN version with your chosen one (I used an official NVIDIA docker file as a reference):
apt-get update && apt-get install -y --allow-downgrades --no-install-recommends \
libcudnn7=7.0.5.15-1+cuda9.1 \
libcudnn7-dev=7.0.5.15-1+cuda9.1 && \
rm -rf /var/lib/apt/lists/*
(Then run a quick apt-get update
to refresh the lists)
Hope that helps!
@LucidBlue Thanks, that solved the problem!
I added couple lines to the Dockerfile and had to switch to root user to make it work:
# fix cudnn version
USER root
RUN apt-get update && apt-get install -y --allow-downgrades --no-install-recommend$
libcudnn7=7.0.5.15-1+cuda9.1 \
libcudnn7-dev=7.0.5.15-1+cuda9.1 && \
rm -rf /var/lib/apt/lists/*
RUN apt-get update
@LucidBlue Actually, it was to early to celebrate.
I can load a model now, but have this error when I run model.predict()
:
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E tensorflow/stream_executor/cuda/cuda_dnn.cc:393] possibly insufficient driver version: 384.111.0
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
Haven't you observed the same problem?
UPD: Sorry, that was easily solved by replacing cuda9.1 with cuda9.0 in the code above.
hi, anybody knows if rebuilding keras from latest source code can fix this issue? or do we have a timeline to support this incompability? Thanks!
@taneta
Thanks! that fixed my problem using the downgrade (cudnn 7.1 to 7.0) you gave, here's just a paste of it with the typos $=s and 9.1=9.0 fix
now my code in docker works with tf 1.6 cuda 9.0 and cudnn 7.0
============================================
USER root
RUN apt-get update && apt-get install -y --allow-downgrades --no-install-recommends \
libcudnn7=7.0.5.15-1+cuda9.0 libcudnn7-dev=7.0.5.15-1+cuda9.0 && rm -rf /var/lib/apt/lists/*
RUN apt-get update
I tried adding the changes to the Dockerfile, as mentioned above, but for some reason when I inspect the image after building it, it still is with the CUDNN version 7.1.1.5. If I try to run Keras with the image, the issue persists. Is there anything wrong with my Dockerfile? I'm attaching it (as TXT)
Dockerfile.txt
to the post.
@alberto-oliveira
I am a newbie at docker so I took the dockerfile from probably a similar source then added the paste at the VERY end, so it first installs the wrong version 7.1, then downgrades it to 7.0
my dockerfile
Dockerfile.txt
+1
My env:windows-10 64 bit
+python 3.5
+ visual studio 2017 community
+ cuda toolkit 9.0
+cuDNN v7.1.1 (Feb 28, 2018), for CUDA 9.0
+tensorflow 1.6-gpu
My failure is
2018-03-13 20:44:03.100572: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\stream_executor\cuda\cuda_dnn.cc:378] Loaded runtime CuDNN library: 7101 (compatibility version 7100) but source was compiled with 7003 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
replace Download cuDNN v7.1.1 (Feb 28, 2018), for CUDA 9.0
by cuDNN v7.0.5 (Dec 5, 2017), for CUDA 9.0
is ok.
So new is windows-10 64 bit
+python 3.5
+ visual studio 2017 community
+ cuda toolkit 9.0
+cuDNN v7.0.5 (Dec 5, 2017), for CUDA 9.0
+tensorflow 1.6-gpu
so i have the same error and i ftry to use this :apt-get install -y --allow-downgrades --no-install-recommends
libcudnn7=7.0.5.15-1+cuda9.0 libcudnn7-dev=7.0.5.15-1+cuda9.0 && rm -rf /var/lib/apt/lists/*
But it doesn't work . Error said: can't find lib 7.0.5.15-1.
So i realize that error happened just because of my libcudnn version . So i remove the early version and install libcudnn7.0.5.15-1 and it's work.
i remove the old version by :
rm /usr/local/cuda/include/cudnn.h
rm /usr/local/cuda/lib64/libcudnn* ( cuda maybe rename like cuda9.0 or something else but it's in /usr/local/...)
so then i install that again with different version : http://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html
So here is my solution. Hope it can help :D
# Downgrade CuDNN for compatibility with Tensforflow 1.5
RUN apt-get update && apt-get install -y --allow-downgrades --no-install-recommends \
libcudnn7=7.0.4.31-1+cuda9.0 \
libcudnn7-dev=7.0.4.31-1+cuda9.0 && \
rm -rf /var/lib/apt/lists/*
I doubt that this will solve most cases, but I installed cuDNN 7.2(.1) via .deb files, reinstalled tensorflow-gpu, and it worked. After all, it wasn't a version issue the driver (I had 384.xx which was correct), but one with cuDNN.
Try this
python=
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"]="1"
I am facing the same issue - https://stackoverflow.com/questions/52590880/mask-rcnn-resource-exhausted-oom-on-my-own-dataset
@gsiisg solution worked charming for me for latest official tf gpu docker image.FROM nvidia/cuda:9.0-base-ubuntu16.04
Nvidia driver 390. Ubuntu 18
@LucidBlue Actually, it was to early to celebrate.
I can load a model now, but have this error when I run
model.predict()
:
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E tensorflow/stream_executor/cuda/cuda_dnn.cc:393] possibly insufficient driver version: 384.111.0
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
Haven't you observed the same problem?
UPD: Sorry, that was easily solved by replacing cuda9.1 with cuda9.0 in the code above.
Hi, do you have any solution for this?
tensorflow-gpu==1.5.0 to tensorflow-gpu==1.9.0 solved in my case
tensorflow-gpu==1.5.0 to tensorflow-gpu==1.9.0 solved in my case
Thanks very much .
when I change cudnn , it still happen same issue.
when I change version of tensorflow 1.5 to 1.9 ,It solved.
thanks
still I'm comfused that
I need a lower version cudnn . why update TF 1.5 t 1.9 can solve problem . I thought 1.5 should to 1.3
Most helpful comment
My env:
windows-10 64 bit
+python 3.5
+visual studio 2017 community
+cuda toolkit 9.0
+cuDNN v7.1.1 (Feb 28, 2018), for CUDA 9.0
+tensorflow 1.6-gpu
My failure is
replace
Download cuDNN v7.1.1 (Feb 28, 2018), for CUDA 9.0
bycuDNN v7.0.5 (Dec 5, 2017), for CUDA 9.0
is ok.So new is
windows-10 64 bit
+python 3.5
+visual studio 2017 community
+cuda toolkit 9.0
+cuDNN v7.0.5 (Dec 5, 2017), for CUDA 9.0
+tensorflow 1.6-gpu