Keras: GPUs returned by tensorflow-nightly-gpu not matched by multi-gpu-model

Created on 21 Oct 2017  ·  14Comments  ·  Source: keras-team/keras

Hi , I'm using tensorflow-nightly-gpu and keras built from github master as of today, on nVidia cards I receive following exception when trying to use multi-gpu-model:

  File "C:\ProgramData\Anaconda3\lib\site-packages\keras-2.0.8-py3.6.egg\keras\utils\training_utils.py", line 100, in multi_gpu_model
ValueError: To call `multi_gpu_model` with `gpus=4`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1', '/gpu:2', '/gpu:3']. However this machine only has: ['/device:CPU:0', '/device:GPU:0', '/device:GPU:1', '/device:GPU:2', '/device:GPU:3']. Try reducing `gpus`.

Here is tensorflow output:

2017-10-21 17:31:06.619806: I C:\tf_jenkins\home\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1121] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:05:00.0, compute capability: 6.1)
2017-10-21 17:31:06.619899: I C:\tf_jenkins\home\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1121] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: GeForce GTX 1060 6GB, pci bus id: 0000:06:00.0, compute capability: 6.1)
2017-10-21 17:31:06.621181: I C:\tf_jenkins\home\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1121] Creating TensorFlow device (/device:GPU:2) -> (device: 2, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2017-10-21 17:31:06.621580: I C:\tf_jenkins\home\workspace\tf-nightly-windows\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1121] Creating TensorFlow device (/device:GPU:3) -> (device: 3, name: GeForce GTX 1060 6GB, pci bus id: 0000:04:00.0, compute capability: 6.1)
T

It seems issue that /gpu:N is hardcoded for matching:
https://github.com/fchollet/keras/blob/62d097c4ff6fa694a4dbc670e9c7eb9e2bc27c74/keras/utils/training_utils.py#L90

However, tensorflow recognizes /gpu:N, e.g. following works fine:

    with tf.device('/gpu:' + str(g)):

Most helpful comment

I can confirm that I have this issue when installing the master version.

All 14 comments

With Keras version 2.1.4, I am still having this issue.

ValueError: To callmulti_gpu_modelwithgpus=3, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1', '/gpu:2']. However this machine only has: ['/cpu:0']. Try reducinggpus.

I can confirm that I have this issue when installing the master version.

Same problem for me with the last keras-2.1.6, any news on that?

Has anyone resolved issue? I have keras 2.1.6 also and am getting same issue
.

I think I see root cause. e.g. I am running "Deep Learning AMI (Ubuntu) Version 9.0 - ami-33f9e753". When I start Tensorflow, I see that the gpu's are being ignored because of the following:

"Ignoring visible gpu device (device: 3, name: GRID K520, pci bus id: 0000:00:06.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5."

I believe this might be due to improper build of CUDA and cudnn in the AMI. Several issues have been opened on this error. I think #4002 is the most likely cause (env flag changed).

Can anyone confirm?

@caugusta @ahendry2688 same issue here , I am using cudnn-7 with cuda 9.1

Hi,
Any update on this?
I have the same problem with cudnn-7, cuda 9.1, keras 2.2.2

Same problems here. Using GCP to train keras muli-gpu models. tf.device('/gpu:0') works fine but keras is not working.

I am having the same problem. Two weeks ago with keras 2.0.9, cudnn-7, cuda 9.1 keras multi-gpu mode worked well. Today it constantly shows me

ValueError: To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However thi s machine only has: ['/cpu:0', '/xla_cpu:0']. Try reducing `gpus`.

no matter which keras version I used. BTW, I tried from 2.0.9 to 2.2.4.

I am having the same problem. Two weeks ago with keras 2.0.9, cudnn-7, cuda 9.1 keras multi-gpu mode worked well. Today it constantly shows me

ValueError: To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However thi s machine only has: ['/cpu:0', '/xla_cpu:0']. Try reducing `gpus`.

no matter which keras version I used. BTW, I tried from 2.0.9 to 2.2.4.

hi,have you solved this problem? I am having the same problem with you.

job exception: To call multi_gpu_model with gpus=2, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/xla_cpu:0']. Try reducing gpus.

I am having the same problem. Two weeks ago with keras 2.0.9, cudnn-7, cuda 9.1 keras multi-gpu mode worked well. Today it constantly shows me
ValueError: To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However thi s machine only has: ['/cpu:0', '/xla_cpu:0']. Try reducing `gpus`.
no matter which keras version I used. BTW, I tried from 2.0.9 to 2.2.4.

hi,have you solved this problem? I am having the same problem with you.

job exception: To call multi_gpu_model with gpus=2, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/xla_cpu:0']. Try reducing gpus.

No, I just use normal training mode and it works well. I mean the GCP may allocate the GPU by themselves. If you change the configuration, using multi-GPU takes halftime in training when using single GPU. So I think it's not a problem maybe but I am not very sure. But it's better to use multi-train-model in Keras anyway.

I am having the same problem. Two weeks ago with keras 2.0.9, cudnn-7, cuda 9.1 keras multi-gpu mode worked well. Today it constantly shows me
ValueError: To call `multi_gpu_model` with `gpus=2`, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However thi s machine only has: ['/cpu:0', '/xla_cpu:0']. Try reducing `gpus`.
no matter which keras version I used. BTW, I tried from 2.0.9 to 2.2.4.

hi,have you solved this problem? I am having the same problem with you.
job exception: To call multi_gpu_model with gpus=2, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/xla_cpu:0']. Try reducing gpus.

No, I just use normal training mode and it works well. I mean the GCP may allocate the GPU by themselves. If you change the configuration, using multi-GPU takes halftime in training when using single GPU. So I think it's not a problem maybe but I am not very sure. But it's better to use multi-train-model in Keras anyway.

I have solved this problem now,I think maybe the main reason is the version of the python.My python version is 2.7 before,and I changed the version into 3.6.3,now it works well.

Thanks a lot. I use python 3.5 maybe I can try python 3.6 later.

发送自 Windows 10 版邮件https://go.microsoft.com/fwlink/?LinkId=550986应用


发件人: HJennyfer notifications@github.com
发送时间: Wednesday, December 5, 2018 12:38:35 PM
收件人: keras-team/keras
抄送: ZHANG ZHAOXIANG; Comment
主题: Re: [keras-team/keras] GPUs returned by tensorflow-nightly-gpu not matched by multi-gpu-model (#8213)

I am having the same problem. Two weeks ago with keras 2.0.9, cudnn-7, cuda 9.1 keras multi-gpu mode worked well. Today it constantly shows me
ValueError: To call multi_gpu_model with gpus=2, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However thi s machine only has: ['/cpu:0', '/xla_cpu:0']. Try reducing gpus.
no matter which keras version I used. BTW, I tried from 2.0.9 to 2.2.4.

hi,have you solved this problem? I am having the same problem with you.
job exception: To call multi_gpu_model with gpus=2, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1']. However this machine only has: ['/cpu:0', '/xla_cpu:0']. Try reducing gpus.

No, I just use normal training mode and it works well. I mean the GCP may allocate the GPU by themselves. If you change the configuration, using multi-GPU takes halftime in training when using single GPU. So I think it's not a problem maybe but I am not very sure. But it's better to use multi-train-model in Keras anyway.

I have solved this problem now,I think maybe the main reason is the version of the python.My python version is 2.7 before,and I changed the version into 3.6.3,now it works well.


You are receiving this because you commented.
Reply to this email directly, view it on GitHubhttps://github.com/keras-team/keras/issues/8213#issuecomment-444347240, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AgLy5fF7Lm4xplajg_eaLSeFQcqemEa6ks5u1z-7gaJpZM4QBlhn.

Any update on this issue?
I got the same problem with cudnn-7, cuda 10.1, keras 2.1.6

Was this page helpful?
0 / 5 - 0 ratings