With Tensorflow 1.12 and multi_gpu_model the number of gpus needs to be specified explicitly. Otherwise one gets an error:
Consider the following minimal example:
from keras import Model, Input
from keras.layers import Dense
from keras.utils import multi_gpu_model
import os
import tensorflow as tf
os.environ['CUDA_VISIBLE_DEVICES'] = "0,1" # 2 gpus enabled
# dummy model
x = Input(shape=(4,))
layer = Dense(2, activation='relu')(x)
y = Dense(1)(layer)
with tf.device('/cpu:0'):
model = Model(inputs=x, outputs=y)
parallel_model = multi_gpu_model(model)
one gets the following error:
Traceback (most recent call last):
File "/home/darte/dereverb/todelete.py", line 16, in
parallel_model = multi_gpu_model(model)
File "/home/darte/.local/lib/python3.5/site-packages/keras/utils/multi_gpu_utils.py", line 181, in multi_gpu_model
available_devices))
ValueError: To callmulti_gpu_modelwithgpus=3, we expect the following devices to be available: ['/cpu:0', '/gpu:0', '/gpu:1', '/gpu:2']. However this machine only has: ['/cpu:0', '/xla_gpu:0', '/xla_cpu:0', '/gpu:0', '/gpu:1']. Try reducinggpus.
Replacing parallel_model = multi_gpu_model(model) by parallel_model = multi_gpu_model(model, gpus=2) then the model works fine.
[X] Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/keras-team/keras.git --upgrade --no-deps
[X] Check that your version of TensorFlow is up-to-date. The installation instructions can be found here.
[X] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).
Thanks, @darteaga -- can you try the same using tf.keras? Do the number of GPUs get set correctly?
@karmel I tried, and with tf.keras the number of gpus is a compulsory argument, not optional. Namely, if I try the example above replacing keras by tf.keras I get:
TypeError: multi_gpu_model() missing 1 required positional argument: 'gpus'
Correct-- the expectation is that you are explicitly requesting GPUs, and the number will get checked against the available set. If you don't know ahead of time how many GPUs you have/want, you can use tf.keras.backend.get_session().list_devices() to check available devices.
It looks like Keras only sees one of the GPUs.
Make sure that all GPUs are accessible.you can use device_lib with TensorFlow.
You can check all device list using following code:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
It seems that xla_gpu is ignored https://github.com/keras-team/keras/pull/9226#issuecomment-495415569
@wendingp To correct my previous statement: exact 1 xla_cpu and 1 xla_gpu are visible to Tensorflow users when the engine is compiled with XLA enabled, and this number is not related to whether there are multiple physical GPUs equipped or not.
Multiple physical GPU devices to use difference XLA device simultaneously is out of the support by Tensorflow at the moment. So training with multiple traditional GPUs is still the only choice.
Solved with (my mistake GPU:2 old code before I installed the third board :-( )
config = ConfigProto()
config = tf.ConfigProto( device_count = {'GPU': 3 , 'CPU': 12} )
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
keras.backend.set_session( session)
The error was (xla_gpu is ignored) Keras see only 2 GPU instead of 3 GPU
at line
model_gpu = multi_gpu_model( model, gpus=3)
error is
ValueError: To call `multi_gpu_model` with `gpus=3`,
we expect the following devices to be available:
['/cpu:0', '/gpu:0', '/gpu:1', '/gpu:2'].
However this machine only has:
['/cpu:0', '/cpu:1', '/cpu:2', '/cpu:3', '/cpu:4', '/cpu:5', '/cpu:6', '/cpu:7', '/cpu:8', '/cpu:9', '/cpu:10', '/cpu:11',
'/xla_gpu:0', '/xla_gpu:1', '/xla_gpu:2',
'/xla_cpu:0',
'/gpu:0', '/gpu:1']
Try reducing `gpus`.
print( "devices are = ", tf.keras.backend.get_session().list_devices())
devices are = [
_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 268435456, 3450774613774870734),
_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:1, CPU, 268435456, 10924024491853209621),
_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:2, CPU, 268435456, 11077961182932239483),
_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:3, CPU, 268435456, 3361844039556185648),
_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:4, CPU, 268435456, 10891319530738938282),
_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:5, CPU, 268435456, 9919963760930538434),
_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:6, CPU, 268435456, 12291411013128890395),
_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:7, CPU, 268435456, 17874787863665771808),
_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:8, CPU, 268435456, 8574429556929948786),
_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:9, CPU, 268435456, 13187484019828111478),
_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:10, CPU, 268435456, 18268447936190623343),
_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:11, CPU, 268435456, 14930379752775399022),
_DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 3048284492670690888),
_DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:1, XLA_GPU, 17179869184, 6488159888065492359),
_DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_GPU:2, XLA_GPU, 17179869184, 15466543791014750049),
_DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 12709901572195720999),
_DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:0, GPU, 7826230477, 18318922565637584313),
_DeviceAttributes(/job:localhost/replica:0/task:0/device:GPU:1, GPU, 7842168832, 10861403049335112838)]
nvidia-smi reports 3 GPU
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.34 Driver Version: 430.34 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 Off | 00000000:01:00.0 Off | N/A |
| 41% 36C P2 25W / 225W | 129MiB / 7982MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 2060 Off | 00000000:04:00.0 Off | N/A |
| 4% 46C P2 30W / 170W | 101MiB / 5934MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 2070 Off | 00000000:08:00.0 Off | N/A |
| 0% 39C P2 44W / 175W | 113MiB / 7982MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 11283 C python3 113MiB |
| 1 11283 C python3 85MiB |
| 2 11283 C python3 97MiB |
+-----------------------------------------------------------------------------+
read the solution on my previous message
now I am going to give you a second step
Keras multi gpu may fail to allocate the memory of the GPU boards
multi_gpu_model( model, gpus=3)
The allocation requests may come in an order that goes to allocating amounts on all GPU and in the end the big request may come and get an CUDA OUT of MEMORY
In this case I deactivated any call of multi gpu model and find the memory allocation for each call and reallocate manually all the Keras calls with
with tf.device("/gpu:1"): #or 0 or 2 ...
until I found manually the right allocation scheme (which Keras call to go on which GPU)
And finally I am happy - see below, the final manual memory allocation of Keras super complex GAN
There is improbable that a system would automatically avoid CUDA OUT of Memory without another AI that should try options like I manually did (send a lot of small calls to a 6 GB GPU and keep my 8 GB free for the bigger sharks)
Until Keras implement Nvidia Unified Memory Scheme keep reading this post (there is no chuncking the call of memory_allocation on 2 GPU - you need to manually configure the contiguous space (inside one GPU memory) needed by your tensors ... so start measure the memory allocation of each of your Keras calls)
step 3 - I tried to improved my manual solution but one Keras complex command do not respond to
with tf.device("/gpu:2"):
and it remains on gpu:0 - for this case I will switch the GPU boards in the hardware (at my next upgrade GPU:0 will not be the best board :-( in order to workaround Keras )
(I did not test the swap options - if exists ...)
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.34 Driver Version: 430.34 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 Off | 00000000:01:00.0 Off | N/A |
| 41% 40C P2 25W / 225W | 4711MiB / 7982MiB | 19% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 2060 Off | 00000000:04:00.0 Off | N/A |
| 19% 54C P2 50W / 170W | 4303MiB / 5934MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 2070 Off | 00000000:08:00.0 Off | N/A |
| 0% 45C P2 59W / 175W | 7697MiB / 7982MiB | 79% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 18367 C python3 4695MiB |
| 1 18367 C python3 4287MiB |
| 2 18367 C python3 7681MiB |
+-----------------------------------------------------------------------------+
@ghostplant so you mean I need to install tensorflow-gpu without XLA to use multiple physical GPUs? How to do it then?
@wendingp I think this is not related to inferring the number of GPUs because it is still wrong even if you explicitly set the specific GPUs number in a traditional way. The problem is mainly caused by undetermined GPU device naming between xla_gpu and gpu, and this is supposed not to happen if tensorflow is built without XLA enabled.
If you use the latest keras, the auto-inferred gpu number should be 2 and it should work correctly but not using the 3rd GPU because there is no /gpu:2 found.
The device naming of tensorflow is changing by versions so it is also possible to have such combination for a 3-GPU host:
['/xla_gpu:0', '/xla_cpu:0', '/gpu:0', '/gpu:1', '/gpu:2']
where /xla_gpu:1 and /xla_gpu:2 don't exist.
So,
the simplest is to try a Tensorflow without XLA support.
or we need to wait for the naming of Tensorflow for GPU device in all cases to be standard and fixed,
or a much complex detection logic for Tensorflow device naming in all cases should be added to the implementation of keras multi_gpu_model.
@wendingp So where do you install the tensorflow package, via pip install tensorflow-gpu==1.12.0 ?
@ghostplant previously I just used pip install tensorflow-gpu
@wendingp Yeah, the tensorflow prebuilt package seems to enable XLA option by default since 1.13.x. If you fallback to 1.12.0 which is based on cuda-9.0, XLA is not enabled, but the CUDA driver might not match so it will be a little annoying to change driver environment.
It is weird that you actually have 3 physical GPUs, but can only see /device:XLA_GPU:[0, 1, 2] and /device:GPU:[0, 1] devices (no /device:GPU:2 found)?
This is my output of tensorflow devices on multiple GPU hosts:
['/device:CPU:0', '/device:XLA_GPU:0', '/device:XLA_GPU:1', '/device:XLA_GPU:2', '/device:XLA_GPU:3', '/device:XLA_GPU:4', '/device:XLA_GPU:5', '/device:XLA_GPU:6', '/device:XLA_GPU:7', '/device:XLA_CPU:0', '/device:GPU:0', '/device:GPU:1', '/device:GPU:2', '/device:GPU:3', '/device:GPU:4', '/device:GPU:5', '/device:GPU:6', '/device:GPU:7']
So are you really NOT able to find /device:GPU:2 in your environment?
Most helpful comment
It seems that xla_gpu is ignored https://github.com/keras-team/keras/pull/9226#issuecomment-495415569