Tfjs: Multiple GPUs with tfjs-node-gpu

Created on 23 Feb 2019 · 10Comments · Source: tensorflow/tfjs

TensorFlow.js version

tfjs-node-gpu

Describe the problem or feature request

Are multiple GPUs supported for the training of a single model?
Something like multi_gpu_model in keras.
Is it possible to take advantage of Nvidia tensor cores with FP16 precision?
Are there plans to expose gpu_options?

support

Source

pavelkukov

Most helpful comment

I've made a test with two GPUs and they are being used.

pavelkukov on 2 Mar 2019

🎉2

All 10 comments

chenqing on 28 Feb 2019

I've made a test with two GPUs and they are being used.

pavelkukov on 2 Mar 2019

🎉2

With multiple GPUs, memory is allocated but there is no performance benefit.
It is the other way around, performance degrades.
Simple benchmark with oversized model:

Total params: 11233402 / Batch size: 512
1 GPU - 72295.879ms/epoch
1 GPU - 74323.471ms/epoch
2 GPUs - 97455.554ms/epoch
2 GPUs - 98588.145ms/epoch

Total params: 44843194 / Batch size: 256
1 GPU - 121129.245ms/epoch
1 GPU - 121760.142ms/epoch
2 GPUs - 158337.767ms/epoch
2 GPUs - 157352.398ms/epoch

_________________________________________________________________
Layer (type)                 Output shape              Param #
=================================================================
conv2d_Conv2D1 (Conv2D)      [null,26,26,4]            40
_________________________________________________________________
.....
_________________________________________________________________
conv2d_Conv2D2 (Conv2D)      [null,24,24,8]            296
_________________________________________________________________
.....
_________________________________________________________________
conv2d_Conv2D5 (Conv2D)      [null,18,18,64]           18496
_________________________________________________________________
.....
_________________________________________________________________
conv2d_Conv2D8 (Conv2D)      [null,6,6,512]            524800
_________________________________________________________________
.....
_________________________________________________________________
dense_Dense1 (Dense)         [null,4096]               33558528
_________________________________________________________________
....
_________________________________________________________________
dense_Dense2 (Dense)         [null,10]                 40970
=================================================================
Total params: 44843194
Trainable params: 44822594
Non-trainable params: 20600
_________________________________________________________________

Looks like there is no benefit of having multiple GPUs....

pavelkukov on 2 Mar 2019

Sun Mar  3 22:36:45 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.93       Driver Version: 410.93       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:04:00.0  On |                  N/A |
| 32%   58C    P2   130W / 250W |  10801MiB / 11176MiB |     30%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:05:00.0 Off |                  N/A |
| 26%   50C    P2    60W / 250W |  10631MiB / 11178MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|

|    0     36327      C   node                                       10227MiB |
|    1     36327      C   node                                       10619MiB |
+-----------------------------------------------------------------------------+

when I use tfjs-node-gpu, in default this looks like occupy both of the GPU (as shown in FIG). and actually this use one or two?

chenqing on 3 Mar 2019

Thanks for sharing @chenqing !
I see the same with two cards.
Memory is allocated and both are used but without actual improvement of training speed.
Probably this depends on training data. I've used MNIST for testing. Maybe distributing the load between the cards is not effective with small images.

@annxingyuan , please can you share an insight into how two cards are used in parallel?

pavelkukov on 4 Mar 2019

I will share my findings and close this.
I've tested with RTX 2080 Ti cards. They are not supported by CUDA 9 and must have version >= 10.
Tensorflow(python) on its turn does not have a release that supports CUDA 10.
The user could compile it by himself. I did it with TF 1.13.1 and CUDA 10. Python works fine.
However, TFJS doesn't work with CUDA 10 and TF 1.13 and node binary is not compiled properly (without any error message, unfortunately).
This means:

Nvidia tensor cores with FP16 precision are not supported
RTX cards are not supported at all, very sad :(

pavelkukov on 6 Mar 2019

😕1

Sorry for reopening, but the question is still same and, I guess, not answered. I have my NN training on single 1080Ti. I'm just about to buy a second 1080Ti to speed up things twice. Will this work? General idea is to get 6xGTX1080Ti setup later.

If not, is there any option to configure tfjs-node-gpu to use a single specified gpu, so it will be possible for 2 GPUs system to run 2 separate processes where each one will train its own different NN?

According to nvidia-smi logs above it looks like tfjs-node-gpu gets all GPUs prepared for training, but only one really works. Isnt that a bug?