Tfjs: Low GPU utilization with tfjs-node-gpu

Created on 25 Jun 2018  路  8Comments  路  Source: tensorflow/tfjs

TensorFlow.js version

  "dependencies": {
    "@tensorflow/tfjs": "^0.11.4",
    "@tensorflow/tfjs-node": "^0.1.5",
    "@tensorflow/tfjs-node-gpu": "^0.1.7",
}

Browser version

N/A. Node v8.9.4. Ubuntu 16.04

Describe the problem or feature request

Using tfjs-node-gpu, I can't seem to get GPU utilization above ~0-3%. I have CUDA 9 and CuDNN 7.1 installed, am importing @tensorflow/tfjs-node-gpu, and am setting the "tensorflow" backend with tf.setBackend('tensorflow'). CPU usage is at 100% on one core, but GPU utilization is practically none. I've tried tfjs-examples/baseball-node (replacing import'@tensorflow/tfjs-node' with import'@tensorflow/tfjs-node-gpu' of course) as well as my own custom LSTM code. Does tfjs-node-gpu actually run processes on the GPU?

Code to reproduce the bug / link to feature request

# assumes CUDA 9, CuDNN 7.1, and latest nvidia drivers are already installed
git clone https://github.com/tensorflow/tfjs-examples
cd tfjs-examples/baseball-node

# replace tfjs-node import with tfjs-node-gpu
sed -i s/tfjs-node/tfjs-node-gpu/ src/server/server.ts

# install dependencies and download data
yarn add @tensorflow/tfjs-node-gpu
yarn && yarn download-data

# start the server
yarn start-server

Now open another terminal and watch GPU usage. Note that if you are running the process on the same GPU as an X window server GPU usage will likely be greater than 3% because of that process. I've tested this on a dedicated GPU running no other processes using the CUDA_VISIBLE_DEVICES env var.

# monitor GPU utilization
watch -n 0.1 nvidia-smi
node.js

Most helpful comment

Gotcha. Thanks for that clarification. I've revisited the char-rnn tfjs-node-gpu example I was telling you about and it looks like it is indeed running on the GPU as memory is allocated, but GPU utilization is ~1%. If I'm understanding you correctly this is because tfjs-node-gpu is using TF Eager mode. So I should expect the same type of model to run ~1 GPU utilization if it were written in Python using TF Eager mode as well, correct?

Does tfjs-node-gpu intend to add support for graph-based execution at some point in the near future? Unless I'm missing something, this "Eager mode only" behavior creates some significance performance hurdles, no? In general, how does tfjs-node-gpu compare in performance to similar implementations in Keras?

I ask because I'm writing some documentation for my team and am beginning to consider a javascript-first approach to common high-level ML tasks. A year ago that would have seemed like a crazy idea, but with tfjs, maybe not so. Basically I'm curious if tfjs-node-gpu will ever be comparable in performance to Keras and Python Tensorflow?

All 8 comments

Hi Brannon,

Apologies for the delay - I was out on holiday. So for some neural networks, GPU can actually be slower than regular CPU usage. This happens because there is a cost to copy tensor data from local storage over to GPU memory. The baseball network is a simple 4-layer network of nothing more than relus and a sigmoid at the end. These type of networks are slower on GPU because of all the copying to GPU memory.

If you want to take advantage of GPU, using a network that has some level of pooling and/or convolutions. For example, in the tfjs-examples repo, we have a MNIST example that runs entirely on Node:

https://github.com/tensorflow/tfjs-examples/tree/master/mnist-node

This runs super fast on the GPU since convolutions are nice and optimized for CUDA operations. Trying running that example with your watch of nvidia-sim tool.

Ah, I see. Running the mnist-node example on a designated GTX 1060 with no other GPU processes does generate ~20% GPU utilization. What mechanism is used to automagically determine whether a model graph will be run on the GPU or stay on the CPU? As I mentioned in the OP I also tried this on my own custom RNN (browser code here). There I would have expected the GPU to be used, as it is with a nearly identical model in Keras.

If my memory serves me correctly Python Tensorflow gives the option to specify CPU/GPU device specifically. Does no such functionality exist in tfjs-node?

Python + Keras will use a graph-based execution which can run faster on GPU. We use the Eager-style API from TensorFlow which does not actually have a graph of placeholders - it allocates new Tensors for op-output. This is probably why you see Keras utilizing the GPU more.

We do not have a device API yet - it's something we're considering down the road once we introduce TPU support. For now, we default to Tensor placement using the default settings in TensorFlow eager (i.e. copy all non-int32 tensors).

Gotcha. Thanks for that clarification. I've revisited the char-rnn tfjs-node-gpu example I was telling you about and it looks like it is indeed running on the GPU as memory is allocated, but GPU utilization is ~1%. If I'm understanding you correctly this is because tfjs-node-gpu is using TF Eager mode. So I should expect the same type of model to run ~1 GPU utilization if it were written in Python using TF Eager mode as well, correct?

Does tfjs-node-gpu intend to add support for graph-based execution at some point in the near future? Unless I'm missing something, this "Eager mode only" behavior creates some significance performance hurdles, no? In general, how does tfjs-node-gpu compare in performance to similar implementations in Keras?

I ask because I'm writing some documentation for my team and am beginning to consider a javascript-first approach to common high-level ML tasks. A year ago that would have seemed like a crazy idea, but with tfjs, maybe not so. Basically I'm curious if tfjs-node-gpu will ever be comparable in performance to Keras and Python Tensorflow?

@nkreeger, any thoughts on a few of these last questions?

@brannondorsey My opinion on the last question: tfjs won't be as fast as tfpy.

@nkreeger something wrong with training using TFJS Node GPU. It's still training on CPU. I'm using 2080ti. This is MNIST example from tfjs repository

(tf) abhimanyuaryan@hackintosh:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105

Selection_012

I'm seeing epoch cycles take over double the time on my GPU when compared to my CPU. Is there any way to improve this?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

nsthorat picture nsthorat  路  3Comments

kylemcdonald picture kylemcdonald  路  3Comments

rlexa picture rlexa  路  3Comments

Josef-Haupt picture Josef-Haupt  路  3Comments

RELNO picture RELNO  路  3Comments