"dependencies": {
"@tensorflow/tfjs": "^0.11.4",
"@tensorflow/tfjs-node": "^0.1.5",
"@tensorflow/tfjs-node-gpu": "^0.1.7",
}
N/A. Node v8.9.4. Ubuntu 16.04
Using tfjs-node-gpu, I can't seem to get GPU utilization above ~0-3%. I have CUDA 9 and CuDNN 7.1 installed, am importing @tensorflow/tfjs-node-gpu, and am setting the "tensorflow" backend with tf.setBackend('tensorflow'). CPU usage is at 100% on one core, but GPU utilization is practically none. I've tried tfjs-examples/baseball-node (replacing import'@tensorflow/tfjs-node' with import'@tensorflow/tfjs-node-gpu' of course) as well as my own custom LSTM code. Does tfjs-node-gpu actually run processes on the GPU?
# assumes CUDA 9, CuDNN 7.1, and latest nvidia drivers are already installed
git clone https://github.com/tensorflow/tfjs-examples
cd tfjs-examples/baseball-node
# replace tfjs-node import with tfjs-node-gpu
sed -i s/tfjs-node/tfjs-node-gpu/ src/server/server.ts
# install dependencies and download data
yarn add @tensorflow/tfjs-node-gpu
yarn && yarn download-data
# start the server
yarn start-server
Now open another terminal and watch GPU usage. Note that if you are running the process on the same GPU as an X window server GPU usage will likely be greater than 3% because of that process. I've tested this on a dedicated GPU running no other processes using the CUDA_VISIBLE_DEVICES env var.
# monitor GPU utilization
watch -n 0.1 nvidia-smi
Hi Brannon,
Apologies for the delay - I was out on holiday. So for some neural networks, GPU can actually be slower than regular CPU usage. This happens because there is a cost to copy tensor data from local storage over to GPU memory. The baseball network is a simple 4-layer network of nothing more than relus and a sigmoid at the end. These type of networks are slower on GPU because of all the copying to GPU memory.
If you want to take advantage of GPU, using a network that has some level of pooling and/or convolutions. For example, in the tfjs-examples repo, we have a MNIST example that runs entirely on Node:
https://github.com/tensorflow/tfjs-examples/tree/master/mnist-node
This runs super fast on the GPU since convolutions are nice and optimized for CUDA operations. Trying running that example with your watch of nvidia-sim tool.
Ah, I see. Running the mnist-node example on a designated GTX 1060 with no other GPU processes does generate ~20% GPU utilization. What mechanism is used to automagically determine whether a model graph will be run on the GPU or stay on the CPU? As I mentioned in the OP I also tried this on my own custom RNN (browser code here). There I would have expected the GPU to be used, as it is with a nearly identical model in Keras.
If my memory serves me correctly Python Tensorflow gives the option to specify CPU/GPU device specifically. Does no such functionality exist in tfjs-node?
Python + Keras will use a graph-based execution which can run faster on GPU. We use the Eager-style API from TensorFlow which does not actually have a graph of placeholders - it allocates new Tensors for op-output. This is probably why you see Keras utilizing the GPU more.
We do not have a device API yet - it's something we're considering down the road once we introduce TPU support. For now, we default to Tensor placement using the default settings in TensorFlow eager (i.e. copy all non-int32 tensors).
Gotcha. Thanks for that clarification. I've revisited the char-rnn tfjs-node-gpu example I was telling you about and it looks like it is indeed running on the GPU as memory is allocated, but GPU utilization is ~1%. If I'm understanding you correctly this is because tfjs-node-gpu is using TF Eager mode. So I should expect the same type of model to run ~1 GPU utilization if it were written in Python using TF Eager mode as well, correct?
Does tfjs-node-gpu intend to add support for graph-based execution at some point in the near future? Unless I'm missing something, this "Eager mode only" behavior creates some significance performance hurdles, no? In general, how does tfjs-node-gpu compare in performance to similar implementations in Keras?
I ask because I'm writing some documentation for my team and am beginning to consider a javascript-first approach to common high-level ML tasks. A year ago that would have seemed like a crazy idea, but with tfjs, maybe not so. Basically I'm curious if tfjs-node-gpu will ever be comparable in performance to Keras and Python Tensorflow?
@nkreeger, any thoughts on a few of these last questions?
@brannondorsey My opinion on the last question: tfjs won't be as fast as tfpy.
@nkreeger something wrong with training using TFJS Node GPU. It's still training on CPU. I'm using 2080ti. This is MNIST example from tfjs repository
(tf) abhimanyuaryan@hackintosh:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105

I'm seeing epoch cycles take over double the time on my GPU when compared to my CPU. Is there any way to improve this?
Most helpful comment
Gotcha. Thanks for that clarification. I've revisited the char-rnn tfjs-node-gpu example I was telling you about and it looks like it is indeed running on the GPU as memory is allocated, but GPU utilization is ~1%. If I'm understanding you correctly this is because tfjs-node-gpu is using TF Eager mode. So I should expect the same type of model to run ~1 GPU utilization if it were written in Python using TF Eager mode as well, correct?
Does tfjs-node-gpu intend to add support for graph-based execution at some point in the near future? Unless I'm missing something, this "Eager mode only" behavior creates some significance performance hurdles, no? In general, how does tfjs-node-gpu compare in performance to similar implementations in Keras?
I ask because I'm writing some documentation for my team and am beginning to consider a javascript-first approach to common high-level ML tasks. A year ago that would have seemed like a crazy idea, but with tfjs, maybe not so. Basically I'm curious if tfjs-node-gpu will ever be comparable in performance to Keras and Python Tensorflow?