To get help from the community, we encourage using Stack Overflow and the tensorflow.js tag.
{ 'tfjs-core': '1.0.3',
'tfjs-data': '1.0.3',
'tfjs-layers': '1.0.3',
'tfjs-converter': '1.0.3',
tfjs: '1.0.3',
'tfjs-node': '1.0.2' }
Running on node
Ubuntu 18.04
$ nvidia-smi
Fri Mar 29 19:25:37 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.104 Driver Version: 410.104 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2070 Off | 00000000:01:00.0 On | N/A |
| N/A 46C P8 9W / N/A | 879MiB / 7952MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
I'm unable to use cudnn convolutional layers in my model on tfjs-node-gpu
Possibly related due to issues with RTX series in this tensorflow workaround there is suggestion to use
config.gpu_options.allow_growth = True
Is there such option in tensorflow js?
const tf = require('@tensorflow/tfjs-node-gpu');
const model = tf.sequential({
layers: [
tf.layers.conv2d({
inputShape:[32, 32, 3],
filters: 32,
kernelSize: [3, 3],
activation: 'relu',
}),
tf.layers.maxPooling2d([2, 2]),
],
});
model.predict(tf.randomNormal([4, 32, 32, 3]))
.then((res) => {
res.print();
})
$ node index.js
2019-03-29 19:22:37.112495: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-03-29 19:22:37.249964: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-03-29 19:22:37.250443: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3aa4000 executing computations on platform CUDA. Devices:
2019-03-29 19:22:37.250458: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5
2019-03-29 19:22:37.271245: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz
2019-03-29 19:22:37.271958: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3aa2750 executing computations on platform Host. Devices:
2019-03-29 19:22:37.271972: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0):,
2019-03-29 19:22:37.272241: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.44
pciBusID: 0000:01:00.0
totalMemory: 7.77GiB freeMemory: 6.80GiB
2019-03-29 19:22:37.272275: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-03-29 19:22:37.273295: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-03-29 19:22:37.273308: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-03-29 19:22:37.273314: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-03-29 19:22:37.273435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6612 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-03-29 19:22:38.761993: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-03-29 19:22:38.763178: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/engine.js:132
throw ex;
^Error: Invalid TF_Status: 2
Message: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
at NodeJSKernelBackend.executeSingleOutput (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-node-gpu/dist/nodejs_kernel_backend.js:192:43)
at NodeJSKernelBackend.conv2d (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-node-gpu/dist/nodejs_kernel_backend.js:700:21)
at environment_1.ENV.engine.runKernel.x (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/ops/conv.js:152:27)
at /home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/engine.js:171:26
at Engine.scopedRun (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/engine.js:126:23)
at Engine.runKernel (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/engine.js:169:14)
at conv2d_ (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/ops/conv.js:151:40)
at Object.conv2d (/home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/ops/operation.js:46:29)
at /home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-layers/dist/layers/convolutional.js:198:17
at /home/bobi/Desktop/cudnn/node_modules/@tensorflow/tfjs-core/dist/engine.js:116:22
Same error happens even when there is no convolutional layers in model.
Models
const actor = () => tf.sequential({
layers: [
tf.layers.inputLayer({inputShape: STATE_SIZE}),
tf.layers.batchNormalization(),
tf.layers.dense({units: ACTION_SIZE*2, activation:'relu'}),
tf.layers.dense({units: ACTION_SIZE, activation:'softmax'}),
],
});
const critic = () => {
const stateInput = tf.input({shape: [STATE_SIZE]});
const actionInput = tf.input({shape: [ACTION_SIZE]});
const bn = tf.layers.batchNormalization().apply(stateInput);
const d1 = tf.layers.dense({units: ACTION_SIZE*2, activation: 'relu'})
.apply(bn);
const d2 = tf.layers.dense({units: ACTION_SIZE,
activation: 'softmax'}).apply(d1);
const concat = tf.layers.concatenate().apply([d2, actionInput]);
const d3 = tf.layers.dense({units: ACTION_SIZE,
activation: 'relu'}).apply(concat);
const output = tf.layers.dense({units: 1}).apply(d3);
return tf.model({inputs: [stateInput, actionInput], outputs: output});
}
$ node server/start.js
2019-04-03 20:26:24.022854: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-03 20:26:24.151743: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-03 20:26:24.152219: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3ac43c0 executing computations on platform CUDA. Devices:
2019-04-03 20:26:24.152233: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5
2019-04-03 20:26:24.171244: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2208000000 Hz
2019-04-03 20:26:24.171685: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3ac2b10 executing computations on platform Host. Devices:
2019-04-03 20:26:24.171699: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0):,
2019-04-03 20:26:24.171843: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.44
pciBusID: 0000:01:00.0
totalMemory: 7.77GiB freeMemory: 6.57GiB
2019-04-03 20:26:24.171855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-03 20:26:24.172565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-03 20:26:24.172575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-04-03 20:26:24.172579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-04-03 20:26:24.172688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6389 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5)
Starting with random weights.
(node:20980) ExperimentalWarning: The fs.promises API is experimental
Listening on 3000
connection
2019-04-03 20:26:27.335442: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-04-03 20:26:27.335505: W ./tensorflow/stream_executor/stream.h:2099] attempting to perform DNN operation using StreamExecutor without DNN support
2019-04-03 20:26:27.346775: I tensorflow/stream_executor/stream.cc:2079] [stream=0x4a7f370,impl=0x4a7f410] did not wait for [stream=0x4a7ed90,impl=0x4a76260]
2019-04-03 20:26:27.346799: I tensorflow/stream_executor/stream.cc:5027] [stream=0x4a7f370,impl=0x4a7f410] did not memcpy host-to-device; source: 0x4a02a980
2019-04-03 20:26:27.346837: F tensorflow/core/common_runtime/gpu/gpu_util.cc:339] CPU->GPU Memcpy failed
@bobiblazeski, have you found any resolution to this issue?
I'm currently blocked by this same error.
Ubuntu 18.04
GTX 1660; Driver 418.56; CUDA 10.1 (even though I followed the instructions for 10.0...)
@adwellj Nope I'm training on CPU until this is resolved.
@bobiblazeski, I punted over to trying on Windows and finally just got this working. I had to drop down to tfjs-node-gpu version 0.3.2 due to node-gyp issues.
However, once I finally got it to install, I later ran in to this same CuDNN issue! Fortunately, using CUDA 9.0 (needed for 0.3.2 compatibility) I got a better error message before the "This is probably because cuDNN failed to initialize..." message, stating that tfjs-node-gpu was built against CuDNN version 7.2. Once I downloaded that version, everything is working.
I haven't went back to see if I could get it to work on the LINUX install, but I'm hoping that this could just be a CuDNN version incompatibility issue that you could experiment with. Luckily CuDNN doesn't have an install / uninstall process; it's simply copying the extracted files in to a dedicated directory that you include in your system path.
I hope that helps give you some possible direction!
As explained in https://github.com/tensorflow/tfjs/issues/671#issuecomment-494832790
There is workaround by setting global variable
export TF_FORCE_GPU_ALLOW_GROWTH=true
@adwellj Nope I'm training on CPU until this is resolved.
have you solve to use GPU ?
I am having this issue too, but it seems to resolve itself only when I restart my computer. This seems rather odd to me. I notice that the issue tends to happen after terminating my application(s) that utilize tfjs.
EDIT: I tried adding TF_FORCE_GPU_ALLOW_GROWTH=true as an environment variable, and it seemed to have worked briefly, but upon trying to run my program once more, the error started appearing again.
this seems to be a duplicate of #671 , we will close this and track the issue at one place.Thank you
Most helpful comment
As explained in https://github.com/tensorflow/tfjs/issues/671#issuecomment-494832790
There is workaround by setting global variable
export TF_FORCE_GPU_ALLOW_GROWTH=true