Keras: F ./tensorflow/core/util/cuda_launch_config.h:127] Check failed: work_element_count > 0 (0 vs. 0)

Created on 7 Apr 2018  路  16Comments  路  Source: keras-team/keras

my computer get two gpus锛宨t' s normal if i don't add this code in the context.
model = keras.utils.training_utils.multi_gpu_model(base_model, gpus=2)
however, it will use only one to compute, i don't understand what's it means 'work_element_count > 0'. is it that i have not cleared the cuda worker before ?

Most helpful comment

@mahaishou This issue was resolved when I upgraded tensorflow-gpu version to 1.9.0.

All 16 comments

Same issue here

I have the same question

Does nobody solve this issue?

Same issue here

I don't have the issue with tensorflow-gpu 1.7.0

@vQuagliaro same issue with tensorflow-gpu-1.7.0

Same issue. Did anyone solve it?

Same issue.

Just try to re-build your environment with CUDA9.0, tensorflow-gpu 1.7.0 and cudnn7.0.
It solved this error for me and many other people.
And be careful with your training images, check if there are images with too many objects in it, it may cause OOM problem even if you reduced the ROI number in config. Try replacing those images may help solving this problem too.

beacuse there has gpu process none data in one batch. Notice the input number. I uses four gpu and input 2 sample, this will occur. it's none business of the environment. If you have 2 gpus, make sure one batch at least has 2 sample.

@mahaishou how do you make sure that one batch has at least 2 samples? Please help!

@ashwinijoshigithub do you output your logs like this?
6976/12000 [================>.............] - ETA: 2:02 - loss: 1.7103 - acc: 0.3749
7008/12000 [================>.............] - ETA: 2:01 - loss: 1.7102 - acc: 0.3747
7040/12000 [================>.............] - ETA: 2:00 - loss: 1.7103 - acc: 0.3741
7072/12000 [================>.............] - ETA: 2:00 - loss: 1.7103 - acc: 0.3740
7104/12000 [================>.............] - ETA: 1:59 - loss: 1.7102 - acc: 0.3744
12000 is my total samples and my batchsize is 32.

same ask.

@mahaishou how do you make sure that one batch has at least 2 samples? Please help!

@qiuyinglin control the number of one epoch input.
I use keras to train model, just like this
history = multi_model.fit(X_train, Y_train, batch_size=batch_size, epochs=1,
validation_data = (X_test, Y_test))
X_train is my input data, so just control the length of X_train.

@mahaishou This issue was resolved when I upgraded tensorflow-gpu version to 1.9.0.

Closing as this is resolved

Was this page helpful?
0 / 5 - 0 ratings