I am facing this strange problem when running my model on Nvidia Tesla k80 (on a ubuntu machine).
I get the following error when I set 'batch_size' in the model fitting anything other than 1
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [7600] vs. [400,19]
Here I use batch_size=400 and my compile command is
model.compile(optimizer=optimizers.Nadam(lr=0.001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
If I remove metrics option in compile then I face no problem.
The number of samples in your input data should be a multiple of number of gpus. For eg, if your input has 3200 samples, with 4 GPUs it should work fine. Please reduce the sample as such and try.
You can also try removing callback functions(if any) as this could also be a reason for this error.
thanks harishini-gadige for the reply.
As one would guess that the batch_size divides the number of data points and that should compatible with the number of GPUs. However, that does not look the case since I am using only one GPU and the error message is strange.
I have 10000 data points and batch size is 10 (in fit) but get the error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [200] vs. [10,20]
Do not know from where the number '200' is picked. When I remove metrics=['accuracy'] in fit there is no problem. Again there is no issue with running on CPU. I think somehow there is issue with GPU version of the keras and it does handle 'metrics' properly. Again if I change the architecture then the problem goes away. The full code is below:
from keras.layers import Embedding, Dense
import numpy as np
from keras.models import Model
from keras.layers import Input
from keras import optimizers
n=10000
num_decoder_tokens=100
len_label_vector=20
latent_dim=300
X = np.random.randint(num_decoder_tokens-1, size=(n, len_label_vector))
Y = np.random.randint(num_decoder_tokens-1, size=(n,len_label_vector)).reshape(n,len_label_vector,1)
decoder_inputs = Input(shape=(None,), name='Decoder-Input')
x = Embedding(num_decoder_tokens, latent_dim, name='Decoder-Word-Embedding', mask_zero=False)(decoder_inputs)
decoder_outputs = Dense(num_decoder_tokens, activation='softmax', name='Final-Output-Dense') (x)
seq2seq_Model = Model(decoder_inputs, decoder_outputs)
print(seq2seq_Model.summary())
seq2seq_Model.compile(optimizer=optimizers.Nadam(lr=0.001),
loss='sparse_categorical_crossentropy', metrics=['accuracy'])
history = seq2seq_Model.fit(X,Y,epochs=50,batch_size=10)
print(history.history)
=====================================
Full error message
Epoch 1/50
2018-12-19 03:50:17.880901: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-19 03:50:17.967829: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-12-19 03:50:17.968229: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2018-12-19 03:50:17.968259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu devices: 0
2018-12-19 03:50:18.274436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-19 03:50:18.274490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0
2018-12-19 03:50:18.274509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N
2018-12-19 03:50:18.274796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10758 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
Traceback (most recent call last):
File "gpu_fail1.py", line 28, in
history = seq2seq_Model.fit(X,Y,epochs=50,batch_size=10)
File "/home/ubuntu/software/tf/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
File "/home/ubuntu/software/tf/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
outs = f(ins_batch)
File "/home/ubuntu/software/tf/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
return self._call(inputs)
File "/home/ubuntu/software/tf/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/home/ubuntu/software/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1382, in __call__
run_metadata_ptr)
File "/home/ubuntu/software/tf/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [200] vs. [10,20]
[[Node: metrics/acc/Equal = Equal[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](metrics/acc/Reshape, metrics/acc/Cast)]]
[[Node: metrics/acc/Mean/_61 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_509_metrics/acc/Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
I've faced the similiar issue , it looks like metrics function does not reshape the y_pred. You could use a custom metric function like :
def custom_sparse_categorical_accuracy(y_true, y_pred):
flatten_y_true = K.cast( K.reshape(y_true,(-1,1) ), K.floatx())
flatten_y_pred = K.cast(K.reshape(y_pred, (-1, y_pred.shape[-1])), K.floatx())
y_pred_labels = K.cast(K.argmax(flatten_y_pred, axis=-1), K.floatx())
return K.cast(K.equal(flatten_y_true,y_pred_labels), K.floatx())
and then complie with this metric :
model.compile(..., metrics=[custom_sparse_categorical_accuracy])
Most helpful comment
I've faced the similiar issue , it looks like metrics function does not reshape the y_pred. You could use a custom metric function like :
and then complie with this metric :
model.compile(..., metrics=[custom_sparse_categorical_accuracy])