Hi!
I've run into an error when using multi_gpu_model for the seq2seq model referenced here:
https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html
Specifically, this is the error received when running on 2 GPUs:
InvalidArgumentError: Incompatible shapes: [16,128] vs. [32,128]
[[Node: replica_0/model_1/lstm_2/while/add = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](replica_0/model_1/lstm_2/while/BiasAdd, replica_0/model_1/lstm_2/while/MatMul_4)]]
[[Node: loss/mul/_193 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4498_loss/mul", tensor_type=DT_FLOAT,
_device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
32 is my batch size and 128 is the latent dim of the encoding space. Running on 4 GPUs gives the same error but [8,128] vs. [32,128] instead.
The full error log is here (I must run the code in Google ML Engine as I don't have multiple GPUs myself).
Python code is here and the error can be reproduced by running
python seq2seq.py --nbr-gpus 2
Tensorflow version: 1.4.1
Keras version: 2.1.4
Hope someone knows what's wrong as I've spent the last few days trying to fix this without success.
Any ideas?
I see a similar issue - I believe the common theme is setting initial state on recurrent layers.
Reproducing code (edited to clarify cause being combination of initial state and multi_gpu_model):
import numpy as np
import keras
from keras import layers as L
from keras.models import Sequential, Model
from keras.utils.multi_gpu_utils import multi_gpu_model
x = L.Input((4,3))
y = L.SimpleRNN(3,return_sequences=True)(x)
_x = np.random.randn(2,4,3)
_y = np.random.randn(2,4,3)
m = Model(x,y)
m.compile(loss='mean_squared_error',optimizer='adam')
m.train_on_batch(_x,_y)
print("Success!")
m2 = multi_gpu_model(m,2)
m2.compile(loss='mean_squared_error',optimizer='adam')
m2.train_on_batch(_x,_y)
print("Success 2!")
x = L.Input((4,3))
init_state = L.Input((3,))
y = L.SimpleRNN(3,return_sequences=True)(x,initial_state=init_state)
_x = [np.random.randn(2,4,3),np.random.randn(2,3)]
_y = np.random.randn(2,4,3)
m = Model([x,init_state],y)
m.compile(loss='mean_squared_error',optimizer='adam')
m.train_on_batch(_x,_y)
print("Success 3!")
m2 = multi_gpu_model(m,2)
m2.compile(loss='mean_squared_error',optimizer='adam')
m2.train_on_batch(_x,_y)
print("Success 4!")
Success!
Success 2!
Success 3!
...
InvalidArgumentError (see above for traceback): Incompatible shapes: [4,4,3] vs. [2,4,3]
[[Node: training_1/Adam/gradients/loss_1/simple_rnn_1_loss/sub_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@loss_1/simple_rnn_1_loss/sub"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](training_1/Adam/gradients/loss_1/simple_rnn_1_loss/sub_grad/Shape/_143, training_1/Adam/gradients/loss_1/simple_rnn_1_loss/sub_grad/Shape_1)]]
[[Node: training_1/Adam/gradients/simple_rnn_1_1/concat_grad/Slice_1/_189 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:1", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_545_training_1/Adam/gradients/simple_rnn_1_1/concat_grad/Slice_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:1"]()]]
I am having exactly the same problem adopting the code from https://github.com/keras-team/keras/blob/master/examples/lstm_seq2seq.py#L144 to use 2 GPUs
Same problem. Any idea to solve it?
Exactly the same problem as well, also with code from https://github.com/keras-team/keras/blob/master/examples/lstm_seq2seq.py#L144 adapted to use 2 GPUS.
same problem help me~
Exactly the same problem! How to solve with initial_state and multi_gpu_model?
@bezigon @kwonyoungjoo @jaimevargast @LeZhengThu @zyxue @mharradon @oskarjonefors This bug has been fixed by #10845 , please check out the latest code. If it still can't work well, please let me know. Thanks.
Most helpful comment
Reproducing code (edited to clarify cause being combination of initial state and multi_gpu_model):