Keras: Shape incompatibility from applying multi_GPU_model() to reference VAE model

Created on 24 Jan 2018 · 9Comments · Source: keras-team/keras

If you try to train the standard VAE model given in the examples (https://github.com/keras-team/keras/blob/master/examples/variational_autoencoder.py) embedded within the multi_gpu_model example: https://github.com/keras-team/keras/blob/c8bef99ec7a2032b9bea6e9a1260d05a2b6a80f1/keras/utils/training_utils.py#L56-L93 there are mismatches in tensor sizes.

The input data x gets sliced down by the number of GPUs, but the tensor z representing the latent variable z is not.

When the loss function in the custom loss layer runs, it calculates the loss in two parts: one from the input data x that has shape[0] of batch_size/nGPUs and another part from the latent representation z which remains of size batch_size. Thus when the loss function attempts to add these parts together there is a mismatch.

Clearly this is not an ideal outcome when using the multi_gpu_model function with a standard, reference, model. See my minimal example here: https://gist.github.com/MatthewWilletts/7eef6a201413f936dff55378b4a14ecf

For a batch_size of 12 and with 3 GPUs the errors are of the form:

2018-01-24 20:54:38.466550: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [4] vs. [12]
         [[Node: replica_0/model_1/custom_variational_layer_1/add_1 = Add[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](replica_0/model_1/custom_variational_layer_1/mul, replica_0/model_1/custom_variational_layer_1/mul_1)]]

You can see a complete error message at:
https://gist.github.com/MatthewWilletts/0c4332d6f7092a7acfa5fff5a29e868c

What changes need to be made, either to the function multi_gpu_model or to the reference implementation of the VAE, to remove this shape incompatibility?

Thank you!

Source

MatthewWilletts

👍3

Most helpful comment

I am having a similar problem, but my error message already comes before starting to fit the model. This is why I am unsure whether the problem is the same. Here is my model, and if I run the code without multi_gpu_model, it goes just fine:

    # merge the outputs of the embeddings, and everything that belongs to the most recent activity executions
    main_output = concatenate(models, axis=2)
    main_output = LSTM(25*32, batch_input_shape=(1,), stateful=True)(main_output)
    # main_output = LSTM(25*32, batch_input_shape=(1,25*32), stateful=True)(main_output) 

    # after LSTM has learned on the sequence, bring in the SP2/PFS features, like in Shibatas paper
    main_output = concatenate([main_output, sequence_embedding])
    main_output = Dense(20*32, activation='relu', name='dense_join')(main_output)
    main_output = Dense(len(feature_dict["concept:name"]["to_int"]), activation='sigmoid', name='dense_final')(main_output)

    full_model = Model(inputs=model_inputs, outputs=[main_output])
    full_model = multi_gpu_model(full_model, gpus=ngpus)
    full_model.compile(loss='categorical_crossentropy', optimizer='adam')

And here is the super-long error trace that originates when I try multi_gpu_model (ngpus = 8). What do you think, is the model malformed or does this belong to the referenced issue?

InvalidArgumentError                      Traceback (most recent call last)
~/anaconda3/envs/thesis/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in set_shape(self, shape)
    524           dim_list,
--> 525           unknown_shape)
    526     except errors.InvalidArgumentError as e:

InvalidArgumentError: Dimension 0 in both shapes must be equal, but are 0 and 1. Shapes are [0,800] and [1,800].

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-45-d22ed3259708> in <module>
     55 
     56 full_model = Model(inputs=model_inputs, outputs=[main_output])
---> 57 full_model = multi_gpu_model(full_model)
     58 full_model.compile(loss='categorical_crossentropy', optimizer='adam')

~/anaconda3/envs/thesis/lib/python3.6/site-packages/keras/utils/multi_gpu_utils.py in multi_gpu_model(model, gpus, cpu_merge, cpu_relocation)
    225                 # Apply model on slice
    226                 # (creating a model replica on the target device).
--> 227                 outputs = model(inputs)
    228                 outputs = to_list(outputs)
    229 

~/anaconda3/envs/thesis/lib/python3.6/site-packages/keras/engine/base_layer.py in __call__(self, inputs, **kwargs)
    455             # Actually call the layer,
    456             # collecting output(s), mask(s), and shape(s).
--> 457             output = self.call(inputs, **kwargs)
    458             output_mask = self.compute_mask(inputs, previous_mask)
    459 

~/anaconda3/envs/thesis/lib/python3.6/site-packages/keras/engine/network.py in call(self, inputs, mask)
    562             return self._output_tensor_cache[cache_key]
    563         else:
--> 564             output_tensors, _, _ = self.run_internal_graph(inputs, masks)
    565             return output_tensors
    566 

~/anaconda3/envs/thesis/lib/python3.6/site-packages/keras/engine/network.py in run_internal_graph(self, inputs, masks)
    719                                     kwargs['mask'] = computed_mask
    720                             output_tensors = to_list(
--> 721                                 layer.call(computed_tensor, **kwargs))
    722                             output_masks = layer.compute_mask(computed_tensor,
    723                                                               computed_mask)

~/anaconda3/envs/thesis/lib/python3.6/site-packages/keras/layers/recurrent.py in call(self, inputs, mask, training, initial_state)
   2192                                       mask=mask,
   2193                                       training=training,
-> 2194                                       initial_state=initial_state)
   2195 
   2196     @property

~/anaconda3/envs/thesis/lib/python3.6/site-packages/keras/layers/recurrent.py in call(self, inputs, mask, training, initial_state, constants)
    647                                              mask=mask,
    648                                              unroll=self.unroll,
--> 649                                              input_length=timesteps)
    650         if self.stateful:
    651             updates = []

~/anaconda3/envs/thesis/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in rnn(step_function, inputs, initial_states, go_backwards, mask, constants, unroll, input_length)
   3009             parallel_iterations=32,
   3010             swap_memory=True,
-> 3011             maximum_iterations=input_length)
   3012         last_time = final_outputs[0]
   3013         output_ta = final_outputs[1]

~/anaconda3/envs/thesis/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py in while_loop(cond, body, loop_vars, shape_invariants, parallel_iterations, back_prop, swap_memory, name, maximum_iterations, return_same_structure)
   3230       ops.add_to_collection(ops.GraphKeys.WHILE_CONTEXT, loop_context)
   3231     result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants,
-> 3232                                     return_same_structure)
   3233     if maximum_iterations is not None:
   3234       return result[1]

~/anaconda3/envs/thesis/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py in BuildLoop(self, pred, body, loop_vars, shape_invariants, return_same_structure)
   2950       with ops.get_default_graph()._mutation_lock():  # pylint: disable=protected-access
   2951         original_body_result, exit_vars = self._BuildLoop(
-> 2952             pred, body, original_loop_vars, loop_vars, shape_invariants)
   2953     finally:
   2954       self.Exit()

~/anaconda3/envs/thesis/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py in _BuildLoop(self, pred, body, original_loop_vars, loop_vars, shape_invariants)
   2885         flat_sequence=vars_for_body_with_tensor_arrays)
   2886     pre_summaries = ops.get_collection(ops.GraphKeys._SUMMARY_COLLECTION)  # pylint: disable=protected-access
-> 2887     body_result = body(*packed_vars_for_body)
   2888     post_summaries = ops.get_collection(ops.GraphKeys._SUMMARY_COLLECTION)  # pylint: disable=protected-access
   2889     if not nest.is_sequence(body_result):

~/anaconda3/envs/thesis/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py in <lambda>(i, lv)
   3199         cond = lambda i, lv: (  # pylint: disable=g-long-lambda
   3200             math_ops.logical_and(i < maximum_iterations, orig_cond(*lv)))
-> 3201         body = lambda i, lv: (i + 1, orig_body(*lv))
   3202 
   3203     if context.executing_eagerly():

~/anaconda3/envs/thesis/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in _step(time, output_ta_t, *states)
   2999                     uses_learning_phase = True
   3000                 for state, new_state in zip(states, new_states):
-> 3001                     new_state.set_shape(state.get_shape())
   3002                 output_ta_t = output_ta_t.write(time, output)
   3003                 return (time + 1, output_ta_t) + tuple(new_states)

~/anaconda3/envs/thesis/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in set_shape(self, shape)
    526     except errors.InvalidArgumentError as e:
    527       # Convert to ValueError for backwards compatibility.
--> 528       raise ValueError(str(e))
    529 
    530   @property

ValueError: Dimension 0 in both shapes must be equal, but are 0 and 1. Shapes are [0,800] and [1,800].

flxw on 26 Oct 2018

👍2

All 9 comments

i'm hitting the same problem. Did you find a solution?

rawmean on 27 Feb 2018

Hi, Yes I did. The problem is that the layers representing the latent variables do not get sliced up into minibatches over the GPUs like the inputs and outputs are. I think this is a bit of a design flaw in the current multi_gpu approach in keras - it breaks custom loss layers that depend on intermediate layers for calculation. The easy fix is to add these layers as outputs of the model. This then means they are split by the same process.

Ie in the old version of the VAE, with the custom loss layer (https://github.com/keras-team/keras/blob/ce4947cbaf380589a63def4cc6eb3e460c41254f/examples/variational_autoencoder.py) we replace:
vae = Model(x, y)
with
vae = Model(x, [y, z_mean, z_log_var] )

Hope that helps!

MatthewWilletts on 27 Feb 2018

👍2

@MatthewWilletts Thanks. I had a very similar problem in my model and solved it last night thanks to your hint about the loss dimension not being correct.

rawmean on 27 Feb 2018

I used to have the same problem. Trying to make it work was huge pain for me. In the end I can recommend using https://github.com/uber/horovod - horovod was something that finally worked for me.

Ajk4 on 4 Aug 2018

    # merge the outputs of the embeddings, and everything that belongs to the most recent activity executions
    main_output = concatenate(models, axis=2)
    main_output = LSTM(25*32, batch_input_shape=(1,), stateful=True)(main_output)
    # main_output = LSTM(25*32, batch_input_shape=(1,25*32), stateful=True)(main_output) 

    # after LSTM has learned on the sequence, bring in the SP2/PFS features, like in Shibatas paper
    main_output = concatenate([main_output, sequence_embedding])
    main_output = Dense(20*32, activation='relu', name='dense_join')(main_output)
    main_output = Dense(len(feature_dict["concept:name"]["to_int"]), activation='sigmoid', name='dense_final')(main_output)

    full_model = Model(inputs=model_inputs, outputs=[main_output])
    full_model = multi_gpu_model(full_model, gpus=ngpus)
    full_model.compile(loss='categorical_crossentropy', optimizer='adam')

And here is the super-long error trace that originates when I try multi_gpu_model (ngpus = 8). What do you think, is the model malformed or does this belong to the referenced issue?

InvalidArgumentError                      Traceback (most recent call last)
~/anaconda3/envs/thesis/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in set_shape(self, shape)
    524           dim_list,
--> 525           unknown_shape)
    526     except errors.InvalidArgumentError as e:

InvalidArgumentError: Dimension 0 in both shapes must be equal, but are 0 and 1. Shapes are [0,800] and [1,800].

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-45-d22ed3259708> in <module>
     55 
     56 full_model = Model(inputs=model_inputs, outputs=[main_output])
---> 57 full_model = multi_gpu_model(full_model)
     58 full_model.compile(loss='categorical_crossentropy', optimizer='adam')

~/anaconda3/envs/thesis/lib/python3.6/site-packages/keras/utils/multi_gpu_utils.py in multi_gpu_model(model, gpus, cpu_merge, cpu_relocation)
    225                 # Apply model on slice
    226                 # (creating a model replica on the target device).
--> 227                 outputs = model(inputs)
    228                 outputs = to_list(outputs)
    229 

~/anaconda3/envs/thesis/lib/python3.6/site-packages/keras/engine/base_layer.py in __call__(self, inputs, **kwargs)
    455             # Actually call the layer,
    456             # collecting output(s), mask(s), and shape(s).
--> 457             output = self.call(inputs, **kwargs)
    458             output_mask = self.compute_mask(inputs, previous_mask)
    459 

~/anaconda3/envs/thesis/lib/python3.6/site-packages/keras/engine/network.py in call(self, inputs, mask)
    562             return self._output_tensor_cache[cache_key]
    563         else:
--> 564             output_tensors, _, _ = self.run_internal_graph(inputs, masks)
    565             return output_tensors
    566 

~/anaconda3/envs/thesis/lib/python3.6/site-packages/keras/engine/network.py in run_internal_graph(self, inputs, masks)
    719                                     kwargs['mask'] = computed_mask
    720                             output_tensors = to_list(
--> 721                                 layer.call(computed_tensor, **kwargs))
    722                             output_masks = layer.compute_mask(computed_tensor,
    723                                                               computed_mask)

~/anaconda3/envs/thesis/lib/python3.6/site-packages/keras/layers/recurrent.py in call(self, inputs, mask, training, initial_state)
   2192                                       mask=mask,
   2193                                       training=training,
-> 2194                                       initial_state=initial_state)
   2195 
   2196     @property

~/anaconda3/envs/thesis/lib/python3.6/site-packages/keras/layers/recurrent.py in call(self, inputs, mask, training, initial_state, constants)
    647                                              mask=mask,
    648                                              unroll=self.unroll,
--> 649                                              input_length=timesteps)
    650         if self.stateful:
    651             updates = []

~/anaconda3/envs/thesis/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in rnn(step_function, inputs, initial_states, go_backwards, mask, constants, unroll, input_length)
   3009             parallel_iterations=32,
   3010             swap_memory=True,
-> 3011             maximum_iterations=input_length)
   3012         last_time = final_outputs[0]
   3013         output_ta = final_outputs[1]

~/anaconda3/envs/thesis/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py in while_loop(cond, body, loop_vars, shape_invariants, parallel_iterations, back_prop, swap_memory, name, maximum_iterations, return_same_structure)
   3230       ops.add_to_collection(ops.GraphKeys.WHILE_CONTEXT, loop_context)
   3231     result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants,
-> 3232                                     return_same_structure)
   3233     if maximum_iterations is not None:
   3234       return result[1]

~/anaconda3/envs/thesis/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py in BuildLoop(self, pred, body, loop_vars, shape_invariants, return_same_structure)
   2950       with ops.get_default_graph()._mutation_lock():  # pylint: disable=protected-access
   2951         original_body_result, exit_vars = self._BuildLoop(
-> 2952             pred, body, original_loop_vars, loop_vars, shape_invariants)
   2953     finally:
   2954       self.Exit()

~/anaconda3/envs/thesis/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py in _BuildLoop(self, pred, body, original_loop_vars, loop_vars, shape_invariants)
   2885         flat_sequence=vars_for_body_with_tensor_arrays)
   2886     pre_summaries = ops.get_collection(ops.GraphKeys._SUMMARY_COLLECTION)  # pylint: disable=protected-access
-> 2887     body_result = body(*packed_vars_for_body)
   2888     post_summaries = ops.get_collection(ops.GraphKeys._SUMMARY_COLLECTION)  # pylint: disable=protected-access
   2889     if not nest.is_sequence(body_result):

~/anaconda3/envs/thesis/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py in <lambda>(i, lv)
   3199         cond = lambda i, lv: (  # pylint: disable=g-long-lambda
   3200             math_ops.logical_and(i < maximum_iterations, orig_cond(*lv)))
-> 3201         body = lambda i, lv: (i + 1, orig_body(*lv))
   3202 
   3203     if context.executing_eagerly():

~/anaconda3/envs/thesis/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in _step(time, output_ta_t, *states)
   2999                     uses_learning_phase = True
   3000                 for state, new_state in zip(states, new_states):
-> 3001                     new_state.set_shape(state.get_shape())
   3002                 output_ta_t = output_ta_t.write(time, output)
   3003                 return (time + 1, output_ta_t) + tuple(new_states)

~/anaconda3/envs/thesis/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in set_shape(self, shape)
    526     except errors.InvalidArgumentError as e:
    527       # Convert to ValueError for backwards compatibility.
--> 528       raise ValueError(str(e))
    529 
    530   @property

ValueError: Dimension 0 in both shapes must be equal, but are 0 and 1. Shapes are [0,800] and [1,800].

flxw on 26 Oct 2018

👍2

Might this be related to #8397?

flxw on 29 Oct 2018

@flxw hello I got same problem. Do you have found solution?

stillwaterman on 19 Nov 2018

@flxw hello I got same problem. Do you have found solution?

Same problem..... Works fine with 1 GPU, breaks with multiple GPUs

douglas125 on 12 Jun 2019

@MatthewWilletts hi, i'm hitting the same problem. But i am confuse about the "latent variables ", what is it mean? And how can i know how to add which layers to output? thanks