Keras: Problem with TimeDistributed() and Learning Phase

Created on 25 Oct 2016 · 16Comments · Source: keras-team/keras

(EDIT: The following issue is only a minimal example of how to produce the error. My actual goal is to use a more complicated model instead of Dropout() here.)

When executing the following script a MissingInputError occurs:

from keras.models import Model
from keras.layers import Input, TimeDistributed, Dropout

in1 = Input(batch_shape=(10, 8, 6), name="in1")
out1 = TimeDistributed(Dropout(0.5))(in1)

model = Model(input=in1, output=out1)
model.compile("adam", "mse")
model._make_predict_function()

This is the simplest model that produces the error (In my original architecture, I tried to distribute a more complex model). The same issue occurs when replacing the Dropout() layer with e.g. GaussianNoise(), GRU(dropout_W=0.5), but not for e.g. Dense(). I think the error boils down to the combination of TimeDistributed() and any layer (or model) that uses the learning phase.

Maybe there is a conceptual problem with TimeDistributed() and the learning phase input?

These issues seem to be somewhat related: #3834, #2609, #3686, #2391

The full stack trace is this:

... 
  File "/homes/sjebbara/git/keras-original/keras/engine/training.py", line 752, in _make_predict_function
    **kwargs)
  File "/homes/sjebbara/git/keras-original/keras/backend/theano_backend.py", line 787, in function
    return Function(inputs, outputs, updates=updates, **kwargs)
  File "/homes/sjebbara/git/keras-original/keras/backend/theano_backend.py", line 773, in __init__
    **kwargs)
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/function.py", line 326, in function
    output_keys=output_keys)
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/pfunc.py", line 486, in pfunc
    output_keys=output_keys)
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/function_module.py", line 1776, in orig_function
    output_keys=output_keys).create(
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/function_module.py", line 1430, in __init__
    accept_inplace)
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/function_module.py", line 176, in std_fgraph
    update_mapping=update_mapping)
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/gof/fg.py", line 180, in __init__
    self.__import_r__(output, reason="init")
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/gof/fg.py", line 351, in __import_r__
    self.__import__(variable.owner, reason=reason)
  File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/gof/fg.py", line 396, in __import__
    variable=r)
theano.gof.fg.MissingInputError: An input of the graph, used to compute Shape(<TensorType(float32, matrix)>), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.

Backtrace when the variable is created:
  File "/homes/sjebbara/PyCharmProjects/NeuralSentiment/src/Test2.py", line 5, in <module>
    out1 = TimeDistributed(Dropout(0.5))(in1)
  File "/homes/sjebbara/git/keras-original/keras/engine/topology.py", line 514, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/homes/sjebbara/git/keras-original/keras/engine/topology.py", line 572, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/homes/sjebbara/git/keras-original/keras/engine/topology.py", line 149, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
  File "/homes/sjebbara/git/keras-original/keras/layers/wrappers.py", line 131, in call
    initial_states=[], input_length=input_length, unroll=unroll)
  File "/homes/sjebbara/git/keras-original/keras/backend/theano_backend.py", line 947, in rnn
    go_backwards=go_backwards)

Please make sure that the boxes below are checked before you submit your issue. Thank you!

[ x] Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
[x ] If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
[x ] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

stale

Source

sjebbara

Most helpful comment

I was having a similar issue with Tensorflow. Whenever I used the TimeDistributed wrapper on a model containing layers that used the learning phase, the resulting tensor would have the property _uses_learning_phase = False. This meant that when I created a final model containing that tensor, the model's _uses_learning_phase would incorrectly be set to False.

In the case below, my intermediate_model had a Dropout layer; before passing it through the wrapper, intermediate_model.uses_learning_phase=True.

input_scan = Input(shape=(ANGLES,FINAL_WIDTH,FINAL_HEIGHT//2,CHANNELS))
#Time distributed model
sequenced_model = TimeDistributed(intermediate_model)(input_scan)

sequenced_model._uses_learning_phase = True #Manually setting the tensor's property fixed the issue.

out = GlobalAveragePooling1D()(sequenced_model)
#Complete model
model = Model(input_scan,out)

brayan07 on 4 Oct 2017

❤3

All 16 comments

I think you have incorrectly applied TimeDistributed to Dropout.

TimeDistributed(Dropout(0.5))(in1) should be TimeDistributed(Dropout(0.5)(in1))(in1)

kudkudak on 25 Oct 2016

Thanks for the reply.
I am quiet sure that the TimeDistributed() layer expects a Layer object and not a tensor (which Dropout(0.5)(in1) would return).
Also, when changing
out1 = TimeDistributed(Dropout(0.5))(in1) to
out1 = TimeDistributed(Dense(10))(in1)
everything works fine.

sjebbara on 25 Oct 2016

This is a bug in theano when RandomStreams are present inside a scan op. See: https://groups.google.com/forum/#!topic/theano-users/8diyZjq6ngc

Solutions :

Don't provide batch_size or,
Don't use TimeDistributed over Dropout. TimeDistributed(Dropout(0.5))(x) and Dropout(0.5)(x) are equivalent.

If you are trying to drop the same set of nodes for all timesteps in a sequence, simply wrapping in TimeDistributed will not do the job. See my solution at #3995

farizrahman4u on 25 Oct 2016

Thanks for the pointer, @farizrahman4u .

The solutions that you suggested sadly do not apply to my real use case (which I simplified for this Issue). My actual goal is to have an inner model:

inner_in1 = Input(batch_shape=(batch_size, n_elements, element_size), name="inner_in1")
inner_output = GRU(2, dropout_U=0.5, dropout_W=0.5, return_sequences=False, name="gru")(inner_in1)
inner_model = Model(input=inner_in1, output=inner_output, name="inner_model")

that I use in a TimeDistributed() layer inside an outer model:

outer_in1 = Input(batch_shape=(batch_size, n_sequences, n_elements, element_size), name="outer_in1")
outer_output = TimeDistributedModel(inner_model, name="distr")(outer_in1)
outer_output = SomeOtherComputations()(outer_output)
outer_model = Model(input=outer_in1, output=outer_output, name="outer_model")
outer_model.compile("adam", "mse")
outer_model._make_predict_function()

You could see this as a sentence model (inner_model) that I apply to each sentence in a document (outer_model). In this setup, the error appears when using dropout_W or dropout_U in the GRU.

Not specifying batch_size is not possible here, since these lines in the TimeDistributed() layer wouldn't make much sense with an RNN.

sjebbara on 26 Oct 2016

@farizrahman4u as i reported in https://github.com/fchollet/keras/issues/4182

batch_size is required when using stateful rnn
what i want to get in timedistributed(dropout) is _not_ the same dropout nodes for every timestep, but having every timestep drop exactly x% of nodes. without timedistributed you would get different fractions for different timesteps

eyaler on 26 Oct 2016

@sjebbara There is no reason for you to provide the batch_size unless you are having a stateful RNN. Both rnn based and reshape based TimeDistributed implementations are strictly mathematically equivalent. (reshape based implementation being faster). If you still want to specify batch_size, here you go:

outer_in1 = Input(batch_shape=(batch_size, n_sequences, n_elements, element_size), name="outer_in1")
TimeDistributedModel = TimeDistributed(inner_model, name="distr")
TimeDistributedModel.build((None,) + outer_in1._keras_shape[1:])
TimeDistributedModel.build = lambda *_: None
outer_output = TimeDistributedModel(outer_in1)
outer_output = SomeOtherComputations()(outer_output)
outer_model = Model(input=outer_in1, output=outer_output, name="outer_model")
outer_model.compile("adam", "mse")
outer_model._make_predict_function()

farizrahman4u on 26 Oct 2016

Similarly @eyaler,
To drop the exact number of nodes at every time step (when batch_size has to be provided becauses of stateful RNN):

model = Sequential()
model.add(LSTM(10, batch_input_shape=(100, 20, 10), stateful=True, return_sequences=True))

dropout = TimeDistributed(Dropout(0.5))
dropout.build((None,) + model.output_shape[1:])
dropout.build = lambda *_: None

model.add(dropout)

model.add(...)
model.add(...)

farizrahman4u on 26 Oct 2016

thanks @farizrahman4u !

if reshape is faster why isn't it used also when batch_size is given?
how would your solution look using the functional api? my attempt failed on assert_input_compatibility(x)

eyaler on 26 Oct 2016

If batch size is given, then it is possible that the layer being wrapped is a stateful RNN (or any layer which requires a static batch size). Since the reshape method messes with the batch dimension, we go for the rnn method instead
Maybe you forgot return_sequences=True

farizrahman4u on 26 Oct 2016

got it!

from keras.models import Sequential, Model
from keras.layers import LSTM, Dropout, TimeDistributed, Input
import numpy as np

x=np.zeros((100,20,10))
y=np.zeros((100,20,10))

model = Sequential()
model.add(LSTM(10, batch_input_shape=(100, 20, 10), stateful=True, return_sequences=True))
dropout = TimeDistributed(Dropout(0.5))
dropout.build((None,) + model.output_shape[1:])
dropout.build = lambda *_: None
model.add(dropout)
model.compile(optimizer='sgd', loss='mse')
model.fit(x,y,nb_epoch=1,batch_size=100)

input = Input(batch_shape=(100, 20, 10))
a = LSTM(10, stateful=True, return_sequences=True)(input)
dropout = TimeDistributed(Dropout(0.5))
dropout.build((None,) + a._keras_shape[1:])
dropout.build = lambda *_: None
output = dropout(a)
fmodel = Model(input, output)
fmodel.compile(optimizer='sgd', loss='mse')
fmodel.fit(x,y,nb_epoch=1,batch_size=100)

eyaler on 26 Oct 2016

I think I misunderstood the reshape-based implementation. I was just about to point out why reshaping makes no sense with a distributed RNN layer, but then the pieces fell together 😆.

So the solution is simply to leave batch_size undefined?!
I will try that tomorrow.
Thanks all!

sjebbara on 26 Oct 2016

Hello!

Is there any update on this?
By the way for me it works with a tensorflow backend, but not with the theano one...

tati- on 20 Apr 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 19 Jul 2017

In the case below, my intermediate_model had a Dropout layer; before passing it through the wrapper, intermediate_model.uses_learning_phase=True.

input_scan = Input(shape=(ANGLES,FINAL_WIDTH,FINAL_HEIGHT//2,CHANNELS))
#Time distributed model
sequenced_model = TimeDistributed(intermediate_model)(input_scan)

sequenced_model._uses_learning_phase = True #Manually setting the tensor's property fixed the issue.

out = GlobalAveragePooling1D()(sequenced_model)
#Complete model
model = Model(input_scan,out)

brayan07 on 4 Oct 2017

❤3

@eyaler
I can't get your functional example to work.

I tried with a Dense Layer instead of an LSTM. I get an error that says.
Tensors don't have keras_shape.

dropout.build((None,) + a.keras_shape[1:])

The other thing I tried was to have a Dense Layer as input to a dropout layer
wrapped by a timeDistributed layer.

input_1 = Input(batch_shape=(batch_size, seq_len, num_inputs))

x1 = Dense(32, activation='tanh')(input_1)
x1 = TimeDistributed(Dropout(0.5))(x1)

which ends with:

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'time_distributed_1/keras_learning_phase' with dtype bool
     [[Node: time_distributed_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Either way will cause an exception.

What I want to do is sequence to sequence learning and I'd like to do it with the functional API.

That would be a timeDistributed dense layer on top of a LSTM if I understood correctly and
that works.

Having dropout would be the icing on the cake though.

Like @farizrahman4u said I'd like to drop the exact same number of nodes at every time step with
a stateful RNN.

Can anybody provide a pointer on how to do this with the functional API. I can't figure out this
build magic.

EDIT!:

I tried using

tuple(a.get_shape().as_list())[1:]

to make the snippet work.

from keras.models import Sequential, Model
from keras.layers import LSTM, Dropout, TimeDistributed, Input
import numpy as np

x=np.zeros((100,20,10))
y=np.zeros((100,20,10))

input = Input(batch_shape=(100, 20, 10))
a = LSTM(10, stateful=True, return_sequences=True)(input)
dropout = TimeDistributed(Dropout(0.5))
dropout.build((None,) + tuple(a.get_shape().as_list())[1:])
dropout.build = lambda *_: None
output = dropout(a)
fmodel = Model(input, output)
fmodel.compile(optimizer='sgd', loss='mse')
fmodel.fit(x,y,nb_epoch=1,batch_size=100)

Again it terminates with an exception. This time in the training phase:

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'time_distributed_1/keras_learning_phase' with dtype bool
     [[Node: time_distributed_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

EDIT!:

Thanks @brayan07

your workaround fixed the issue and it compiles. I don't know if the dropout is applied correctly
though.