(EDIT: The following issue is only a minimal example of how to produce the error. My actual goal is to use a more complicated model instead of Dropout() here.)
When executing the following script a MissingInputError occurs:
from keras.models import Model
from keras.layers import Input, TimeDistributed, Dropout
in1 = Input(batch_shape=(10, 8, 6), name="in1")
out1 = TimeDistributed(Dropout(0.5))(in1)
model = Model(input=in1, output=out1)
model.compile("adam", "mse")
model._make_predict_function()
This is the simplest model that produces the error (In my original architecture, I tried to distribute a more complex model). The same issue occurs when replacing the Dropout() layer with e.g. GaussianNoise(), GRU(dropout_W=0.5), but not for e.g. Dense(). I think the error boils down to the combination of TimeDistributed() and any layer (or model) that uses the learning phase.
Maybe there is a conceptual problem with TimeDistributed() and the learning phase input?
These issues seem to be somewhat related: #3834, #2609, #3686, #2391
The full stack trace is this:
...
File "/homes/sjebbara/git/keras-original/keras/engine/training.py", line 752, in _make_predict_function
**kwargs)
File "/homes/sjebbara/git/keras-original/keras/backend/theano_backend.py", line 787, in function
return Function(inputs, outputs, updates=updates, **kwargs)
File "/homes/sjebbara/git/keras-original/keras/backend/theano_backend.py", line 773, in __init__
**kwargs)
File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/function.py", line 326, in function
output_keys=output_keys)
File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/pfunc.py", line 486, in pfunc
output_keys=output_keys)
File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/function_module.py", line 1776, in orig_function
output_keys=output_keys).create(
File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/function_module.py", line 1430, in __init__
accept_inplace)
File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/compile/function_module.py", line 176, in std_fgraph
update_mapping=update_mapping)
File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/gof/fg.py", line 180, in __init__
self.__import_r__(output, reason="init")
File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/gof/fg.py", line 351, in __import_r__
self.__import__(variable.owner, reason=reason)
File "/homes/sjebbara/.local/lib/python2.7/site-packages/Theano-0.9.0.dev3-py2.7.egg/theano/gof/fg.py", line 396, in __import__
variable=r)
theano.gof.fg.MissingInputError: An input of the graph, used to compute Shape(<TensorType(float32, matrix)>), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.
Backtrace when the variable is created:
File "/homes/sjebbara/PyCharmProjects/NeuralSentiment/src/Test2.py", line 5, in <module>
out1 = TimeDistributed(Dropout(0.5))(in1)
File "/homes/sjebbara/git/keras-original/keras/engine/topology.py", line 514, in __call__
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/homes/sjebbara/git/keras-original/keras/engine/topology.py", line 572, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/homes/sjebbara/git/keras-original/keras/engine/topology.py", line 149, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "/homes/sjebbara/git/keras-original/keras/layers/wrappers.py", line 131, in call
initial_states=[], input_length=input_length, unroll=unroll)
File "/homes/sjebbara/git/keras-original/keras/backend/theano_backend.py", line 947, in rnn
go_backwards=go_backwards)
Please make sure that the boxes below are checked before you submit your issue. Thank you!
I think you have incorrectly applied TimeDistributed to Dropout.
TimeDistributed(Dropout(0.5))(in1) should be TimeDistributed(Dropout(0.5)(in1))(in1)
Thanks for the reply.
I am quiet sure that the TimeDistributed() layer expects a Layer object and not a tensor (which Dropout(0.5)(in1) would return).
Also, when changing
out1 = TimeDistributed(Dropout(0.5))(in1) to
out1 = TimeDistributed(Dense(10))(in1)
everything works fine.
This is a bug in theano when RandomStreams are present inside a scan op. See: https://groups.google.com/forum/#!topic/theano-users/8diyZjq6ngc
Solutions :
batch_size or,TimeDistributed over Dropout. TimeDistributed(Dropout(0.5))(x) and Dropout(0.5)(x) are equivalent.If you are trying to drop the same set of nodes for all timesteps in a sequence, simply wrapping in TimeDistributed will not do the job. See my solution at #3995
Thanks for the pointer, @farizrahman4u .
The solutions that you suggested sadly do not apply to my real use case (which I simplified for this Issue). My actual goal is to have an inner model:
inner_in1 = Input(batch_shape=(batch_size, n_elements, element_size), name="inner_in1")
inner_output = GRU(2, dropout_U=0.5, dropout_W=0.5, return_sequences=False, name="gru")(inner_in1)
inner_model = Model(input=inner_in1, output=inner_output, name="inner_model")
that I use in a TimeDistributed() layer inside an outer model:
outer_in1 = Input(batch_shape=(batch_size, n_sequences, n_elements, element_size), name="outer_in1")
outer_output = TimeDistributedModel(inner_model, name="distr")(outer_in1)
outer_output = SomeOtherComputations()(outer_output)
outer_model = Model(input=outer_in1, output=outer_output, name="outer_model")
outer_model.compile("adam", "mse")
outer_model._make_predict_function()
You could see this as a sentence model (inner_model) that I apply to each sentence in a document (outer_model). In this setup, the error appears when using dropout_W or dropout_U in the GRU.
Not specifying batch_size is not possible here, since these lines in the TimeDistributed() layer wouldn't make much sense with an RNN.
@farizrahman4u as i reported in https://github.com/fchollet/keras/issues/4182
@sjebbara There is no reason for you to provide the batch_size unless you are having a stateful RNN. Both rnn based and reshape based TimeDistributed implementations are strictly mathematically equivalent. (reshape based implementation being faster). If you still want to specify batch_size, here you go:
outer_in1 = Input(batch_shape=(batch_size, n_sequences, n_elements, element_size), name="outer_in1")
TimeDistributedModel = TimeDistributed(inner_model, name="distr")
TimeDistributedModel.build((None,) + outer_in1._keras_shape[1:])
TimeDistributedModel.build = lambda *_: None
outer_output = TimeDistributedModel(outer_in1)
outer_output = SomeOtherComputations()(outer_output)
outer_model = Model(input=outer_in1, output=outer_output, name="outer_model")
outer_model.compile("adam", "mse")
outer_model._make_predict_function()
Similarly @eyaler,
To drop the exact number of nodes at every time step (when batch_size has to be provided becauses of stateful RNN):
model = Sequential()
model.add(LSTM(10, batch_input_shape=(100, 20, 10), stateful=True, return_sequences=True))
dropout = TimeDistributed(Dropout(0.5))
dropout.build((None,) + model.output_shape[1:])
dropout.build = lambda *_: None
model.add(dropout)
model.add(...)
model.add(...)
thanks @farizrahman4u !
got it!
from keras.models import Sequential, Model
from keras.layers import LSTM, Dropout, TimeDistributed, Input
import numpy as np
x=np.zeros((100,20,10))
y=np.zeros((100,20,10))
model = Sequential()
model.add(LSTM(10, batch_input_shape=(100, 20, 10), stateful=True, return_sequences=True))
dropout = TimeDistributed(Dropout(0.5))
dropout.build((None,) + model.output_shape[1:])
dropout.build = lambda *_: None
model.add(dropout)
model.compile(optimizer='sgd', loss='mse')
model.fit(x,y,nb_epoch=1,batch_size=100)
input = Input(batch_shape=(100, 20, 10))
a = LSTM(10, stateful=True, return_sequences=True)(input)
dropout = TimeDistributed(Dropout(0.5))
dropout.build((None,) + a._keras_shape[1:])
dropout.build = lambda *_: None
output = dropout(a)
fmodel = Model(input, output)
fmodel.compile(optimizer='sgd', loss='mse')
fmodel.fit(x,y,nb_epoch=1,batch_size=100)
I think I misunderstood the reshape-based implementation. I was just about to point out why reshaping makes no sense with a distributed RNN layer, but then the pieces fell together 馃槅.
So the solution is simply to leave batch_size undefined?!
I will try that tomorrow.
Thanks all!
Hello!
Is there any update on this?
By the way for me it works with a tensorflow backend, but not with the theano one...
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
I was having a similar issue with Tensorflow. Whenever I used the TimeDistributed wrapper on a model containing layers that used the learning phase, the resulting tensor would have the property _uses_learning_phase = False. This meant that when I created a final model containing that tensor, the model's _uses_learning_phase would incorrectly be set to False.
In the case below, my intermediate_model had a Dropout layer; before passing it through the wrapper, intermediate_model.uses_learning_phase=True.
input_scan = Input(shape=(ANGLES,FINAL_WIDTH,FINAL_HEIGHT//2,CHANNELS))
#Time distributed model
sequenced_model = TimeDistributed(intermediate_model)(input_scan)
sequenced_model._uses_learning_phase = True #Manually setting the tensor's property fixed the issue.
out = GlobalAveragePooling1D()(sequenced_model)
#Complete model
model = Model(input_scan,out)
@eyaler
I can't get your functional example to work.
I tried with a Dense Layer instead of an LSTM. I get an error that says.
Tensors don't have keras_shape.
dropout.build((None,) + a.keras_shape[1:])
The other thing I tried was to have a Dense Layer as input to a dropout layer
wrapped by a timeDistributed layer.
input_1 = Input(batch_shape=(batch_size, seq_len, num_inputs))
x1 = Dense(32, activation='tanh')(input_1)
x1 = TimeDistributed(Dropout(0.5))(x1)
which ends with:
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'time_distributed_1/keras_learning_phase' with dtype bool
[[Node: time_distributed_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Either way will cause an exception.
What I want to do is sequence to sequence learning and I'd like to do it with the functional API.
That would be a timeDistributed dense layer on top of a LSTM if I understood correctly and
that works.
Having dropout would be the icing on the cake though.
Like @farizrahman4u said I'd like to drop the exact same number of nodes at every time step with
a stateful RNN.
Can anybody provide a pointer on how to do this with the functional API. I can't figure out this
build magic.
EDIT!:
I tried using
tuple(a.get_shape().as_list())[1:]
to make the snippet work.
from keras.models import Sequential, Model
from keras.layers import LSTM, Dropout, TimeDistributed, Input
import numpy as np
x=np.zeros((100,20,10))
y=np.zeros((100,20,10))
input = Input(batch_shape=(100, 20, 10))
a = LSTM(10, stateful=True, return_sequences=True)(input)
dropout = TimeDistributed(Dropout(0.5))
dropout.build((None,) + tuple(a.get_shape().as_list())[1:])
dropout.build = lambda *_: None
output = dropout(a)
fmodel = Model(input, output)
fmodel.compile(optimizer='sgd', loss='mse')
fmodel.fit(x,y,nb_epoch=1,batch_size=100)
Again it terminates with an exception. This time in the training phase:
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'time_distributed_1/keras_learning_phase' with dtype bool
[[Node: time_distributed_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
EDIT!:
Thanks @brayan07
your workaround fixed the issue and it compiles. I don't know if the dropout is applied correctly
though.
sequenced_model._uses_learning_phase = True #Manually setting the tensor's property fixed the issue.
```
This was the key to solve this for me, too.
The model contained in the timedistributed was indeed not training without this.
Most helpful comment
I was having a similar issue with Tensorflow. Whenever I used the TimeDistributed wrapper on a model containing layers that used the learning phase, the resulting tensor would have the property _uses_learning_phase = False. This meant that when I created a final model containing that tensor, the model's _uses_learning_phase would incorrectly be set to False.
In the case below, my intermediate_model had a Dropout layer; before passing it through the wrapper, intermediate_model.uses_learning_phase=True.