Please make sure that the boxes below are checked before you submit your issue. If your issue is an implementation question, please ask your question on StackOverflow or join the Keras Slack channel and ask there instead of filing a GitHub issue.
Thank you!
[x] Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/keras-team/keras.git --upgrade --no-deps
[x] If running on TensorFlow, check that you are up-to-date with the latest version. The installation instructions can be found here.
[ ] If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
[x] Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).
Below is a failing test case - the error seems to be triggered by connecting the output of model1 to the initial_state of the LSTM in model 2 (see "this line works" and "this line doesn't").
from keras.models import Model
from keras.layers import Input, LSTM, Dense
from keras.layers import Dropout, BatchNormalization, Concatenate
import numpy as np
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1' # force CPU (don't conflict with GPU training)
num_vocab_tokens = 6
latent_dim = 4
input1 = Input(shape=( None, num_vocab_tokens,))
h = Dense(latent_dim)(input1)
c = Dense(latent_dim)(input1)
model1 = Model(input1, [h, c])
input2a = Input(shape=( latent_dim,))
input2b = Input(shape=( latent_dim,))
input2c = Input(shape=( None, num_vocab_tokens,))
output2 = LSTM(latent_dim, return_sequences=True)(input2c, initial_state= [input2a, input2b]) # this line doesn't
output2 = Dense(num_vocab_tokens)(output2)
model2 = Model( [input2a, input2b, input2c], output2)
input3a = Input(shape=( None, num_vocab_tokens)) # input to ENCODER
input3b = Input(shape=( None, num_vocab_tokens)) # teacher input to DECODER
m1_output = model1(input3a)
output3 = model2( [m1_output[0], m1_output[1], input3b] )
model3 = Model( [input3a, input3b], output3)
data = np.random.rand( 2, 2, num_vocab_tokens )
model3.predict( [data, data] )
I should have added, while I think issuing this error for the above test case is a bug, I am really looking for a workaround to the problem. If anyone has any ideas, I'd love to hear them. Thanks.
Hi @rfernand2, I reported a very similar issue #9084 . The bug relates to statefulness, the same issue occurs when you set stateful=True instantiating an RNN layer. You can work around it, I gave my solution here. It is not ideal but let me know if that works out for you.
@fchollet It is really a shame that the (awesome) functional API breaks for stateful models. Unfortunately I haven't been able to find the issue, so giving this issue some love will be highly appreciated!
The Input in the functional API always creates a placeholder at https://github.com/keras-team/keras/blob/416783156c1b07f28131c493a55a93936b5fe163/keras/engine/topology.py#L1363 As far as Keras is concerned everything clicks together; but, now when you chain models together you are expecting TensorFlow to fill one placeholder with another automatically. In reality, you are feeding values to placeholders only on the outer most model and the inner model placeholders are starved giving the error.
@nuric, but how do you explain that the error only occurs for stateful submodels or submodels where you set initial state on its layers?
Hmm, you are right, it isn't just the placeholder connections. I'm getting the same error if I simply ask for K.shape(input) in the inner model. It could be something about the None in the input shapes that is causing it or because the inputs layers are past into the initial states. We should try the above example with fixed lengths in the model, see if the error still occurs.
@nuric, I have a working gist that shows this issue where the number of samples, sequence length and feature lengths are all fixed:
https://gist.github.com/visionscaper/06a75e9066a368fc2ed01cf0c3f606da
Also see #9084 for more info.
@visionscaper From my debugging, there is a transpose operation being done on the initial_state tensor passed in. In this case this is a Placeholder that doesn't have a value set hence it blows up. I'm getting the feeling passing Placeholder into initial_states seem to use the value before it is actually set, this is the "model_1/gru/transpose3:0" operation in my case.
@nuric, hmm, maybe, but in that case it is still a bug. Just like in the gist I shared, when you would not cast the layer(s) of the decoder in a submodel but embed these layers directly in the overall model, there is no issue. It should not matter if the input tensor is a placeholder tensor or a tensor representing some operations; as long as the placeholder gets filled, which seems to be the case from the last line.
@rfernand2 If you are still interested, @fchollet wrote a nice blog post about sequence-to-sequence learning with teacher forcing just as you like to do. In his approach the teacher forcing input is not connected to the initial state but the input of the decoder.
I did hit this same issue. The workaround presented by @visionscaper is quite complex, and unfortunately I could not think of a simpler one. Any ideas on how a fix for the actual bug would look like? The repro gist is very clear, but the actual problem seems to be buried quite deep somewhere.
Happy to provide more repro instructions, but it seems there are plenty already.
Anyone find a solution to this? or any workaround?
I meet this problem when I combined two rnn model(one's output as another's initial state), and I also give the rnn initial state.
I also have a similar issue (https://github.com/keras-team/keras/issues/10074) my current workaround (which isn't very convenient) is to build a 'flat' model.
When using the functional API, most models shouldn't be hard to define as a single model.
I also find same issue.
If the sub-model is stateful, then crash when evaluating.
I have this issue as well in my stacked discriminator/generator network. As soon as I would like to use a stateful model, it crashs for the prediction part.
Has anyone found a solution for this issue?
My workaround is to use Tensorflow directly for the stateful LSTM. If someone needs some help with the pure Tensorflow implementation of stateful LSTM's, just send me a message :-)
this problem occurs when the component model inside the combined model does not get the value of the input tensor. I solved this using the proper index at the get_output_at(index) function. the concept of node_index from the link https://www.rdocumentation.org/packages/keras/versions/2.2.4/topics/get_input_at
solved my issue. I gave wrong index number in my case. Giving the proper index number solved my problem.
I was getting the same error instantiating my multi-model API. I realized that the issue actually showed up when trying to feed a placeholder defined as an Input layer using another input layer. The issue was resolved by literally using the same Input placeholders across models instead of defining multiple of them and then connecting them together.
@hosse049 Your suggestion worked. Thank you!
input_tensor = pretrained_model.input
output_from_pretrained_model = pretrained_model(input_tensor)
#my model codes follows, using input_tensor and out_from_pretrained_model
If you want to flatten the predefined_model, use
output_from_pretrained_model = pretrained_model.output
I am facing the same problem but in a slightly different setting. During train_on_batch of a dual-loss discriminator of a GAN I am trying to apply different weights to the losses depending on the input (coming either from generator or real training data). I am following @fchollet's approach in #10358 in order to achieve the dynamic loss weights during training.
The way I construct the models is as follows:
generator = Sequential(...)
discriminator = Model([inputs, loss_weights, target1, target2], [output1, output2])
loss = ...
discriminator.add_loss(loss)
discriminator.compile(...)
# Combined
discriminator.trainable = False
generator_input = Input(...)
gan_loss_weights = Input(shape=(1,))
gan_target1 = Input(shape=(1,))
gan_target2 = Input(shape=(1,))
discriminator_input = generator(generator_input)
out1, out2 = discriminator([discriminator_input, gan_loss_weights, gan_target1, gan_target2])
gan = Model([generator_input, gan_loss_weights, gan_target1, gan_target2], [out1, out2])
loss = ...
gan.add_loss(loss)
gan.compile(...)
During training calling train_on_batch on discriminator with dynamic loss weights works correctly. But when called on gan I get the InvalidArgumentError "You must feed a value for placeholder tensor" for discriminator_input.
I tried removing the explicit Input layers and using model.input and model.output when calling generator() and discriminator() as suggested by @hosse049 but it did not work.
The interesting thing is that if I do not apply dynamic loss weighting, this approach for GAN works properly.
I have the same problem as mention above. During train_on_batch on a GAN i get the error: InvalidArgumentError: You must feed a value for placeholder tensor 'ocr_input_img_1' with dtype float and shape [?,128,512,1] [[{{node ocr_input_img_1}}]]
...
img = self.generator([noise, word_embedded])
ctc = self.text_recognition([img, word_sequence, word_length])
valid = self.discriminator(img)
combined = Model([noise, word_sequence, word_length], [ctc, valid]
...
When i only use the valid as output the code works.
@thomasemmerich in my case I ended up solving the dynamic loss weighting by creating a custom loss function that takes the weight (a K.placeholder()) as parameter and I update the placeholder dynamically.
@edervishaj thank you for your answer. But i need the ctc in the custom loss. The Problem is that when i try to get the ctc the error appears.
I solved this problem by using tensorflow 1.13.1, keras 2.3.1. I met this problem when using tf 1.14.
I tried it with tensorflow 1.13.1 and keras 2.3.1 but it doesn't solve the problem in my case
The network structure I implemented is Unet with 3 decoders & encoders.
It does not work when I use conv functions from keras.layers. The layer function I used is: slim.conv2d. More details about versions: windows 10 + 2080ti + cudnn 7.4.2.24 + cuda 10.0 .
Another interesting thing is that it works for the code only based on keras.
Most helpful comment
Below is a failing test case - the error seems to be triggered by connecting the output of model1 to the initial_state of the LSTM in model 2 (see "this line works" and "this line doesn't").
model_connect.py - test out ability to connect and reuse 2 keras models
from keras.models import Model
from keras.layers import Input, LSTM, Dense
from keras.layers import Dropout, BatchNormalization, Concatenate
import numpy as np
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1' # force CPU (don't conflict with GPU training)
num_vocab_tokens = 6
latent_dim = 4
create model1 (encoder)
input1 = Input(shape=( None, num_vocab_tokens,))
h = Dense(latent_dim)(input1)
c = Dense(latent_dim)(input1)
model1 = Model(input1, [h, c])
create model2 (decoder)
input2a = Input(shape=( latent_dim,))
input2b = Input(shape=( latent_dim,))
input2c = Input(shape=( None, num_vocab_tokens,))
output2 = LSTM(latent_dim, return_sequences=True)(input2c) # this line works
output2 = LSTM(latent_dim, return_sequences=True)(input2c, initial_state= [input2a, input2b]) # this line doesn't
output2 = Dense(num_vocab_tokens)(output2)
model2 = Model( [input2a, input2b, input2c], output2)
create model3 (composes model1 and mode2 to create encoder-decoder model)
input3a = Input(shape=( None, num_vocab_tokens)) # input to ENCODER
input3b = Input(shape=( None, num_vocab_tokens)) # teacher input to DECODER
m1_output = model1(input3a)
output3 = model2( [m1_output[0], m1_output[1], input3b] )
model3 = Model( [input3a, input3b], output3)
predict with model3 (force the error as early as possible)
data = np.random.rand( 2, 2, num_vocab_tokens )
model3.predict( [data, data] )