I am writing a pipeline that fine-tunes the pre-trained models of Keras 1.2.0. To speed it up, instead of freezing the layers I try to:
If you don't use data augmentation, this should yield a significant speed improvement. Unfortunately the step 2 fails with a "Graph Disconnected" exception. I tried alternative ways to do this (such as using the K.function() approach) but it still fails.
Below you will find a simple example that reproduces the problem and the error message:
import keras.applications
from keras.models import Model
from keras.layers import Input
from keras.preprocessing import image
from keras.applications.imagenet_utils import preprocess_input
import numpy as np
# Read some random image
img = image.load_img('/path/to/image.jpg', target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
# Load a pre-trained model
model = keras.applications.resnet50.ResNet50(weights='imagenet', include_top=False, input_tensor=Input(shape=(224, 224, 3)))
# Feed the image and get the bn_conv1 output: WORKS!
bn_conv1_model = Model(input=model.input, output=model.get_layer('bn_conv1').output)
bn_conv1_output = bn_conv1_model.predict(x)
# Feed directly the bn_conv1 output to the remaining layers: FAILS!
avg_pool_model = Model(input=Input(model.get_layer('bn_conv1').output_shape[1:]), output=model.get_layer('avg_pool').output) # This line throws exception
avg_pool_output = avg_pool_model.predict(bn_conv1_output)
The error message is:
Traceback (most recent call last):
File "
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 1987, in __init__
str(layers_with_complete_input))
RuntimeError: Graph disconnected: cannot obtain value for tensor Tensor("input_1:0", shape=(?, 224, 224, 3), dtype=float32) at layer "input_1". The following previous layers were accessed without issue: []
I am also facing similar problem and I believe its something to do with Model()
function.
RuntimeError: Graph disconnected: cannot obtain value for tensor input_11 at layer "input_11".
You're making a new input layer Input(model.get_layer('bn_conv1').output_shape[1:]
and using an old output output=model.get_layer('avg_pool').output
. These are not connected in any way so of course this fails.
The model outputs are tensors based on the model inputs. You can't just use them for a different set of inputs. The simplest way to do what you want would be to build a new model using only the layers that you want.
m = Sequential()
for layer in model.layers[42:69]:
m.add(layer)
If you want to be able to reuse the outputs from the model, you might be able to use clone and replace but that would be a long discussion you should have on the tensorflow forums.
Cheers
Graph disconnected normally means your input and output are not part of the same graph. If your input was not the variable you used to create your output, this is the error you will get.
That was the first thing I tried but the Sequential API will not work with complicated models such as ResNet50 (see below the error message).
I understand now that splitting a non-Sequential model is not an easy task since a layer on the bottom can in theory take multiple inputs from higher layers. As I am not yet familiar with the low-level API of Keras, I wrote a terrible solution that extracts the json architecture, manipilates it and reconstructs the partial model. This works and actually speeds up the model-tuning of models by a factor of 3 but it is far from an elegant and generic solution. I can share the snippet if you want.
bottom = keras.applications.resnet50.ResNet50(weights='imagenet', include_top=False, input_tensor=Input(shape=(224, 224, 3), name='input'))
m = Sequential()
for l in bottom.layers[0:141]:
m.add(l)
Traceback (most recent call last):
File "
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 324, in add
output_tensor = layer(self.outputs[0])
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 517, in __call__
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 571, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 155, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "/usr/local/lib/python2.7/dist-packages/keras/layers/normalization.py", line 128, in call
self.add_updates([K.moving_average_update(self.running_mean, mean, self.momentum),
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 364, in moving_average_update
variable, value, momentum)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 70, in assign_moving_average
update_delta = _zero_debias(variable, value, decay)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 177, in _zero_debias
trainable=False)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1024, in get_variable
custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 850, in get_variable
custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 346, in get_variable
validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 331, in _true_getter
caching_device=caching_device, validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 632, in _get_single_variable
name, "".join(traceback.format_list(tb))))
ValueError: Variable bn_conv1_running_mean/biased already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 364, in moving_average_update
variable, value, momentum)
File "/usr/local/lib/python2.7/dist-packages/keras/layers/normalization.py", line 128, in call
self.add_updates([K.moving_average_update(self.running_mean, mean, self.momentum),
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 155, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
Just to clarify my use-case (similar to issue 5083), I try to fine-tune a ResNet50 as on the InceptionV3 snippet of the Keras documentation. The only problem with the original snippet is that it's very slow. Instead of freezing the various layers one could, split the model in two parts, run the data through the first bit & persist the intermediate outputs on disk and then train/tune quickly the second part.
Here are the steps that I am following:
Part 1: Training the network with custom classification layers:
This approach speeds up the iterations by a factor of 20, comparing to freezing the layers.
Part 2: Fine-tuning the network:
This speeds up the iterations by a factor of 3, comparing to freezing the layers.
As I said earlier to split the model, I export the configuration, manipulate it and reconstruct part of the network. This is a very wasteful and poor solution. Do you think there is any low-level API that could help me achieve the same result?
Here is my "terrible" & non-generic solution for splitting a model:
def split_model(model, start, end):
confs = model.get_config()
weights = {l.name:l.get_weights() for l in model.layers}
# split model
kept_layers = set()
for i, l in enumerate(confs['layers']):
if i == 0:
confs['layers'][0]['config']['batch_input_shape'] = model.layers[start].input_shape
elif i < start or i > end:
continue
kept_layers.add(l['name'])
# filter layers
layers = [l for l in confs['layers'] if l['name'] in kept_layers]
layers[1]['inbound_nodes'][0][0][0] = layers[0]['name']
# set conf
confs['layers'] = layers
confs['input_layers'][0][0] = layers[0]['name']
confs['output_layers'][0][0] = layers[-1]['name']
# create new model
newModel = Model.from_config(confs)
for l in newModel.layers:
l.set_weights(weights[l.name])
return newModel
What backend are you on? There is a possible clean solution but it is backend specific.
You could hypothetically manipulate the graph to get just the portion you need. Something like this will work, but you might need to add some attributes to make Model
accept it.
middle_layer_output = model.layers[...].output
output = model.layers[...].output
swapped_input = Input(...)
output_swapped = theano.clone(output, {middle_layer_output:swapped_input})
Something like this is a really handy trick, but no one has put cloning into K.backend yet.
Unfortunately I use Tensorflow. I will have a look on the implementation of Theano to see how it works.
Should I leave the ticket open? I guess the question turned into a feature request.
You should be able to do something similar in Tensorflow using graph_replace.
https://www.tensorflow.org/api_docs/python/contrib.graph_editor/module_transform#graph_replace
Please let me know if you have any success. If you can get something short like that to work, I think adding theano.clone and tensorflow.contrib.graph_editor.transform.graph_replace to keras backend would be an awesome feature.
Cheers
Sounds good. Leave it open, I'll familiarize with the low-level APIs of Keras & Tensorflow, evaluate the solution offered by Theano and I'll have a go to see if I can write a clean solution. If I succeed, I'll send a pull-request. :)
@datumbox I like your "terrible" model splitter. I've tried it on the InceptV3 model (for the same reason: slowwwwww...). It seems to subtract the botom section 0-172 ok, but subsequently subtracting the top section 173:216 fails. Not sure if it was intended to work for InceptV3 or for the top section? If this works, it would be rather straight forward to stack the models using the functional API. This would allow to train the sections separately. Note, I am not sure about exact split, the refrenced example Keras applications suggests a split 0-172 (where 172 is not included), but I would assume that the merge block (mixed8) at 172 should be included in the bottom part to create a logical model termination? Anyway, I think your limits are inclusive.
If you can make your splitter work for both top and bottom I can try stacking them. Jan
Yes I agree, you should definitely include it. If we visualize the model we will see that it makes absolutely no sense not to include the merge. :)
My plan is to complete a less terrible solution which is more generic and allows you to split specific nodes. I'll post it here once I have it or send a pull request.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.
Hi @datumbox I'm fighting your same problem and I'm trying to use your "terrible" solution, that seems to be the most viable actually. I'm using keras 2.0 and tensorflow backend. Did you end up developing an updated version of the split model function?
Hi @engharat
I have not yet written a proper solution for that. The network graphs can be very complex, especially for networks that branch-out and merge a lot and this requires writting graph traversal algorithms. On the future I'll probably do this and contribute it back to Keras but I have not done it yet.
Below I send you the latest version of the "terrible" solution that I'm using. Rest assured it is equally terrible as the previous one:
def split(model, start, end):
confs = model.get_config()
kept_layers = set()
for i, l in enumerate(confs['layers']):
if i == 0:
confs['layers'][0]['config']['batch_input_shape'] = model.layers[start].input_shape
if i != start:
confs['layers'][0]['name'] += str(random.randint(0, 100000000)) # rename the input layer to avoid conflicts on merge
confs['layers'][0]['config']['name'] = confs['layers'][0]['name']
elif i < start or i > end:
continue
kept_layers.add(l['name'])
# filter layers
layers = [l for l in confs['layers'] if l['name'] in kept_layers]
layers[1]['inbound_nodes'][0][0][0] = layers[0]['name']
# set conf
confs['layers'] = layers
confs['input_layers'][0][0] = layers[0]['name']
confs['output_layers'][0][0] = layers[-1]['name']
# create new model
submodel = Model.from_config(confs)
for l in submodel.layers:
orig_l = model.get_layer(l.name)
if orig_l is not None:
l.set_weights(orig_l.get_weights())
return submodel
Hello Everyone,
I am trying to do similar thing. I saved the intermediate output to the disk and now using this want to train and score the remaining model. Was there any addition to Keras API for this issue? @datumbox Were you able to find another solution?
I have a similar problem
#----------------------------------------------
# Encoder
#----------------------------------------------
img_input = Input((input_height, input_width, 1))
x = Lambda(lambda x: K.repeat_elements(x, 3, axis=3))(img_input)
adapter = Model(img_input, x)
vgg = VGG16(include_top=False, weights='imagenet')
vgg_plus_input = Model(outputs=vgg(adapter.output), inputs=adapter.input)
block_of_interest = 'block4_pool'
#----------------------------------------------
# Decoder
#----------------------------------------------
o = vgg.get_layer(block_of_interest).output
o = ( ZeroPadding2D( (1,1) , data_format='channels_last' ))(o)
o = ( Conv2D(512, (3, 3), padding='valid', data_format='channels_last'))(o)
o = ( BatchNormalization())(o)
o = ( UpSampling2D( (2,2), data_format='channels_last'))(o)
o = ( ZeroPadding2D( (1,1), data_format='channels_last'))(o)
o = ( Conv2D( 256, (3, 3), padding='valid', data_format='channels_last'))(o)
o = ( BatchNormalization())(o)
o = ( UpSampling2D((2,2) , data_format='channels_last' ) )(o)
o = ( ZeroPadding2D((1,1) , data_format='channels_last' ))(o)
o = ( Conv2D( 128 , (3, 3), padding='valid' , data_format='channels_last' ))(o)
o = ( BatchNormalization())(o)
o = ( UpSampling2D((2,2) , data_format='channels_last' ))(o)
o = ( ZeroPadding2D((1,1) , data_format='channels_last' ))(o)
o = ( Conv2D( 64 , (3, 3), padding='valid' , data_format='channels_last' ))(o)
o = ( BatchNormalization())(o)
o = Conv2D( n_classes , (3, 3) , padding='same', data_format='channels_last' )( o )
o_shape = Model(img_input , o ).output_shape
outputHeight = o_shape[2]
outputWidth = o_shape[3]
o = (Reshape(( -1 , outputHeight*outputWidth )))(o)
o = (Permute((2, 1)))(o)
o = (Activation('softmax'))(o)
model = Model( vgg_plus_input.input , o )
model.outputWidth = outputWidth
model.outputHeight = outputHeight
model.compile(loss='categorical_crossentropy',
optimizer= optimizer_name ,
metrics=['accuracy', mean_iou])
and the error I get
File "/home/hey/work/kaggleSegmentation/models.py", line 87, in VGGSegnet
o_shape = Model(img_input , o ).output_shape
File "/home/hey/Programs/anaconda3/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/hey/Programs/anaconda3/lib/python3.6/site-packages/keras/engine/network.py", line 93, in __init__
self._init_graph_network(*args, **kwargs)
File "/home/hey/Programs/anaconda3/lib/python3.6/site-packages/keras/engine/network.py", line 237, in _init_graph_network
self.inputs, self.outputs)
File "/home/hey/Programs/anaconda3/lib/python3.6/site-packages/keras/engine/network.py", line 1430, in _map_graph_network
str(layers_with_complete_input))
ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_3:0", shape=(?, ?, ?, 3), dtype=float32) at layer "input_3". The following previous layers were accessed without issue: []
I fail to see how to connect the networks. I would like to have the full vgg to do some U-Net type of connections, that's why I don't want to pop layers.
Any idea @datumbox , @bstriner, @joelthchao ??
@datumbox 's solution really helps my work. Thank you very much. Actually it seems that the "terrible" solution is the only working solution found online treating this situation.
I think the important feature like model splitting should be given in the official keras API, maybe someday it will.
Hi @engharat
I have not yet written a proper solution for that. The network graphs can be very complex, especially for networks that branch-out and merge a lot and this requires writting graph traversal algorithms. On the future I'll probably do this and contribute it back to Keras but I have not done it yet.
Below I send you the latest version of the "terrible" solution that I'm using. Rest assured it is equally terrible as the previous one:
def split(model, start, end): confs = model.get_config() kept_layers = set() for i, l in enumerate(confs['layers']): if i == 0: confs['layers'][0]['config']['batch_input_shape'] = model.layers[start].input_shape if i != start: confs['layers'][0]['name'] += str(random.randint(0, 100000000)) # rename the input layer to avoid conflicts on merge confs['layers'][0]['config']['name'] = confs['layers'][0]['name'] elif i < start or i > end: continue kept_layers.add(l['name']) # filter layers layers = [l for l in confs['layers'] if l['name'] in kept_layers] layers[1]['inbound_nodes'][0][0][0] = layers[0]['name'] # set conf confs['layers'] = layers confs['input_layers'][0][0] = layers[0]['name'] confs['output_layers'][0][0] = layers[-1]['name'] # create new model submodel = Model.from_config(confs) for l in submodel.layers: orig_l = model.get_layer(l.name) if orig_l is not None: l.set_weights(orig_l.get_weights()) return submodel
I'm getting an assertion error in Model.from_config(confs). Any chance of an explanation so I can try and work it out?
I'm having the same issue, any updates on this? :)
You're making a new input layer
Input(model.get_layer('bn_conv1').output_shape[1:]
and using an old outputoutput=model.get_layer('avg_pool').output
. These are not connected in any way so of course this fails.The model outputs are tensors based on the model inputs. You can't just use them for a different set of inputs. The simplest way to do what you want would be to build a new model using only the layers that you want.
m = Sequential() for layer in model.layers[42:69]: m.add(layer)
If you want to be able to reuse the outputs from the model, you might be able to use clone and replace but that would be a long discussion you should have on the tensorflow forums.
Cheers
* ValueError: A merge layer should be called on a list of inputs
@datumbox any sense of how to deal with shared layers? orig_l = model.get_layer(l.name)
fails for me because the orig_l name of the shared layer isn't the same as the submodel name. So in the original model the name will be shared_layer
and in the submodel it will be shared_layer_1
.
Graph disconnected normally means your input and output are not part of the same graph. If your input was not the variable you used to create your output, this is the error you will get.
this is exactly my case. and how can i solve this issue when the input is not my variable ( my original data) ? thank you
Most helpful comment
Graph disconnected normally means your input and output are not part of the same graph. If your input was not the variable you used to create your output, this is the error you will get.