Keras: Feeding input to intermediate layer fails with Graph disconnected Exception

Created on 18 Jan 2017  路  22Comments  路  Source: keras-team/keras

I am writing a pipeline that fine-tunes the pre-trained models of Keras 1.2.0. To speed it up, instead of freezing the layers I try to:

  1. Feed the training images once to the "frozen" part of the network and store the intermediate output to a file.
  2. Train iteratively the remaining network by feeding directly the intermediate output from the file.

If you don't use data augmentation, this should yield a significant speed improvement. Unfortunately the step 2 fails with a "Graph Disconnected" exception. I tried alternative ways to do this (such as using the K.function() approach) but it still fails.

Below you will find a simple example that reproduces the problem and the error message:

import keras.applications
from keras.models import Model
from keras.layers import Input
from keras.preprocessing import image
from keras.applications.imagenet_utils import preprocess_input
import numpy as np

# Read some random image
img = image.load_img('/path/to/image.jpg', target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

# Load a pre-trained model
model = keras.applications.resnet50.ResNet50(weights='imagenet', include_top=False, input_tensor=Input(shape=(224, 224, 3)))

# Feed the image and get the bn_conv1 output: WORKS!
bn_conv1_model = Model(input=model.input, output=model.get_layer('bn_conv1').output)
bn_conv1_output = bn_conv1_model.predict(x)

# Feed directly the bn_conv1 output to the remaining layers: FAILS!
avg_pool_model = Model(input=Input(model.get_layer('bn_conv1').output_shape[1:]), output=model.get_layer('avg_pool').output) # This line throws exception
avg_pool_output = avg_pool_model.predict(bn_conv1_output)

The error message is:
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 1987, in __init__
str(layers_with_complete_input))
RuntimeError: Graph disconnected: cannot obtain value for tensor Tensor("input_1:0", shape=(?, 224, 224, 3), dtype=float32) at layer "input_1". The following previous layers were accessed without issue: []

stale

Most helpful comment

Graph disconnected normally means your input and output are not part of the same graph. If your input was not the variable you used to create your output, this is the error you will get.

All 22 comments

I am also facing similar problem and I believe its something to do with Model() function.

RuntimeError: Graph disconnected: cannot obtain value for tensor input_11 at layer "input_11". 

You're making a new input layer Input(model.get_layer('bn_conv1').output_shape[1:] and using an old output output=model.get_layer('avg_pool').output. These are not connected in any way so of course this fails.

The model outputs are tensors based on the model inputs. You can't just use them for a different set of inputs. The simplest way to do what you want would be to build a new model using only the layers that you want.

m = Sequential()
for layer in model.layers[42:69]:
  m.add(layer)

If you want to be able to reuse the outputs from the model, you might be able to use clone and replace but that would be a long discussion you should have on the tensorflow forums.

Cheers

Graph disconnected normally means your input and output are not part of the same graph. If your input was not the variable you used to create your output, this is the error you will get.

That was the first thing I tried but the Sequential API will not work with complicated models such as ResNet50 (see below the error message).

I understand now that splitting a non-Sequential model is not an easy task since a layer on the bottom can in theory take multiple inputs from higher layers. As I am not yet familiar with the low-level API of Keras, I wrote a terrible solution that extracts the json architecture, manipilates it and reconstructs the partial model. This works and actually speeds up the model-tuning of models by a factor of 3 but it is far from an elegant and generic solution. I can share the snippet if you want.

bottom = keras.applications.resnet50.ResNet50(weights='imagenet', include_top=False, input_tensor=Input(shape=(224, 224, 3), name='input'))
m = Sequential()
for l in bottom.layers[0:141]:
    m.add(l)

Traceback (most recent call last):
File "", line 2, in
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 324, in add
output_tensor = layer(self.outputs[0])
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 517, in __call__
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 571, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 155, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "/usr/local/lib/python2.7/dist-packages/keras/layers/normalization.py", line 128, in call
self.add_updates([K.moving_average_update(self.running_mean, mean, self.momentum),
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 364, in moving_average_update
variable, value, momentum)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 70, in assign_moving_average
update_delta = _zero_debias(variable, value, decay)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/moving_averages.py", line 177, in _zero_debias
trainable=False)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1024, in get_variable
custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 850, in get_variable
custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 346, in get_variable
validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 331, in _true_getter
caching_device=caching_device, validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 632, in _get_single_variable
name, "".join(traceback.format_list(tb))))
ValueError: Variable bn_conv1_running_mean/biased already exists, disallowed. Did you mean to set reuse=True in VarScope? Originally defined at:

File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 364, in moving_average_update
variable, value, momentum)
File "/usr/local/lib/python2.7/dist-packages/keras/layers/normalization.py", line 128, in call
self.add_updates([K.moving_average_update(self.running_mean, mean, self.momentum),
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 155, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))

Just to clarify my use-case (similar to issue 5083), I try to fine-tune a ResNet50 as on the InceptionV3 snippet of the Keras documentation. The only problem with the original snippet is that it's very slow. Instead of freezing the various layers one could, split the model in two parts, run the data through the first bit & persist the intermediate outputs on disk and then train/tune quickly the second part.

Here are the steps that I am following:

Part 1: Training the network with custom classification layers:

  • Fetch the bottom network of ResNet50 and build a Sequential top model for my own classes.
  • Feed the training data to the bottom network and persist their intermediate output in a file.
  • Feed directly the cached intermediate results to the top network and run lots of very quick iterations.
  • Join the bottom and top networks into a single model.

This approach speeds up the iterations by a factor of 20, comparing to freezing the layers.

Part 2: Fine-tuning the network:

  • Split the above model in two parts: layers from 0 to 140 and from 141 to the end.
  • Feed the original training data through the first part and store the outputs in a file.
  • Feed the cached outputs to the second part and run lots of quick iterations to fine tune it.
  • Join the 2 parts into a single model.

This speeds up the iterations by a factor of 3, comparing to freezing the layers.

As I said earlier to split the model, I export the configuration, manipulate it and reconstruct part of the network. This is a very wasteful and poor solution. Do you think there is any low-level API that could help me achieve the same result?

Here is my "terrible" & non-generic solution for splitting a model:

def split_model(model, start, end):
    confs = model.get_config()
    weights = {l.name:l.get_weights() for l in model.layers}
    # split model
    kept_layers = set()
    for i, l in enumerate(confs['layers']):
        if i == 0:
            confs['layers'][0]['config']['batch_input_shape'] = model.layers[start].input_shape
        elif i < start or i > end:
            continue
        kept_layers.add(l['name'])
    # filter layers
    layers = [l for l in confs['layers'] if l['name'] in kept_layers]
    layers[1]['inbound_nodes'][0][0][0] = layers[0]['name']
    # set conf
    confs['layers'] = layers
    confs['input_layers'][0][0] = layers[0]['name']
    confs['output_layers'][0][0] = layers[-1]['name']
    # create new model
    newModel = Model.from_config(confs)
    for l in newModel.layers:
        l.set_weights(weights[l.name])
    return newModel

What backend are you on? There is a possible clean solution but it is backend specific.

You could hypothetically manipulate the graph to get just the portion you need. Something like this will work, but you might need to add some attributes to make Model accept it.

middle_layer_output = model.layers[...].output
output = model.layers[...].output
swapped_input = Input(...)
output_swapped = theano.clone(output, {middle_layer_output:swapped_input})

Something like this is a really handy trick, but no one has put cloning into K.backend yet.

Unfortunately I use Tensorflow. I will have a look on the implementation of Theano to see how it works.

Should I leave the ticket open? I guess the question turned into a feature request.

You should be able to do something similar in Tensorflow using graph_replace.
https://www.tensorflow.org/api_docs/python/contrib.graph_editor/module_transform#graph_replace

Please let me know if you have any success. If you can get something short like that to work, I think adding theano.clone and tensorflow.contrib.graph_editor.transform.graph_replace to keras backend would be an awesome feature.

Cheers

Sounds good. Leave it open, I'll familiarize with the low-level APIs of Keras & Tensorflow, evaluate the solution offered by Theano and I'll have a go to see if I can write a clean solution. If I succeed, I'll send a pull-request. :)

@datumbox I like your "terrible" model splitter. I've tried it on the InceptV3 model (for the same reason: slowwwwww...). It seems to subtract the botom section 0-172 ok, but subsequently subtracting the top section 173:216 fails. Not sure if it was intended to work for InceptV3 or for the top section? If this works, it would be rather straight forward to stack the models using the functional API. This would allow to train the sections separately. Note, I am not sure about exact split, the refrenced example Keras applications suggests a split 0-172 (where 172 is not included), but I would assume that the merge block (mixed8) at 172 should be included in the bottom part to create a logical model termination? Anyway, I think your limits are inclusive.
If you can make your splitter work for both top and bottom I can try stacking them. Jan

Yes I agree, you should definitely include it. If we visualize the model we will see that it makes absolutely no sense not to include the merge. :)

My plan is to complete a less terrible solution which is more generic and allows you to split specific nodes. I'll post it here once I have it or send a pull request.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.

Hi @datumbox I'm fighting your same problem and I'm trying to use your "terrible" solution, that seems to be the most viable actually. I'm using keras 2.0 and tensorflow backend. Did you end up developing an updated version of the split model function?

Hi @engharat

I have not yet written a proper solution for that. The network graphs can be very complex, especially for networks that branch-out and merge a lot and this requires writting graph traversal algorithms. On the future I'll probably do this and contribute it back to Keras but I have not done it yet.

Below I send you the latest version of the "terrible" solution that I'm using. Rest assured it is equally terrible as the previous one:

def split(model, start, end):
    confs = model.get_config()
    kept_layers = set()
    for i, l in enumerate(confs['layers']):
        if i == 0:
            confs['layers'][0]['config']['batch_input_shape'] = model.layers[start].input_shape
            if i != start:
                confs['layers'][0]['name'] += str(random.randint(0, 100000000)) # rename the input layer to avoid conflicts on merge
                confs['layers'][0]['config']['name'] = confs['layers'][0]['name']
        elif i < start or i > end:
            continue
        kept_layers.add(l['name'])
    # filter layers
    layers = [l for l in confs['layers'] if l['name'] in kept_layers]
    layers[1]['inbound_nodes'][0][0][0] = layers[0]['name']
    # set conf
    confs['layers'] = layers
    confs['input_layers'][0][0] = layers[0]['name']
    confs['output_layers'][0][0] = layers[-1]['name']
    # create new model
    submodel = Model.from_config(confs)
    for l in submodel.layers:
        orig_l = model.get_layer(l.name)
        if orig_l is not None:
            l.set_weights(orig_l.get_weights())
    return submodel

Hello Everyone,

I am trying to do similar thing. I saved the intermediate output to the disk and now using this want to train and score the remaining model. Was there any addition to Keras API for this issue? @datumbox Were you able to find another solution?

I have a similar problem

    #----------------------------------------------
    #           Encoder
    #----------------------------------------------

    img_input = Input((input_height, input_width, 1))        
    x = Lambda(lambda x: K.repeat_elements(x, 3, axis=3))(img_input)    
    adapter = Model(img_input, x)

    vgg = VGG16(include_top=False, weights='imagenet')        
    vgg_plus_input = Model(outputs=vgg(adapter.output), inputs=adapter.input)

    block_of_interest = 'block4_pool'


    #----------------------------------------------
    #           Decoder
    #----------------------------------------------


    o = vgg.get_layer(block_of_interest).output

    o = ( ZeroPadding2D( (1,1) , data_format='channels_last' ))(o)
    o = ( Conv2D(512, (3, 3), padding='valid', data_format='channels_last'))(o)
    o = ( BatchNormalization())(o)

    o = ( UpSampling2D( (2,2), data_format='channels_last'))(o)
    o = ( ZeroPadding2D( (1,1), data_format='channels_last'))(o)
    o = ( Conv2D( 256, (3, 3), padding='valid', data_format='channels_last'))(o)
    o = ( BatchNormalization())(o)

    o = ( UpSampling2D((2,2)  , data_format='channels_last' ) )(o)
    o = ( ZeroPadding2D((1,1) , data_format='channels_last' ))(o)
    o = ( Conv2D( 128 , (3, 3), padding='valid' , data_format='channels_last' ))(o)
    o = ( BatchNormalization())(o)

    o = ( UpSampling2D((2,2)  , data_format='channels_last' ))(o)
    o = ( ZeroPadding2D((1,1)  , data_format='channels_last' ))(o)
    o = ( Conv2D( 64 , (3, 3), padding='valid'  , data_format='channels_last' ))(o)
    o = ( BatchNormalization())(o)


    o =  Conv2D( n_classes , (3, 3) , padding='same', data_format='channels_last' )( o )
    o_shape = Model(img_input , o ).output_shape
    outputHeight = o_shape[2]
    outputWidth = o_shape[3]

    o = (Reshape((  -1  , outputHeight*outputWidth   )))(o)
    o = (Permute((2, 1)))(o)
    o = (Activation('softmax'))(o)




    model = Model( vgg_plus_input.input , o )
    model.outputWidth = outputWidth
    model.outputHeight = outputHeight

    model.compile(loss='categorical_crossentropy',
                  optimizer= optimizer_name ,
                  metrics=['accuracy', mean_iou])

and the error I get

      File "/home/hey/work/kaggleSegmentation/models.py", line 87, in VGGSegnet
        o_shape = Model(img_input , o ).output_shape

      File "/home/hey/Programs/anaconda3/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
        return func(*args, **kwargs)

      File "/home/hey/Programs/anaconda3/lib/python3.6/site-packages/keras/engine/network.py", line 93, in __init__
        self._init_graph_network(*args, **kwargs)

      File "/home/hey/Programs/anaconda3/lib/python3.6/site-packages/keras/engine/network.py", line 237, in _init_graph_network
        self.inputs, self.outputs)

      File "/home/hey/Programs/anaconda3/lib/python3.6/site-packages/keras/engine/network.py", line 1430, in _map_graph_network
        str(layers_with_complete_input))

    ValueError: Graph disconnected: cannot obtain value for tensor Tensor("input_3:0", shape=(?, ?, ?, 3), dtype=float32) at layer "input_3". The following previous layers were accessed without issue: []

I fail to see how to connect the networks. I would like to have the full vgg to do some U-Net type of connections, that's why I don't want to pop layers.

Any idea @datumbox , @bstriner, @joelthchao ??

@datumbox 's solution really helps my work. Thank you very much. Actually it seems that the "terrible" solution is the only working solution found online treating this situation.

I think the important feature like model splitting should be given in the official keras API, maybe someday it will.

Hi @engharat

I have not yet written a proper solution for that. The network graphs can be very complex, especially for networks that branch-out and merge a lot and this requires writting graph traversal algorithms. On the future I'll probably do this and contribute it back to Keras but I have not done it yet.

Below I send you the latest version of the "terrible" solution that I'm using. Rest assured it is equally terrible as the previous one:

def split(model, start, end):
    confs = model.get_config()
    kept_layers = set()
    for i, l in enumerate(confs['layers']):
        if i == 0:
            confs['layers'][0]['config']['batch_input_shape'] = model.layers[start].input_shape
            if i != start:
                confs['layers'][0]['name'] += str(random.randint(0, 100000000)) # rename the input layer to avoid conflicts on merge
                confs['layers'][0]['config']['name'] = confs['layers'][0]['name']
        elif i < start or i > end:
            continue
        kept_layers.add(l['name'])
    # filter layers
    layers = [l for l in confs['layers'] if l['name'] in kept_layers]
    layers[1]['inbound_nodes'][0][0][0] = layers[0]['name']
    # set conf
    confs['layers'] = layers
    confs['input_layers'][0][0] = layers[0]['name']
    confs['output_layers'][0][0] = layers[-1]['name']
    # create new model
    submodel = Model.from_config(confs)
    for l in submodel.layers:
        orig_l = model.get_layer(l.name)
        if orig_l is not None:
            l.set_weights(orig_l.get_weights())
    return submodel

I'm getting an assertion error in Model.from_config(confs). Any chance of an explanation so I can try and work it out?

I'm having the same issue, any updates on this? :)

You're making a new input layer Input(model.get_layer('bn_conv1').output_shape[1:] and using an old output output=model.get_layer('avg_pool').output. These are not connected in any way so of course this fails.

The model outputs are tensors based on the model inputs. You can't just use them for a different set of inputs. The simplest way to do what you want would be to build a new model using only the layers that you want.

m = Sequential()
for layer in model.layers[42:69]:
  m.add(layer)

If you want to be able to reuse the outputs from the model, you might be able to use clone and replace but that would be a long discussion you should have on the tensorflow forums.

Cheers

* ValueError: A merge layer should be called on a list of inputs

@datumbox any sense of how to deal with shared layers? orig_l = model.get_layer(l.name) fails for me because the orig_l name of the shared layer isn't the same as the submodel name. So in the original model the name will be shared_layer and in the submodel it will be shared_layer_1.

Graph disconnected normally means your input and output are not part of the same graph. If your input was not the variable you used to create your output, this is the error you will get.

this is exactly my case. and how can i solve this issue when the input is not my variable ( my original data) ? thank you

Was this page helpful?
0 / 5 - 0 ratings

Related issues

amityaffliction picture amityaffliction  路  3Comments

braingineer picture braingineer  路  3Comments

Imorton-zd picture Imorton-zd  路  3Comments

snakeztc picture snakeztc  路  3Comments

harishkrishnav picture harishkrishnav  路  3Comments