Keras: How to make the trained network to handle images of different sizes

Created on 16 Feb 2017  路  8Comments  路  Source: keras-team/keras

Hi, I have been experimenting the u-net code . It works fine. In order to let the trained network to be able to predict the images with different size, I changed the input layer from inputs = Input((1, img_rows, img_cols)) to inputs = Input((1, None, None)). However, it gives the following error message

Traceback (most recent call last):
  File "train-unet-v3b.py", line 113, in <module>
    train_and_predict()
  File "train-unet-v3b.py", line 96, in train_and_predict
    model = get_unet()
  File "train-unet-v3b.py", line 54, in get_unet
    up6 = merge([UpSampling2D(size=(2, 2))(conv5), conv4], mode='concat', concat_axis=1)
  File "/development/gtfw/lib/python3.4/site-packages/Keras-1.0.3-py3.4.egg/keras/engine/topology.py", line 485, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/development/gtfw/lib/python3.4/site-packages/Keras-1.0.3-py3.4.egg/keras/engine/topology.py", line 543, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/development/gtfw/lib/python3.4/site-packages/Keras-1.0.3-py3.4.egg/keras/engine/topology.py", line 148, in create_node
    output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
  File "/development/gtfw/lib/python3.4/site-packages/Keras-1.0.3-py3.4.egg/keras/layers/convolutional.py", line 1043, in call
    self.dim_ordering)
  File "/development/gtfw/lib/python3.4/site-packages/Keras-1.0.3-py3.4.egg/keras/backend/tensorflow_backend.py", line 508, in resize_images
    X.set_shape((None, None, original_shape[2] * height_factor, original_shape[3] * width_factor))
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

What's the right way to let the u-net or FCN type of training to be able to handle images with diffferent sizes.

stale

Most helpful comment

@xiao7199 maybe use K.shape so that the size could be a dynamic variable.

All 8 comments

You probably don't want to do what you want to do. Instead you should try pre-processing your images by either scaling or cropping them. Because not doing so will probably a lot of scale&translation invariance, and deteriorate performance.

If you still want to allow the network to process arbitrary input shape, you will have to find which layer (either upsampling, merge or convolution) makes the shape return an error, and whether the error is at the keras layer shape (i.e. the bug is in get_output_shape) (in which case you can probably relax it by making your own custom layer which return a shape of undefined dimensions) or if it is a error of the backend level, which only allows to work with fix dimensions, (in which case you could try theano and see if you have more luck)

If the issue is with upsampling maybe you could try using dilated Convolutions instead (a trous convolutions).

I have the similar problem, but I take a different approach.

What I did is
1. Preprocessing all the image to zero-padding to the same size (500,500) for PASCAL VOC
2. Writing a custom layer to crop the CNN feature from (500x500) to its original scale before
pixel-wise prediction. This layer takes 2 input:
one is from CNN feature and the other is a list of offset for each image.

But my work needs the original image shape to do other calculation in my custom layer.

So the problem is that when I'm trying to crop the feature based on these variable and get the shape of the extracted feature by 'K.int_shape(cropped_feature_map)', it raised an error.
'Not a Keras tensor'

It turned out that the extracted feature map is not a 'keras variable' and I'm working on a theano backend, I think it's a theano tensor now.

Due to a bug of tensorflow, I have to work with theano backend. But it seemed that K.int_shape() is not suitable for theano tensor variable.
Any idea ?

@xiao7199 maybe use K.shape so that the size could be a dynamic variable.

@unrealwill Yes!! That's what I want, I can't believe I missed that.
Thanks !!!

@unrealwill,

Thank you for the reply!

According to what you suggested, the training process still keeps as previously, based on fixed 128128 training set size. In the prediction stage, I just re-scale the testing image to 128128, and can re-scale the predicted mask back. Is my understanding correct?

I am not very clear about your statement of "Because not doing so will probably a lot of scale&translation invariance, and deteriorate performance."
It seems to me "scale the testing image and re-scale it bacK" involves some extra scaling operations. I get confused here.

Moreover, you mentioned "If the issue is with upsampling maybe you could try using dilated Convolutions instead (a trous convolutions)."
Do you mean I should replace the traditional convolution with dilated convolution in the upsampling branch?
For instance, in the upsampling branch, the author uses

conv7 = Convolution2D(128, 3, 3, activation='relu', border_mode='same')(up7)
conv7 = Convolution2D(128, 3, 3, activation='relu', border_mode='same')(conv7)

I can replace Convolution2D with dilated convolution. Is that right?

Thanks a lot for your help.

Yes you got the pre-scaling/post-scaling right.

What I meant about scale&translation invariance, was that a layer usually adapt its features in a spatially dependent way. Imagine for example that the distance between eyes is always about 5 pixels, one layer will probably discover a feature for this, but if now you add a close-up (i.e hi-res version) of the same image then the distances between the eyes will be 15 pixels, and the feature which work before will not fire).

The max-pooling/up-sampling is usually done so that the next convolution can "span" across a larger/smaller area of the image.
They also reduce the segmentation quality, but they reduce the computation cost.

To remove the max-pooling/up-sampling and have a similar result you would need to have all convolution layers after an up-sampling replaced by AtrousConvolution2D with their dilatation rate increased/reduced by the max-pooling/up-sampling amount. (The pseudo-rule is that the convolution layer should see approximately the same portion of the image as they would have). (Max-pooling is done without sub-sampling with stride=1, and just remove up-sampling )

Hi unrealwill,

Thanks a lot.

My studied problem has about 300~400 images with large size, such as 10241024. I generate many training sets by randomly sampling from the original images. The sampled patched are of size 128128. In other words, I am using sub-images to train the model. Currently, in the prediction stage, I still perform the prediction over the sub-images(128 *128), and stitch them back to the large image.
In the context of this kind of training scenario, can I apply this scale-rescale pipeline in stead of the current subimage-stitching pipeline?

In your particular context (or for example finding cars/planes in satellite data imagery), the scale-rescale pipeline is not appropriate, because at prediction time the input should be of the same nature and scale that what you trained it on.
You are right doing the sub-image-stiching pipeline, especially if you are memory limited.
Alternatively you could probably get the Unet, run on a larger image in a fully convolutionnal fashion using theano.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Imorton-zd picture Imorton-zd  路  3Comments

braingineer picture braingineer  路  3Comments

amityaffliction picture amityaffliction  路  3Comments

anjishnu picture anjishnu  路  3Comments

LuCeHe picture LuCeHe  路  3Comments