Keras: Keras Transformer Network Layer

Created on 4 Aug 2015 · 25Comments · Source: keras-team/keras

Is there interst for a Keras Transformer Layer as suggested in https://github.com/skaae/transformer_network/issues/1?

If so i can port the Lasagne code to Keras. If someone can come up with unit test they would be greatly appreciated :)

Best regards Søren

stale

Source

skaae

Most helpful comment

@nes123 you can check https://github.com/oarriaga/spatial_transformer_networks for a working implementation of Seya's spatial transformer network in keras with the tensorflow backend.

oarriaga on 17 Feb 2017

👍5

All 25 comments

Hi, I copy pasted @skaae 's lasagne layer to a keras layer (with proper citation, but pls check if that was enough). An example usage is here https://github.com/EderSantana/seya/blob/master/examples/Spatial%20Transformer%20Networks.ipynb

But I tried several architectures for the localization networks (locnet in the code) but I'm not sure results look correct. If anybody is interested pls check it out and let me know what you found out.

EderSantana on 4 Aug 2015

Ok. I'll close this issue. The citation is fine :)

skaae on 4 Aug 2015

I wish you could leave it open for a while to people try it out. I'm not sure if it is working properly... Could you, for example, let me know what kind architecture and how many epochs you trained your working example at?

EderSantana on 4 Aug 2015

Sure. Im not sure your b initialization is correct.

b = np.zeros((2, 3), dtype='float32')
b[0, 0] = 1
b[1, 1] = 1
b[0, 1] = 1
W = np.zeros((784, 6), dtype='float32')
weights = [W, b.flatten()]

Why do you initialize b[0,1] to one? In the paper they also initialize W to zero.
Also the activation function for the last layer should be linear. That might not be clear from the docstring.

skaae on 4 Aug 2015

I also tried initializing to 0.5 as you did in your test file. It seems to represent the entire image with 1. I thought that would be a good start as well.

The locnet has linear output in my example.

EderSantana on 4 Aug 2015

Maybe we misunderstand each other :).
I see that you took the initialization of b from the example. The example shows that the transformation layer kan skew the image and zoom. If you compare the plotted image with cat.jpg you'll see that the plotted image is zoomed in and skewed.

If you want initialize the layer to the identity transform you should initialize b to:

| 1  0  0 |
| 0  1  0 |

Again you can comfirm this by looking at the plotted image.

best regards Sørren

skaae on 5 Aug 2015

Is your ST layer merged into Keras' code?

ghost on 6 Aug 2015

Not yet, you have to try it with edersantana/seya right now. Some feedback would be appreciated.

EderSantana on 7 Aug 2015

@skaae I believe I found out what was "wrong" check new results:
https://github.com/EderSantana/seya/blob/master/examples/Spatial%20Transformer%20Networks.ipynb

Basically, we can't let the theta be totally free I fixed the rotation/scaling part to be between 0 and 1 and the translation part to be between 0 and size of the image:
https://github.com/EderSantana/seya/blob/master/seya/layers/attention.py#L47-L53

Before, the spatial transformer was focusing on empty space and the gradient was never good enough to let it come back.

EderSantana on 10 Aug 2015

Hmm. Im don’t had that problem. Have you seen the images in my transformer_net repo? There the network seems to be correctly zooming?

The convolution layer in your network seems small?

locnet.add(Convolution2D(1, 1, 1, 1))

Is that a single 1x1 filter or?
In any case I think you need a bigger localization layer.
For the localization network I used something similar to what they used in the article:
l_pool0_loc = pool(l_dim, pool_size=(2, 2))
l_conv0_loc = conv(l_pool0_loc, num_filters=20, filter_size=(3, 3),
name='input', W=W_ini)
l_pool1_loc = pool(l_conv0_loc, pool_size=(2, 2))
l_conv1_loc = conv(l_pool1_loc, num_filters=20, filter_size=(3, 3),
name='l_conv1_loc', W=W_ini)
l_conv1_loc = lasagne.layers.DropoutLayer(l_conv1_loc, p=sh_drp)
l_pool2_loc = pool(l_conv1_loc, pool_size=(2, 2))
l_conv2_loc = conv(l_pool2_loc, num_filters=20, filter_size=(3, 3),
name='l_conv2_loc', W=W_ini)

On 09 Aug 2015, at 23:08, Eder Santana [email protected] wrote:

@skaae I believe I found out what was "wrong" check new results:
https://github.com/EderSantana/seya/blob/master/examples/Spatial%20Transformer%20Networks.ipynb

Basically, we can't let the theta be totally free I fixed the rotation/scaling part to be between 0 and 1 and the translation part to be between 0 x size of the image:
https://github.com/EderSantana/seya/blob/master/seya/layers/attention.py#L47-L53

Before, the spatial transformer was focusing on empty space and the gradient was never good enough to let it come back.

—
Reply to this email directly or view it on GitHub.

skaae on 10 Aug 2015

I tried several sizes actually.
From everything I tried, the only thing that seemed to solve the problem was to control the translation part of the system.

How many epochs did it take you to train your model?

EderSantana on 10 Aug 2015

ok. I don’t have access to GPU before the end of the week, so I can’t check before then

On 10 Aug 2015, at 00:13, Eder Santana [email protected] wrote:

I tried several sizes actually.
From everything I tried, the only thing that seemed to solve the problem was to control the translation part of the system

—
Reply to this email directly or view it on GitHub.

skaae on 10 Aug 2015

I tried with the Lasagne layer and it seems to be working fine. The experiment is here:

https://github.com/skaae/Recipes/blob/spatial_transform/examples/spatial_transformer_network.ipynb

skaae on 15 Aug 2015

That's awesome!!! Let me have a close look at it.

EderSantana on 15 Aug 2015

It doesn't really reproduce the results from the paper because i do not have the dataset. I didn't try other network architectures, so most likely something else will work better :)

skaae on 15 Aug 2015

It's fine, I just want to train a network that does not focus blank spaces :D
I'll see what I was doing wrong and tell it here.

EderSantana on 15 Aug 2015

Ok! its working now. :+1:
Thanks for publishing that notebook. I realized a few things:

Your loc_net does not have nonlinearities after convolutions.
Your batch size are 2x larger than what I was previously using
Your downsampling factor is 3

(a subset of) these details should be what made the difference here.
What do you think?

EderSantana on 16 Aug 2015

@EderSantana Dear Eder, recently I tried to implement the Spatial Transformer Networks on MATLAB, and encountered similar problems as yours in this issue. What I aim to do is to establish the STN that can be combined into some typical CNN model (e.g. AlexNet). I find that after processing the very first image during the training, the gradient from deeper layers at theta d(z)/d(theta) is already very large, which will further modify the weights of the locnet and make the theta for the second image extremely large (several tens to several hundreds), resulting in all the xs and ys for the second image (and all images afterwards) out of the bound [-1,1] and have zeros as their feature map values, which is similar to the "focusing on empty spaces" in this issue.

Here are some details. I initialize the weight of the last layer (full-connection) of the locnet to produce theta as [1 0 0; 0 1 0], without using activation layer (like sigmoid) afterwards. And I use the entire AlexNet as the locnet architecture. I wonder how did you fixed your problem for the empty spaces? I see that the sigmoid function is used in some earlier versions of your attention.py, but was removed later. In my case, should an activation layer be used after the last fc layer of the locnet to restrict the range of theta? Or is it because my locnet is too large?

Many thanks!

yuanyc06 on 25 Sep 2015

@skaae Dear Sørren, I've browsed the Lasagne repo and find the STN have been implemented in lasagne/layers/special.py as TransformerLayer. However, I can't find the implementation of the gradient computation towards theta (using equation 7 in paper), which will be used to transfer the gradient from deeper layers to the locnet and thus update the value of theta. Since the transformer layer uses bilinear sampling, I expect some gradient calculation especially to it written in updates.py, but find nothing special there. I am expecting something like the "BilinearSamplerBHWD_updateGradInput" in the following file:

https://github.com/qassemoquab/stnbhwd/blob/master/generic/BilinearSamplerBHWD.c

Since I'm new here, sorry if this question sounds too naive. Many thanks!

yuanyc06 on 25 Sep 2015

Yes. Theano will calcualte the gradient automatically. You could create a gradient checker to verify that your gradients are correct.

Otherwise you can calculate som gradients with Theano and test if you get similar values.

skaae on 25 Sep 2015

Thanks for your reply! BTW, as the question I asked Eder earlier, would you enlighten me some idea as how Lasagne managed to constrain the range of theta so that most xs and ys won't result in out-of-bound blank spaces? In practice I find it very easy for theta to be large in absolute value without a final activation layer (e.g. sigmoid) of the locnet, which will cause xs and ys out of the [-1,1] range easily.

yuanyc06 on 25 Sep 2015

We don't. I initalize the weights of the final layer in the transformer layer to zero and the biases to the identity transform. I didn't really have problems with that. I guess you could use hardTanh or someting liek that?

skaae on 25 Sep 2015

@yuanyc06 you don't really need to constrain it. My problem was actually just a bad localization_net I think. If you use a fair model with the initialization @skaae mentioned, you won't really need to care about constraining anything. Backpropagation seems to just take care of everything again. Also note that if your problem is simple, you don't want a huge net for the localization part, it is supposed to make your computation more efficient.

EderSantana on 26 Sep 2015

are you going to make it available for tensorflow backend ?

nes123 on 26 Dec 2016

@nes123 you can check https://github.com/oarriaga/spatial_transformer_networks for a working implementation of Seya's spatial transformer network in keras with the tensorflow backend.

oarriaga on 17 Feb 2017

👍5

Was this page helpful?

0 / 5 - 0 ratings