Is there interst for a Keras Transformer Layer as suggested in https://github.com/skaae/transformer_network/issues/1?
If so i can port the Lasagne code to Keras. If someone can come up with unit test they would be greatly appreciated :)
Best regards Søren
Hi, I copy pasted @skaae 's lasagne layer to a keras layer (with proper citation, but pls check if that was enough). An example usage is here https://github.com/EderSantana/seya/blob/master/examples/Spatial%20Transformer%20Networks.ipynb
But I tried several architectures for the localization networks (locnet in the code) but I'm not sure results look correct. If anybody is interested pls check it out and let me know what you found out.
Ok. I'll close this issue. The citation is fine :)
I wish you could leave it open for a while to people try it out. I'm not sure if it is working properly... Could you, for example, let me know what kind architecture and how many epochs you trained your working example at?
Sure. Im not sure your b initialization is correct.
b = np.zeros((2, 3), dtype='float32')
b[0, 0] = 1
b[1, 1] = 1
b[0, 1] = 1
W = np.zeros((784, 6), dtype='float32')
weights = [W, b.flatten()]
Why do you initialize b[0,1] to one? In the paper they also initialize W to zero.
Also the activation function for the last layer should be linear. That might not be clear from the docstring.
I also tried initializing to 0.5 as you did in your test file. It seems to represent the entire image with 1. I thought that would be a good start as well.
The locnet has linear output in my example.
Maybe we misunderstand each other :).
I see that you took the initialization of b from the example. The example shows that the transformation layer kan skew the image and zoom. If you compare the plotted image with cat.jpg you'll see that the plotted image is zoomed in and skewed.
If you want initialize the layer to the identity transform you should initialize b to:
| 1 0 0 |
| 0 1 0 |
Again you can comfirm this by looking at the plotted image.
best regards Sørren
Is your ST layer merged into Keras' code?
Not yet, you have to try it with edersantana/seya right now. Some feedback would be appreciated.
@skaae I believe I found out what was "wrong" check new results:
https://github.com/EderSantana/seya/blob/master/examples/Spatial%20Transformer%20Networks.ipynb
Basically, we can't let the theta be totally free I fixed the rotation/scaling part to be between 0 and 1 and the translation part to be between 0 and size of the image:
https://github.com/EderSantana/seya/blob/master/seya/layers/attention.py#L47-L53
Before, the spatial transformer was focusing on empty space and the gradient was never good enough to let it come back.
Hmm. Im don’t had that problem. Have you seen the images in my transformer_net repo? There the network seems to be correctly zooming?
The convolution layer in your network seems small?
locnet.add(Convolution2D(1, 1, 1, 1))
Is that a single 1x1 filter or?
In any case I think you need a bigger localization layer.
For the localization network I used something similar to what they used in the article:
l_pool0_loc = pool(l_dim, pool_size=(2, 2))
l_conv0_loc = conv(l_pool0_loc, num_filters=20, filter_size=(3, 3),
name='input', W=W_ini)
l_pool1_loc = pool(l_conv0_loc, pool_size=(2, 2))
l_conv1_loc = conv(l_pool1_loc, num_filters=20, filter_size=(3, 3),
name='l_conv1_loc', W=W_ini)
l_conv1_loc = lasagne.layers.DropoutLayer(l_conv1_loc, p=sh_drp)
l_pool2_loc = pool(l_conv1_loc, pool_size=(2, 2))
l_conv2_loc = conv(l_pool2_loc, num_filters=20, filter_size=(3, 3),
name='l_conv2_loc', W=W_ini)
On 09 Aug 2015, at 23:08, Eder Santana [email protected] wrote:
@skaae I believe I found out what was "wrong" check new results:
https://github.com/EderSantana/seya/blob/master/examples/Spatial%20Transformer%20Networks.ipynbBasically, we can't let the theta be totally free I fixed the rotation/scaling part to be between 0 and 1 and the translation part to be between 0 x size of the image:
https://github.com/EderSantana/seya/blob/master/seya/layers/attention.py#L47-L53Before, the spatial transformer was focusing on empty space and the gradient was never good enough to let it come back.
—
Reply to this email directly or view it on GitHub.
I tried several sizes actually.
From everything I tried, the only thing that seemed to solve the problem was to control the translation part of the system.
How many epochs did it take you to train your model?
ok. I don’t have access to GPU before the end of the week, so I can’t check before then
On 10 Aug 2015, at 00:13, Eder Santana [email protected] wrote:
I tried several sizes actually.
From everything I tried, the only thing that seemed to solve the problem was to control the translation part of the system—
Reply to this email directly or view it on GitHub.
I tried with the Lasagne layer and it seems to be working fine. The experiment is here:
https://github.com/skaae/Recipes/blob/spatial_transform/examples/spatial_transformer_network.ipynb
That's awesome!!! Let me have a close look at it.
It doesn't really reproduce the results from the paper because i do not have the dataset. I didn't try other network architectures, so most likely something else will work better :)
It's fine, I just want to train a network that does not focus blank spaces :D
I'll see what I was doing wrong and tell it here.
Ok! its working now. :+1:
Thanks for publishing that notebook. I realized a few things:
(a subset of) these details should be what made the difference here.
What do you think?
@EderSantana Dear Eder, recently I tried to implement the Spatial Transformer Networks on MATLAB, and encountered similar problems as yours in this issue. What I aim to do is to establish the STN that can be combined into some typical CNN model (e.g. AlexNet). I find that after processing the very first image during the training, the gradient from deeper layers at theta d(z)/d(theta) is already very large, which will further modify the weights of the locnet and make the theta for the second image extremely large (several tens to several hundreds), resulting in all the xs and ys for the second image (and all images afterwards) out of the bound [-1,1] and have zeros as their feature map values, which is similar to the "focusing on empty spaces" in this issue.
Here are some details. I initialize the weight of the last layer (full-connection) of the locnet to produce theta as [1 0 0; 0 1 0], without using activation layer (like sigmoid) afterwards. And I use the entire AlexNet as the locnet architecture. I wonder how did you fixed your problem for the empty spaces? I see that the sigmoid function is used in some earlier versions of your attention.py, but was removed later. In my case, should an activation layer be used after the last fc layer of the locnet to restrict the range of theta? Or is it because my locnet is too large?
Many thanks!
@skaae Dear Sørren, I've browsed the Lasagne repo and find the STN have been implemented in lasagne/layers/special.py as TransformerLayer. However, I can't find the implementation of the gradient computation towards theta (using equation 7 in paper), which will be used to transfer the gradient from deeper layers to the locnet and thus update the value of theta. Since the transformer layer uses bilinear sampling, I expect some gradient calculation especially to it written in updates.py, but find nothing special there. I am expecting something like the "BilinearSamplerBHWD_updateGradInput" in the following file:
https://github.com/qassemoquab/stnbhwd/blob/master/generic/BilinearSamplerBHWD.c
Since I'm new here, sorry if this question sounds too naive. Many thanks!
Yes. Theano will calcualte the gradient automatically. You could create a gradient checker to verify that your gradients are correct.
Otherwise you can calculate som gradients with Theano and test if you get similar values.
Thanks for your reply! BTW, as the question I asked Eder earlier, would you enlighten me some idea as how Lasagne managed to constrain the range of theta so that most xs and ys won't result in out-of-bound blank spaces? In practice I find it very easy for theta to be large in absolute value without a final activation layer (e.g. sigmoid) of the locnet, which will cause xs and ys out of the [-1,1] range easily.
We don't. I initalize the weights of the final layer in the transformer layer to zero and the biases to the identity transform. I didn't really have problems with that. I guess you could use hardTanh or someting liek that?
@yuanyc06 you don't really need to constrain it. My problem was actually just a bad localization_net I think. If you use a fair model with the initialization @skaae mentioned, you won't really need to care about constraining anything. Backpropagation seems to just take care of everything again. Also note that if your problem is simple, you don't want a huge net for the localization part, it is supposed to make your computation more efficient.
are you going to make it available for tensorflow backend ?
@nes123 you can check https://github.com/oarriaga/spatial_transformer_networks for a working implementation of Seya's spatial transformer network in keras with the tensorflow backend.
Most helpful comment
@nes123 you can check https://github.com/oarriaga/spatial_transformer_networks for a working implementation of Seya's spatial transformer network in keras with the tensorflow backend.