Hi!
I want to make a custom layer which is supposed to fuse the output of a Dense Layer with a Convolution2D Layer.
The Idea came from this paper and here's the network:
the fusion layer tries to fuse the Convolution2D tensor (256x28x28) with the Dense tensor (256). here's the equation for it:
y_global => Dense layer output with shape 256
y_mid => Convolution2D layer output with shape 256x28x28
Here's the description of the paper about the Fusion process:
I ended up making a new custom layer like below:
class FusionLayer(Layer):
def __init__(self, output_dim, **kwargs):
self.output_dim = output_dim
super(FusionLayer, self).__init__(**kwargs)
def build(self, input_shape):
input_dim = input_shape[1][1]
initial_weight_value = np.random.random((input_dim, self.output_dim))
self.W = K.variable(initial_weight_value)
self.b = K.zeros((input_dim,))
self.trainable_weights = [self.W, self.b]
def call(self, inputs, mask=None):
y_global = inputs[0]
y_mid = inputs[1]
# the code below should be modified
output = K.dot(K.concatenate([y_global, y_mid]), self.W)
output += self.b
return self.activation(output)
def get_output_shape_for(self, input_shape):
assert input_shape and len(input_shape) == 2
return (input_shape[0], self.output_dim)
I think I got the __init__ and build methods right but I don't know how to concatenate y_global (256 dimesnions) with y-mid (256x28x28 dimensions) in the call layer so that the output would be the same as the equation mentioned above.
How can I implement this equation in the call method?
Thanks so much...
UPDATE: any other way to successfully integrate the data of these 2 layers is also acceptable for me... it doesn't exactly have to be the way mentioned in the paper but it needs to at least return an acceptable output...
Have you checked out Merge layer? It seems like it can be done by combining Merge and Dense with functional Model API.
Merging a 256 dimension Dense vector with 256 Convolutions is the real problem...
One option is to flatten the convolutions and then merge them with the 256 dimension vector... but the size of the network grows so much (256 x 28 x 28 ~ 200,000 nodes) and even if we did that, we are doing something very wrong... it's as if the 256 vector 'dissolves' in all of the 200,000 data of the flattened vector and at the end it has nearly no effect on the outcome...
The second option (which I actually tested and failed!) is to change the 256 dense vector to a 196 vector and then resize it to make a 14x14 image out of it... THEN you can merge it with the subsampled 256 convolutions BUT this approach is also useless as if the lone 14x14 convolution has nearly no effect on the outcome... this again reminds me of the same 'dissolving' scenario that happens in option one...
The paper's Fusion approach actually fuses each and every one of the 256 convolutions with the vector and somehow affects all of them... when it's finished, we have 256 new convolutions which have been changed and fused with the vector resulting in the colorization to actually work...
Which brings us back to the concept of how to fuse vectors with convolutions in Keras? any other comments please ?!
I'm not sure what's the exact problem here. Is it about understanding the work? Or how to implement it?
Anyway, as far as I understand, it seems something similar to context vectors in seq2seq models, where a global, context vector is concatenated to all inputs of sequences in the decoder block. Here, y_global is literally global information so it should be fed into all the local vectors. Therefore you probably should duplicate the global vector, make the dimension concatenatable, and then concatenate them.
# y_mid: (None, 256, 28, 28)
y_global_2d = RepeatVector(28 * 28)(y_global) # shape: (None, 28*28, 256)
y_global_2d = Permute((2, 1))(y_global_2d) # shape: (None, 256, 28*28)
y_global_3d = Reshape(256, 28, 28)(y_global_2d) # shape: (None, 256, 28, 28)
y_concat = Merge(layers=[y_global_3d, y_local], mode='concat', concat_axis=1) # concatenating
y_concat = Lambda(lambda x: K.relu(x))(y_concat) # activation
Not sure if there's no misake on permuting/reshaping though.
Thanks for the info and the code... I'm gonna run it until morning and I'll get back to you with the results...
Thanks again...
@keunwoochoi
I'm not sure what's the exact problem here. Is it about understanding the work? Or how to implement it?
It's about implementing it... I've actually got the gist of it and implemented the rest of the network... only this Fusion part seems hard to implement...
By the way a question popped into my mind after seeing your code...
wouldn't your code result in redundancies when merging the 256 mid-level convolutions with the 256 repeated global convolutions? this is in contrast with the idea of the paper which somehow mixes the convolutions with 1 vector and results in 256 final outputs (and not 512 as in our case)?
Hm, now I'm quite sure this is what the paper says.
# y_mid: (None, 256, 28, 28)
y_global_2d = RepeatVector(28 * 28)(y_global) # shape: (None, 28*28, 256)
y_global_2d = Permute((2, 1))(y_global_2d) # shape: (None, 256, 28*28)
y_global_3d = Reshape(256, 28, 28)(y_global_2d) # shape: (None, 256, 28, 28)
y_concat = Merge(layers=[y_global_3d, y_local], mode='concat', concat_axis=1) # (None, 512, 28, 28)
y_fusion = Conv2D(256, 1, 1, activation='relu')(y_concat) # (None, 256, 28, 28) and Eq. (5)
So local feature has this indices, u and v, which indicates row and column indices, while global feature does not. y_fusion preserves the row/column incides. Also the 1x1 convolution by Conv2D does the matrix multiplication per location as well as bias, exactly same as the equation.
Thanks for the guidance and the code... they really helped me out...
Hello @Neltherion Please tell me after implementing the last code did you got the result right ? or did you used some other technique for this?
@keunwoochoi I want to know is the code you wrote is a structure for a model? if yes then I am having difficulty in making that structure using model.add() (I am new to keras please help me out thanks)
Most helpful comment
Hm, now I'm quite sure this is what the paper says.
So local feature has this indices,
uandv, which indicates row and column indices, while global feature does not.y_fusionpreserves the row/column incides. Also the 1x1 convolution byConv2Ddoes the matrix multiplication per location as well as bias, exactly same as the equation.