Keras: How to share weights between two Dense layers with transposed in/out dims ?

Created on 17 Jan 2017  路  9Comments  路  Source: keras-team/keras

Hi,
I'm new to keras, and I'm working to define a BiDNN (https://github.com/v-v/BiDNN) like model using keras (using tensorflow as backend).

However, it's a little bit confused for me to do such thing like Lasagne (https://github.com/Lasagne/Lasagne) does, that share weights between two layers with transposed input and output dimensions. Just like the following Lasagne code:
l1 = DenseLayer(4, num_units=8, W=GlorotUniform()) # with input dim (4) and output dim (8)
l2 = DenseLayer(8, num_units=4, W=l1.W.T) # with input dim (8) and output dim (4)
Thus l2 shares l1's weights by transposing this tensor.

Is this possible in keras implementation? Thank you for any advises.

stale

Most helpful comment

I don't know if anyone has built it in keras already, but I could be wrong.

Basically, use a Dense layer for one direction. Write a custom layer for the other direction.

The layer would roughly look like this:

class DenseTranspose(Layer):
  def __init__(other_layer):
    self.other_layer=other_layer
  def call(x):
    return K.dot(x-self.other_layer.b, K.transpose(self.other_layer.W))

Cheers

All 9 comments

I don't know if anyone has built it in keras already, but I could be wrong.

Basically, use a Dense layer for one direction. Write a custom layer for the other direction.

The layer would roughly look like this:

class DenseTranspose(Layer):
  def __init__(other_layer):
    self.other_layer=other_layer
  def call(x):
    return K.dot(x-self.other_layer.b, K.transpose(self.other_layer.W))

Cheers

@bstriner Where do you get these K.dot and K.transpose operations from?

import keras.backend as K

@mattdornfeld Hi , do you get the way to do this?could you share your method?

Read through the code in files like core.py to see how the core layers work. K is the keras backend and is used for pretty much all computations (transpose, dot, sin, etc.). K is a proxy for either theano or tensorflow depending on what you are using. If you want to know more about that those functions do and how they work, refer to theano or keras as appropriate.

@buaaliyi note that with keras-2 that code would be self.other_layer.bias and self.other_layer.kernel. Any progress? If you have a simple layer and it seems like a common need you should push it to keras-contrib.

@Lzc6996 what is your question exactly?

@Lzc6996

Here's the code. I did run into problems using Keras's load_model function with this type of layers, since load_model doesn't expect the __init__ function to have other_layer has an arg, but it works otherwise.

import tensorflow as tf

class DenseTranspose(Dense):
    """
    A Keras dense layer that has its weights set to be the transpose of 
    another layer. Used for implemeneting BidNNs.
    """
    def __init__(self, other_layer, **kwargs):
        super().__init__(other_layer.input_dim, **kwargs)
        self.other_layer = other_layer


    def build(self, input_shape):
        assert len(input_shape) >= 2
        input_dim = input_shape[-1]
        self.input_dim = input_dim
        self.input_spec = [InputSpec(dtype=K.floatx(),
                                     ndim='2+')]

        self.W = tf.transpose(self.other_layer.W) 

        if self.bias:
            self.b = self.add_weight((self.output_dim,),
                                     initializer='zero',
                                     name='{}_b'.format(self.name),
                                     regularizer=self.b_regularizer,
                                     constraint=self.b_constraint)
        else:
            self.b = None

        if self.initial_weights is not None:
            self.set_weights(self.initial_weights)
            del self.initial_weights
        self.built = True

@mattdornfeld BTW, load_weights is way more reliable than load as soon as you start to do anything interesting. Plenty of open issues regarding load but I've never had a problem with load_weights.

This is a very interesting thread. I am thinking of implementing a ladder network in keras

The problem is that ladder network uses batch normalization, but the batch statistics are only updated on the clean patch, so I need to create a frozen batch normalization layer that shares the weights (moving_mean and moving_variance) with the standard batch_normalization layer of the clean patch

Your trick can be a reasonable clean solution

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

rantsandruse picture rantsandruse  路  3Comments

nryant picture nryant  路  3Comments

zygmuntz picture zygmuntz  路  3Comments

braingineer picture braingineer  路  3Comments

anjishnu picture anjishnu  路  3Comments