Keras: Bidirectional RNNs?

Created on 20 Jul 2015  Â·  13Comments  Â·  Source: keras-team/keras

I have an idea for adding bidirectional RNNs to Keras and I'm curious what the Keras devs think of it.

  • Add a Reverse layer which simply slices its input tensor along the timestep dimension (e.g. X_input[:, ::-1]. This would preserve masking and slice masks in reverse as well.
  • Add a Bidirectional class which takes an RNN class as a parameter and internally constructs the forward and backward instances, along with their merge. The backward instance can be Reverse(RNN(Reverse(x))).

How does that sound?

stale

Most helpful comment

Figured I'd add my 2 cents on this, using the functional API:

f_lstm = LSTM(n_lstm_dims)
b_lstm = LSTM(n_lstm_dims, go_backwards=True)
f_input = f_lstm(input)
b_input = b_lstm(input)
together = merge([f_input, b_input], mode='concat', concat_axis=1)

If return_sequences=True on the RNNs, change concat_axis=2 for time-wise concatenation.

However, due to the merge layer, this implementation doesn't let you use Masking. Is there a way to do concatenation that supports masking? This seems like a common enough use case.

All 13 comments

Add a Bidirectional class which takes an RNN class as a parameter and internally constructs the forward and backward instances, along with their merge.

This sounds reasonable. But couldn't this be achieved without the Reverse layer? I'm having trouble imagining situations outside of bidirectional RNNs where reversing time would be required.

Reversing time also came up as an (inexplicably) important transformation in Sequence to Sequence Learning with Neural Networks, so I thought it might be convenient to make it available more generally.

Use the "go_backwards" parameter of theano's scan() function can do the job.
Add "go_backwards" parameter to the init() of LSTM like:

def __init__(self, input_dim, output_dim=128, 
    init='glorot_uniform', inner_init='orthogonal', forget_bias_init='one',
    activation='tanh', inner_activation='hard_sigmoid',
    weights=None, truncate_gradient=-1, return_sequences=False, go_backwards=False):

And modify the get_output() function of LSTM as:

    [outputs, memories], updates = theano.scan(
        self._step, 
        sequences=[xi, xf, xo, xc, padded_mask],
        outputs_info=[
            T.unbroadcast(alloc_zeros_matrix(X.shape[1], self.output_dim), 1),
            T.unbroadcast(alloc_zeros_matrix(X.shape[1], self.output_dim), 1)
        ], 
        non_sequences=[self.U_i, self.U_f, self.U_o, self.U_c], 
        truncate_gradient=self.truncate_gradient,
        go_backwards=self.go_backwards
    )

Remember to add:
self.go_backwards = go_backwards

Good suggestion. Makes it really easy to implement bidirectional recurrent layers using a base layer.

Thanks @jedi00, I like the suggestion.

Do either you or @fchollet have any tips for writing a layer which itself contains a pair of layers? I can't find any examples of how to do this (e.g. what should be in self.params?), I suspect seeing the implementation of the upcoming TimeDistributed class will be illuminating.

Have you looked at the bidirectional RNN implementation here :

https://github.com/hycis/bidirectional_RNN

It is built on keras.

@iskandr : I too have trouble in class inheritance of keras layers, the gradient chain of theano will be broken, I still can't figure out the reason. I've done the following simple trial: write another class, which is just a wrapper of LSTM, like this:

class BLSTM(Recurrent):
    '''
        Bi-directional LSTM
        For more details, refer to LSTM()
    '''
    def __init__(self, input_dim, output_dim=128,
        init='glorot_uniform', inner_init='orthogonal', forget_bias_init='one',
        activation='tanh', inner_activation='hard_sigmoid',
        weights=None, truncate_gradient=-1, return_sequences=False):
        super(BLSTM,self).__init__()
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.truncate_gradient = truncate_gradient
        self.return_sequences = return_sequences
        self.init = initializations.get(init)
        self.inner_init = initializations.get(inner_init)
        self.forget_bias_init = initializations.get(forget_bias_init)
        self.activation = activations.get(activation)
        self.inner_activation = activations.get(inner_activation)
        self.weights = weights
        self.LSTM_forward  = LSTM(self.input_dim, self.output_dim, self.init, self.inner_init, self.forget_bias_init, self.activation,
                                  self.inner_activation, self.weights, self.truncate_gradient, self.return_sequences, False)
        self.params = self.LSTM_forward.params

    def get_output(self, train):
        forward_out  = self.LSTM_forward.get_output(train)
        return forward_out

    def get_config(self):
        return {"name":self.__class__.__name__,
            "input_dim":self.input_dim,
            "output_dim":self.output_dim,
            "init":self.init.__name__,
            "inner_init":self.inner_init.__name__,
            "forget_bias_init":self.forget_bias_init.__name__,
            "activation":self.activation.__name__,
            "inner_activation":self.inner_activation.__name__,
            "truncate_gradient":self.truncate_gradient,
            "return_sequences":self.return_sequences}

As you can see, this 'BLSTM' class does nothing but a wrapper of LSTM. But this layer won't compile. It always prompts "theano.gradient.DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator"

I've asked @fchollet for this problem via email weeks ago, unfortunately gotten no reply.

Figured I'd add my 2 cents on this, using the functional API:

f_lstm = LSTM(n_lstm_dims)
b_lstm = LSTM(n_lstm_dims, go_backwards=True)
f_input = f_lstm(input)
b_input = b_lstm(input)
together = merge([f_input, b_input], mode='concat', concat_axis=1)

If return_sequences=True on the RNNs, change concat_axis=2 for time-wise concatenation.

However, due to the merge layer, this implementation doesn't let you use Masking. Is there a way to do concatenation that supports masking? This seems like a common enough use case.

The best would be to add mask merging support into the Merge layer. This is
planned anyway. Anybody wants to give it a shot?

On 18 April 2016 at 10:17, Benjamin Bolte [email protected] wrote:

Figured I'd add my 2 cents on this, using the functional API:

f_lstm = LSTM(n_lstm_dims)
b_lstm = LSTM(n_lstm_dims, go_backwards=True)
f_input = f_lstm(input)
b_input = b_lstm(input)
together = merge([f_input, b_input], mode='concat', concat_axis=1)

If return_sequences=True on the RNNs, change concat_axis=2 for time-wise
concatenation.

However, due to the merge layer, this implementation doesn't let you use
Masking. Is there a way to do concatenation that supports masking? This
seems like a common enough use case.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/418#issuecomment-211482538

Is there a issue thread relating to that? I've thought about trying it but I'm not sure what it would entail.

Are any assumptions made on the contents of a mask? Is it OK to concatenate
[True, True, False, False] with [False, True True]`?

On Mon, Apr 18, 2016 at 4:20 PM, Benjamin Bolte [email protected]
wrote:

Is there a issue thread relating to that? I've thought about trying it but
I'm not sure what it would entail.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/fchollet/keras/issues/418#issuecomment-211558900

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

harishkrishnav picture harishkrishnav  Â·  3Comments

NancyZxll picture NancyZxll  Â·  3Comments

KeironO picture KeironO  Â·  3Comments

snakeztc picture snakeztc  Â·  3Comments

kylemcdonald picture kylemcdonald  Â·  3Comments