Keras: What is the correct way for stacking Bidirectional RNNs using Keras?

Created on 12 Jan 2017 · 4Comments · Source: keras-team/keras

I want to stack two bidirectional RNNs and i was wondering what is the correct way to do it. My main question is what data each of the deeper RNNs read. To be more specific:

Lets say that we have two RNNs each one with 64 cells, lets call them F1 and B1 (F-forward, B-backwards). Lets say that for our problem we just want to create a dense representation of our input. In the case of Bidirectional RNNs we take the outputs from the last timestep for each one and concatenate them to get a vector of size 128.

output = Bidirectional(GRU(64))(input)

Now in order to get a more high-level representation, i want to stack another bidirectional layer on top, again with each RNN having 64 cells, F2 and B2. Is this correct?

output1 = Bidirectional(GRU(64, return_sequences=True))(input)
output2 = Bidirectional(GRU(64))(output1)

This example runs just fine. But is it the _correct way_ to do it?

Question 1 - How exactly does the previous example work? I want to understand what exactly is the input of the second forward RNN F2.
A) is it the output(s) of just the F1?
B) is it the concatenation of the outputs of both F1 and B1?
which one is it?

Question 2 - is there even a correct way (A-B)? Are there scenarios that both make sense?

stale

Source

cbaziotis

👍4

Most helpful comment

@mbollmann Thanks! I needed to know how keras works in order to be able to think better about the design of my architecture. I also tried this:

forw = GRU(units, return_sequences=True)(_input)
forw = GRU(units)(forw)

back = GRU(units, return_sequences=True, go_backwards=True)(_input)
back = GRU(units, go_backwards=True)(back)

output = merge([forw, back], mode='concat')

and it seems that your intuition is correct as i got inferior performance compared to simple stacking two Bidirectional layers.

cbaziotis on 13 Jan 2017

👍14

All 4 comments

Q1: It's B), since the Bidirectional wrapper (by default) returns the concatenation of F1 and B1.

Q2: I don't think it makes sense to speak of a "correct way" when talking about neural network architectures, but FWIW, I would have interpreted "a stack of bi-RNNs" to refer to option B).

_Careful, personal intuition follows:_ I believe that constructing it like option A) would lead to a less powerful model, since you would only combine the information from the forward and backward passes at the very end, for the output. On the contrary, with option B), the second RNN of the stack can already make use of both forward & backward information to calculate its own representations.

mbollmann on 13 Jan 2017

👍5

@mbollmann Thanks! I needed to know how keras works in order to be able to think better about the design of my architecture. I also tried this:

forw = GRU(units, return_sequences=True)(_input)
forw = GRU(units)(forw)

back = GRU(units, return_sequences=True, go_backwards=True)(_input)
back = GRU(units, go_backwards=True)(back)

output = merge([forw, back], mode='concat')

and it seems that your intuition is correct as i got inferior performance compared to simple stacking two Bidirectional layers.

cbaziotis on 13 Jan 2017

👍14

And why not ?

rnn = GRU(units, return_sequences=True)(_input)
rnn = GRU(units, return_sequences=True, go_backwards=True)(rnn)
rnn = GRU(units)(rnn)
rnn = GRU(units, go_backwards=True)(rnn)

it is something I see quite often.

GMarzinotto on 27 Nov 2018

👀1

@mbollmann Thanks! I needed to know how keras works in order to be able to think better about the design of my architecture. I also tried this:
forw = GRU(units, return_sequences=True)(_input)
forw = GRU(units)(forw)

back = GRU(units, return_sequences=True, go_backwards=True)(_input)
back = GRU(units, go_backwards=True)(back)

output = merge([forw, back], mode='concat')
and it seems that your intuition is correct as i got inferior performance compared to simple stacking two Bidirectional layers.

Hi, in my case, I think your code isn't working. In the second backward-layer, it will return all zero. In my result, first layer will return reversed sequence. So, I think that we have to reverse first output and pass it to second backward-layer and reverse it again like code below

back = GRU(units, return_sequences=True, go_backwards=True)(_input)
back = layers.Lambda(lambda x: K.reverse(x, axes=-2))(back) # reverse backward-lstm tensor
back = GRU(units, go_backwards=True)(back)
back = layers.Lambda(lambda x: K.reverse(x, axes=-2))(back)