Keras: What is the current state of Deep Bidirectional RNNs in Keras?

Created on 27 May 2016  Â·  4Comments  Â·  Source: keras-team/keras

Just a quick question: I can quickly build a single layer bi-directional (BiRNN) layer using the functional api (and example). My question is, how does one go about building a deep BiRNN giving the current api and how does this change if the RNN is LSTM?

p.s I have seen this #1629 Is this the only way? Im also referencing the architectures built here https://arxiv.org/pdf/1312.6026.pdf.

Any advice would be great!

Most helpful comment

wouldn't this work? note, this is untested.

from keras.layers import Input, Embedding, LSTM, merge, Dense
from keras.engine import Model

xin = Input(batch_shape=(batch_size, seq_size), dtype='int32')
xemb = Embedding(embedding_size, mask_zero=True)(xin)

rnn_fwd1 = LSTM(rnn_size, return_sequence=True)(xemb)
rnn_bwd1 = LSTM(rnn_size, return_sequence=True, go_backwards=True)(xemb)
rnn_bidir1 = merge([rnn_fwd1, rnn_bwd1], mode='concat')

predictions = TimeDistributed(Dense(output_class_size, activation='softmax'))(rnn_bidir1) 

model = Model(input=xin, output=predictions)

I personally have found thinking about bi-directional rnns in terms of predictions a bit tough, though. Mostly because viterbi and beam search are rendered tricky. Any decisions you make at time t influence the distributions at time t-1. So, if you are doing any sort of online sampling, this is strained. If you are doing argmax predictions, or something similar where you are taking the output as a whole, then it's not so bad.

note, also, that you could make a 'deep' bi-directional rnn with a simple function. In reference to Figure 2 of the provided paper, my example below is in line with type (d). If you wanted to change the internal structure of an RNN, such as in (b) and (c), you would have to implement those.

def BiDirectional(rnn_size):
    def func(some_input):
        rnn_fwd1 = LSTM(rnn_size, return_sequence=True)(some_input)
        rnn_bwd1 = LSTM(rnn_size, return_sequence=True, go_backwards=True)(some_input)  
        rnn_bidir1 = merge([rnn_fwd1, rnn_bwd1], mode='concat')
        return rnn_bidir1
    return func

and then

rnn_bidir1 = BiDirectional(rnn_size)(xemb)
rnn_bidir2 = BiDirectional(rnn_size)(rnn_bidir1)

or even with a compose function

def compose(*layers):
    def func(x):
        out = x
        for layer in layers[::-1]:
            out = layer(out)
        return out
    return func

bidir_layers = tuple([BiDirectional(rnn_size) for _ in range(nb_rnn_layers)])
xout = compose(*bidir_layers)(xemb)

All 4 comments

wouldn't this work? note, this is untested.

from keras.layers import Input, Embedding, LSTM, merge, Dense
from keras.engine import Model

xin = Input(batch_shape=(batch_size, seq_size), dtype='int32')
xemb = Embedding(embedding_size, mask_zero=True)(xin)

rnn_fwd1 = LSTM(rnn_size, return_sequence=True)(xemb)
rnn_bwd1 = LSTM(rnn_size, return_sequence=True, go_backwards=True)(xemb)
rnn_bidir1 = merge([rnn_fwd1, rnn_bwd1], mode='concat')

predictions = TimeDistributed(Dense(output_class_size, activation='softmax'))(rnn_bidir1) 

model = Model(input=xin, output=predictions)

I personally have found thinking about bi-directional rnns in terms of predictions a bit tough, though. Mostly because viterbi and beam search are rendered tricky. Any decisions you make at time t influence the distributions at time t-1. So, if you are doing any sort of online sampling, this is strained. If you are doing argmax predictions, or something similar where you are taking the output as a whole, then it's not so bad.

note, also, that you could make a 'deep' bi-directional rnn with a simple function. In reference to Figure 2 of the provided paper, my example below is in line with type (d). If you wanted to change the internal structure of an RNN, such as in (b) and (c), you would have to implement those.

def BiDirectional(rnn_size):
    def func(some_input):
        rnn_fwd1 = LSTM(rnn_size, return_sequence=True)(some_input)
        rnn_bwd1 = LSTM(rnn_size, return_sequence=True, go_backwards=True)(some_input)  
        rnn_bidir1 = merge([rnn_fwd1, rnn_bwd1], mode='concat')
        return rnn_bidir1
    return func

and then

rnn_bidir1 = BiDirectional(rnn_size)(xemb)
rnn_bidir2 = BiDirectional(rnn_size)(rnn_bidir1)

or even with a compose function

def compose(*layers):
    def func(x):
        out = x
        for layer in layers[::-1]:
            out = layer(out)
        return out
    return func

bidir_layers = tuple([BiDirectional(rnn_size) for _ in range(nb_rnn_layers)])
xout = compose(*bidir_layers)(xemb)

Thanks @braingineer! I have been using a model similar to your example but curious about improvements, so going deeper :). And yeah your functions do align with the (d). Im working on a NER problem so argmax on the predictions so Im not too worried about sampling.

Thanks Again. I'll try it out an let you know.

but when i set ’return_sequences = True’in fwd and 'return_sequences = True' in bwd
errors happened.
it threw an Exception: Input 0 is incompatible with layer dense_4: expected ndim=2, found ndim=3
when i ignored that,set the LSTM layer like this:

rnn_fwd1 = LSTM(rnn_size,)(xemb)
rnn_bwd1 = LSTM(rnn_size,  go_backwards=True)(xemb)

it works well.
@braingineer can you help me?thanks a lot.

imdb- bidirectional example is not working #3698

Was this page helpful?
0 / 5 - 0 ratings