Keras: What is the current state of Deep Bidirectional RNNs in Keras?

Created on 27 May 2016 · 4Comments · Source: keras-team/keras

Just a quick question: I can quickly build a single layer bi-directional (BiRNN) layer using the functional api (and example). My question is, how does one go about building a deep BiRNN giving the current api and how does this change if the RNN is LSTM?

p.s I have seen this #1629 Is this the only way? Im also referencing the architectures built here https://arxiv.org/pdf/1312.6026.pdf.

Any advice would be great!

Source

ParseThis

👍1

Most helpful comment

wouldn't this work? note, this is untested.

from keras.layers import Input, Embedding, LSTM, merge, Dense
from keras.engine import Model

xin = Input(batch_shape=(batch_size, seq_size), dtype='int32')
xemb = Embedding(embedding_size, mask_zero=True)(xin)

rnn_fwd1 = LSTM(rnn_size, return_sequence=True)(xemb)
rnn_bwd1 = LSTM(rnn_size, return_sequence=True, go_backwards=True)(xemb)
rnn_bidir1 = merge([rnn_fwd1, rnn_bwd1], mode='concat')

predictions = TimeDistributed(Dense(output_class_size, activation='softmax'))(rnn_bidir1) 

model = Model(input=xin, output=predictions)

I personally have found thinking about bi-directional rnns in terms of predictions a bit tough, though. Mostly because viterbi and beam search are rendered tricky. Any decisions you make at time t influence the distributions at time t-1. So, if you are doing any sort of online sampling, this is strained. If you are doing argmax predictions, or something similar where you are taking the output as a whole, then it's not so bad.

note, also, that you could make a 'deep' bi-directional rnn with a simple function. In reference to Figure 2 of the provided paper, my example below is in line with type (d). If you wanted to change the internal structure of an RNN, such as in (b) and (c), you would have to implement those.

def BiDirectional(rnn_size):
    def func(some_input):
        rnn_fwd1 = LSTM(rnn_size, return_sequence=True)(some_input)
        rnn_bwd1 = LSTM(rnn_size, return_sequence=True, go_backwards=True)(some_input)  
        rnn_bidir1 = merge([rnn_fwd1, rnn_bwd1], mode='concat')
        return rnn_bidir1
    return func

and then

rnn_bidir1 = BiDirectional(rnn_size)(xemb)
rnn_bidir2 = BiDirectional(rnn_size)(rnn_bidir1)

or even with a compose function

def compose(*layers):
    def func(x):
        out = x
        for layer in layers[::-1]:
            out = layer(out)
        return out
    return func

bidir_layers = tuple([BiDirectional(rnn_size) for _ in range(nb_rnn_layers)])
xout = compose(*bidir_layers)(xemb)

braingineer on 28 May 2016

👍4

All 4 comments

wouldn't this work? note, this is untested.

from keras.layers import Input, Embedding, LSTM, merge, Dense
from keras.engine import Model

xin = Input(batch_shape=(batch_size, seq_size), dtype='int32')
xemb = Embedding(embedding_size, mask_zero=True)(xin)

rnn_fwd1 = LSTM(rnn_size, return_sequence=True)(xemb)
rnn_bwd1 = LSTM(rnn_size, return_sequence=True, go_backwards=True)(xemb)
rnn_bidir1 = merge([rnn_fwd1, rnn_bwd1], mode='concat')

predictions = TimeDistributed(Dense(output_class_size, activation='softmax'))(rnn_bidir1) 

model = Model(input=xin, output=predictions)

def BiDirectional(rnn_size):
    def func(some_input):
        rnn_fwd1 = LSTM(rnn_size, return_sequence=True)(some_input)
        rnn_bwd1 = LSTM(rnn_size, return_sequence=True, go_backwards=True)(some_input)  
        rnn_bidir1 = merge([rnn_fwd1, rnn_bwd1], mode='concat')
        return rnn_bidir1
    return func

and then

rnn_bidir1 = BiDirectional(rnn_size)(xemb)
rnn_bidir2 = BiDirectional(rnn_size)(rnn_bidir1)

or even with a compose function

def compose(*layers):
    def func(x):
        out = x
        for layer in layers[::-1]:
            out = layer(out)
        return out
    return func

bidir_layers = tuple([BiDirectional(rnn_size) for _ in range(nb_rnn_layers)])
xout = compose(*bidir_layers)(xemb)

braingineer on 28 May 2016

👍4

Thanks @braingineer! I have been using a model similar to your example but curious about improvements, so going deeper :). And yeah your functions do align with the (d). Im working on a NER problem so argmax on the predictions so Im not too worried about sampling.

Thanks Again. I'll try it out an let you know.

ParseThis on 30 May 2016

but when i set ’return_sequences = True’in fwd and 'return_sequences = True' in bwd
errors happened.
it threw an Exception: Input 0 is incompatible with layer dense_4: expected ndim=2, found ndim=3
when i ignored that,set the LSTM layer like this:

rnn_fwd1 = LSTM(rnn_size,)(xemb)
rnn_bwd1 = LSTM(rnn_size,  go_backwards=True)(xemb)

it works well.
@braingineer can you help me?thanks a lot.

YFZX on 28 Jul 2016

imdb- bidirectional example is not working #3698

vinayakumarr on 6 Sep 2016

Was this page helpful?

0 / 5 - 0 ratings

Related issues

compile() should not require arguments when not training

kylemcdonald · 3Comments

New predict API for multiple outputs

snakeztc · 3Comments

In training process, validation data are necessary?

Imorton-zd · 3Comments

keras crashing when using convolutions

braingineer · 3Comments

Cost-sensitive classification

zygmuntz · 3Comments