Keras: RNN attention with varying input length

Created on 14 Mar 2018 · 9Comments · Source: keras-team/keras

Hi, I'm trying to implementing the attention mechanism in my project. However, my sequences have varying lengths and I’m using bucketing to solve the issue. Therefore, I define the LSTM input shape as (None, None, features). Currently, it seems that every attention implementation using Keras requires a fixed number of timesteps that declared in the input shape. And theoretically, attention should work well with these varying lengths since it's just a softmax regardless of the input length. Is there any way to make a "dynamic_attention" just like the RNN layer, which can accept (None, None, features) as the input shape. Thanks.

Source

LeZhengThu

Most helpful comment

Okay, I would prefer to answer it on the Stack Overflow but:

   def ModelCreate():
        inp=L.Input((None,features))
        lstm=L.CuDNNLSTM(128,return_sequences=True)(inp)
        attention=L.TimeDistributed(L.Dense(1))(lstm)
        attention=L.Softmax(axis=1)(attention)
        context=L.Multiply()([attention,lstm])
        out=L.Lambda(lambda x: K.sum(x,axis=1))(context)        
        model=Model(inputs=inp,outputs=[out])
        return model

Should do it. No need to specify full batch shape as (None,None,features) - first dimension of None is implicit

tRosenflanz on 14 Mar 2018

🎉4

All 9 comments

It is definitely possible and should be very simple. Would you mind creating a stack overflow question and posting it here? Just to keep the issues page clean from implementation questions

tRosenflanz on 14 Mar 2018

@tRosenflanz Thanks for the info. I'll close the issue if anyone comes out of a solution and some codes posted. Really needs help.

LeZhengThu on 14 Mar 2018

Okay, I would prefer to answer it on the Stack Overflow but:

   def ModelCreate():
        inp=L.Input((None,features))
        lstm=L.CuDNNLSTM(128,return_sequences=True)(inp)
        attention=L.TimeDistributed(L.Dense(1))(lstm)
        attention=L.Softmax(axis=1)(attention)
        context=L.Multiply()([attention,lstm])
        out=L.Lambda(lambda x: K.sum(x,axis=1))(context)        
        model=Model(inputs=inp,outputs=[out])
        return model

Should do it. No need to specify full batch shape as (None,None,features) - first dimension of None is implicit

tRosenflanz on 14 Mar 2018

🎉4

@tRosenflanz Thanks. Will close this issue and a stack overflow question.

LeZhengThu on 14 Mar 2018

Hi, @tRosenflanz I was using your attention code and it worked perfectly. Just one small question, when I was predicting new labels with new data, I padded all the sequence to the same length. And I found that the results were different with and without padding, which was not a problem when there was no attention layer. I wonder if the code can be modified to make the results identical with and without padding. Thanks.

LeZhengThu on 27 Mar 2018

I think it is only fair to expect differences if your training data was bucketed and test was not - why not do the same thing as you have done with training and bucket the data?

tRosenflanz on 27 Mar 2018

Emmm, padding was easier to prepare the data though. You are right, I should be consistent with training and testing. Thanks.

LeZhengThu on 27 Mar 2018

Hi, @tRosenflanz another question, it seems that [out] in your code doesn't contain timesteps. Can I change the code to have sequence output, too? Thanks. Have a good weekend.

LeZhengThu on 31 Mar 2018

Hi, @tRosenflanz how you get the L.Softmax ?