Keras: RNN attention with varying input length

Created on 14 Mar 2018  路  9Comments  路  Source: keras-team/keras

Hi, I'm trying to implementing the attention mechanism in my project. However, my sequences have varying lengths and I鈥檓 using bucketing to solve the issue. Therefore, I define the LSTM input shape as (None, None, features). Currently, it seems that every attention implementation using Keras requires a fixed number of timesteps that declared in the input shape. And theoretically, attention should work well with these varying lengths since it's just a softmax regardless of the input length. Is there any way to make a "dynamic_attention" just like the RNN layer, which can accept (None, None, features) as the input shape. Thanks.

Most helpful comment

Okay, I would prefer to answer it on the Stack Overflow but:

   def ModelCreate():
        inp=L.Input((None,features))
        lstm=L.CuDNNLSTM(128,return_sequences=True)(inp)
        attention=L.TimeDistributed(L.Dense(1))(lstm)
        attention=L.Softmax(axis=1)(attention)
        context=L.Multiply()([attention,lstm])
        out=L.Lambda(lambda x: K.sum(x,axis=1))(context)        
        model=Model(inputs=inp,outputs=[out])
        return model

Should do it. No need to specify full batch shape as (None,None,features) - first dimension of None is implicit

All 9 comments

It is definitely possible and should be very simple. Would you mind creating a stack overflow question and posting it here? Just to keep the issues page clean from implementation questions

@tRosenflanz Thanks for the info. I'll close the issue if anyone comes out of a solution and some codes posted. Really needs help.

Okay, I would prefer to answer it on the Stack Overflow but:

   def ModelCreate():
        inp=L.Input((None,features))
        lstm=L.CuDNNLSTM(128,return_sequences=True)(inp)
        attention=L.TimeDistributed(L.Dense(1))(lstm)
        attention=L.Softmax(axis=1)(attention)
        context=L.Multiply()([attention,lstm])
        out=L.Lambda(lambda x: K.sum(x,axis=1))(context)        
        model=Model(inputs=inp,outputs=[out])
        return model

Should do it. No need to specify full batch shape as (None,None,features) - first dimension of None is implicit

@tRosenflanz Thanks. Will close this issue and a stack overflow question.

Hi, @tRosenflanz I was using your attention code and it worked perfectly. Just one small question, when I was predicting new labels with new data, I padded all the sequence to the same length. And I found that the results were different with and without padding, which was not a problem when there was no attention layer. I wonder if the code can be modified to make the results identical with and without padding. Thanks.

I think it is only fair to expect differences if your training data was bucketed and test was not - why not do the same thing as you have done with training and bucket the data?

Emmm, padding was easier to prepare the data though. You are right, I should be consistent with training and testing. Thanks.

Hi, @tRosenflanz another question, it seems that [out] in your code doesn't contain timesteps. Can I change the code to have sequence output, too? Thanks. Have a good weekend.

Hi, @tRosenflanz how you get the L.Softmax ?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

KeironO picture KeironO  路  3Comments

amityaffliction picture amityaffliction  路  3Comments

braingineer picture braingineer  路  3Comments

vinayakumarr picture vinayakumarr  路  3Comments

nryant picture nryant  路  3Comments