Hi, I'm trying to implementing the attention mechanism in my project. However, my sequences have varying lengths and I鈥檓 using bucketing to solve the issue. Therefore, I define the LSTM input shape as (None, None, features). Currently, it seems that every attention implementation using Keras requires a fixed number of timesteps that declared in the input shape. And theoretically, attention should work well with these varying lengths since it's just a softmax regardless of the input length. Is there any way to make a "dynamic_attention" just like the RNN layer, which can accept (None, None, features) as the input shape. Thanks.
It is definitely possible and should be very simple. Would you mind creating a stack overflow question and posting it here? Just to keep the issues page clean from implementation questions
@tRosenflanz Thanks for the info. I'll close the issue if anyone comes out of a solution and some codes posted. Really needs help.
Okay, I would prefer to answer it on the Stack Overflow but:
def ModelCreate():
inp=L.Input((None,features))
lstm=L.CuDNNLSTM(128,return_sequences=True)(inp)
attention=L.TimeDistributed(L.Dense(1))(lstm)
attention=L.Softmax(axis=1)(attention)
context=L.Multiply()([attention,lstm])
out=L.Lambda(lambda x: K.sum(x,axis=1))(context)
model=Model(inputs=inp,outputs=[out])
return model
Should do it. No need to specify full batch shape as (None,None,features) - first dimension of None is implicit
@tRosenflanz Thanks. Will close this issue and a stack overflow question.
Hi, @tRosenflanz I was using your attention code and it worked perfectly. Just one small question, when I was predicting new labels with new data, I padded all the sequence to the same length. And I found that the results were different with and without padding, which was not a problem when there was no attention layer. I wonder if the code can be modified to make the results identical with and without padding. Thanks.
I think it is only fair to expect differences if your training data was bucketed and test was not - why not do the same thing as you have done with training and bucket the data?
Emmm, padding was easier to prepare the data though. You are right, I should be consistent with training and testing. Thanks.
Hi, @tRosenflanz another question, it seems that [out] in your code doesn't contain timesteps. Can I change the code to have sequence output, too? Thanks. Have a good weekend.
Hi, @tRosenflanz how you get the L.Softmax ?
Most helpful comment
Okay, I would prefer to answer it on the Stack Overflow but:
Should do it. No need to specify full batch shape as (None,None,features) - first dimension of None is implicit