Keras: Attention Model Available!

Created on 24 Mar 2016  路  14Comments  路  Source: keras-team/keras

Hi,

I implemented an attention model for doing textual entailment problems. Here is the code. Its a bit worse than the paper, but works decently well. Hope this comes handy for beginners in keras like me.

Comments are welcome!

Shout outs to @farizrahman4u @fchollet @pasky for their help and patience in answering queries on github.

Most helpful comment

I've just started a project to collect all the possible information about attention with Keras:

https://github.com/philipperemy/keras-attention-mechanism

Check this out! It's still at an early stage. I'm currently working on it!

All 14 comments

awesome job.
A minor comment on L139
https://github.com/shyamupa/snli-entailment/blob/master/amodel.py#L139
TimeDistributedDense layer will produce a 3D tensor shape of (batch_size,L,1), and when you apply the softmax activation, the output maybe not correct.
because in the last dimension there is only one unit, causing a constant output of 1, losing the meaning of attention.
I think you can try to use TimeDistributedDense with linear activation, then Flatten it (to get 2D tensor), and apply softmax afterwards.

@ymcui Good catch! I tried your modification and am noticing some improvements. Thanks!

That's nice work!

What I got stuck on however when I was thinking about this is the fact that in the paper, they use two different RNNs in series, whereas you use only a single common RNN for both premise and hypothesis. I think that'll probably require some small Keras modifications to allow "initialize from node".

True. I implemented what they called shared encoding. The difference b/w the models is about 2 pts in their experiments. Lasagne has this feature to initialize the hidden state, but writing this model there would lead to code bloat. Maybe something can be done for keras RNN too? :) @fchollet

Oh, I somehow missed that experiment. So this isn't that important. Nice!

Depends on what you mean by important (2% on that dataset is about 200 questions). Also notice that I train embeddings along with the model, while they fix it to word2vec/glove.

@shyamupa The implementation of Bi-GRU seems problematic ( https://github.com/shyamupa/snli-entailment/blob/master/amodel.py#L128). This is an old issue: #2074 #1725 #1703 #1674 #1432 #1282 any plan to fix it officially? @fchollet

I see, I was not aware of this. I was using LSTMs earlier, but switched to GRU because they are supposed to train faster. Hope LSTMs dont have the same issue..

The LSTMs have the same issue.

@DingKe I don't think that there are plans to fix the go_backward stuff because it is consistent with the go backward of Theano. At least it won't be solved at Backend level. I think Something needs to be done in the Recurrent class however. I originally added the go_backwards in the Recurrent class by simply wrapping the Theano scan keywords but we need to fix this issue at least in the example.

My understanding is that this is mainly a matter of someone finding the
time to submit a patch that flips go_backwards sequences in the
Recurrent class frontends (after applying K.rnn), checks that the same
is done for masks and writes some testcases + docs. Just a little grunt
work. I hoped to do it but I can't seem to be getting around to attend
properly to even a simpler PR that's already open... :(

Hi,

I'm trying to implement a similar attention model in Keras. Does the go_backwards bug still exist? If not, can someone give a small example on how to fix it.

Thanks.

Hello,
Is there any idea about implement attention mechanism with mask?

I've just started a project to collect all the possible information about attention with Keras:

https://github.com/philipperemy/keras-attention-mechanism

Check this out! It's still at an early stage. I'm currently working on it!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

trane293 picture trane293  路  90Comments

phipleg picture phipleg  路  60Comments

EderSantana picture EderSantana  路  219Comments

hotplot picture hotplot  路  59Comments

xieximeng2008 picture xieximeng2008  路  74Comments