Can we see your implementation?
@fchollet declared hold on all PRs that do not address the backend abstraction. We will be solely working into getting Keras with Tensorflow + Theano working and up to date. So, if we accept PRs to master right now, we will have to change it later anyway. I recommend you to wait a bit because there may be changes (although very little, like 1 char) in the way we write layers in Keras. I recommend you now to develop your model in parallel in your own github. Also if you can get help from @farizrahman4u I'm sure the result will be awesome.
@farizrahman4u Here is the attention layer. It only works for Graph though.
it requires a small hack in core layer and naming convention is a bit off for which I need help from Keras community. It will be much better in couple of commits.
An encoder-decoder with attention would be like this:
mode = Graph()
model.add_input(name='input', input_shape=(None,len(chars)))
model.add_node(RNN(128), name='encoder_rnn', input='input')
model.add_node(RepeatVector(MAXLEN), name ='recurrent_context', input = 'encoder_rnn')
model.add_node(RNN(256, return_sequences = True), name='encoder_context', input='input')
model.add_node(TimeDistributedAttention(prev_dim = 128, att_dim = 64, return_sequences = True), name='attention', inputs=['encoder_context','recurrent_context'], merge_mode = 'join_att')
model.add_node(TimeDistributedDense(len(chars)), name='tdd', input='attention')
model.add_node(Activation('softmax'), name = 'softmax',input = 'tdd')
model.add_output(name='output', input='softmax')
or visual-attention
image_model = Graph()
image_model.add_input(name = 'input', input_shape = Ximages[0].shape)
image_model.add_node(Convolution2D(12, 3, 3, border_mode='full'), name = 'c1',input = 'input')
image_model.add_node(Activation('relu'), name = 'a1',input = 'c1')
image_model.add_node(Convolution2D(12, 3, 3), name = 'c2',input = 'a1')
image_model.add_node(Activation('relu'), name = 'a2',input = 'c2')
image_model.add_node(MaxPooling2D(pool_size=(2, 2)), name = 'p1',input = 'a2')
image_model.add_node(Convolution2D(10, 3, 3, border_mode='full'), name = 'c3',input = 'p1')
image_model.add_node(Activation('relu'), name = 'a3',input = 'c3')
image_model.add_node(Convolution2D(10, 3, 3), name = 'c4',input = 'a3')
image_model.add_node(Activation('relu'), name = 'a4',input = 'c4')
image_model.add_node(PreAttention(), name='pre_attention', input= 'c4')
image_model.add_node(DenseAttention(att_dim = 128), name='dense_attention', input= 'pre_attention')
image_model.add_node(Dense(answer_size), name = 'd',input = 'dense_attention')
image_model.add_node(Activation('softmax'), name = 'softmax',input = 'd')
@wolet You should be using the LambdaMerge layer instead of the 'hack'. That way, this could be merged easily to Keras without changing the core layers (After we are done with TensorFlow of course), and would work seamlessly in Sequential and Graph models.
@farizrahman4u I did not know about LambdaMerge, thanks for pointing out!
You could also see #1051
@farizrahman4u I could not find a way to use LambdaMerge in Graph models. Would you mind giving me a simple example?
(Not tested)
def func(X):
#your merge function here. X is a list of input tensors
#this function should output the merged tensor
pass
def output_shape(shapes):
#shapes = list of output shapes of input tensors
#this function should output the shape of the merged tensor
pass
input1 = Dense(....)
input2 = Dense(....)
lambda_merge = LambdaMerge([input1, input2], func, output_shape)
graph = Graph()
graph.add_input(input1, name='input1')
graph.add_input(input2, name='input2')
graph.add_node(lambda_merge, name='lambda_merge')
graph.add_node(Dense(....), name='dense1', input='lambda_merge')
@wolet do you plan to convert this code to the generic Keras backend (which supports both Tensorflow and Theano)? I am going to need an attention layer in the near future and you have already put so much work on this, so I don't see a reason for implementing it from scratch :)
@jfsantos I haven't used the new API after Tensorflow changes. I will check the new API and see how I can contribute.
In my environment
graph.add_input(input1, name='input1')
cause
TypeError: add_input() got multiple values for keyword argument 'name'
I figured that someone might find an example of attention layer like in 1506.03340 or 1511.04108 useful, so here is mine.
The setup: Transforming sequence of embeddings e1s to e1sm by multiplying it with a per-token attention. The attention is determined by similarity with another embedding e0a, and focused to a single or few points in the sequence by softmax, as in the papers above. The token attention scalar can be generated by a couple of ways, the original papers use w*tanh(e0a + W*e1s).
model.add_node(name='e1sa', input='e1s', # consider another nonlinearity here
layer=TimeDistributedDense(input_dim=int(N*sdim), output_dim=int(N*adim), W_regularizer=l2(l2reg))
model.add_node(name='e0sa', input='e0a',
layer=RepeatVector(s1pad))
model.add_node(name='esa[0]', inputs=['e0sa', 'e1sa'], merge_mode='sum',
layer=Activation(T.tanh))
model.add_node(name='esa[1]', input='esa[0]',
layer=TimeDistributedDense(input_dim=int(N*adim), output_dim=1, W_regularizer=l2(l2reg)))
model.add_node(name='esa[2]', input='esa[1]',
layer=Flatten(input_shape=(s1pad, 1)))
model.add_node(name='esa[3]', input='esa[2]',
layer=Activation('softmax'))
# and now just multiply timewise
model.add_node(name='esa[4]', input='esa[3]',
layer=RepeatVector(int(N*sdim)))
model.add_node(name='esa', input='esa[4]',
layer=Permute((2,1)))
model.add_node(name='e1sm', inputs=['e1s', 'esa'], merge_mode='mul',
layer=Activation('linear'))
Posting it here as it was a bit difficult for me to figure out as a Keras/Theano newbie. I don't know if it's worth making a dedicated layer in Keras API for this, though. For one, because in my experiments I've found dot-product similarity to work a lot better than the weighed sum (but still researching):
def batched_batched_dot(s):
""" from (x,y,z)-shaped pair, produce (x,y)-shaped pair that replaces the z-vector pairs by their dot-products """
import theano
import theano.tensor as T
return theano.scan(fn=lambda xm, ym: T.batched_dot(xm, ym),
outputs_info=None, sequences=s, non_sequences=None)[0]
model.add_node(name='esa[0]', # nested batched_dot
layer=LambdaMerge([model.nodes['e0sa'], model.nodes['e1sa']],
batched_batched_dot,
lambda s: (s[1][0], s[1][1])))
model.add_node(name='esa[3]', input='esa[0]',
layer=Activation('softmax'))
I hope to soon submit a PR adding an example that does some serious stuff with NLP embedding sequences and includes the attention mechanism, though!
I just wanted to add a link to a standalone example of that, which may be also easier to read:
https://github.com/brmson/dataset-sts/blob/master/examples/anssel_attn.py
(after a little more work, I intend to contribute a simplified version as an example in Keras itself)
@pasky Thanks a lot!
@pasky / @wolet did you ever ported this to the generic keras backend?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
Most helpful comment
I just wanted to add a link to a standalone example of that, which may be also easier to read:
https://github.com/brmson/dataset-sts/blob/master/examples/anssel_attn.py
(after a little more work, I intend to contribute a simplified version as an example in Keras itself)