Keras: Making two LSTMs each using different input subsequence

Created on 5 Jul 2016 · 15Comments · Source: keras-team/keras

hi,

I use LSTM for text classification (return_sequences=False).
Each input instance is n size words. Now instead of feeding the whole input sequence, I'd like to feed the first m (m

It that possible with Keras? if not, should I wrap the LSTM class with my own? how?

thanks a lot

lstmx2

stale

Source

lechuzo

👍1

Most helpful comment

You can define your first and second LSTM models and then combine them to a single model with Merge layer.

Something like this:

from keras.models import Sequential
from keras.layers import Merge, Activation, Dense
from keras.layers.recurrent import LSTM

n, m = 10, 3
input_dim = 10
output_dim = 20

first_model = Sequential()
first_model.add(LSTM(output_dim, input_shape=(m, input_dim)))

second_model = Sequential()
second_model.add(LSTM(output_dim, input_shape=(n-m, input_dim)))

model = Sequential()
model.add(Merge([first_model, second_model], mode='concat'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(optimizer='RMSprop', loss='binary_crossentropy')
model.fit([X[:,:m,:], X[:,m:,:]], y)

Additional dense layers can be appended after each of the LSTM layers or after the merge layer.

a-rodin on 5 Jul 2016

👍9 ❤3

All 15 comments

You can define your first and second LSTM models and then combine them to a single model with Merge layer.

Something like this:

from keras.models import Sequential
from keras.layers import Merge, Activation, Dense
from keras.layers.recurrent import LSTM

n, m = 10, 3
input_dim = 10
output_dim = 20

first_model = Sequential()
first_model.add(LSTM(output_dim, input_shape=(m, input_dim)))

second_model = Sequential()
second_model.add(LSTM(output_dim, input_shape=(n-m, input_dim)))

model = Sequential()
model.add(Merge([first_model, second_model], mode='concat'))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(optimizer='RMSprop', loss='binary_crossentropy')
model.fit([X[:,:m,:], X[:,m:,:]], y)

Additional dense layers can be appended after each of the LSTM layers or after the merge layer.

a-rodin on 5 Jul 2016

👍9 ❤3

looks elegant. thanks

So I can work with multiple models, and thus determine the input of each.
How about the case where the input sequence is actually two sequences (ie m varies among sentences). In fact input x has 2 lists of words.

thanks a lot @a-rodin

minor thing - maybe a Dense layer could be useful after each LSTM

lechuzo on 5 Jul 2016

@lechuzo The input sequences can be just padded to length n with zeros and then each submodel can has masking layer before LSTM:

first_model = Sequential()
first_model.add(Masking(mask_value=0.0, input_shape=(n, input_dim)))
first_model.add(LSTM(output_dim))

second_model = Sequential()
first_model.add(Masking(mask_value=0.0, input_shape=(n, input_dim)))
second_model.add(LSTM(output_dim))

a-rodin on 5 Jul 2016

it's precomputed.
say x (data input) has 2 lists:
x['words1]=[w1, w2, w3]
x['words2]=[w4, w5, w6]

think of words1 and words2 as 2 different sentences

lechuzo on 5 Jul 2016

So you surely can create two sequences padded to zeros and feed them both to model.fit instead of [X[:,:m,:], X[:,m:,:]].

Btw do you mind about using embedding layer before LSTM?

a-rodin on 5 Jul 2016

thanks.
sounds cool. Did I get it ? (notice the 'validation_data' also)

from keras.models import Sequential
from keras.layers import Merge, Activation, Dense, Embedding
from keras.layers.recurrent import LSTM

input_dim = 10
output_dim = 20

`
X1, X2=[]
for item in X:
    X1.append(item ['words1'])
    X2.append(item ['words2'])

X_test1, X_test2=[]
for item in X_test:
    X_test1.append(item ['words1'])
    X_test2.append(item ['words2'])

first_model = Sequential()
first_model.add(Embedding(output_dim=w2v_dim, input_dim=max_features, mask_zero=True,  input_length=maxlen, dropout=0.2))
first_model.add(LSTM(output_dim)
first_model.add(Dense(1))

second_model = Sequential()
second_model.add(Embedding(output_dim=w2v_dim, input_dim=max_features, mask_zero=True,  input_length=maxlen, dropout=0.2))
second_model.add(LSTM(output_dim, input_shape=(n-m, input_dim)))
second_model.add(Dense(1))

model = Sequential()
model.add(Merge([first_model, second_model], mode='concat'))
    # optional instead\in addition to Dense in the 2 sub-models : model.add(Dense(1))
model.add(Activation('softmax')) 

model.compile(optimizer='RMSprop', loss='binary_crossentropy')
model.fit([X1, X2], y,  validation_data=([X_test1, X_test2], Y_test))
`

lechuzo on 5 Jul 2016

Hi, if I suspect that part of speech maybe helpful to the problem and I want to add p1 p2 p3 as pos labels of w1 w2 w3 to lstm1 and p4 p5 p6 as pos labels of w4 w5 w6, how can I do it?

wailoktam on 5 Jul 2016

@lechuzo You haven't padded sequences. It can be done with keras function pad_sequences. Check babi_rnn.py in keras demos to see example usage of it.

a-rodin on 5 Jul 2016

@a-rodin thanks much!
I did not write all of the code.. sure:

X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)

lechuzo on 5 Jul 2016

@wailoktam
so you want to use POS AND the actual words. One way is to:

use same architecture as @a-rodin suggests, where you feed the first with words and second with POS labels
however, I wander if you also should add embedding layer before the POS-LSTM layer

lechuzo on 5 Jul 2016

@lechuzo
Thanks. I am thinking of using embedding layer as well. So the part of speech types are the lexical entries in the vocabulary.

I think the POS and the actual words should each work like a color channel in image. Some friends suggest I can merge them by the concat mode. I wonder whether it is the right thing to do, particularly after reading your original post.

You merge the output of w1 w2 w3 with the output of w4 w5 w6 by concatenation. w1 and w4 are information at separate locations of the whole input.

So creating two input sequences (or making embedding of the two sequences) w1 w2 w3 and p1 p2 p3 would mean information at separate location of the whole output, I think? This does not feel like what I want. w1 relates to p1 just like the green channel value relates to the red channel value at a point in an image.

Thanks in advance for any help.

wailoktam on 5 Jul 2016

Any body having character based text classification code in keras like in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/skflow/text_classification_character_rnn.py

vinayakumarr on 3 Sep 2016

how can i implement the merging in tensorflow?

abhilash1512 on 29 Mar 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 27 Jun 2017

Hi,
I am currently working on a project which is neural melody composition from lyrics. This deals with aligning syllables of lyrics with the pitch and duration of musical notes. For that I created X and y for both the lyrics and music files and tokenized them. In order to align them, is concatenating models of both needed or is it okay to give both inputs in a single model?