Keras: Recurrent Models with sequences of mixed length

Created on 8 Apr 2015  Â·  15Comments  Â·  Source: keras-team/keras

The training process for LSTM only supports tensor3. If the sequences are of different length, then X must be a list, however models.py:90 does not support lists as input. I think a quick fix would be to cast X_batch to tensor3 if batch_size=1, and also fix y_batch accordingly.

Most helpful comment

In addition, here are a few quick examples of solutions to your problem:

Zero-padding

X = keras.preprocessing.sequence.pad_sequences(sequences, maxlen=100)
model.fit(X, y, batch_size=32, nb_epoch=10)

Batches of size 1

for seq, label in zip(sequences, y):
   model.train(np.array([seq]), [label])

All 15 comments

Sequences should be grouped together by length, and segmented manually into batches by that length before being sent to Keras.

Alternatively (or in addition to the above, to get more sequences of the same length), if it does not break the logic in the cost function, the sequences can be padded with 0s (or the equivalent non-entity).

The reason that lists are not supported is that Theano builds everything as tensors, or matrices of matrices, so everything must have the same dimensionality (Theano does not assume it should pad with 0s where lengths differ).

In addition, here are a few quick examples of solutions to your problem:

Zero-padding

X = keras.preprocessing.sequence.pad_sequences(sequences, maxlen=100)
model.fit(X, y, batch_size=32, nb_epoch=10)

Batches of size 1

for seq, label in zip(sequences, y):
   model.train(np.array([seq]), [label])

@fchollet sorry if I bother you, but I can't find model.train(np.array([seq]), [label]) in keras document for batch size 1.

@fchollet In the case of batch size 1 method , what should be assigned to input_length parameter in the model? Or should it be set to NONE in this case?

@fchollet , just for my understanding, when you 'pad_sequences', the padded zeros are fed through the sequence network (e.g. recurrent NN), correct?

What I was looking for is a method where this doesn't happen; I only want to input the real sequence, with each sequence having another length, and subsequently use the output for further processing.

It seems to me that padding the sequences will make it harder to learn the task at hand, since the zeros don't provide info but they get encoding by the network any way.

@visionscaper yes, the padding still goes through the network. If you don't want this, you might want to look in to sequence-to-sequence learning, e.g. with farizrahman4u/seq2seq. This paper explains the idea.

Or use the masking layer..

https://github.com/fchollet/keras/issues/3086

On Fri, Nov 18, 2016 at 8:36 AM, Tao Bror Bojlén [email protected]
wrote:

@visionscaper https://github.com/visionscaper yes, the padding still
goes through the network. If you don't want this, you might want to look in
to sequence-to-sequence learning, e.g. with farizrahman4u/seq2seq
https://github.com/farizrahman4u/seq2seq. This paper
https://arxiv.org/abs/1409.3215 explains the idea.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/fchollet/keras/issues/40#issuecomment-261377280, or mute
the thread
https://github.com/notifications/unsubscribe-auth/AHK7nQTN8J20oWH46g-FkppQNKn2YbSXks5q_MjtgaJpZM4D8Wcd
.

@patyork : You mentioned

Sequences should be grouped together by length, and segmented manually into batches by that length before being sent to Keras.
I initialize my model as :
model_LSTM = Sequential()
model_LSTM.add(LSTM(lstm_hidden_units, input_shape=(X.shape[1], X.shape[2])))

and I plan on calling "model.train_on_batch(X, y)" on every batch. The problem is how can I intialize input_shape in LSTM when it varies across batches ?

@LopezGG The shape of a single temporal (or any other kind of) "frame" of the input sequence must be the same shape with the varying dimension being the length or number of "frames" each sample has.

For example, excluding anything to do with batches and batch size: a set of video clips all have a resolution of 1920x1080 pixels but can vary in duration. In this case, the input shape is 1920 by 1080 which is the "frame" size, and the varying dimension is the duration/length of the video - such as 120 frames/4 seconds of video. The sequence for this example video would be 120 frames of 1920*1080 pixels. Any length of video can be fed through this network, so long as it is a 1920x1080 feed.

Going one step further: if you want to use batches of videos to train concurrently, the sequences in each batch must be the same length. One way to accomplish this is to predefine a few "buckets" of temporal length, for example "up to 2 seconds, up to 4 seconds, etc". You can then bucket your video clips, padding when necessary (with all black frames) to get all of the clips to the cutoff/bucket duration.

@visionscaper to follow up: "padding" your input is necessary, but it should be done in a way that makes sense. For examples:

  • with video: pad the input with the equivalent of black frames of video
  • with audio: pad the input with the equivalent of silence

Padding with just straight zeros will, as you guessed, more than likely encode some unnecessary - if not incorrect - information into the network. Padding with a "neutral" frame of data is the correct approach.

@patyork, sorry, but wouldn't Masking() take care of this? Over all minibatches in an epoch some sequences might end earlier than the longest, and should thus not be given any weight in the forward pass so we make sure the samples are set to zero and then also make sure to reduce the loss accordingly before updating the weights in the backward pass. I guess it would be extremely important to normalize data if masking is being used.

@carlthome Yes, the Masking layer appears to be exactly what is needed. This thread predates that layer addition, and I was unaware of its intended use.

The masking layer looks great for most applications, but I would think the "pad with neutral data" approach should be kept in mind for some applications. Specifically, I would think that for speech recognition, it would be good to embed the idea of "silence" as a valid input sequence, and a very expected input sequence.

What's the difference between using a Masking layer for sequences of different lengths and setting the mask_zero field to True?

I had the same problem. You have basically 2 solutions (let's say the dimension of you input is n) :

Either you use the parameters :

batch_input_shape=(1, 1, n), stateful=True

Then you train with :

for _ in range(nb_epoch):
    model.fit(X, Y, batch_size=1, nb_epoch=1, shuffle=False)
    model.reset_states()

or with :

for _ in range(nb_epoch):
    for x, y in zip(X, Y):
        model.fit(x, y, batch_size=1, nb_epoch=1, shuffle=False)
    model.reset_states()

and with X of shape (length, 1, n).

I don't know if the two methods are equivalent though… Maybe there are more gradient updates with the second…

Or you define the model with

input_shape=(None, n)

(and stateful=False by default)
and you train with :

model.fit(X, Y, nb_epoch=nb_epoch)

and X has shape (1, length, n).

@hqsiswiliam

but I can't find model.train(np.array([seq]), [label]) in keras document for batch size 1.

it should be model.train_on_batch

Was this page helpful?
0 / 5 - 0 ratings