Keras: [help] Constructing a synced sequence input and output RNN

Created on 28 May 2015 · 9Comments · Source: keras-team/keras

Hi there,

I'm building an RNN to assign an output label for each input element in the sequence for activity recognition based on location. In this toy model, the shape of each input location is 4x1; the shape of each output activity is 3x1. There are two hidden layers, the shape of each hidden component is 3x1.

My question is how to construct the model? Do I need to use Embedding layer? Should I use two layers of TimeDistributedDense or two layers of GRU/LSTM for my two hidden layers?

Please help and I hope I could contribute an example to the repo :)

My code snippet is shown below.

input_dim = 4
output_dim = 3
hidden_dim = 3

print('Build model...')

model = Sequential()

# TODO: add layers to model

print('Compile model...')
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
model.fit(X_train, Y_train, batch_size=1, nb_epoch=10)
print('Done')

stale

Source

zxcvbn97

Most helpful comment

No, TimeDistrubutedDense is exactly as it sounds, simply a Dense layer that feed all of its inputs forward in time; this distinction between Dense and TimeDistributedDense is simply that a Dense layer expects 2D input (batch_size, sample_size) whereas TimeDistributedDense expects 3D input (Batch_size, time_steps, sample_size). This should be used in conjunction with TimeDistributedSoftmax for the same reason (2D vs. 3D expected input).

There is a GRU layer, however: https://github.com/fchollet/keras/blob/master/keras/layers/recurrent.py#L156-253

patyork on 8 Jun 2015

👍2

All 9 comments

Embedding layers are used for text vectorization. This is not your use case.

You could use one of these networks:

model = Sequential() # input has shape (samples, timesteps, locations)
model.add(LSTM(input_dim, output_dim, return_sequences=True))
model.add(Activation('time_distributed_softmax')) # output has shape (samples, timesteps, activities)

model = Sequential()
model.add(LSTM(input_dim, hidden_dim, return_sequences=True))
model.add(TimeDistributedDense(hidden_dim, output_dim))
model.add(Activation('time_distributed_softmax')) # output has shape (samples, timesteps, activities)

You can try replacing LSTM with GRU; if your data is simple (it seems to be) chances are it will work better.

fchollet on 28 May 2015

Thanks for your answer. That was very helpful.

I just want to make sure I'm getting it right. I have two more questions:

If I'd like to stack multiple layers (e.g., 4) of LSTM/GRU is this the correct way?

model = Sequential() # input has shape (samples, timesteps, locations)
model.add(LSTM(input_dim, hidden_dim, return_sequences=True))
model.add(LSTM(hidden_dim, hidden_dim, return_sequences=True))
model.add(LSTM(hidden_dim, hidden_dim, return_sequences=True))
model.add(LSTM(hidden_dim, output_dim, return_sequences=True))
model.add(Activation('time_distributed_softmax')) # output has shape (samples, timesteps, activities)

If I use TimeDistributedDense, then every hidden unit at layer_i will be connected to every hidden unit at layer_{i+1}, is this true?
Do I need to use Dropout ?

Thanks!

zxcvbn97 on 29 May 2015

yes, that is the correct way. (except that your activities don't sum up to 1, nor are they capped at one, just use a linear layer instead)
TimeDistributedDense is like Dense (fully connected) but spread over time. (not connected on the time dimension) :))
The problem you'll have isn't so much the dropout as it is the initialization. Usually these LSTMs won't be stackable for more than 2 layers just initialized at random. It's quite likely your network won't learn anything. You need to do some kind of pretraining first. I often find that doing it with one recurrent layer first for a few hundred epochs then removing the top layer, adding another random recurrent layer on top of that and then resuming works decently.

lemuriandezapada on 29 May 2015

👍1

Thanks for your help! I just found this paper: Gated Feedback Recurrent Neural Networks (http://arxiv.org/pdf/1502.02367.pdf)

Is the current implementation of TimeDistributedDense the same as the concept of Gated Feedback RNN in the paper?

zxcvbn97 on 8 Jun 2015

There is a GRU layer, however: https://github.com/fchollet/keras/blob/master/keras/layers/recurrent.py#L156-253

patyork on 8 Jun 2015

👍2

@zxcvbn97, @fchollet
I'm working on almost the same problem with sentence labeling,
simple LSTM + TimeDistributedDense shows 95% accuracy on test dataset while training, but when I am trying to predict new sentences with model.predict(X_i) method almost all elements of sequence are classified wrong and it seems like network just learned some mapping. Do you have any ideas why it happens? Thank you.

Rachnog on 23 Jan 2016

@zxcvbn97 @fchollet

May I ask how to set input_dim, hidden_dim, and output_dim? Suppose my training data is 10000, 50, 40 (samples, timesteps, features), and I need output for each timestep with categorical labels (11 categories), thus 10000, 50, 11 (samples, timesteps, categories).

I tried the setting like this:

model = Sequential()
model.add(LSTM(input_dim=(50,40),output_dim=(128,1),return_sequences=True))
model.add(LSTM(input_dim=(128,1), output_dim=(50,11), return_sequences=True))
model.add(Activation('time_distributed_softmax'))

Unfortunately it does not work, but I don't know how to fix it.

Besides, I'm wondering how to do pre-training. Thanks a lot!

tuming1990 on 5 Mar 2016

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 23 May 2017

@zxcvbn97 I think you have using the default accuracy for evaluation. Instead use metrics.categorical_accuracy in order to get the real accuracy for your case. Since it is multiclass problem.