Keras: Help with LSTM

Created on 6 Nov 2016  路  6Comments  路  Source: keras-team/keras

I am new to Keras and I am trying to create a few toy examples so that I get to know Keras better. I was trying to implement a LSTM that takes as inputs two binary strings (b1 and b2, and returns the result of applying b1 OR b2. I know this is not what you would generally use an LSTM for, but I'd still like to try this to get familiar with the different LSTM architectures

I've created a sequence-to-sequence model. Calling the binary strings a and b, and working with strings of 5 bits, we have the following architecture:

    [y_0]      [y_1]      [y_2]      [y_i]      [y_n]
      |          |          |          |          |
      |          |          |          |          |
    [h_0]----->[h_1]----->[h_2]----->[h_i]----->[h_n]
      |          |          |          |          |
      |          |          |          |          |
  [a_0,b_0]  [a_1,b_1]  [a_2,b_2]  [a_i,b_i]  [a_n,b_n]

Where [a_i, b_i] corresponds with the ith bit of both strings a and b. This way:

X_train = (100000, 5, 2) # [samples, time steps, features]
y_train = (100000, 5, 1)
X_test = (30000, 5, 2)
y_test = (30000, 5, 1)

I am creating the LSTM and fitting it with:

model = Sequential()
model.add(LSTM(5, input_dim=2, input_length=5, return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mse', optimizer='rmsprop', metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=100, nb_epoch=10,
          validation_data=(X_test, y_test))

However, this doesn't converge, this the output:
Using TensorFlow backend. Loading data... 100000 train sequences 30000 test sequences Data Shapes: X_train: (100000, 5, 2) y_train: (100000, 5, 1) X_test: (30000, 5, 2) y_test: (30000, 5, 1) Build model... Train... Train on 100000 samples, validate on 30000 samples Epoch 1/10 100000/100000 [==============================] - 8s - loss: 0.3251 - acc: 0.5748 - val_loss: 0.2094 - val_acc: 0.6819 Epoch 2/10 100000/100000 [==============================] - 9s - loss: 0.1963 - acc: 0.6852 - val_loss: 0.1860 - val_acc: 0.7068 Epoch 3/10 100000/100000 [==============================] - 9s - loss: 0.1772 - acc: 0.7252 - val_loss: 0.1706 - val_acc: 0.7396 Epoch 4/10 100000/100000 [==============================] - 8s - loss: 0.1686 - acc: 0.7394 - val_loss: 0.1665 - val_acc: 0.7453 Epoch 5/10 100000/100000 [==============================] - 8s - loss: 0.1654 - acc: 0.7425 - val_loss: 0.1639 - val_acc: 0.7455 Epoch 6/10 100000/100000 [==============================] - 9s - loss: 0.1634 - acc: 0.7428 - val_loss: 0.1620 - val_acc: 0.7457 Epoch 7/10 100000/100000 [==============================] - 8s - loss: 0.1615 - acc: 0.7471 - val_loss: 0.1601 - val_acc: 0.7511 Epoch 8/10 100000/100000 [==============================] - 9s - loss: 0.1594 - acc: 0.7545 - val_loss: 0.1581 - val_acc: 0.7557 Epoch 9/10 100000/100000 [==============================] - 8s - loss: 0.1576 - acc: 0.7553 - val_loss: 0.1566 - val_acc: 0.7573 Epoch 10/10 100000/100000 [==============================] - 8s - loss: 0.1564 - acc: 0.7554 - val_loss: 0.1559 - val_acc: 0.7585 29900/30000 [============================>.] - ETA: 0sTest score: 0.155859013249 Test accuracy: 0.758480000496
Any clues on what I am doing wrong? I've tried tweaking the hyperparams without much success. Should I be working with another LSTM architecture?

stale

Most helpful comment

Binary operations such as OR, AND, XOR etc are not good examples for RNNs, since there is no sequence / time dependency. Take a look at this page

http://www.xcprod.com/titan/XCSB-DOC/binary_or.html

You can see that each result bit is dependent on only the two bits to be OR'd, and not any surrounding bits.
An RNN should be able to learn it, but it makes for a better example for static NNs, that's why XOR is a 'hello world' example.

Binary addition is a better exercise, since it involves a carry bit which is dependent on the previous operation in the sequence. This is in numpy, not keras, but see this page and scroll down to 'Our Toy Code'

https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/

On a separate note, are you generating the dataset at random? Because when n=5, 2^5 = 32 so there are only 32 5 bit sequences. So the number of possible OR operations with two 5 bit numbers is 32^2 = 1024, so that's the max size of your dataset. So for low n, you could generate the entire set and then split by train / test / validation.
By the time you get to n = 9 or so it will probably be worth going back to random.

All 6 comments

Have you tried running for more than 10 epochs? It looks fine to me, but training stops too early.

Hi CCXD, thanks for your reply. I tried with 50, and 100 epochs and it stagnates at val_acc: 0.8049.

What is the training loss/acc and valid loss at that point? Try adding regularization techniques such as L2 regularization.

I tried adding regularization techniques, but I still don't get any better results. This is what it usually looks like:

100000/100000 [==============================] - 5s - loss: 0.1505 - acc: 0.7601 - val_loss: 0.1509 - val_acc: 0.7596
Epoch 492/500
100000/100000 [==============================] - 6s - loss: 0.1505 - acc: 0.7600 - val_loss: 0.1510 - val_acc: 0.7612
Epoch 493/500
100000/100000 [==============================] - 5s - loss: 0.1505 - acc: 0.7601 - val_loss: 0.1509 - val_acc: 0.7616
Epoch 494/500
100000/100000 [==============================] - 6s - loss: 0.1505 - acc: 0.7609 - val_loss: 0.1510 - val_acc: 0.7594
Epoch 495/500
100000/100000 [==============================] - 6s - loss: 0.1505 - acc: 0.7602 - val_loss: 0.1510 - val_acc: 0.7599
Epoch 496/500
100000/100000 [==============================] - 5s - loss: 0.1505 - acc: 0.7599 - val_loss: 0.1508 - val_acc: 0.7612
Epoch 497/500
100000/100000 [==============================] - 6s - loss: 0.1505 - acc: 0.7603 - val_loss: 0.1509 - val_acc: 0.7619
Epoch 498/500
100000/100000 [==============================] - 6s - loss: 0.1505 - acc: 0.7605 - val_loss: 0.1510 - val_acc: 0.7597
Epoch 499/500
100000/100000 [==============================] - 6s - loss: 0.1505 - acc: 0.7604 - val_loss: 0.1509 - val_acc: 0.7608
Epoch 500/500
100000/100000 [==============================] - 6s - loss: 0.1505 - acc: 0.7600 - val_loss: 0.1508 - val_acc: 0.7608
29550/30000 [============================>.] - ETA: 0sTest score:  0.150830901116
Test accuracy:  0.760773334801

any clues on what to try next? I also tried Dropout, but no success with it either.

I see, the amount of features is very low. Do you know if any other model has achieved a higher performance with those descriptors?
Looks like you'll have to change something in that aspect

Binary operations such as OR, AND, XOR etc are not good examples for RNNs, since there is no sequence / time dependency. Take a look at this page

http://www.xcprod.com/titan/XCSB-DOC/binary_or.html

You can see that each result bit is dependent on only the two bits to be OR'd, and not any surrounding bits.
An RNN should be able to learn it, but it makes for a better example for static NNs, that's why XOR is a 'hello world' example.

Binary addition is a better exercise, since it involves a carry bit which is dependent on the previous operation in the sequence. This is in numpy, not keras, but see this page and scroll down to 'Our Toy Code'

https://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/

On a separate note, are you generating the dataset at random? Because when n=5, 2^5 = 32 so there are only 32 5 bit sequences. So the number of possible OR operations with two 5 bit numbers is 32^2 = 1024, so that's the max size of your dataset. So for low n, you could generate the entire set and then split by train / test / validation.
By the time you get to n = 9 or so it will probably be worth going back to random.

Was this page helpful?
0 / 5 - 0 ratings