Been following some related threads, such as #395, #2654, and #2403, but still cannot sort out how to get it to work. The Keras API doc is already very dated so it's not very helpful for this issue.
So I want to use a pretrained word2vec word presentation + Keras LSTM to do POS tagging.
My first question is: is there a better way to feed in the pretrained vector presentation than the embedding_weights method mentioned at #853?
Say we embed using the method mentioned in #853, and get a (M+2) by N embedding matrix. We also pad the variable-length sentences. Then we have
X_pad.shape = (M, N)
y_pad.shape = (M, N)
where M is the number of sentences in the corpus (in my case 18421), and N is padded sentence length (originals vary from 15-140 so in this case N=140)
Here is how I initialized the model
model = Sequential()
# first embedding layer
model.add(Embedding(input_dim=vocab_size, output_dim=embed_size, input_length=N, mask_zero=True, weights=[embedding_matrix]))
# hidden layer
model.add(LSTM(output_dim=hidden_dim, return_sequences=True))
# output layer
model.add(TimeDistributed(Dense(num_class, activation='softmax')))
# compile
model.compile(loss='categorical_crossentropy', optimizer='adam')
When I run model.fit(X_pad, y_pad), I got this error:
Exception: Error when checking model target: expected timedistributed_1 to have 3 dimensions, but got array with shape (18421, 140)
Been stuck here for a while. Any suggestion is appreciated!
I ran across this problem as well. I am still not sure why this is the case and if this is the desired behaviour, but I did manage to get around it by putting all my output values in separate arrays, i.e.:
X = [[1, 2]]
X_padded = keras.preprocessing.sequence.pad_sequences(X, dtype='float32', maxlen=3)
Y = [[[1], [2]]]
Y_padded = keras.preprocessing.sequence.pad_sequences(Y, maxlen=3, dtype='float32')
See also #3855, which is about a different sequence to sequence learning with variable length problem, but also mentions this issue.
@dieuwkehupkes thanks for the hint! Turns out one-hot encoding is needed.
And for people who have similar issues, you can solve the problem by creating and feeding the 3-d y_pad_one_hot into the previous model
import numpy as np
from keras.utils.np_utils import to_categorical
# y_pad_one_hot.shape: (M, N, nb_classes)
y_pad_one_hot = np.array([to_categorical(sent_label, nb_classes=nb_classes) for sent_label in y_pad])
model.fit(X_pad, y_pad_one_hot)
Still need to find the best way to mask the padding, though.
@ShuaiW can you provide the detailed value of "nb_classes" and "num_class". I encountered the same problem , please help!
@yangxiufengsia num_class/nb_classes is the number of classes.
@ShuaiW If the output is a set of words, the num_class becomes the vocab_size. Assuming that I am expecting an output of 20 words, a one hot encoded Y becomes [vocab_size, max_output_words]. Is this correct?
Most helpful comment
@dieuwkehupkes thanks for the hint! Turns out one-hot encoding is needed.
And for people who have similar issues, you can solve the problem by creating and feeding the 3-d y_pad_one_hot into the previous model
Still need to find the best way to mask the padding, though.