Keras: Error when checking target: expected dense_2 to have shape (20,) but got array with shape (1000,) while running pretrained_word_embeddings.py

Created on 27 Jul 2018 · 8Comments · Source: keras-team/keras

Error when checking target: expected dense_2 to have shape (20,) but got array with shape (1000,) while running pretrained_word_embeddings.py

https://github.com/keras-team/keras/blob/master/examples/pretrained_word_embeddings.py

Source

jonanem

Most helpful comment

I have a similiar error message: "ValueError: Error when checking target: expected dense_2 to have shape (10,) but got array with shape (1,)", and it seems that what is causing this problem is the loss parameter in the model architecture, when I use loss = 'sparse_categorical_crossentropy' gives me no error but when using this one, it does loss = 'categorical_crossentropy'

johnbond2 on 6 Mar 2020

👍4 😄1

All 8 comments

the error is related to your network as far as I can understand.
Can you post your code or the initial part?
I can assume that you did not divided the network as it must be. You have a shape of 20 but you try to pass 1000 without batches.

chriskoups on 27 Jul 2018

Please find the code below

`BASE_DIR = 'C:\Users\'
GLOVE_DIR = os.path.join(BASE_DIR, 'glove.6B')
TEXT_DATA_DIR = os.path.join(BASE_DIR, 'news20\20_newsgroup')
MAX_SEQUENCE_LENGTH = 1000
MAX_NUM_WORDS = 20000
EMBEDDING_DIM = 100
VALIDATION_SPLIT = 0.2
EMBEDDING_DIM = 100

embeddings_index = {}
with open(os.path.join(GLOVE_DIR, 'glove.6B.100d.txt'), encoding="utf8") as f:
for line in f:
values = line.split()
word = values[0]
coefs = np.asarray(values[1:],dtype = 'float32')
embeddings_index[word] = coefs

texts = [] # list of text samples
labels_index = {} # Dictionary mapping of label to label id
labels = [] # List of label ids
for name in sorted(os.listdir(TEXT_DATA_DIR)):
path = os.path.join(TEXT_DATA_DIR, name)
if os.path.isdir(path):
label_id = len(labels_index)
labels_index[name] = label_id

    for fname in sorted(os.listdir(path)):
        if fname.isdigit():
            fpath = os.path.join(path, fname)
            args = {} if sys.version_info < (3,) else {'encoding':'latin-1'}

            with open(fpath, **args) as f:
                t = f.read()
                i = t.find("\n\n") # Skips header
                if 0 < i:
                    t = t[i:]
                texts.append(t)
            labels.append(label_id)

tokenizer = Tokenizer(num_words = MAX_NUM_WORDS)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
word_index = tokenizer.word_index

Padding sequences

data = pad_sequences(sequences, maxlen = MAX_SEQUENCE_LENGTH)

To convert into binary

labels = to_categorical(np.asarray(labels))

indices = np.arange(data.shape[0])
np.random.shuffle(indices)
data = data[indices]
labels = labels[indices]
num_validation_samples = int(VALIDATION_SPLIT*data.shape[0])

x_train = data[:-num_validation_samples]
y_train = data[:-num_validation_samples]
x_test = data[-num_validation_samples:]
y_test = data[-num_validation_samples:]

num_words = min(MAX_NUM_WORDS, len(word_index)+1)
embedding_matrix = np.zeros((num_words, EMBEDDING_DIM))

for word, i in word_index.items():
if i >= MAX_NUM_WORDS:
continue
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
embedding_matrix[i] = embedding_vector

embedding_layer = Embedding(num_words,
EMBEDDING_DIM,
embeddings_initializer=Constant(embedding_matrix),
input_length = MAX_SEQUENCE_LENGTH,
trainable = False)

Training 1D convent with global maxpooling

sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype ='int32')
embedded_sequences = embedding_layer(sequence_input)
x = Conv1D(128, 5, activation = 'relu')(embedded_sequences)
x = MaxPooling1D(5)(x)
x = Conv1D(128, 5, activation = 'relu')(x)
x = MaxPooling1D(5)(x)
x = Conv1D(128, 5, activation = 'relu')(x)
x = GlobalMaxPooling1D()(x)
print(x.shape)
x = Dense(128, activation='relu')(x)
print(x.shape)
preds = Dense(len(labels_index), activation='softmax')(x)

model = Model(sequence_input, preds)
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
model.fit(x_train, y_train, batch_size=128, epochs = 10, validation_data=(x_test, y_test))`

jonanem on 27 Jul 2018

I am not sure what is costing that problem.
Wild guess will be that the lables_index is 20 and that is creating the problem. Can you confirm the output of len(labels_index)?

chriskoups on 27 Jul 2018

output is 20 to predict 20 Categories

jonanem on 27 Jul 2018

Github is for issues in Keras while this is just an implementation error. In future please open stackoverflow questions instead of posting on Github:

x_train = data[:-num_validation_samples]
y_train = data[:-num_validation_samples] .  # supposed to be labels
x_test = data[-num_validation_samples:]
y_test = data[-num_validation_samples:] .  # supposed to be lablels

tRosenflanz on 27 Jul 2018

Thank you so much for your help

jonanem on 27 Jul 2018

johnbond2 on 6 Mar 2020

👍4 😄1

I have a similiar error message: "ValueError: Error when checking target: expected dense_2 to have shape (10,) but got array with shape (1,)", and it seems that what is causing this problem is the loss parameter in the model architecture, when I use loss = 'sparse_categorical_crossentropy' gives me no error but when using this one, it does loss = 'categorical_crossentropy'

But even after i changed that still i got that error but with less error shape