Keras: Error when checking target: expected dense_2 to have shape (20,) but got array with shape (1000,) while running pretrained_word_embeddings.py

Created on 27 Jul 2018  路  8Comments  路  Source: keras-team/keras

Error when checking target: expected dense_2 to have shape (20,) but got array with shape (1000,) while running pretrained_word_embeddings.py

https://github.com/keras-team/keras/blob/master/examples/pretrained_word_embeddings.py

Most helpful comment

I have a similiar error message: "ValueError: Error when checking target: expected dense_2 to have shape (10,) but got array with shape (1,)", and it seems that what is causing this problem is the loss parameter in the model architecture, when I use loss = 'sparse_categorical_crossentropy' gives me no error but when using this one, it does loss = 'categorical_crossentropy'

All 8 comments

the error is related to your network as far as I can understand.
Can you post your code or the initial part?
I can assume that you did not divided the network as it must be. You have a shape of 20 but you try to pass 1000 without batches.

Please find the code below

`BASE_DIR = 'C:\Users\'
GLOVE_DIR = os.path.join(BASE_DIR, 'glove.6B')
TEXT_DATA_DIR = os.path.join(BASE_DIR, 'news20\20_newsgroup')
MAX_SEQUENCE_LENGTH = 1000
MAX_NUM_WORDS = 20000
EMBEDDING_DIM = 100
VALIDATION_SPLIT = 0.2
EMBEDDING_DIM = 100

embeddings_index = {}
with open(os.path.join(GLOVE_DIR, 'glove.6B.100d.txt'), encoding="utf8") as f:
for line in f:
values = line.split()
word = values[0]
coefs = np.asarray(values[1:],dtype = 'float32')
embeddings_index[word] = coefs

texts = [] # list of text samples
labels_index = {} # Dictionary mapping of label to label id
labels = [] # List of label ids
for name in sorted(os.listdir(TEXT_DATA_DIR)):
path = os.path.join(TEXT_DATA_DIR, name)
if os.path.isdir(path):
label_id = len(labels_index)
labels_index[name] = label_id

    for fname in sorted(os.listdir(path)):
        if fname.isdigit():
            fpath = os.path.join(path, fname)
            args = {} if sys.version_info < (3,) else {'encoding':'latin-1'}

            with open(fpath, **args) as f:
                t = f.read()
                i = t.find("\n\n") # Skips header
                if 0 < i:
                    t = t[i:]
                texts.append(t)
            labels.append(label_id)

tokenizer = Tokenizer(num_words = MAX_NUM_WORDS)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
word_index = tokenizer.word_index

Padding sequences

data = pad_sequences(sequences, maxlen = MAX_SEQUENCE_LENGTH)

To convert into binary

labels = to_categorical(np.asarray(labels))

indices = np.arange(data.shape[0])
np.random.shuffle(indices)
data = data[indices]
labels = labels[indices]
num_validation_samples = int(VALIDATION_SPLIT*data.shape[0])

x_train = data[:-num_validation_samples]
y_train = data[:-num_validation_samples]
x_test = data[-num_validation_samples:]
y_test = data[-num_validation_samples:]

num_words = min(MAX_NUM_WORDS, len(word_index)+1)
embedding_matrix = np.zeros((num_words, EMBEDDING_DIM))

for word, i in word_index.items():
if i >= MAX_NUM_WORDS:
continue
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
embedding_matrix[i] = embedding_vector

embedding_layer = Embedding(num_words,
EMBEDDING_DIM,
embeddings_initializer=Constant(embedding_matrix),
input_length = MAX_SEQUENCE_LENGTH,
trainable = False)

Training 1D convent with global maxpooling

sequence_input = Input(shape=(MAX_SEQUENCE_LENGTH,), dtype ='int32')
embedded_sequences = embedding_layer(sequence_input)
x = Conv1D(128, 5, activation = 'relu')(embedded_sequences)
x = MaxPooling1D(5)(x)
x = Conv1D(128, 5, activation = 'relu')(x)
x = MaxPooling1D(5)(x)
x = Conv1D(128, 5, activation = 'relu')(x)
x = GlobalMaxPooling1D()(x)
print(x.shape)
x = Dense(128, activation='relu')(x)
print(x.shape)
preds = Dense(len(labels_index), activation='softmax')(x)

model = Model(sequence_input, preds)
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
model.fit(x_train, y_train, batch_size=128, epochs = 10, validation_data=(x_test, y_test))`

I am not sure what is costing that problem.
Wild guess will be that the lables_index is 20 and that is creating the problem. Can you confirm the output of len(labels_index)?

output is 20 to predict 20 Categories

Github is for issues in Keras while this is just an implementation error. In future please open stackoverflow questions instead of posting on Github:

x_train = data[:-num_validation_samples]
y_train = data[:-num_validation_samples] .  # supposed to be labels
x_test = data[-num_validation_samples:]
y_test = data[-num_validation_samples:] .  # supposed to be lablels

Thank you so much for your help

I have a similiar error message: "ValueError: Error when checking target: expected dense_2 to have shape (10,) but got array with shape (1,)", and it seems that what is causing this problem is the loss parameter in the model architecture, when I use loss = 'sparse_categorical_crossentropy' gives me no error but when using this one, it does loss = 'categorical_crossentropy'

I have a similiar error message: "ValueError: Error when checking target: expected dense_2 to have shape (10,) but got array with shape (1,)", and it seems that what is causing this problem is the loss parameter in the model architecture, when I use loss = 'sparse_categorical_crossentropy' gives me no error but when using this one, it does loss = 'categorical_crossentropy'

But even after i changed that still i got that error but with less error shape

Was this page helpful?
0 / 5 - 0 ratings