Keras: Exception: Input arrays should have the same number of samples as target arrays. Found 28019580 input samples and 2142942 target samples

Created on 27 Dec 2016  路  15Comments  路  Source: keras-team/keras

program snippet given below. I am trying for character based text classification.

trainlabels = pd.read_csv('labels.csv', header=None)
trainlabel = trainlabels.iloc[:,0:1]

path = "/home/censpark/rancyb/Allinone/other/mydga/lstm/multiclass/dga/New/train.txt"
X = open(path).read()
path = "/home/censpark/rancyb/Allinone/other/mydga/lstm/multiclass/dga/New/test.txt"
T = open(path).read()

Generate a dictionary of valid characters

valid_chars = {x:idx+1 for idx, x in enumerate(set(''.join(X)))}

max_features = len(valid_chars) + 1
maxlen = np.max([len(x) for x in X])

Convert characters to int and pad

X = [[valid_chars[y] for y in x] for x in X]

X_train = sequence.pad_sequences(X, maxlen=maxlen)
y_train1 = np.array(trainlabel)
y_train= to_categorical(y_train1)

embedding_vecor_length = 128

model = Sequential()
model.add(Embedding(max_features, embedding_vecor_length, input_length=maxlen))
model.add(LSTM(128))
model.add(Dropout(0.2))
model.add(Dense(18))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['accuracy'])

model.fit(X_train, y_train, batch_size=128, nb_epoch=15,validation_data=(X_train,y_train), shuffle=True)
score, acc = model.evaluate(X_train, y_train, batch_size=128)
print('Test score:', score)
print('Test accuracy:', acc)

untitled

stale

Most helpful comment

@abhijitnathwani I ran into the same Issue - what worked for me is ensuring that train_data.size % batch_size = 0, or in other words the size of your data arrays needs to be a multiple of the batch size.

All 15 comments

I think error says len(X_train) != len(y_train), so you can start by making first sure these arrays have the same lengths.

Hello,
I am facing the same issue. The lengths of the array are same as suggested by @kudkudak . Did anyone solve this?

@abhijitnathwani I ran into the same Issue - what worked for me is ensuring that train_data.size % batch_size = 0, or in other words the size of your data arrays needs to be a multiple of the batch size.

Thanks @silvanheller . I did some work around, and concluded the same!

i have same propblem i use train data = 1628 and validation =489 wht i should do to solve this

Thank you @silvanheller solved my problem.

When my batch size is 100, I get ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(30000, 32, 32, 1), (10000, 10)], where train data%batch size = 0. What do I do?

Solved by kudkudak

I had this problem as well - one thing to check is how you are loading in your training data. For me, I versioned my files by adding the number to the end. So, when I was referencing the file name to load it in, I was referencing an older version of the data with a different number of samples thus len(x_train) != len(y_train). Just one more thing to check.

@sophchoe what were the changes you had to do, to get rid of the error?

ValueError: Input arrays should have the same number of samples as target arrays. Found 4847 input samples and 4846 target samples.

I have this error, can someone solve this?

confirmed!!!! it was caused by len(X_train) != len(y_train),

How can this issue be solved??

Somebody reply ASAP please

I have the same problem and len(X_train) == len(y_train).
My model is fitted from a generator and I realized that this error happens when the batch size changes.

These are the input and target shapes. When the shape changes from 128 to 16 samples Bam error.
(128, 900) (128, 11)
(128, 900) (128, 11)
(128, 900) (128, 11)
(128, 900) (128, 11)
(128, 900) (128, 11)
(128, 900) (128, 11)
(128, 900) (128, 11)
(16, 900) (16, 11)

ValueError: Input arrays should have the same number of samples as target arrays. Found 16 input samples and 128 target samples.

This works for me :+1:

history = model.fit_generator(train_data, 
                              steps_per_epoch=train_steps,
                              epochs=10, 
                              validation_data=test_data,
                              validation_steps=test_steps,
                              use_multiprocessing=True,
                              callbacks=callbacks_list)

I set use_multiprocessing to True and the error disappeared.

Was this page helpful?
0 / 5 - 0 ratings