Keras: Exception: Input arrays should have the same number of samples as target arrays. Found 28019580 input samples and 2142942 target samples

Created on 27 Dec 2016 · 15Comments · Source: keras-team/keras

program snippet given below. I am trying for character based text classification.

trainlabels = pd.read_csv('labels.csv', header=None)
trainlabel = trainlabels.iloc[:,0:1]

path = "/home/censpark/rancyb/Allinone/other/mydga/lstm/multiclass/dga/New/train.txt"
X = open(path).read()
path = "/home/censpark/rancyb/Allinone/other/mydga/lstm/multiclass/dga/New/test.txt"
T = open(path).read()

Generate a dictionary of valid characters

valid_chars = {x:idx+1 for idx, x in enumerate(set(''.join(X)))}

max_features = len(valid_chars) + 1
maxlen = np.max([len(x) for x in X])

Convert characters to int and pad

X = [[valid_chars[y] for y in x] for x in X]

X_train = sequence.pad_sequences(X, maxlen=maxlen)
y_train1 = np.array(trainlabel)
y_train= to_categorical(y_train1)

embedding_vecor_length = 128

model = Sequential()
model.add(Embedding(max_features, embedding_vecor_length, input_length=maxlen))
model.add(LSTM(128))
model.add(Dropout(0.2))
model.add(Dense(18))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['accuracy'])

model.fit(X_train, y_train, batch_size=128, nb_epoch=15,validation_data=(X_train,y_train), shuffle=True)
score, acc = model.evaluate(X_train, y_train, batch_size=128)
print('Test score:', score)
print('Test accuracy:', acc)

untitled

stale

Source

vinayakumarr

Most helpful comment

@abhijitnathwani I ran into the same Issue - what worked for me is ensuring that train_data.size % batch_size = 0, or in other words the size of your data arrays needs to be a multiple of the batch size.

silvanheller on 29 Oct 2017

👍10

All 15 comments

I think error says len(X_train) != len(y_train), so you can start by making first sure these arrays have the same lengths.

kudkudak on 28 Dec 2016

👍9

Hello,
I am facing the same issue. The lengths of the array are same as suggested by @kudkudak . Did anyone solve this?

abhijitnathwani on 10 Aug 2017

👍5

silvanheller on 29 Oct 2017

👍10

Thanks @silvanheller . I did some work around, and concluded the same!

abhijitnathwani on 30 Oct 2017

🎉2

i have same propblem i use train data = 1628 and validation =489 wht i should do to solve this

rihamriffat on 24 Nov 2017

Thank you @silvanheller solved my problem.

russell0 on 28 Nov 2017

When my batch size is 100, I get ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(30000, 32, 32, 1), (10000, 10)], where train data%batch size = 0. What do I do?

sophchoe on 10 Jan 2018

Solved by kudkudak

sophchoe on 10 Jan 2018

I had this problem as well - one thing to check is how you are loading in your training data. For me, I versioned my files by adding the number to the end. So, when I was referencing the file name to load it in, I was referencing an older version of the data with a different number of samples thus len(x_train) != len(y_train). Just one more thing to check.

brockelmore on 13 Mar 2018

@sophchoe what were the changes you had to do, to get rid of the error?

shrybht on 21 Jan 2019

ValueError: Input arrays should have the same number of samples as target arrays. Found 4847 input samples and 4846 target samples.

I have this error, can someone solve this?

Aveen1 on 25 Aug 2019

confirmed!!!! it was caused by len(X_train) != len(y_train),

cybermi on 27 Sep 2019

How can this issue be solved??

Somebody reply ASAP please

cshendye on 17 Jan 2020

😄1

I have the same problem and len(X_train) == len(y_train).
My model is fitted from a generator and I realized that this error happens when the batch size changes.

These are the input and target shapes. When the shape changes from 128 to 16 samples Bam error.
(128, 900) (128, 11)
(128, 900) (128, 11)
(128, 900) (128, 11)
(128, 900) (128, 11)
(128, 900) (128, 11)
(128, 900) (128, 11)
(128, 900) (128, 11)
(16, 900) (16, 11)

ValueError: Input arrays should have the same number of samples as target arrays. Found 16 input samples and 128 target samples.

tchaye59 on 26 Mar 2020

This works for me :+1:

history = model.fit_generator(train_data, 
                              steps_per_epoch=train_steps,
                              epochs=10, 
                              validation_data=test_data,
                              validation_steps=test_steps,
                              use_multiprocessing=True,
                              callbacks=callbacks_list)

I set use_multiprocessing to True and the error disappeared.

tchaye59 on 26 Mar 2020

Was this page helpful?

0 / 5 - 0 ratings