program snippet given below. I am trying for character based text classification.
trainlabels = pd.read_csv('labels.csv', header=None)
trainlabel = trainlabels.iloc[:,0:1]
path = "/home/censpark/rancyb/Allinone/other/mydga/lstm/multiclass/dga/New/train.txt"
X = open(path).read()
path = "/home/censpark/rancyb/Allinone/other/mydga/lstm/multiclass/dga/New/test.txt"
T = open(path).read()
valid_chars = {x:idx+1 for idx, x in enumerate(set(''.join(X)))}
max_features = len(valid_chars) + 1
maxlen = np.max([len(x) for x in X])
X = [[valid_chars[y] for y in x] for x in X]
X_train = sequence.pad_sequences(X, maxlen=maxlen)
y_train1 = np.array(trainlabel)
y_train= to_categorical(y_train1)
embedding_vecor_length = 128
model = Sequential()
model.add(Embedding(max_features, embedding_vecor_length, input_length=maxlen))
model.add(LSTM(128))
model.add(Dropout(0.2))
model.add(Dense(18))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=128, nb_epoch=15,validation_data=(X_train,y_train), shuffle=True)
score, acc = model.evaluate(X_train, y_train, batch_size=128)
print('Test score:', score)
print('Test accuracy:', acc)
I think error says len(X_train) != len(y_train)
, so you can start by making first sure these arrays have the same lengths.
Hello,
I am facing the same issue. The lengths of the array are same as suggested by @kudkudak . Did anyone solve this?
@abhijitnathwani I ran into the same Issue - what worked for me is ensuring that train_data.size % batch_size = 0, or in other words the size of your data arrays needs to be a multiple of the batch size.
Thanks @silvanheller . I did some work around, and concluded the same!
i have same propblem i use train data = 1628 and validation =489 wht i should do to solve this
Thank you @silvanheller solved my problem.
When my batch size is 100, I get ValueError: All input arrays (x) should have the same number of samples. Got array shapes: [(30000, 32, 32, 1), (10000, 10)], where train data%batch size = 0. What do I do?
Solved by kudkudak
I had this problem as well - one thing to check is how you are loading in your training data. For me, I versioned my files by adding the number to the end. So, when I was referencing the file name to load it in, I was referencing an older version of the data with a different number of samples thus len(x_train) != len(y_train). Just one more thing to check.
@sophchoe what were the changes you had to do, to get rid of the error?
ValueError: Input arrays should have the same number of samples as target arrays. Found 4847 input samples and 4846 target samples.
I have this error, can someone solve this?
confirmed!!!! it was caused by
len(X_train) != len(y_train)
,
How can this issue be solved??
Somebody reply ASAP please
I have the same problem and len(X_train) == len(y_train).
My model is fitted from a generator and I realized that this error happens when the batch size changes.
These are the input and target shapes. When the shape changes from 128 to 16 samples Bam error.
(128, 900) (128, 11)
(128, 900) (128, 11)
(128, 900) (128, 11)
(128, 900) (128, 11)
(128, 900) (128, 11)
(128, 900) (128, 11)
(128, 900) (128, 11)
(16, 900) (16, 11)
ValueError: Input arrays should have the same number of samples as target arrays. Found 16 input samples and 128 target samples.
This works for me :+1:
history = model.fit_generator(train_data,
steps_per_epoch=train_steps,
epochs=10,
validation_data=test_data,
validation_steps=test_steps,
use_multiprocessing=True,
callbacks=callbacks_list)
I set use_multiprocessing to True and the error disappeared.
Most helpful comment
@abhijitnathwani I ran into the same Issue - what worked for me is ensuring that train_data.size % batch_size = 0, or in other words the size of your data arrays needs to be a multiple of the batch size.