Keras: model.predict_classes giving wrong labels

Created on 9 Oct 2017  路  11Comments  路  Source: keras-team/keras

My model is like
print('Build main model...')
model = Sequential()
model.add(Merge([left, right], mode='sum'))

model.add(LSTM(128))

model.add(Dropout(0.2))

model.add(Dense(14, activation='softmax'))

model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])

when I use model.evaluate([xtest1, xtest2], y_test), I get an accuracy of 90% but when I use model.predict_classes([x_test1, x_xtest2]), I get totally wrong class labels, going by which my accuracy drops significantly. What is the difference in model.evaluate and model.predict_classes schema? Where am I making the mistake?

Most helpful comment

If your last layer is model.add(Dense(14, activation='softmax')) I'm guessing you have 14 classes. You should be using categorical_crossentropy as your loss.

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

All 11 comments

If your last layer is model.add(Dense(14, activation='softmax')) I'm guessing you have 14 classes. You should be using categorical_crossentropy as your loss.

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

(Note: it's helpful to describe your data, too, not just your model.)

The binary_crossentropy loss function/metric is for binary classification problems (yes/no questions like "Is this spam?"), so defer to @nicolewhite's advice and use categorical_crossentropy if you have many categories. When you specify metrics=['accuracy'], this tells evaluate() to assess the accuracy of predictions using the loss function you specified (the incorrect binary_crossentropy), which gives you a better-looking (but also less correct) 90%, while predict() will calculate your accuracy manually from the results (more correct, but unfortunately, not such great-looking results). So the 90% is just a false flag from using the wrong loss function.

@nicolewhite @charlesreid1 you both are right. I should have tried categorical_crossentropy. But I have another doubt that, do you think it is right to use embedding layer for video classification, where my input are CNN features. Because if I dont use embedding layer, then the input to my LSTM should be of shape (#videos, #timesteps, #features=4096). But in this case I go for padding with zeros and my accuracy drops significantly. On the other hand, if I use embedding layer, then my input is (frames, features) and I have a label for each frame.

@nicolewhite @srijandas07 Sorry to bother you. I just meet the similar problem with srijandas07's.What's different,I trained my data with the sparse_categorical_crossentropy for binary classifation.Then I predict the test data,whatever I input,the predict results are 1111111(all positive),what should I do?Use the binary crossentropy?

I think you should use loss='binary_crossentropy'. And even if that produces the same result then, may be your data is converging to the same class. Its better to put your code in that case so that we can have a look on that..

The rate ( negative samples vs the positive samples) is 2:1.It is strange that the predict result is all positive.
Can you leave me your own e-mail address so that I can send my code to you?

Hey @srijandas07 ! I am facing a similar problem (on a fine tuned VGG16 on 4 classes). I am using categorical_crossentropy so that is not the issue and my purpose is to fine tune the CNN model for both RGB and Flow before using them to extract features and feed to LSTM (for video classification). I had good accuracies on both the CNNs and the LSTM but after trying predict_classes() on the sequences of frames, I realized it was predicting wrong. Then I tried it on the CNN and it is the same problem. Even model.evaluate() gave good results but I don't understand why the model.predict_classes() returns a high level of misclassifications. Do you have any idea about this by any chance ? thank you !!

@Osumann Since you are using categorical cross entropy, its not a problem of your loss. But by any chance are you using predict_genrator? In that case, you might get the output labels randomly. So while testing you need to turnoff your shuffle command. A small snippet of the code might help to detect the problem here.

@srijandas07 (It's me Osumann. I just changed my username) I do use predict_generator() and data_generator() etc... At first, I made the error of shuffling during the testing, but I repaired that later. The problem is not during the training testing phases, but it's when I try to use the model to predict. When I use predict_classes() on data from only one of the multiple classes, I get bad results. This contradicts the results from predict_generator() and model.evaluate() and it seems weird to me. I will run few more tests and then share some snippets. Do you prefer if I send you that by email ?
(sorry for replying this late. I don't know how I didn't see your reply)

Yes, you can reach me via my e-mail address.

Was this page helpful?
0 / 5 - 0 ratings