Keras: un balanced (number of 1s and number of 0s are very differnt) data gives bad results

Created on 18 Mar 2016  路  14Comments  路  Source: keras-team/keras

if train and test data are biased to for example one class than training process will be biased
http://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/
http://scikit-learn.org/stable/auto_examples/svm/plot_separating_hyperplane_unbalanced.html

https://www.researchgate.net/post/What_are_the_possible_approaches_for_solving_imbalanced_class_problems

Please make sure that the boxes below are checked before you submit your issue. Thank you!

stale

Most helpful comment

This isn't a Keras issue.

All 14 comments

This isn't a Keras issue.

https://www.quora.com/Which-balance-strategy-to-learn-from-my-very-imbalanced-dataset

so when data is un balanced there is may be as one of strategies
When you update weights during a minibatch during training, consider the proportions of the two classes in the minibatch and then update the weights accordingly.

so if I have much more negative labelled samples then positive , then it may be good to create batches to have the same number of negative and positive samples. Or if I want to enforce negatives then I need the option to put more negatives to batches
Of cause I may create data to be balanced by adding the same negatives many times , trick keras?

some info for answer may be found in
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/keras-users/LYo7sqE75N4/9K2TJHngCAAJ

I have tried to "balance" out the classes by setting the class_weight=class_weight={0:1, 1:100000}.

https://github.com/fchollet/keras/issues/177
Loss scaling would happen inside objectives.py functions, using a class_weight parameter set in model.fit or model.train. The amount of changes needed to get it rolling would be minimal.

the problem is not so simple as seems to be, I put more links
http://ro.uow.edu.au/cgi/viewcontent.cgi?article=10491&context=infopapers

A supervised learning approach for imbalanced data sets
Giang H. Nguyen
University of Wollongong, [email protected]
Abdesselam Bouzerdoum
University of Wollongong, [email protected]
Son Lam Phung
University of Wollongong, [email protected]

and more links
# http://arxiv.org/pdf/1508.03422.pdf
#Cost-Sensitive Learning of Deep Feature
#Representations from Imbalanced Data

http://www.cs.utah.edu/~piyush/teaching/ImbalancedLearning.pdf

Learning from Imbalanced Data
and even phd thesis

http://lib.dr.iastate.edu/cgi/viewcontent.cgi?article=4544&context=etd
A balanced approach to the multi-class imbalance
problem
Lawrence Mosley
Iowa State University

this example how to deal with unbalanced data http://pastebin.com/0QHtPGzJ , but still this is not working for my task

How are doing with your task? I met a same imbalance problem of classification from time series data sets, the proportion of the minority is about 0.2%. I tried the oversampling like SMOTE, but it didn't work.

@Sandy4321 @danielgy
Training

  1. WRONG Worth a shot in Keras (near zero effort)
    model.fit(X_train, Y_train, nb_epoch=5, batch_size=32, class_weight = 'auto')
    Undocumented, mentioned in Google group: will class balance each batch. (Does not work, but not throwing an error)
    In my mind this reduces learning issues due to imbalanced batch updates.
    Also, if you google oversampling and NNs you can hit on papers claiming that simple training set oversampling is valid (though simple).
  2. Consider using Keras Modelchecking Callback watching the validation loss. Validation accuracy may be misleading under fake balance (Val acc may overstate performance if you care more about Class=1). Neither is really good thoug.
  3. Consider validation_data instead of validation_split in fit(). That way you can provide an unbalanced validation set and val_loss becomes a better measure of real performance. (Not sure if this isn't implicitly taken care of with validation_split).

Evaluation:

Results are bad

Test on unblanced test set.
Average Precision aka AUC of Precision Recall Curve (AUC of PR)
In contrast to AUC this measure incorporates class imbalance.
AUC_PR = average_precision_score(y_true=y_test, y_score=model.predict(X_test), average='weighted')

Let us know how this works for extreme class imbalance.

There is nothing related to class_weight = 'auto' in Keras code. Don't use it! Check https://github.com/fchollet/keras/issues/5116.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@fabioperez
Thanks for pointing out the 'auto' mistake. Edited post. Nice catch.
@Sandy4321

  • What performance measure are you optimizing for? I struggled with this issue and found that it is not one necessarily a model issue but how we often measure performance.
    E.g. we turn sigmoid probabilities into binary via a threshold = 0.5. Your model may learn this class imbalance and put the actual best prediction threshold above or below 0.5. The ROC measure for example tries all thresholds and is quite robust against class imbalance. It can also give you things like the break even point etc. You can easily calculate the optimal threshold automatically, no need to learn it -- if your point is decision automation anyway.
  • Balancing is really artificial somehow -- IMHO after dealing with this. Some papers suggest to balance training and not test or balance both -- either is flawed IMHO. If the model learns class imbalance than why not. You can always compute this threshold on your tuning/validation set (not your test set to be rigorous).
  • Make sure your training, validation/tuning, test data have the same class balance (as in sklearn stratified sampling/ k-fold). This may seem artificial but any other sampling is simply random giving you random results.
  • (optional deeper analysis) then you could test how your model performs if you change validation or test set balance (is it robust against class balance changes).
  • a cheap way. Use multiple measures to derive a final image of performance. No measure alone is perfect and any model is usually optimized for a particular measure such as F1, accuracy, MAPE, ROC etc.

Happy to hear what you found.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Imorton-zd picture Imorton-zd  路  3Comments

anjishnu picture anjishnu  路  3Comments

farizrahman4u picture farizrahman4u  路  3Comments

NancyZxll picture NancyZxll  路  3Comments

vinayakumarr picture vinayakumarr  路  3Comments