Keras: Confusion regarding class_weight

Created on 2 Mar 2016 · 17Comments · Source: keras-team/keras

Hey there,

How does one actually use class_weight on model.fit?

I had originally written the following method to do this, but I'm not entirely sure whether it works or not.

def calculate_class_weights(train_label):
    list = train_label.tolist()

    num_neg = list.count(0)
    num_pos = list.count(1)

    duplicate = num_pos / num_neg

    class_weights={0 : (num_neg * (duplicate)) , 1: num_pos }
    return class_weights

This returns a dictionary of...

{0: 34, 1: 34}

Does anyone have a working example of how to balance 2 classes using the class_weights method?

Thanks,

Keiron.

stale

Source

KeironO

Most helpful comment

I have this simple function for computing the weights for each class:

def get_class_weights(y):
    counter = Counter(y)
    majority = max(counter.values())
    return  {cls: float(majority/count) for cls, count in counter.items()}

What i do is pick the majority class as a reference and assign weights for the other classes based on the reference class. So if you have 3 classes with classA:10%, classB:50% and classC:40% then you get the weights:

{0:5, 1:1, 2:1.25}

So this means that if you miss-classify classA the loss will be 5 times more than miss-classifying classB and so on...

cbaziotis on 19 Jan 2017

👍41 🎉3

All 17 comments

You should probably post this in the Keras google group:
https://groups.google.com/forum/#!forum/keras-users

Stackoverflow would work too.

hlin117 on 2 Mar 2016

👍2

See also: https://groups.google.com/forum/#!topic/keras-users/MUO6v3kRHUw

MartinThoma on 21 May 2016

train_generator = train_datagen.flow_from_directory(
    train_img_path,  # this is the target directory
    target_size=(img_rows, img_cols), 
    batch_size=batch_size,
    class_mode='binary',
    color_mode='grayscale',
    classes=['good', 'bad'],
    save_to_dir=generate_train_img_path) 

validation_generator = test_datagen.flow_from_directory(
    validation_img_path,
    target_size=(img_rows, img_cols),
    batch_size=batch_size,
    class_mode='binary',
    color_mode='grayscale',
    classes=['good', 'bad'],
    save_to_dir=generate_validation_img_path)

#There are 83% images which are class 1, and 17% images which are class 0.  I balance 2 classes using the class_weights
class_weight = {0:83,1:17}

for i in range(0, nb_epoch):
    print('epoch:{}'.format(i))
    if i == 0:
        print('epoch:{}'.format(i))
    else:
        model.load_weights('{}.h5'.format(i - 1))
    model.fit_generator(
        train_generator,
        samples_per_epoch=1800,
        nb_epoch=1,
        validation_data=validation_generator,
        nb_val_samples=250,
        class_weight=class_weight)
    model.save('{}.h5'.format(i))

PaulChongPeng on 16 Dec 2016

I have this simple function for computing the weights for each class:

def get_class_weights(y):
    counter = Counter(y)
    majority = max(counter.values())
    return  {cls: float(majority/count) for cls, count in counter.items()}

{0:5, 1:1, 2:1.25}

So this means that if you miss-classify classA the loss will be 5 times more than miss-classifying classB and so on...

cbaziotis on 19 Jan 2017

👍41 🎉3

Seems like a useful utility function. Do you have any papers about the best choices of class weights and why this is the correct scaling? Maybe rename to something like balanced_class_weights and try to add it to np_utils.

bstriner on 22 Jan 2017

Here is an example applying SegNet to the RoadScene dataset, where the class_weights are given for several classes in the images:

class_weighting = [
 0.2595,
 0.1826,
 4.5640,
 0.1417,
 0.5051,
 0.3826,
 9.6446,
 1.8418,
 6.6823,
 6.2478,
 3.0,
 7.3614
]
# Fit the model
history = segnet_basic.fit(train_data, train_label, callbacks=callbacks_list, batch_size=batch_size, nb_epoch=nb_epoch,
                    verbose=1, class_weight=class_weighting , validation_data=(test_data, test_label), shuffle=True) # validation_split=0.33

0bserver07 on 7 Apr 2017

👍1

Hi
I'm confused about how to use the class_weights, I pasted a simple example here, in the example I fit the same inputs to predict two different classes, without weights, the prediction for the inputs should be 50% for 2nd class and 50% for 4th class. And I set the class_weight to mask out the 2nd class, but by doing this, the prediction gives the same 50% 50% results. Am I doing something wrong?

from keras.models import Sequential
from keras.layers import Dense
from keras import optimizers
import numpy as np
model = Sequential()
model.add(Dense(4, input_shape=(2,)))
model.add(Dense(4, activation='softmax'))
model.compile(optimizer=optimizers.Adagrad(), loss='categorical_crossentropy')


x = np.array([[1,1], [1,1]])
y = np.array([[0,1,0,0], [0,0,0,1]])
weights_mask = np.array([1, 1])
class_weights = {
    0:0,
    1:0,
    2:0,
    3:10
}
# weights_mask = np.array([1])
model.fit(x,y, epochs=1000, sample_weight=weights_mask, class_weight=class_weights, validation_data=((x,y)))

ret = model.predict(x)

print(ret)

newbiesitl on 4 Jun 2017

👍1

@0bserver07 how to set the value in class_weight ? Is there some papers to refer?

alyato on 6 Jul 2017

👍3

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale[bot] on 4 Oct 2017

How do we use class_weight in case of fit_generator? I mean is there a way to do it on the go for each training batch? I tried using a generator to return class_weight for each batch but that gives me

TypeError: object of type 'generator' has no len()

I actually want to calculate class weights for each batch and not the entire dataset. And I am unable to find a way to do this using fit_generator without duplication of effort.

AniketDhar on 17 May 2018

@AniketDhar Check this: https://stackoverflow.com/questions/42586475/is-it-possible-to-automatically-infer-the-class-weight-from-flow-from-directory

obinsc on 21 Jul 2018

👍1

If you are using R: https://stackoverflow.com/questions/46907881/how-to-set-class-weight-in-keras-package-of-r/52123775#52123775

pablo14 on 1 Sep 2018

@0bserver07 I am training a semantic segmentation network. When I try to pass a dict as the class_weight parameter to fit_generator, it complains that ValueError:class_weightnot supported for 3+ dimensional targets., but when I pass it a list like you did, it magically works! But the docs don't mention anything about passing lists to the class_weight parameter of fit or fit_generator. Could you please shed some light on how this is working? Thanks!

sagnibak on 13 Jan 2019

👍1

What would a class weight of 0 imply?

For example, suppose class_weights are {0 : 0.5, 1 : 0.5, 2 : 0.0}. Does this mean we're asking the model to consider classes 0 and 1 equally and ignore class 2 i.e. not have it contribute to the loss?

ashnair1 on 23 Jan 2019

👍1

yes

On Jan 22, 2019, at 11:51 PM, Ashwin Nair notifications@github.com wrote:

What would a class weight of 0.0 imply?

For example, suppose class_weights are {0 : 0.5, 1 : 0.5, 2 : 0.0}. Does this mean we're asking the model to consider classes 0 and 1 equally and ignore class 2?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

newbiesitl on 23 Jan 2019

👍1

@0bserver07 I am training a semantic segmentation network. When I try to pass a dict as the class_weight parameter to fit_generator, it complains that ValueError:class_weightnot supported for 3+ dimensional targets., but when I pass it a list like you did, it magically works! But the docs don't mention anything about passing lists to the class_weight parameter of fit or fit_generator. Could you please shed some light on how this is working? Thanks!

I have the same problem. could you please let me know how did you fix it?

golchinpg on 13 Mar 2020

Check this out:
https://stackoverflow.com/questions/60408901/sklearn-utils-compute-class-weight-function-for-large-dataset

train_generator = train_datagen.flow_from_directory(
        'train_directory',
        target_size=(224, 224),
        batch_size=32,
        class_mode = "categorical"
        )
and the class weights for the training set can be computed like this

class_weights = class_weight.compute_class_weight(
           'balanced',
            np.unique(train_generator.classes), 
            train_generator.classes)