Hey there,
How does one actually use class_weight on model.fit
?
I had originally written the following method to do this, but I'm not entirely sure whether it works or not.
def calculate_class_weights(train_label):
list = train_label.tolist()
num_neg = list.count(0)
num_pos = list.count(1)
duplicate = num_pos / num_neg
class_weights={0 : (num_neg * (duplicate)) , 1: num_pos }
return class_weights
This returns a dictionary of...
{0: 34, 1: 34}
Does anyone have a working example of how to balance 2 classes using the class_weights
method?
Thanks,
Keiron.
You should probably post this in the Keras google group:
https://groups.google.com/forum/#!forum/keras-users
Stackoverflow would work too.
See also: https://groups.google.com/forum/#!topic/keras-users/MUO6v3kRHUw
train_generator = train_datagen.flow_from_directory(
train_img_path, # this is the target directory
target_size=(img_rows, img_cols),
batch_size=batch_size,
class_mode='binary',
color_mode='grayscale',
classes=['good', 'bad'],
save_to_dir=generate_train_img_path)
validation_generator = test_datagen.flow_from_directory(
validation_img_path,
target_size=(img_rows, img_cols),
batch_size=batch_size,
class_mode='binary',
color_mode='grayscale',
classes=['good', 'bad'],
save_to_dir=generate_validation_img_path)
#There are 83% images which are class 1, and 17% images which are class 0. I balance 2 classes using the class_weights
class_weight = {0:83,1:17}
for i in range(0, nb_epoch):
print('epoch:{}'.format(i))
if i == 0:
print('epoch:{}'.format(i))
else:
model.load_weights('{}.h5'.format(i - 1))
model.fit_generator(
train_generator,
samples_per_epoch=1800,
nb_epoch=1,
validation_data=validation_generator,
nb_val_samples=250,
class_weight=class_weight)
model.save('{}.h5'.format(i))
I have this simple function for computing the weights for each class:
def get_class_weights(y):
counter = Counter(y)
majority = max(counter.values())
return {cls: float(majority/count) for cls, count in counter.items()}
What i do is pick the majority class as a reference and assign weights for the other classes based on the reference class. So if you have 3 classes with classA:10%, classB:50% and classC:40% then you get the weights:
{0:5, 1:1, 2:1.25}
So this means that if you miss-classify classA the loss will be 5 times more than miss-classifying classB and so on...
Seems like a useful utility function. Do you have any papers about the best choices of class weights and why this is the correct scaling? Maybe rename to something like balanced_class_weights
and try to add it to np_utils
.
Here is an example applying SegNet to the RoadScene dataset, where the class_weights
are given for several classes in the images:
class_weighting = [
0.2595,
0.1826,
4.5640,
0.1417,
0.5051,
0.3826,
9.6446,
1.8418,
6.6823,
6.2478,
3.0,
7.3614
]
# Fit the model
history = segnet_basic.fit(train_data, train_label, callbacks=callbacks_list, batch_size=batch_size, nb_epoch=nb_epoch,
verbose=1, class_weight=class_weighting , validation_data=(test_data, test_label), shuffle=True) # validation_split=0.33
Hi
I'm confused about how to use the class_weights, I pasted a simple example here, in the example I fit the same inputs to predict two different classes, without weights, the prediction for the inputs should be 50% for 2nd class and 50% for 4th class. And I set the class_weight to mask out the 2nd class, but by doing this, the prediction gives the same 50% 50% results. Am I doing something wrong?
from keras.models import Sequential
from keras.layers import Dense
from keras import optimizers
import numpy as np
model = Sequential()
model.add(Dense(4, input_shape=(2,)))
model.add(Dense(4, activation='softmax'))
model.compile(optimizer=optimizers.Adagrad(), loss='categorical_crossentropy')
x = np.array([[1,1], [1,1]])
y = np.array([[0,1,0,0], [0,0,0,1]])
weights_mask = np.array([1, 1])
class_weights = {
0:0,
1:0,
2:0,
3:10
}
# weights_mask = np.array([1])
model.fit(x,y, epochs=1000, sample_weight=weights_mask, class_weight=class_weights, validation_data=((x,y)))
ret = model.predict(x)
print(ret)
@0bserver07 how to set the value in class_weight
? Is there some papers to refer?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.
How do we use class_weight
in case of fit_generator
? I mean is there a way to do it on the go for each training batch? I tried using a generator to return class_weight
for each batch but that gives me
TypeError: object of type 'generator' has no len()
I actually want to calculate class weights for each batch and not the entire dataset. And I am unable to find a way to do this using fit_generator
without duplication of effort.
@0bserver07 I am training a semantic segmentation network. When I try to pass a dict as the class_weight
parameter to fit_generator
, it complains that ValueError:
class_weightnot supported for 3+ dimensional targets.
, but when I pass it a list like you did, it magically works! But the docs don't mention anything about passing lists to the class_weight
parameter of fit
or fit_generator
. Could you please shed some light on how this is working? Thanks!
What would a class weight of 0 imply?
For example, suppose class_weights are {0 : 0.5, 1 : 0.5, 2 : 0.0}. Does this mean we're asking the model to consider classes 0 and 1 equally and ignore class 2 i.e. not have it contribute to the loss?
yes
On Jan 22, 2019, at 11:51 PM, Ashwin Nair notifications@github.com wrote:
What would a class weight of 0.0 imply?
For example, suppose class_weights are {0 : 0.5, 1 : 0.5, 2 : 0.0}. Does this mean we're asking the model to consider classes 0 and 1 equally and ignore class 2?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
@0bserver07 I am training a semantic segmentation network. When I try to pass a dict as the
class_weight
parameter tofit_generator
, it complains thatValueError:
class_weightnot supported for 3+ dimensional targets.
, but when I pass it a list like you did, it magically works! But the docs don't mention anything about passing lists to theclass_weight
parameter offit
orfit_generator
. Could you please shed some light on how this is working? Thanks!
I have the same problem. could you please let me know how did you fix it?
Check this out:
https://stackoverflow.com/questions/60408901/sklearn-utils-compute-class-weight-function-for-large-dataset
train_generator = train_datagen.flow_from_directory(
'train_directory',
target_size=(224, 224),
batch_size=32,
class_mode = "categorical"
)
and the class weights for the training set can be computed like this
class_weights = class_weight.compute_class_weight(
'balanced',
np.unique(train_generator.classes),
train_generator.classes)
Most helpful comment
I have this simple function for computing the weights for each class:
What i do is pick the majority class as a reference and assign weights for the other classes based on the reference class. So if you have 3 classes with classA:10%, classB:50% and classC:40% then you get the weights:
So this means that if you miss-classify classA the loss will be 5 times more than miss-classifying classB and so on...