It seems that XGBoost and LightGBM both have scale_pos_weight argument but calculation is done completely different. I couldn't find any authentic answer regarding how to calculate it in LightGBM so requesting it here.
I'm using following method to calculate scale_pos_weight. I am not sure if it's correct.
Number of positive: 143540 , number of negative: 59856460
Number of data: 60000000, number of used features: 11
scale_pos_weight = 100 - ( [number of positive samples / total samples] * 100 )
scale_pos_weight = 100 - ( [ 143540 / 60000000 ] * 100 )
scale_pos_weight = 99.76
scale_pos_weight is always just negatives / positives I thought?
This answer is for XGBoost but it should be the same for both implmentations. If you're still unsure, I think 'is_unbalance' essentially does the same thing but calculates 'scale_pos_weight' by itself.
Thanks @bbennett36
Actually, that formula is also mentioned in XGBoost documentation but LightGBM documentation lacks details on this parameter. That's why I wanted to confirm if formula stays the same for XGBoost and LightGBM or different.
I took reference for the formula from this post
@pranavpandya84
negatives / positives looks more accurate to me.
From the document we can see
scale_pos_weight, default=1.0, type=double
– weight of positive class in binary classification task
With the default value of '1', it implies that the positive class has a weight equal to the negative class. So, in your case as the positive class is less than the negative class the number should have been less than '1' and not more than '1'.
This is just my understanding and I may not be correct...
For both xgboost and LightGBM, scale_pos_weight, if assuming perfectly balanced positive/negative samples, means that:
number of positive samples = number of negative samples
which also means the following when using weights through scale_pos_weight:
number of positive samples * sample_pos_weight = number of negative samples
Therefore, its value, if asking for balance, is the following:
sample_pos_weight = number of negative samples / number of positive samples
More simple explanation: https://sites.google.com/view/lauraepp/parameters and type "scale" in the search box, then click on "Positive Binary Scaling".

Related C++ code:
xgboost proof: w += y * ((param_.scale_pos_weight * w) - w); where y is the label (0 negative or 1 positive in src/objective/regression_obj.cc)
LightGBM proof: label_weights_[1] *= scale_pos_weight_; where the 2nd index (1) is for positive labels (in src/objective/binary_objective.hpp)
Thanks a lot. Perfect!
Most helpful comment
For both xgboost and LightGBM,
scale_pos_weight, if assuming perfectly balanced positive/negative samples, means that:number of positive samples = number of negative sampleswhich also means the following when using weights through
scale_pos_weight:number of positive samples * sample_pos_weight = number of negative samplesTherefore, its value, if asking for balance, is the following:
sample_pos_weight = number of negative samples / number of positive samplesMore simple explanation: https://sites.google.com/view/lauraepp/parameters and type "scale" in the search box, then click on "Positive Binary Scaling".
Related C++ code:
xgboost proof:
w += y * ((param_.scale_pos_weight * w) - w);where y is the label (0 negative or 1 positive in src/objective/regression_obj.cc)LightGBM proof:
label_weights_[1] *= scale_pos_weight_;where the 2nd index (1) is for positive labels (in src/objective/binary_objective.hpp)