Lightgbm: Feature Importance with objective:binary is double? (detail)

Created on 29 Jan 2020 · 4Comments · Source: microsoft/LightGBM

Regarding the below issue.
https://github.com/microsoft/LightGBM/issues/2708

If the code which is referred in the above link is consistent with xgboost tutorial, I think that there are some difference.

I can transform Line 505 in the link as following.
https://github.com/microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L505
I assume that l1 = 0, l2 = 0 and loss function is log-loss to simplify the formula.

-(2.0 * sg_l1 * output + (sum_hessians + l2) * output * output) (Line 505)
= -(2.0 * sum_gradients + sum_hessians * output* output)
(I assume that l1 = 0 and l2 = 0, and therefore sg_l1 = sum_gradients)
= -(2.0 * sum_gradients * ( - sum_gradients / sum_hessians) + sum_hessians * ( - sum_gradients / sum_hessians) * ( - sum_gradients / sum_hessians) )
=- ( -2.0 * sum_gradients* sum_gradients/sum_hessians + sum_gradients * sum_gradients / sum_hessians )
(where hessian of logloss is always positive)
= sum_gradients * sum_gradients / sum_hessians

This is twice as much as the formula of XGBoost tutorial as your link.
https://xgboost.readthedocs.io/en/latest/tutorials/model.html
obj*=-1/2\sum_{j=1}^{T}\frac{G_j^2}{H_j+\lambda}+\gammaT
Gain = -1/2 [ \frac{G_L^2}{H_L+\lambda} + \frac{G_R^2}{H_R+\lambda}
- \frac{(G_L+G_R)^2}{H_L+H_R+\lambda}] -\gamma

Should the feature importance of LightGBM be multiplied by 1/2.0 to calculate the feature importance based on the gain which is defined in XGBoost tutorial?

question

Source

masaru-tsuruta

Most helpful comment

I think the below code is cause of this point.
https://github.com/microsoft/LightGBM/blob/master/src/objective/regression_objective.hpp#L110-L125
Gradient and hessian are calculated as loss function is 1/2*(y_act - y_est)^2, not (y_act - y_est)^2 for split.
Is this correct?

masaru-tsuruta on 29 Jan 2020

👍2

All 4 comments

the constant term 1/2 is eliminated, due to it doesn't affect the result for choosing split point with the max gain, as all gains are double.
And feature importance is to provide the feature order (by gain, or by count) of importance.

guolinke on 29 Jan 2020

👍1

I appreciate your answer. I understood.
However, I still have one question.
In the case of the regression where mse is used for loss function, why does the feature importance is same as the gain based on XGBoost tutorial?

masaru-tsuruta on 29 Jan 2020

I think the constant term in second order gradient in mse is eliminated too.

guolinke on 29 Jan 2020

👍1

masaru-tsuruta on 29 Jan 2020

👍2

Was this page helpful?

0 / 5 - 0 ratings