Regarding the below issue.
https://github.com/microsoft/LightGBM/issues/2708
If the code which is referred in the above link is consistent with xgboost tutorial, I think that there are some difference.
I can transform Line 505 in the link as following.
https://github.com/microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L505
I assume that l1 = 0, l2 = 0 and loss function is log-loss to simplify the formula.
-(2.0 * sg_l1 * output + (sum_hessians + l2) * output * output) (Line 505)
= -(2.0 * sum_gradients + sum_hessians * output* output)
(I assume that l1 = 0 and l2 = 0, and therefore sg_l1 = sum_gradients)
= -(2.0 * sum_gradients * ( - sum_gradients / sum_hessians) + sum_hessians * ( - sum_gradients / sum_hessians) * ( - sum_gradients / sum_hessians) )
=- ( -2.0 * sum_gradients* sum_gradients/sum_hessians + sum_gradients * sum_gradients / sum_hessians )
(where hessian of logloss is always positive)
= sum_gradients * sum_gradients / sum_hessians
This is twice as much as the formula of XGBoost tutorial as your link.
https://xgboost.readthedocs.io/en/latest/tutorials/model.html
obj*=-1/2\sum_{j=1}^{T}\frac{G_j^2}{H_j+\lambda}+\gammaT
Gain = -1/2 [ \frac{G_L^2}{H_L+\lambda} + \frac{G_R^2}{H_R+\lambda}
- \frac{(G_L+G_R)^2}{H_L+H_R+\lambda}] -\gamma
Should the feature importance of LightGBM be multiplied by 1/2.0 to calculate the feature importance based on the gain which is defined in XGBoost tutorial?
the constant term 1/2 is eliminated, due to it doesn't affect the result for choosing split point with the max gain, as all gains are double.
And feature importance is to provide the feature order (by gain, or by count) of importance.
I appreciate your answer. I understood.
However, I still have one question.
In the case of the regression where mse is used for loss function, why does the feature importance is same as the gain based on XGBoost tutorial?
I think the constant term in second order gradient in mse is eliminated too.
I think the below code is cause of this point.
https://github.com/microsoft/LightGBM/blob/master/src/objective/regression_objective.hpp#L110-L125
Gradient and hessian are calculated as loss function is 1/2*(y_act - y_est)^2, not (y_act - y_est)^2 for split.
Is this correct?
Most helpful comment
I think the below code is cause of this point.
https://github.com/microsoft/LightGBM/blob/master/src/objective/regression_objective.hpp#L110-L125
Gradient and hessian are calculated as loss function is 1/2*(y_act - y_est)^2, not (y_act - y_est)^2 for split.
Is this correct?