Lightgbm: Python Feature Importances have different scales

Created on 25 Jul 2017  路  5Comments  路  Source: microsoft/LightGBM

Very simple but it caught me out! The feature importance from

gbm = lightgbm.LGBMRegressor()  
gbm.booster_.feature_importance()

is different to

gbm.feature_importance()

which is scaled to [0, 1]. It would be worth having a note of this in the docs, especially as the plot functionality uses the unscaled feature_importance and the default (/easiest to find) feature_importnace is scaled.

Most helpful comment

@JoshuaC3 I think using raw result is better, I guess we refer to some designs of xgboost at the very beginning. I can change both of them to raw result, and add a note.

All 5 comments

@wxchan

Sure, we can add it to docs. Actually I don't know why they have different scale. I think it's some sklearn convention.

@JoshuaC3 do you think we should use same scale?

Yes. I think that would make sense. However, I am not sure if it more desirable to put the raw feature importance or whether to return the [0, 1] scaled data. IMO, I lean more towards raw as people might wish to know the actual number of splits/gain. They could then normalise this themselves if so desired. Thoughts on this?

If looking at this section of the code there is also gbm.feature_importance() which do not take the importance_type argument whereas gbm.booster_.feature_importance() does. This importance_type argument is also visible for plotting so should also be visible for a raw number output.

@JoshuaC3 I think using raw result is better, I guess we refer to some designs of xgboost at the very beginning. I can change both of them to raw result, and add a note.

Was this page helpful?
0 / 5 - 0 ratings