Lightgbm: Feature importance calculate on validation set

Created on 13 Mar 2018 · 7Comments · Source: microsoft/LightGBM

seems that feature importance calculates on the training set, can it apply to validation set?

Source

penolove

👍1

All 7 comments

You can use the column sums of the absolute values of the predictor contributions to achieve this. When predicting, pass in pred_contrib=true or predcontrib=TRUE (R version) which will return a matrix of predictor contributions. The last column is the model intercept and can be ignored. By summing the absolute values of these contributions you can calculate importance on any data set.

Zelazny7 on 13 Mar 2018

👍1

@Zelazny7 correct me if I am wrong, the suggested method would calculate Shap-type feature importance which I understand is different from lightgbm's typical feature importance referred to above, right? Just clarifying, otherwise I think shap is better anyway. BTW thanks for enabling it within R!

amir-ghasemi on 16 Mar 2018

👍1

Yes, your understanding is the same as mine! You will be calculating the Shap scores for each feature. The have the nice property of being consistent and adding up to the predicted score so I agree with you about them being better as well. However, they may take a while to compute on large datasets. I haven't done any timing tests and I'm not sure what the complexity of the operation is.

Zelazny7 on 16 Mar 2018

👍1

Thx! Is the shap feature importance mention in this paper? https://arxiv.org/abs/1706.06060

penolove on 16 Mar 2018

Yes, that is what's being calculated using pred_contrib = true or predcontrib=TRUE. I had nothing to do with that paper and only exposed what the author of that paper added to LightGBM in the R package

Zelazny7 on 16 Mar 2018

@Zelazny7 It takes about 20 minutes to compute SHAP scores on a 33M observations x 39 features with 28 physical cores (9.6GB matrix output). 100 iterations of boosting at depth 6 was performed.