seems that feature importance calculates on the training set, can it apply to validation set?
You can use the column sums of the absolute values of the predictor contributions to achieve this. When predicting, pass in pred_contrib=true or predcontrib=TRUE (R version) which will return a matrix of predictor contributions. The last column is the model intercept and can be ignored. By summing the absolute values of these contributions you can calculate importance on any data set.
@Zelazny7 correct me if I am wrong, the suggested method would calculate Shap-type feature importance which I understand is different from lightgbm's typical feature importance referred to above, right? Just clarifying, otherwise I think shap is better anyway. BTW thanks for enabling it within R!
Yes, your understanding is the same as mine! You will be calculating the Shap scores for each feature. The have the nice property of being consistent and adding up to the predicted score so I agree with you about them being better as well. However, they may take a while to compute on large datasets. I haven't done any timing tests and I'm not sure what the complexity of the operation is.
Thx! Is the shap feature importance mention in this paper? https://arxiv.org/abs/1706.06060
Yes, that is what's being calculated using pred_contrib = true or predcontrib=TRUE. I had nothing to do with that paper and only exposed what the author of that paper added to LightGBM in the R package
@Zelazny7 It takes about 20 minutes to compute SHAP scores on a 33M observations x 39 features with 28 physical cores (9.6GB matrix output). 100 iterations of boosting at depth 6 was performed.
@Laurae2 given how the algorithm was implemented, that doesn't seem unreasonable to me. Thank you for testing the performance.