Hello,
I have recently updated the XGBoost version of my ML project and I have noticed that the feature importance (gain) between v0.9 and newest version (v1+) is very different.
The relative importance of a feature changed and the score and gain changed quite a lot. As you can see in the samples showed below, for the same dataset, the feature importance score changed quite a lot (this is the top 20 features only). The only difference between the 2 examples is the XGBoost version.
Does anyone can explain to me what changed so radically in the feature score computation between version 0.90 and v1+ ?
Thanks a lot for the explenation
Could you please provide the set of parameters you used?
Hello,
Here is the parameters used for the feature importance. I used the default parameters with the following customs params
xgb_params = {'random_state': 42,
'n_jobs': -1} )
Thanks !
Are you using sklearn interface?
Hi @trivialfis, yes I am using the Sklearn interface to get the feature importance.
Thanks !
At XGBoost 1.0.0, we unified the default parameters between sklearn interface and internal default (which is used by other interfaces). Specifically the learning_rate
and max_depth
are changed. https://github.com/dmlc/xgboost/pull/5130#issuecomment-574489850