Xgboost: What`s the difference in feature importance (gain) computation between v0.90 and v1+ ?

Created on 6 Jul 2020 · 5Comments · Source: dmlc/xgboost

Hello,

I have recently updated the XGBoost version of my ML project and I have noticed that the feature importance (gain) between v0.9 and newest version (v1+) is very different.

The relative importance of a feature changed and the score and gain changed quite a lot. As you can see in the samples showed below, for the same dataset, the feature importance score changed quite a lot (this is the top 20 features only). The only difference between the 2 examples is the XGBoost version.

Does anyone can explain to me what changed so radically in the feature score computation between version 0.90 and v1+ ?

Thanks a lot for the explenation

Source

Nerwax

👍3

All 5 comments

Could you please provide the set of parameters you used?

trivialfis on 6 Jul 2020

Hello,

Here is the parameters used for the feature importance. I used the default parameters with the following customs params
xgb_params = {'random_state': 42,
'n_jobs': -1} )

Thanks !

Nerwax on 6 Jul 2020

👍1

Are you using sklearn interface?

trivialfis on 9 Jul 2020

👍1

Hi @trivialfis, yes I am using the Sklearn interface to get the feature importance.

Thanks !

Nerwax on 9 Jul 2020

👍1

At XGBoost 1.0.0, we unified the default parameters between sklearn interface and internal default (which is used by other interfaces). Specifically the learning_rate and max_depth are changed. https://github.com/dmlc/xgboost/pull/5130#issuecomment-574489850

trivialfis on 11 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings