Xgboost: KeyError: 'weight' with sklearn.feature_selection.SelectFromModel

Created on 11 May 2020 · 6Comments · Source: dmlc/xgboost

Hi,
I'm using scikit-learn automatic feature selection together with a trained XGBoost model.
I set up a threshold to interrupt the feature reduction process when accuracy falls below it.
I think everything is fine in the loop, but when I use SelectFromModel.transform() I receive the following error:

Traceback (most recent call last):
  File "boost.py", line 581, in <module>
    s_train_x = selection.transform(train_x)
  File "/home/guido/.virtualenvs/ml/lib/python3.6/site-packages/sklearn/feature_selection/_base.py", line 77, in transform
    mask = self.get_support()
  File "/home/guido/.virtualenvs/ml/lib/python3.6/site-packages/sklearn/feature_selection/_base.py", line 46, in get_support
    mask = self._get_support_mask()
  File "/home/guido/.virtualenvs/ml/lib/python3.6/site-packages/sklearn/feature_selection/_from_model.py", line 178, in _get_support_mask
    scores = _get_feature_importances(estimator, self.norm_order)
  File "/home/guido/.virtualenvs/ml/lib/python3.6/site-packages/sklearn/feature_selection/_from_model.py", line 18, in _get_feature_importances
    coef_ = getattr(estimator, "coef_", None)
  File "/home/guido/.virtualenvs/ml/lib/python3.6/site-packages/xgboost/sklearn.py", line 716, in coef_
    coef = np.array(json.loads(b.get_dump(dump_format='json')[0])['weight'])
KeyError: 'weight'

I'm using the latest xgboost 1.0.2 with scikit-learn 0.22 and below there is the code I wrote. It's part of a bigger script, so some variable are defined before, but the KeyError should not depend on that.

report = []
prev_t = -1
scores = np.sort(model.feature_importances_)
indices = np.argsort(model.feature_importances_)
misc.msg('Feature selection (threshold = {})...'.format(autosel))
iterator = tqdm(scores)
for i, t in enumerate(iterator):
    if -1 < prev_t == t:
        continue
    prev_t = t
    selection = SelectFromModel(model, threshold=t, prefit=True)
    try:
        s_train_x = selection.transform(train_x)
    except ValueError:
        misc.msg('Incompatible number of features!', 'err')
        sys.exit(1)
    kwargs = {'tree_method': 'hist' if not gpu else 'gpu_hist',
              'grow_policy': 'lossguide' if useloss else 'depthwise'} \
        if not exact else {}
    s_model = xgb.XGBClassifier(objective=model.objective, n_jobs=-1, n_estimators=model.n_estimators,
                                max_depth=model.max_depth, learning_rate=model.learning_rate,
                                subsample=model.subsample, colsample_bytree=model.colsample_bytree,
                                min_child_weight=model.min_child_weight, gamma=model.gamma,
                                reg_alpha=model.reg_alpha, reg_lambda=model.reg_lambda,
                                max_delta_step=model.max_delta_step, random_state=model.random_state,
                                scale_pos_weight=model.scale_pos_weight, **kwargs)
    try:
        s_model.fit(s_train_x, train_y)
    except KeyboardInterrupt:
        misc.msg('Feature selection interrupted', 'warn')
        sys.exit(0)
    s_test_x = selection.transform(test_x)
    s_pred_y = s_model.predict(s_test_x)
    s_accuracy = accuracy_score(test_y, s_pred_y)
    subset = str(list(reversed(indices[i:]))).replace(',', ';')
    report.append([t, s_train_x.shape[1], s_accuracy, subset])
    if s_accuracy < args.autosel:
        iterator.close()
        misc.msg('Accuracy below threshold ({:.6f})'.format(s_accuracy), 'warn')
        misc.msg('Feature subset: {}'.format(conv.values2ranges(indices[i:])))
        break
    gc.collect()

Anyone can reproduce this behaviour?
Many thanks in advance!

Source

GuidoBartoli

Most helpful comment

This is a minimal test.py:

from h5py import File
from joblib import load
from sklearn.feature_selection import SelectFromModel

if __name__ == '__main__':
    h5 = File('dataset.h5', 'r')
    data = h5['data'][:]
    model = load('model.mdl')
    selection = SelectFromModel(model, threshold=0.95, prefit=True).transform(data)

This is the corresponding requirements.txt:

h5py==2.10.0
joblib==0.14.1
numpy==1.18.4
scikit-learn==0.23.0
scipy==1.4.1
six==1.14.0
threadpoolctl==2.0.0
xgboost==1.0.2

Here are the dataset and model to be unzipped in the same folder as the script. The model is a xgb.XGBClassifier previously trained on the same data with the standard fit() function.

You can reproduce the reported problem with python test.py.

GuidoBartoli on 13 May 2020

👍2

All 6 comments

Hi, could you please post a more complete script that I can run?

trivialfis on 12 May 2020

Sure, I will post it here this afternoon, so you can take a look at it.

Thanks!

GuidoBartoli on 12 May 2020

This is a minimal test.py:

from h5py import File
from joblib import load
from sklearn.feature_selection import SelectFromModel

if __name__ == '__main__':
    h5 = File('dataset.h5', 'r')
    data = h5['data'][:]
    model = load('model.mdl')
    selection = SelectFromModel(model, threshold=0.95, prefit=True).transform(data)

This is the corresponding requirements.txt:

h5py==2.10.0
joblib==0.14.1
numpy==1.18.4
scikit-learn==0.23.0
scipy==1.4.1
six==1.14.0
threadpoolctl==2.0.0
xgboost==1.0.2

Here are the dataset and model to be unzipped in the same folder as the script. The model is a xgb.XGBClassifier previously trained on the same data with the standard fit() function.

You can reproduce the reported problem with python test.py.

GuidoBartoli on 13 May 2020

👍2

@GuidoBartoli Hi dude. I've had the same problem. I have used xgboost==1.0.0 version. Upgrading up to recent 1.1.0 helped.