Tpot: Cannot clone XGBoostClassifier exception

Created on 25 May 2017  路  12Comments  路  Source: EpistasisLab/tpot

Hello,

When I try the following:

from tpot import TPOTClassifier
clf = TPOTClassifier(verbosity=2)
clf.fit(X.loc[train_ind], Y.loc[train_ind])

I get an error of the sort:

/Users/xxx/anaconda/lib/python3.6/site-packages/sklearn/base.py in clone(estimator, safe)
    124             raise RuntimeError('Cannot clone object %s, as the constructor '
    125                                'does not seem to set parameter %s' %
--> 126                                (estimator, name))
    127 
    128     return new_object

RuntimeError: Cannot clone object XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=1, gamma=0, learning_rate=0.001, max_delta_step=0,
       max_depth=3, min_child_weight=9, missing=None, n_estimators=100,
       n_jobs=1, nthread=1, objective='binary:logistic', random_state=42,
       reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=0, silent=True,
       subsample=0.45), as the constructor does not seem to set parameter seed

I am using XGBoost 0.6 (compiled using gcc on Mac), and TPOT 0.7.5, and sklearn 0.18.1

Seems to not be related to the dataset I use.

Any help appreciated.

question

All 12 comments

I suspected the latest version of XGBoost did something abnormal since it put both seed and random_state into parameter list. Could you please try the codes below? Please let me know if the RuntimeError still happens . I will also test it after I installed this version of XGBoost in my environment.

from xgboost import XGBClassifier
from sklearn.base import clone
clf = XGBClassifier(base_score=0.5, colsample_bylevel=1,
                    colsample_bytree=1, gamma=0, learning_rate=0.001, max_delta_step=0,
                    max_depth=3, min_child_weight=9, missing=None, n_estimators=100,
                    nthread=1, objective='binary:logistic',
                    reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None, random_state=42, silent=True,
                    subsample=0.45)

clf_clone  = clone(clf)

The code you wrote in your comment doesn't seem to raise any errors. Only a warning:

/Users/xxx/anaconda/lib/python3.6/site-packages/xgboost-0.6-py3.6.egg/xgboost/sklearn.py:171: DeprecationWarning: The nthread parameter is deprecated as of version .6.Please use n_jobs instead.nthread is deprecated.
  'nthread is deprecated.', DeprecationWarning)

Thank you for the quick test. The warning message is normal because the nthread and seed are deprecated and replaced by n_jobs and random_state in the latest version.

I will submit a PR to fix this compatibility issue with the latest version of xgboost. Meanwhile you can try to use the codes below to use xgboost in TPOT 0.7.5.

from tpot.config_classifier import classifier_config_dict
from tpot import TPOTClassifier
import numpy as np
xgbclf_correct = {
        'n_estimators': [100],
        'max_depth': range(1, 11),
        'learning_rate': [1e-3, 1e-2, 1e-1, 0.5, 1.],
        'subsample': np.arange(0.05, 1.01, 0.05),
        'min_child_weight': range(1, 21),
        'n_jobs': [1], 
        'seed': [None]
    }
classifier_config_dict['xgboost.XGBClassifier'] =  xgbclf_correct

clf = TPOTClassifier(verbosity=2, config_dict=classifier_config_dict)
clf.fit(X.loc[train_ind], Y.loc[train_ind])

Please let me know how the codes works.

Hi,

The code you suggest doesn't work. I get the same error.

RuntimeError: Cannot clone object XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=1, gamma=0, learning_rate=0.01, max_delta_step=0,
       max_depth=3, min_child_weight=5, missing=None, n_estimators=100,
       n_jobs=1, nthread=1, objective='binary:logistic', random_state=42,
       reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=0, silent=True,
       subsample=0.8500000000000001), as the constructor does not seem to set parameter seed

Doesn't seem to change seed=0 to seed=None (which I think is what you want to do?)

The code snippet I sent is the new error... :/

I found the reason. It is from this source code in xgboost. If the seed is not set, it would be assigned the default value 0 of random_state .

I made a PR #467 to fix this issue based on dev branch.
The changes are simple as shows in the link.

@rhiever should we make a patch for master branch?

@fferroni Could you please install version 0.6a2 of xgboost? The version 0.7.5 of TPOT works with this version of xgboost in Pypi.

If you need build xgboost 0.6a2 from source codes, you may find the source codes in the Pypi webpage.

I just made a commit in xgboost https://github.com/dmlc/xgboost/pull/2378, I think may solve your issue here.

Thanks @wxchan! Glad to see this fixed in future versions of XGBoost.

Nice! This PR was already merged to master branch of xgboost. I tested it and the API issue is fixed!

Closing this issue for now. Please feel free to re-open if you have any more questions or comments.

Was this page helpful?
0 / 5 - 0 ratings