I keep getting the following error when executing the code pasted below:
XGBoostError: b'[10:17:34] src/objective/regression_obj.cc:90: Check failed: (preds.size()) == (info.labels.size()) labels are not correctly providedpreds.size=4544, label.size=568'
Operating System: OSX 10.12.5
Package used (python/R/jvm/C++): python
xgboost version used: 0.6a2 (via pip)
If you are using python package, please provide
xgboost if you are not installing from source: pip install xgboostCSV data may be found here.
import xgboost as xgb
import pandas as pd
from sklearn.metrics import f1_score
train = pd.read_csv('raw_data/train.csv')
X = train.copy()
y = X.pop('target')
size = int(X.shape[0] * .8)
train = X[:size]
y_train = y[:size]
test = X[size:]
y_test = y[size:]
params = {
'n_estimators': 100,
'colsample_bytree': 0.8,
'objective': 'binary:logistic',
'max_depth': 7,
'min_child_weight': 1,
'learning_rate': 0.1,
'subsample': 0.8,
'num_class': 8,
'eta': 0.2
}
dtrain = xgb.DMatrix(train, y_train)
dtest = xgb.DMatrix(test)
model = xgb.cv(
params=params,
dtrain=dtrain,
num_boost_round=500,
nfold=5,
early_stopping_rounds=100
)
# Fit
final_gb = xgb.train(params, dtrain, num_boost_round=len(model))
preds = final_gb.predict(dtest)
f1 = f1_score(y_test, preds)
print(f1)
It appears the error is thrown here:
model = xgb.cv(
params=params,
dtrain=dtrain,
num_boost_round=500,
nfold=5,
early_stopping_rounds=100
)
Full Traceback:
XGBoostError Traceback (most recent call last)
<ipython-input-89-860291596b0d> in <module>()
----> 1 output = recursor.recursive_build(500)
<ipython-input-79-92716c5bfffa6> in recursive_build(self, boosts)
25 num_boost_round=boosts,
26 nfold=self.folds,
---> 27 early_stopping_rounds=100
28 )
29
/Users/user/code/shared/.virtualenvs/python36/lib/python3.6/site-packages/xgboost/training.py in cv(params, dtrain, num_boost_round, nfold, stratified, folds, metrics, obj, feval, maximize, early_stopping_rounds, fpreproc, as_pandas, verbose_eval, show_stdv, seed, callbacks)
398 evaluation_result_list=None))
399 for fold in cvfolds:
--> 400 fold.update(i, obj)
401 res = aggcv([f.eval(i, feval) for f in cvfolds])
402
/Users/user/code/shared/.virtualenvs/python36/lib/python3.6/site-packages/xgboost/training.py in update(self, iteration, fobj)
217 def update(self, iteration, fobj):
218 """"Update the boosters for one iteration"""
--> 219 self.bst.update(self.dtrain, iteration, fobj)
220
221 def eval(self, iteration, feval):
/Users/user/code/shared/.virtualenvs/python36/lib/python3.6/site-packages/xgboost/core.py in update(self, dtrain, iteration, fobj)
804
805 if fobj is None:
--> 806 _check_call(_LIB.XGBoosterUpdateOneIter(self.handle, iteration, dtrain.handle))
807 else:
808 pred = self.predict(dtrain)
/Users/user/code/shared/.virtualenvs/python36/lib/python3.6/site-packages/xgboost/core.py in _check_call(ret)
125 """
126 if ret != 0:
--> 127 raise XGBoostError(_LIB.XGBGetLastError())
128
129
XGBoostError: b'[10:17:34] src/objective/regression_obj.cc:90: Check failed: (preds.size()) == (info.labels.size()) labels are not correctly providedpreds.size=4544, label.size=568'
The error is because 'binary:logistic' objective is not compatible with 'num_class': 8. Use one of the multiclass objectives if you actually have multiclass labels.
i also have this error.And i change my num_class to 2,but the programe still raise thie error........
For what it's worth, you also see this error if you make a mistake in labels for training - I once found in my code classifier.train(x_train, y), and it should have been classifier.train(x_train, y_train). 馃
@chandlervan you are not supposed to use num_class with 'binary:logistic'
@khotilov this solved the problem for me! thanks!
@khotilov Thank you! This was driving me insane, and your comment solved the problem for me.
Most helpful comment
@chandlervan you are not supposed to use num_class with 'binary:logistic'