Xgboost: XGBoostError: labels are not correctly provided (python 3.6.1)

Created on 30 Jul 2017  路  6Comments  路  Source: dmlc/xgboost

I keep getting the following error when executing the code pasted below:

XGBoostError: b'[10:17:34] src/objective/regression_obj.cc:90: Check failed: (preds.size()) == (info.labels.size()) labels are not correctly providedpreds.size=4544, label.size=568'

Environment info

Operating System: OSX 10.12.5

Package used (python/R/jvm/C++): python

xgboost version used: 0.6a2 (via pip)

If you are using python package, please provide

  1. The python version and distribution: 3.6.1 via brew
  2. The command to install xgboost if you are not installing from source: pip install xgboost

Steps to reproduce

CSV data may be found here.

import xgboost as xgb
import pandas as pd

from sklearn.metrics import f1_score

train = pd.read_csv('raw_data/train.csv')
X = train.copy()
y = X.pop('target')

size = int(X.shape[0] * .8)
train = X[:size]
y_train = y[:size]
test = X[size:]
y_test = y[size:]

params = {
    'n_estimators': 100,
    'colsample_bytree': 0.8,
    'objective': 'binary:logistic',
    'max_depth': 7,
    'min_child_weight': 1,
    'learning_rate': 0.1,
    'subsample': 0.8,
    'num_class': 8,
    'eta': 0.2
}

dtrain = xgb.DMatrix(train, y_train)
dtest = xgb.DMatrix(test)

model = xgb.cv(
    params=params,
    dtrain=dtrain,
    num_boost_round=500,
    nfold=5,
    early_stopping_rounds=100
)

# Fit
final_gb = xgb.train(params, dtrain, num_boost_round=len(model))

preds = final_gb.predict(dtest)
f1 = f1_score(y_test, preds)
print(f1)

It appears the error is thrown here:

model = xgb.cv(
    params=params,
    dtrain=dtrain,
    num_boost_round=500,
    nfold=5,
    early_stopping_rounds=100
)

Full Traceback:

XGBoostError                             Traceback (most recent call last)
<ipython-input-89-860291596b0d> in <module>()
----> 1 output = recursor.recursive_build(500)

<ipython-input-79-92716c5bfffa6> in recursive_build(self, boosts)
     25                 num_boost_round=boosts,
     26                 nfold=self.folds,
---> 27                 early_stopping_rounds=100
     28             )
     29 

/Users/user/code/shared/.virtualenvs/python36/lib/python3.6/site-packages/xgboost/training.py in cv(params, dtrain, num_boost_round, nfold, stratified, folds, metrics, obj, feval, maximize, early_stopping_rounds, fpreproc, as_pandas, verbose_eval, show_stdv, seed, callbacks)
    398                            evaluation_result_list=None))
    399         for fold in cvfolds:
--> 400             fold.update(i, obj)
    401         res = aggcv([f.eval(i, feval) for f in cvfolds])
    402 

/Users/user/code/shared/.virtualenvs/python36/lib/python3.6/site-packages/xgboost/training.py in update(self, iteration, fobj)
    217     def update(self, iteration, fobj):
    218         """"Update the boosters for one iteration"""
--> 219         self.bst.update(self.dtrain, iteration, fobj)
    220 
    221     def eval(self, iteration, feval):

/Users/user/code/shared/.virtualenvs/python36/lib/python3.6/site-packages/xgboost/core.py in update(self, dtrain, iteration, fobj)
    804 
    805         if fobj is None:
--> 806             _check_call(_LIB.XGBoosterUpdateOneIter(self.handle, iteration, dtrain.handle))
    807         else:
    808             pred = self.predict(dtrain)

/Users/user/code/shared/.virtualenvs/python36/lib/python3.6/site-packages/xgboost/core.py in _check_call(ret)
    125     """
    126     if ret != 0:
--> 127         raise XGBoostError(_LIB.XGBGetLastError())
    128 
    129 

XGBoostError: b'[10:17:34] src/objective/regression_obj.cc:90: Check failed: (preds.size()) == (info.labels.size()) labels are not correctly providedpreds.size=4544, label.size=568'

Most helpful comment

@chandlervan you are not supposed to use num_class with 'binary:logistic'

All 6 comments

The error is because 'binary:logistic' objective is not compatible with 'num_class': 8. Use one of the multiclass objectives if you actually have multiclass labels.

i also have this error.And i change my num_class to 2,but the programe still raise thie error........

For what it's worth, you also see this error if you make a mistake in labels for training - I once found in my code classifier.train(x_train, y), and it should have been classifier.train(x_train, y_train). 馃

@chandlervan you are not supposed to use num_class with 'binary:logistic'

@khotilov this solved the problem for me! thanks!

@khotilov Thank you! This was driving me insane, and your comment solved the problem for me.

Was this page helpful?
0 / 5 - 0 ratings