Xgboost: ValueError: y contains previously unseen labels: [12.0]

Created on 9 Dec 2019 · 11Comments · Source: dmlc/xgboost

Code

import xgboost as xgb
classifier = xgb.XGBClassifier(n_estimators=10, max_depth=5, base_score=0.5,objective='multi:softmax', 
                               random_state=42 , num_class = len(y_train.unique()))

classifier.fit(X_train, y_train,eval_set = [(X_valid,y_valid)],verbose=False,early_stopping_rounds = 10)

classifier.fit(X_train, y_train)
y_pred =  classifier.predict(X_test)

Error

ValueError                                Traceback (most recent call last)
<ipython-input-62-a4644655ccf6> in <module>
      5 #                              eval_set=[(X_train, y_train), (X_valid, y_valid)])
      6 
----> 7 classifier.fit(X_train, y_train,eval_set = [(X_valid,y_valid)],verbose=False,early_stopping_rounds = 10) #eval_set=[(X_train, y_train), (X_valid, y_valid)])
      8 
      9 classifier.fit(X_train, y_train)

~/anaconda3/lib/python3.7/site-packages/xgboost/sklearn.py in fit(self, X, y, sample_weight, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set, callbacks)
    709                         missing=self.missing, weight=sample_weight_eval_set[i],
    710                         nthread=self.n_jobs)
--> 711                 for i in range(len(eval_set))
    712             )
    713             nevals = len(evals)

~/anaconda3/lib/python3.7/site-packages/xgboost/sklearn.py in <genexpr>(.0)
    709                         missing=self.missing, weight=sample_weight_eval_set[i],
    710                         nthread=self.n_jobs)
--> 711                 for i in range(len(eval_set))
    712             )
    713             nevals = len(evals)

~/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/label.py in transform(self, y)
    255             return np.array([])
    256 
--> 257         _, y = _encode(y, uniques=self.classes_, encode=True)
    258         return y
    259 

~/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/label.py in _encode(values, uniques, encode)
    108         return res
    109     else:
--> 110         return _encode_numpy(values, uniques, encode)
    111 
    112 

~/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/label.py in _encode_numpy(values, uniques, encode)
     47         if diff:
     48             raise ValueError("y contains previously unseen labels: %s"
---> 49                              % str(diff))
     50         encoded = np.searchsorted(uniques, values)
     51         return uniques, encoded

ValueError: y contains previously unseen labels: [12.0]

Source

connect2robiul

Most helpful comment

Is it possible to transform your output label so that the classes are represented as 0, 1, 2, ..., num_class-1?

hcho3 on 14 Dec 2019

❤1 🎉1 👍1

All 11 comments

What's the value of num_class = len(y_train.unique())?

trivialfis on 9 Dec 2019

17-34 (depend on user) (its int number ). Withoutnum_class = len(y_train.unique()) this error stll occuar.

connect2robiul on 9 Dec 2019

Is it possible to transform your output label so that the classes are represented as 0, 1, 2, ..., num_class-1?

hcho3 on 14 Dec 2019

❤1 🎉1 👍1

There are already numbers.

connect2robiul on 15 Dec 2019

Can you check if there is a gap? The error message says 12 is missing.

hcho3 on 15 Dec 2019

Please provide a reproducible script. I haven't seen this before.

trivialfis on 20 Dec 2019

I did not provide the dataset. sorry.

connect2robiul on 20 Dec 2019

We should define a num class parameter and allow skipping XGBoost s label encoder.

trivialfis on 5 Feb 2020

OH MY GODDD, finally what hcho3 said helped me out. They re-encode. We gotta have this in xgboost, although I had this problem with lightgbm. Apparently all encoded columns are expected to be zero indexed, trainy - 1 sorted me out.

shivam13juna on 19 Mar 2020

🎉1

If I use eval_set=(val_x,val_y),it will report a ValueError just like the author.
Use eval_set=[(val_x,val_y)] or eval_set=[(tra_x,tra_y),(val_x,val_y)] is OK.
Why it is okay when I use eval_set=(val_x,val_y) in LightGBM?

The train_y is already number, it's a array.No gap, no "nan".
The objection is "binary".