Code
import xgboost as xgb
classifier = xgb.XGBClassifier(n_estimators=10, max_depth=5, base_score=0.5,objective='multi:softmax',
random_state=42 , num_class = len(y_train.unique()))
classifier.fit(X_train, y_train,eval_set = [(X_valid,y_valid)],verbose=False,early_stopping_rounds = 10)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
Error
ValueError Traceback (most recent call last)
<ipython-input-62-a4644655ccf6> in <module>
5 # eval_set=[(X_train, y_train), (X_valid, y_valid)])
6
----> 7 classifier.fit(X_train, y_train,eval_set = [(X_valid,y_valid)],verbose=False,early_stopping_rounds = 10) #eval_set=[(X_train, y_train), (X_valid, y_valid)])
8
9 classifier.fit(X_train, y_train)
~/anaconda3/lib/python3.7/site-packages/xgboost/sklearn.py in fit(self, X, y, sample_weight, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set, callbacks)
709 missing=self.missing, weight=sample_weight_eval_set[i],
710 nthread=self.n_jobs)
--> 711 for i in range(len(eval_set))
712 )
713 nevals = len(evals)
~/anaconda3/lib/python3.7/site-packages/xgboost/sklearn.py in <genexpr>(.0)
709 missing=self.missing, weight=sample_weight_eval_set[i],
710 nthread=self.n_jobs)
--> 711 for i in range(len(eval_set))
712 )
713 nevals = len(evals)
~/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/label.py in transform(self, y)
255 return np.array([])
256
--> 257 _, y = _encode(y, uniques=self.classes_, encode=True)
258 return y
259
~/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/label.py in _encode(values, uniques, encode)
108 return res
109 else:
--> 110 return _encode_numpy(values, uniques, encode)
111
112
~/anaconda3/lib/python3.7/site-packages/sklearn/preprocessing/label.py in _encode_numpy(values, uniques, encode)
47 if diff:
48 raise ValueError("y contains previously unseen labels: %s"
---> 49 % str(diff))
50 encoded = np.searchsorted(uniques, values)
51 return uniques, encoded
ValueError: y contains previously unseen labels: [12.0]
What's the value of num_class = len(y_train.unique())
?
17-34 (depend on user) (its int number ). Withoutnum_class = len(y_train.unique())
this error stll occuar.
Is it possible to transform your output label so that the classes are represented as 0, 1, 2, ..., num_class-1
?
There are already numbers.
Can you check if there is a gap? The error message says 12 is missing.
Please provide a reproducible script. I haven't seen this before.
I did not provide the dataset. sorry.
We should define a num class parameter and allow skipping XGBoost s label encoder.
OH MY GODDD, finally what hcho3 said helped me out. They re-encode. We gotta have this in xgboost, although I had this problem with lightgbm. Apparently all encoded columns are expected to be zero indexed, trainy - 1 sorted me out.
If I use eval_set=(val_x,val_y)
,it will report a ValueError just like the author.
Use eval_set=[(val_x,val_y)]
or eval_set=[(tra_x,tra_y),(val_x,val_y)]
is OK.
Why it is okay when I use eval_set=(val_x,val_y)
in LightGBM?
The train_y
is already number, it's a array.No gap, no "nan".
The objection is "binary".
In XGBoost, eval_set
must be a list of pairs, where each pair is of form (X, y).
Most helpful comment
Is it possible to transform your output label so that the classes are represented as 0, 1, 2, ...,
num_class-1
?