Lightgbm: TypeError: Wrong type(ndarray) for label, should be list or numpy array

Created on 17 Jul 2017 · 11Comments · Source: microsoft/LightGBM

Please search your question on previous issues, stackoverflow or other search engines before you open a new one.

For bugs and unexpected issues, please provide following information, so that we could reproduce on our system.

Environment info

Operating System:win-7
CPU: intel -core -i7
C++/Python/R version: Python 3.5

Error Message:

TypeError: Wrong type(ndarray) for label, should be list or numpy array

Reproducible examples

y_train=train[['target']].values
y_train.shape
Out[36]: (1000, 1)
variables_details

params = {
'boosting_type': 'gbdt',
'objective': 'binary',
'metric': 'binary_logloss',
'num_leaves': 31,
'learning_rate': 0.05,
'feature_fraction': 0.9,
'bagging_fraction': 0.8,
'bagging_freq': 5,
'verbose': 0
}

feature_name = ['feature_' + str(col) for col in range(num_feature)]

gbm = lgb.train(params,
train_lgb,
num_boost_round=500,
valid_sets=train_lgb, # eval training data
feature_name=feature_name
)

Steps to reproduce

Running these steps in sequence.

y_train is a one dimensional array. I believe that is a valid input for the lightgbm.

Source

munitech4u

Most helpful comment

Ok, though the example doesn't mention it.

This worked for me:

y=y_train.ravel()
train_lgb = lgb.Dataset(X, y)

munitech4u on 17 Jul 2017

👍7

All 11 comments

@munitech4u
I think (1000, 1) is a 2D array.

guolinke on 17 Jul 2017

👍4

Ok. I can see that as per the basic.py program, it checks for condition: len(data.shape) == 1

Do I need to convert it into one dimensional before feeding to lgb.train?

I was following the instructions on: https://github.com/Microsoft/LightGBM/blob/master/examples/python-guide/advanced_example.py

But it doesn't mention anything like that

munitech4u on 17 Jul 2017

Ok, though the example doesn't mention it.

This worked for me:

y=y_train.ravel()
train_lgb = lgb.Dataset(X, y)

munitech4u on 17 Jul 2017

👍7

@munitech4u your data type is a Data Frame of pandas ?

@wxchan Does pandas always use (n,1) as the shape of label ? If it does, maybe we should take some conversion for this.

guolinke on 17 Jul 2017

yes it is a pandas dataframe and it always take the form (n,1) for data.values

munitech4u on 17 Jul 2017

@munitech4u you are using two brackets. Using one will solve it. like y_train=train['target'].values

@guolinke we have the conversion of pandas.Series, it's 1-d array.

wxchan on 17 Jul 2017

Thanks, that is indeed the case

munitech4u on 17 Jul 2017

I am facing same issue / error, I am beginner please help

train = pd.read_csv("Train.csv")
test = pd.read_csv("Test.csv")

train.head()

 X = lgb.Dataset[['Unnamed: 0',
'Months since Last Donation',
'Number of Donations',
'Total Volume Donated (c.c.)',
'Months since First Donation']]

Y = train[['Made Donation in March 2007']]
seed = 1234
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, Y, test_size=0.3)

gbm = lgb.LGBMRegressor(objective='binary',
                    num_leaves=31,
                    learning_rate=0.02,
                    n_estimators=100)

gbm.fit(X_train, y_train,
    eval_set=[(X_test, test)],
    eval_metric='l1')

TypeError: Wrong type(ndarray) for label, should be list or numpy array