Please search your question on previous issues, stackoverflow or other search engines before you open a new one.
For bugs and unexpected issues, please provide following information, so that we could reproduce on our system.
Operating System:win-7
CPU: intel -core -i7
C++/Python/R version: Python 3.5
TypeError: Wrong type(ndarray) for label, should be list or numpy array
y_train=train[['target']].values
y_train.shape
Out[36]: (1000, 1)

params = {
'boosting_type': 'gbdt',
'objective': 'binary',
'metric': 'binary_logloss',
'num_leaves': 31,
'learning_rate': 0.05,
'feature_fraction': 0.9,
'bagging_fraction': 0.8,
'bagging_freq': 5,
'verbose': 0
}
feature_name = ['feature_' + str(col) for col in range(num_feature)]
gbm = lgb.train(params,
train_lgb,
num_boost_round=500,
valid_sets=train_lgb, # eval training data
feature_name=feature_name
)
Running these steps in sequence.
y_train is a one dimensional array. I believe that is a valid input for the lightgbm.
@munitech4u
I think (1000, 1) is a 2D array.
Ok. I can see that as per the basic.py program, it checks for condition: len(data.shape) == 1
Do I need to convert it into one dimensional before feeding to lgb.train?
I was following the instructions on: https://github.com/Microsoft/LightGBM/blob/master/examples/python-guide/advanced_example.py
But it doesn't mention anything like that
Ok, though the example doesn't mention it.
This worked for me:
y=y_train.ravel()
train_lgb = lgb.Dataset(X, y)
@munitech4u your data type is a Data Frame of pandas ?
@wxchan Does pandas always use (n,1) as the shape of label ? If it does, maybe we should take some conversion for this.
yes it is a pandas dataframe and it always take the form (n,1) for data.values
@munitech4u you are using two brackets. Using one will solve it. like y_train=train['target'].values
@guolinke we have the conversion of pandas.Series, it's 1-d array.
Thanks, that is indeed the case
I am facing same issue / error, I am beginner please help
train = pd.read_csv("Train.csv")
test = pd.read_csv("Test.csv")
train.head()
X = lgb.Dataset[['Unnamed: 0',
'Months since Last Donation',
'Number of Donations',
'Total Volume Donated (c.c.)',
'Months since First Donation']]
Y = train[['Made Donation in March 2007']]
seed = 1234
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, Y, test_size=0.3)
gbm = lgb.LGBMRegressor(objective='binary',
num_leaves=31,
learning_rate=0.02,
n_estimators=100)
gbm.fit(X_train, y_train,
eval_set=[(X_test, test)],
eval_metric='l1')
TypeError: Wrong type(ndarray) for label, should be list or numpy array
Change this:
Y = train[['Made Donation in March 2007']]
to >>
Y = train['Made Donation in March 2007']
Thank you.
This definitely needs a better error message.
Most helpful comment
Ok, though the example doesn't mention it.
This worked for me:
y=y_train.ravel()
train_lgb = lgb.Dataset(X, y)