LightGBM component: Python package
Operating System: Ubuntu 16.04.6 LTS
CPU/GPU model: Intel(R) Core(TM) i7-3840QM CPU @ 2.80GHz
C++ compiler version: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
CMake version: 3.5.1
Python version: 3.6.3
Comments: tried on other machines running Debian with newer python, gcc, and cmake
LightGBM version or commit hash: I tried 1bc27939a43c414c3424f339994d0aa11f3aa3b1 and 3.0.0 from pypi
this hash: 1bc27939a43c414c3424f339994d0aa11f3aa3b1
`/home/ilya/anaconda3/lib/python3.6/site-packages/lightgbm/basic.py:1971: UserWarning: Cannot add features from NoneType type of raw data to NoneType type of raw data.
Set free_raw_data=False when construct Dataset to avoid this
warnings.warn(err_msg)
/home/ilya/anaconda3/lib/python3.6/site-packages/lightgbm/basic.py:1973: UserWarning: Reseting categorical features.
You can set new categorical features viaset_categorical_feature`` method
warnings.warn("Reseting categorical features.\n"
[LightGBM] [Debug] Dataset::GetMultiBinFromSparseFeatures: sparse rate 0.800036
[LightGBM] [Info] Total Bins 10200
[LightGBM] [Info] Number of data points in the train set: 1000000, number of used features: 40
Segmentation fault (core dumped)
with 3.0.0 from pypi:
```[LightGBM] [Fatal] Bug. There should be only one multi-val group.
Traceback (most recent call last):
File "lgb_test.py", line 28, in <module>
m = lgb.train(pars, datasets[0], num_boost_round=3)
File "/home/ilya/anaconda3/lib/python3.6/site-packages/lightgbm/engine.py", line 231, in train
booster = Booster(params=params, train_set=train_set)
File "/home/ilya/anaconda3/lib/python3.6/site-packages/lightgbm/basic.py", line 1991, in __init__
ctypes.byref(self.handle)))
File "/home/ilya/anaconda3/lib/python3.6/site-packages/lightgbm/basic.py", line 55, in _safe_call
raise LightGBMError(decode_string(_LIB.LGBM_GetLastError()))
lightgbm.basic.LightGBMError: Bug. There should be only one multi-val group.
import lightgbm as lgb
import numpy as np
r = np.random.RandomState(42)
ROWS = 1000000
COLUMNS = 20
N = 2
tags = 'ab'
data = [r.rand(ROWS, COLUMNS) for _ in range(N)]
for i in range(N):
data[i][data[i] < 0.8] = 0
label = r.rand(ROWS)
def construct(i):
return lgb.Dataset(data[i], feature_name=[tags[i] + str(n) for n in range(COLUMNS)], label=label).construct()
datasets = [construct(i) for i in range(N)]
for i in range(1, N):
datasets[0].add_features_from(datasets[i])
if lgb.__version__ == '3.0.0':
datasets[0].feature_name = sum((d.feature_name for d in datasets), [])
pars = {'verbosity': 2, 'seed': 42, 'force_col_wise': True}
m = lgb.train(pars, datasets[0], num_boost_round=3)
python <above code>
On pypi version, setting pars['force_row_wise'] = True instead makes it work but master tip still segfaults. My guess is that in 3.0.0 some bin appears more than once?
Separately, the warnings in master tip don't make sense to me: I think it is perfectly reasonable to free raw data, so why this check?
Many thanks for any help!
@shiyu1994 as you are changing the related codes, can you also fix this in your PR? (or after merge it).
I'll open another PR for this.
@shiyu1994 any progress for the fix?
Most helpful comment
I'll open another PR for this.