When I use predict() on a lightgbm.sklearn.LGBMRegressor, lightgbm appears only to use a single core. My understanding was that this was fixed by fix in response to original issue. Am I doing something wrong? Or perhaps is the use of SKLearn preventing things from working correctly? Any help or advice would be appreciated. Thank you!
Operating System: CentOS7
C++/Python/R version: Python 3.6.3
LightGBM Version: 2.1.1
@UserQuestions
It depends how many data used in prediction.
if there are only one sample, it is single core. otherwise it will use multi-threading.
@guolinke
Thank you for your prompt response. I am not sure I understand. I thought .predict() only accepted a single dataset at once. I am trying to predict a large dataset (potentially hundreds of millions of observations), and I give it all to .predict() at once. I have a large number of cores but only see one core working when I use model.predict(dataset).
@UserQuestions In such large dataset, the prediction should be multi-threading.
Is that okay if you using lightgbm.predict ?
you can do it via https://lightgbm.readthedocs.io/en/latest/Python-API.html#lightgbm.Booster.predict ?
The booster can be got via: https://lightgbm.readthedocs.io/en/latest/Python-API.html#lightgbm.LGBMModel.booster_
Thanks again for the prompt response @guolinke . I tried it using predfun.booster_.predict(data), however it still uses just one core. It's worth noting that type(predfun.booster_) is lightgbm.basic.Booster, not lightgbm.Booster.
how about set 'num_threads=8' in pred_parameter dict, when calling https://lightgbm.readthedocs.io/en/latest/Python-API.html#lightgbm.Booster.predict ?
@guolinke
That works! Thanks for the help! It's interesting that a deprecated parameter is the solution (though I'm not running the newest version).
you can use 'kwargs' , just set num_threads=8 in the arguments of predict function.
a deprecated parameter is the solution (though I'm not running the newest version).
In the latest stable version (as of now it is v2.2.3) you can use either the deprecated num_threads or its replacement n_jobs as parameters of the predict() function. If you try to use both (with the same value:), it will print a warning and ignore the newer parameter. Another way to verify that it reaches the C++ backend rather than being silently ignored in python is to set it to None. This is necessary, because in most test scenarios it will look like it is silently ignored. It is not an unconditional parameter as in the train / fit functions. Here it is more of a we-know-better parameter that ensures scoring proceeds fast even on badly chosen high number of threads or on defaults. There is no way to set the number of cores used for prediction manually, i.e. to switch off the "auto" mode, which is probably a bug (?)
Rants aside, I have to admit it works fast out-of-the-box. Not all prediction methods handle multi-core machines so smoothly as lightgbm's does. For instance with xgboost v0.81 performance starts to degrade quite severely above 32 cores (by 3 orders of magnitude!), because by default it uses all available cores for predictions...
Most helpful comment
how about set 'num_threads=8' in pred_parameter dict, when calling https://lightgbm.readthedocs.io/en/latest/Python-API.html#lightgbm.Booster.predict ?