Hi all, we have tentative plan on extending LightGBM to python users. please share us your opinions.
I think this should be a great step forward with adoption, of course it should be also pretty easy to install and use.
These are the features I think you should consider for python support:
Agree with ParadoxShmaradox. And for me earlystopping would be a great feature to have.
@ParadoxShmaradox @yychenca , we can break its procedure into two several phrases: 1. python bindings. 2 scikit-learn(or other popular framework) interface.
As for model (de)serialization, cross validation, and early stopping seems to be another popular features we should move on. I'll open issues for this features.
Want to add, that something like xgboosts scale_pos_weight to handle imbalanced classes, or even better an Implementation of _Example-Dependent Cost-Sensitive_ would be very interesting http://nbviewer.jupyter.org/github/albahnsen/CostSensitiveClassification/blob/master/doc/tutorials/tutorial_edcs_credit_scoring.ipynb
I think we need a way to create the LightGBM binary format directly from a NumPy array instead of having to do an expensive write&read to CSV/SVMLight. For really big datasets (eg. Kaggle Bosch Competition) it can take an hour to read and write CSVs, so something like this (functionality similar to xgboost.DMatrix()) would be excellent.
I think user-define loss and eval function is also important
@chivee Thank you chivee. I had a chance to try LightGBM with a dataset of 180k rows and 30 features for a regression problem. The training completed in 8 seconds as compared to 52 secs by XGBoost using comparable parameters, 63 leaves vs 8 depths. And the accuracy of LIghtGBM (L1/MAE ) was even a bit better. Truly impressed!!
Noticed that early stopping has already been added and I'll probably give it another try soon.
Anyways, great job!
A quick wrapper for LightGBM: https://github.com/ArdalanM/pyLightGBM
It still dump input to svm format before training on.
@ArdalanM , thank you ardalanM , that will be very helpful!
Hi guys, a simple Python binding using Ctype can be found at
https://github.com/Microsoft/LightGBM/blob/master/tests/c_api_test/test.py,
Any feedback to the bindings will be great.
Existing python binding is great, but I would love to be able to replace existing algorithms in the pipeline and most likely they have scikit-learn interface.
At least on the level of:
refer to #94.
The basic class are finished. I think we can base on these classes to implement sklearn-liked interfaces.
Hi, All,
the branch of python-package is merged, refer to https://github.com/Microsoft/LightGBM/tree/master/python-package
Welcome to have a try and provide feedback and issues.
Most helpful comment
I think this should be a great step forward with adoption, of course it should be also pretty easy to install and use.
These are the features I think you should consider for python support: