Lightgbm: Feature Requests & Voting Hub

Created on 1 Aug 2019 · 18Comments · Source: microsoft/LightGBM

This issue is to maintain all features request on one page.

Note to contributors: If you want to work for a requested feature, re-open the linked issue. Everyone is welcome to work on any of the issues below.

Note to maintainers: All feature requests should be consolidated to this page. When there are new feature request issues, close them and create the new entries, with the link to the issues, in this page. The one exception is issues marked good first issue...these should be left open so they are discoverable by new contributors.

Call for Voting

we would like to call the voting here, to prioritize these requests.
If you think a feature request is very necessary for you, you can vote for it by the following process:

got the issue (feature request) number.
search the number in this issue, check the voting of it exists or not.
if the voting exists, you can add 👍 to that voting
if the voting doesn't exist, you can create a new voting by replying to this thread, and add the number in the it.

Discussions

Efficiency improvements (#2791)
Accuracy improvements (#2790)

Efficiency related

[x] Faster LambdaRank (#2701)
[ ] Faster Split (data partition) (#2782)
[ ] Numa-aware (#1441)
[ ] Continued accerelate ConstructHistogram (#2786)
[ ] Accelerate the data loading from file (#2788)
[ ] Accelerate the data loading from Python/R object (#2789)

Effectiveness related

[ ] Better Regularization for Categorical features (#1934)

Distributed platform and GPU

[ ] YARN support (#790)
[ ] Multiple GPU support (#620)
[ ] GPU performance improvement (#768)
[ ] GPU binarines release (#2263)

Maintenance

[ ] Code refectoring (#2341)
[ ] Remove unused-command-line-argument warning with Apple Clang (#1805)
[ ] More tests (#261)
[ ] Publish lib_lightgbm.dll symbols to Microsoft Symbols Server (#1725)
[ ] Enhance parameter tuning guide with more params and scenarios (suggested ranges) for different tasks/datasets (#2617)
[x] CI via GitHub actions (#2353)
[x] Debug flag in CMake configuration (#1588)
[x] Fix cpp lint problems (#1990)

python package:

[ ] Check input for prediction (#812)
[ ] Refine pandas support (#960)
[ ] Refine categorical feature support (#1021)
[x] Migrate to parametrize_with_checks for scikit-learn integration tests (#2947)
[ ] Refactor sklearn wrapper after stabilizing upstream API, public API compatibility tests and official documentation (also after maturing HistGradientBoosting) (#2966, #2628)
[ ] Register custom objective / loss function (#3244)

R package:

[ ] Rewrite R demos (#1944)
[ ] Use commandArgs instead of hardcoded stuff in the installation script (#2441)
[ ] Factor out custom R interface to lib_lightgbm (#3016)
[ ] lgb.convert_with_rules() should validate rules (#2682)
[ ] Reduce duplication in Makevars.in, Makevars.win (#3249)
[x] lgb.convert functions should convert columns of type 'logical' (#2678)
[x] lgb.convert functions should warn on unconverted columns of unsupported types (#2681)
[x] lgb.prepare() and lgb.prepare2() should be simplified (#2683)
[x] lgb.prepare_rules() and lgb.prepare_rules2() should be simplified (#2684)
[x] Remove lgb.prepare() and lgb.prepare_rules() (#3075)
[x] CRAN-compliant installation configuration (#2960)
[x] Add tests on R 4.0 (#3024)
[x] Add pkgdown documentation support (#1143)
[x] Cover 100% of R-to-C++ calls in R unit tests (#2944)
[x] Bump version of pkgdown (#3036)
[x] Run R CI in Windows environment (#2335)
[x] Add unit tests for best metric iteration/value (#2525)
[x] Standardize R code on comma-first (#2373)
[x] Add additional linters to CI (#2477)
[x] Support roxygen 7.0.0+ (#2569)
[x] Run R CI in Linux and Mac environments (#2335)

New features

[ ] CoreML support (#1074)
[ ] More platforms support (#1129)
[ ] Object importantce (#1460)
[ ] Include init_score in predict function (#1978)
[ ] Hyper-parameter per feature/column (#1938)
[ ] Extracting decision path (#2187)
[ ] Support for extremely large model (#2265)
[ ] Add C API function that returns all parameter names with their aliases (#2633)
[ ] Recalculate feature importance during the update process of a tree model (#2413)
[ ] Merge Dataset objects on condition that they hold same binmapper (#2579)
[ ] Spike and slab feature sampling priors (feature weighted sampling) (#2542)
[ ] Customizable early stopping tolerance (#2526)
[ ] Stop training branch of tree once a specific feature is used (#2518)
[ ] Subsampling rows with replacement (#1038)
[ ] Arbitrary base learner (#3180)
[ ] Decouple boosting types (#3128, #2991)
[x] Pre-defined bin_upper_bounds (#1829)
[x] Setup editorconfig (#2401)
[x] Colsample by node (#2315)
[x] Smarter Backoffs for MPI ring connection (#2348)
[x] UTF-8 support for model file (#2478)

new algorithms:

[ ] Regularized Greedy Forest (#315)
[ ] Accelerated Gradient Boosting (#1257)
[ ] Piece-wise linear tree (#1315)
[ ] Multi-Layered Gradient Boosting Decision Trees (#1423)
[ ] Adaptive neural tree (#1542)
[ ] Probabilistic Random Forest (#1946)
[ ] Sparrow (#2001)
[ ] Minimal Variance Sampling (MVS) in Stochastic Gradient Boosting (#2644)
[x] Extremely randomized trees (#2583)

objective and metric functions:

[ ] Multi-output regression (#524)
[ ] Earth Mover Distance (#1256)
[ ] Cox Proportional Hazard Regression (#1837)
[ ] Ranking metric for regression objective (#1911)
[ ] Density estimation (#2056)
[x] Precision recall AUC (#3026)
[x] AUC Mu (#2344)

python package:

[ ] Support complex data types in categorical columns of pandas DataFrame (#2134)
[ ] support weight in refit (#3038)
[ ] better Support for Tree Plot with multi class (#3061)
[x] Keep cv predicted values (#283)
[x] Feature importance in CV (#1445)
[x] Log redirect in python (#1493)
[x] Make _CVBooster public for better stacking experience (#2105)

R package:

[ ] Release to CRAN (#629)
[ ] Export callback functions (#2479)
[ ] Plotting in R-package (#1222)
[ ] Support trees with linear models at leaves (#3319)
[ ] Add support for saving weight values of a node in the R-package (#2281)
[ ] Check parameters in cb.reset.parameters() (#2665)
[ ] Refit method for R-package (#2369)
[ ] Add the ability to predict on lgb.Dataset in Predictor$predict() (#2666)
[ ] Add support for non-ASCII feature names (#2983)
[ ] Allow use of MPI from the R package (#3364)
[ ] Allow data to live in memory mapped file (#2184)
[ ] Add GPU support for CRAN package (#3206)
[ ] Add CUDA support for CRAN package (#3465)
[x] Exclude training data from being checked for early stopping (#2472)
[x] first_metric_only parameter for R-package (#2368)
[x] Build a 32-bit version of LightGBM for the R package (#3187)
[x] Ability to control the printed messages (#1440)

new language wrappers:

[ ] MATLAB support (#743)
[ ] Java support (like xgboost4j) (#909)
[ ] Go support (predict part can be already found in https://github.com/dmitryikh/leaves package) (#2515)
[x] Ruby support (#2367)

input enhancements:

[ ] String as categorical input directly (#789)
[ ] AWS S3 support (#1039)
[ ] H2O datatable direct support (not via to_numpy() method as it currently is) (#2003)
[ ] Multiple file as input (#2031)
[ ] Parquet file support (#1286)

feature request help wanted

Source

guolinke

👍22 ❤5 👎1

Most helpful comment

Cox Proportional Hazard Regression #1837

candalfigomoro on 16 Aug 2020

👍5

All 18 comments

For everyone's information, the sparrow algorithm has been implemented in CatBoost

MotoRZR on 18 Oct 2019

👍3

There’s a reference to minimum variance sampling here:

https://catboost.ai/docs/concepts/algorithm-main-stages_bootstrap-options.html

Although I think it just speeds up training rather than providing out of core training.

MotoRZR on 20 Oct 2019

I would like to tackle the following issues on Python package. Could I discuss about a plan to fix? Also, where can we discuss that? IMHO, They will be resolved by improving to lightgbm.cv() function.

2105: Make _CVBooster public for better stacking experience

283: Keep cv predicted values

I want to reopen the above issues, but I can not do that. Maybe I have no permission.

momijiame on 7 Jun 2020

@momijiame Thank you for your interest! I've unlocked those issues for commenting. Please let's continue the discussion there.

StrikerRUS on 10 Jun 2020

❤1

we would like to call the voting here, to prioritize these requests.
If you think a feature request is very necessary for you, you can vote for it by the following process:

got the issue (feature request) number.
search the number in this issue, check the voting of it exists or not.
if the voting exists, you can add 👍 to that voting
if the voting doesn't exist, you can create a new voting by replying to this thread, and add the number in the it.

guolinke on 6 Aug 2020

we would like to call the voting here

Let me start.

2644

StrikerRUS on 10 Aug 2020

👍2

It was proposed by me so I'm a little bit biased

Decouple boosting types #3128

candalfigomoro on 16 Aug 2020

👍1

GPU binaries release #2263

candalfigomoro on 16 Aug 2020

👍2

Enhance parameter tuning guide with more params #2617

candalfigomoro on 16 Aug 2020

👍3

Subsampling rows with replacement #1038

candalfigomoro on 16 Aug 2020

👍1

Piece-wise linear tree #1315 (also see PR https://github.com/microsoft/LightGBM/pull/3299)

candalfigomoro on 16 Aug 2020

👍1

Multi-output regression #524

candalfigomoro on 16 Aug 2020

👍3

Cox Proportional Hazard Regression #1837

candalfigomoro on 16 Aug 2020

👍5

Based on https://github.com/microsoft/LightGBM/issues/2983#issuecomment-722630931, I've updated this issue's description:

Note to maintainers: All feature requests should be consolidated to this page. When there are new feature request issues, close them and create the new entries, with the link to the issues, in this page. The one exception is issues marked good first issue...these should be left open so they are discoverable by new contributors.

I think that we should keep good first issue issues open, so it's easy for new contributors to find them.