This issue is to maintain all features request on one page.
Note to contributors: If you want to work for a requested feature, re-open the linked issue. Everyone is welcome to work on any of the issues below.
Note to maintainers: All feature requests should be consolidated to this page. When there are new feature request issues, close them and create the new entries, with the link to the issues, in this page. The one exception is issues marked good first issue...these should be left open so they are discoverable by new contributors.
Call for Voting
we would like to call the voting here, to prioritize these requests.
If you think a feature request is very necessary for you, you can vote for it by the following process:
- got the issue (feature request) number.
- search the number in this issue, check the voting of it exists or not.
- if the voting exists, you can add 馃憤 to that voting
- if the voting doesn't exist, you can create a new voting by replying to this thread, and add the number in the it.
Discussions
- Efficiency improvements (#2791)
- Accuracy improvements (#2790)
Efficiency related
- [x] Faster LambdaRank (#2701)
- [ ] Faster Split (data partition) (#2782)
- [ ] Numa-aware (#1441)
- [ ] Continued accerelate ConstructHistogram (#2786)
- [ ] Accelerate the data loading from file (#2788)
- [ ] Accelerate the data loading from Python/R object (#2789)
Effectiveness related
- [ ] Better Regularization for Categorical features (#1934)
Distributed platform and GPU
- [ ] YARN support (#790)
- [ ] Multiple GPU support (#620)
- [ ] GPU performance improvement (#768)
- [ ] GPU binarines release (#2263)
Maintenance
- [ ] Code refectoring (#2341)
- [ ] Remove unused-command-line-argument warning with Apple Clang (#1805)
- [ ] More tests (#261)
- [ ] Publish lib_lightgbm.dll symbols to Microsoft Symbols Server (#1725)
- [ ] Enhance parameter tuning guide with more params and scenarios (suggested ranges) for different tasks/datasets (#2617)
- [x] CI via GitHub actions (#2353)
- [x] Debug flag in CMake configuration (#1588)
- [x] Fix cpp lint problems (#1990)
python package:
- [ ] Check input for prediction (#812)
- [ ] Refine pandas support (#960)
- [ ] Refine categorical feature support (#1021)
- [x] Migrate to
parametrize_with_checks for scikit-learn integration tests (#2947)
- [ ] Refactor sklearn wrapper after stabilizing upstream API, public API compatibility tests and official documentation (also after maturing
HistGradientBoosting) (#2966, #2628)
- [ ] Register custom objective / loss function (#3244)
R package:
- [ ] Rewrite R demos (#1944)
- [ ] Use
commandArgs instead of hardcoded stuff in the installation script (#2441)
- [ ] Factor out custom R interface to lib_lightgbm (#3016)
- [ ]
lgb.convert_with_rules() should validate rules (#2682)
- [ ] Reduce duplication in Makevars.in, Makevars.win (#3249)
- [x]
lgb.convert functions should convert columns of type 'logical' (#2678)
- [x]
lgb.convert functions should warn on unconverted columns of unsupported types (#2681)
- [x]
lgb.prepare() and lgb.prepare2() should be simplified (#2683)
- [x]
lgb.prepare_rules() and lgb.prepare_rules2() should be simplified (#2684)
- [x] Remove
lgb.prepare() and lgb.prepare_rules() (#3075)
- [x] CRAN-compliant installation configuration (#2960)
- [x] Add tests on R 4.0 (#3024)
- [x] Add pkgdown documentation support (#1143)
- [x] Cover 100% of R-to-C++ calls in R unit tests (#2944)
- [x] Bump version of pkgdown (#3036)
- [x] Run R CI in Windows environment (#2335)
- [x] Add unit tests for best metric iteration/value (#2525)
- [x] Standardize R code on comma-first (#2373)
- [x] Add additional linters to CI (#2477)
- [x] Support roxygen 7.0.0+ (#2569)
- [x] Run R CI in Linux and Mac environments (#2335)
New features
- [ ] CoreML support (#1074)
- [ ] More platforms support (#1129)
- [ ] Object importantce (#1460)
- [ ] Include init_score in predict function (#1978)
- [ ] Hyper-parameter per feature/column (#1938)
- [ ] Extracting decision path (#2187)
- [ ] Support for extremely large model (#2265)
- [ ] Add C API function that returns all parameter names with their aliases (#2633)
- [ ] Recalculate feature importance during the update process of a tree model (#2413)
- [ ] Merge Dataset objects on condition that they hold same binmapper (#2579)
- [ ] Spike and slab feature sampling priors (feature weighted sampling) (#2542)
- [ ] Customizable early stopping tolerance (#2526)
- [ ] Stop training branch of tree once a specific feature is used (#2518)
- [ ] Subsampling rows with replacement (#1038)
- [ ] Arbitrary base learner (#3180)
- [ ] Decouple boosting types (#3128, #2991)
- [x] Pre-defined bin_upper_bounds (#1829)
- [x] Setup editorconfig (#2401)
- [x] Colsample by node (#2315)
- [x] Smarter Backoffs for MPI ring connection (#2348)
- [x] UTF-8 support for model file (#2478)
new algorithms:
- [ ] Regularized Greedy Forest (#315)
- [ ] Accelerated Gradient Boosting (#1257)
- [ ] Piece-wise linear tree (#1315)
- [ ] Multi-Layered Gradient Boosting Decision Trees (#1423)
- [ ] Adaptive neural tree (#1542)
- [ ] Probabilistic Random Forest (#1946)
- [ ] Sparrow (#2001)
- [ ] Minimal Variance Sampling (MVS) in Stochastic Gradient Boosting (#2644)
- [x] Extremely randomized trees (#2583)
objective and metric functions:
- [ ] Multi-output regression (#524)
- [ ] Earth Mover Distance (#1256)
- [ ] Cox Proportional Hazard Regression (#1837)
- [ ] Ranking metric for regression objective (#1911)
- [ ] Density estimation (#2056)
- [x] Precision recall AUC (#3026)
- [x] AUC Mu (#2344)
python package:
- [ ] Support complex data types in categorical columns of pandas DataFrame (#2134)
- [ ] support weight in refit (#3038)
- [ ] better Support for Tree Plot with multi class (#3061)
- [x] Keep cv predicted values (#283)
- [x] Feature importance in CV (#1445)
- [x] Log redirect in python (#1493)
- [x] Make _CVBooster public for better stacking experience (#2105)
R package:
- [ ] Release to CRAN (#629)
- [ ] Export callback functions (#2479)
- [ ] Plotting in R-package (#1222)
- [ ] Support trees with linear models at leaves (#3319)
- [ ] Add support for saving weight values of a node in the R-package (#2281)
- [ ] Check parameters in
cb.reset.parameters() (#2665)
- [ ] Refit method for R-package (#2369)
- [ ] Add the ability to predict on
lgb.Dataset in Predictor$predict() (#2666)
- [ ] Add support for non-ASCII feature names (#2983)
- [ ] Allow use of MPI from the R package (#3364)
- [ ] Allow data to live in memory mapped file (#2184)
- [ ] Add GPU support for CRAN package (#3206)
- [ ] Add CUDA support for CRAN package (#3465)
- [x] Exclude training data from being checked for early stopping (#2472)
- [x] first_metric_only parameter for R-package (#2368)
- [x] Build a 32-bit version of LightGBM for the R package (#3187)
- [x] Ability to control the printed messages (#1440)
new language wrappers:
- [ ] MATLAB support (#743)
- [ ] Java support (like xgboost4j) (#909)
- [ ] Go support (predict part can be already found in https://github.com/dmitryikh/leaves package) (#2515)
- [x] Ruby support (#2367)
input enhancements:
- [ ] String as categorical input directly (#789)
- [ ] AWS S3 support (#1039)
- [ ] H2O datatable direct support (not via
to_numpy() method as it currently is) (#2003)
- [ ] Multiple file as input (#2031)
- [ ] Parquet file support (#1286)
feature request
help wanted
Most helpful comment
Cox Proportional Hazard Regression #1837