Xgboost: Roadmap: feature requests

Created on 5 Jul 2018 · 5Comments · Source: dmlc/xgboost

This issue is aimed at keeping track of requested features.

Note to developers and contributors: If you plan to actively work to implement a requested feature, re-open the linked issue. Everyone is welcome to work on any of the issues below. For specifics, read the linked issues.

Note to maintainers: All issues with feature requests should be consolidated to this document. Close all new issues with feature requests and create corresponding new entries in the following checklist. (Don't forget to create a link to the closed issue.) This is so that the number of open issues is kept manageable. Also make sure to attach feature-request label when closing feature requests

[ ] Multiple output regression (#2087)
[ ] Rust binding (#3351)
[ ] Lua binding (#1444)
[ ] Save data counts inside XGBoost model (#3419)
[x] Feature interaction constraint (#3135)
[ ] Clarify behavior for trivial splits (#2914)
[ ] Stratified distributed DMatrix (#2758)
[ ] Dynamic min_child_weight parameter (#2714)

Distributed learning

[ ] Write a tutorial for distributed learning using Docker containerization (#3593)
[ ] Provide a tutorial for using XGBoost with MPI (#3596)

Python specific

[ ] Preserve feature_names when slicing DMatrix with slice() (#3124)
[x] Pre-built binary wheels for Windows
[x] Automatically set ntree_limit = best_ntree_limit for predictions in scikit-learn API (#3053)
[ ] Make prediction at each stage for a given data matrix (#2175)
[x] Enable prediction on CPU-only machines with models trained with GPUs (#3342)
[ ] Booster.load_model() should not require buffer writeability (#3013)
[ ] Interoperability with sklearn.tree.export_graphviz (#2981)

JVM specific

[x] Update documentation
[x] Single-instance prediction interface (#3153)
[x] Attach group info for each row in data frame (#3097)
[x] Evaluate an arbitrary number of evaluation sets when training in XGBoost-Spark (#3231)
[ ] Pass additional parameters to XGBoostEstimator (#3202)
[x] Do not expose booster parameter to XGBoost-Spark (#3209)
[x] resolve external cache file for windows (#3038)
[ ] Float.NaN as missing value causes program crash (#3370)
[x] add early stopping in XGBClassifier/Regressor (#2710)
[ ] Do train/test-split by column-values in XGBoost4J-Spark (#3607)

Documentation

[x] Modernize XGBoost docs (#3444)
[ ] Document CSC/CSR format (#3331)
[x] Document test system for developers (#3283)
[ ] Update the R tutorial "Discover your data" to reflect the deletion of RealCover (#3239)
[ ] Document usage of C/C++ API

help wanted

Source

hcho3

Most helpful comment

+1 for Multiple output regression

loretoparisi on 24 Sep 2019

👍6 🎉1

All 5 comments

Single instance prediction was added to JVM package by #3464.

hcho3 on 20 Jul 2018

Hello @hcho3 in the forum the ability to define the probability of each feature being selected when col_sample is being used has been requested a few times recently [1].

If we want to add this as a feature request I could create an issue to track it, and take it up since it doesn't seem too complicated once we agree on a design.

[1] https://discuss.xgboost.ai/t/sampsize-by-strata-in-subsample/281