Xgboost: [Roadmap] 1.1.0 Roadmap

Created on 21 Feb 2020 · 23Comments · Source: dmlc/xgboost

@dmlc/xgboost-committer please add your items here by editing this post. Let's ensure that

Each item has to be associated with a ticket
Major design/refactoring are associated with a RFC before committing the code
Blocking issue must be marked as blocking
Breaking change must be marked as breaking

For other contributors who have no permission to edit the post, please comment here about what you think should be in 1.1.0.

[x] Add Survival analysis objective: Accelerated Failure Time (#4763)
[x] Optimized ApplySplit and UpdatePredictCache functions on CPU (#5244). This concludes the 'hist' optimization work by @SmirnovEgorRu. Roadmap: #5104
[x] Refactored setup.py (#5271, #5280)
[x] Gradient based sampling for GPU Hist (#5093)
[x] Wide dataset quantile performance improvement (#5306)
[x] Thread safe in-place prediction (#5389).
[x] JSON serialization of R XGBoost object (#5123)
[ ] DMatrix refactoring (WIP) (#5327, #5220, #5302, #5312, #5315, #5321). RFC #4354, Roadmap #5143
[x] Pass feature count from JVM to native layer (#5199, #5202, #5303)
[x] Deterministic GPU Histogram building (#5023).
[x] Use configured header in CMake (#5514)
[x] Parameter validation for scikit-learn and R (#5477, #5569)

roadmap

Source

hcho3

👍4

Most helpful comment

Proposal: this time, let us not wait too long until the next release.

hcho3 on 21 Feb 2020

👍3 ❤2

All 23 comments

Proposal: this time, let us not wait too long until the next release.

hcho3 on 21 Feb 2020

👍3 ❤2

@JohnZed @datametrician

hcho3 on 21 Feb 2020

Please Add Survival analysis objective: Accelerated Failure Time https://github.com/dmlc/xgboost/pull/4763

avinashbarnwal on 10 Mar 2020

LGTM especially #5023. Thanks @hcho3!

datametrician on 10 Mar 2020

👍1

@avinashbarnwal See the first item in the list.

hcho3 on 10 Mar 2020

👍1

@hcho3 would be great if we can look into https://github.com/apache/incubator-tvm/issues/4953

tqchen on 6 Apr 2020

@tqchen I will look.

hcho3 on 7 Apr 2020

@CodingCat can please you help us resolve the last JVM issue for the release?

RAMitchell on 7 Apr 2020

Reverting the status for DMatrix refactoring.

trivialfis on 9 Apr 2020

@trivialfis Is DMatrix refactor blocking 1.1.0 release?

hcho3 on 10 Apr 2020

@hcho3 I just want https://github.com/dmlc/xgboost/pull/5504 before 1.1, the PR fixes prediction on device dmatrix which is part of dmatrix refactoring. Also, the weighted sketching is not yet implemented for device dmatrix, but this is not blocking.

trivialfis on 10 Apr 2020

Also please let me give a deeper look into https://github.com/dmlc/xgboost/issues/5285 . Will continue profiling in coming days.

trivialfis on 10 Apr 2020

Got it. I am now reviewing #5123.

hcho3 on 10 Apr 2020

Labelled https://github.com/dmlc/xgboost/issues/5529 as breaking.

trivialfis on 14 Apr 2020

@tqchen I cannot reproduce the issue in apache/incubator-tvm#4953.

hcho3 on 20 Apr 2020

I cannot reproduce the issue in apache/incubator-tvm#4953.

@tqchen Neither can I.

@hcho3 I'm happy to make the next release.

trivialfis on 20 Apr 2020

5209 is marked as blocking.

hcho3 on 20 Apr 2020

@hcho3 All blocking bugs are closed. Can we branch out? What can I help with the release process?

trivialfis on 21 Apr 2020

@trivialfis If you can provide a summary of your contribution that would be great. I will create a new branch.

hcho3 on 21 Apr 2020

@hcho3 Here is a list of PRs that are related to me, I omitted some trivial changes.

Python

Rewritten setup.py for aligning with Python package convension. (#5271, #5280)
[Breaking] Now the scikit-learn featured package is called
xgboost[scikit-learn] instead of xgboost[sklearn] (#5310).
Enable parameter validation for scikit-learn interface. (#5477)
Fix booster checks in scikit-learn interface. (#5505)
Document update for cache and parameter validation. (#5517)
Fix scikit-learn nan tag for accepting inputs containing nan. (#5538)
Assert matching length of evaluation inputs in scikit-learn interface. (#5540)

R

R can now call xgb.config to get a JSON representation of internal
configuration from booster.
Full raw serialization with model and parameters, which is used by bst$raw. (#5123)
Fix r interaction constraints when number of features is greater than 1e5. (#5543)
Fix dropped booster attributes by restoring attributes in
xgb.Booster.complete. (#5573)
Enable parameter validation. (#5569)

JVM

Add JVM_CHECK_CALL macro in JVM C++ wrapper, this avoids some segfaults when
dmlc::Error is thrown in C++. (#5199)
Pass number of features from JVM to C++. Lower level code now doesn't have to guess the
data shape. (#5202)

C++

Refactor prediction cache. (#5302, #5220, #5312)

Now XGBoost caches all DMatrix, and release the cache once DMatrix is expired.
This way users no longer have to delete the booster before deleting DMatrix.
Also the caching logic is simplified.
Run GPU prediction on Ellpack page, which is part of the DMatrix refactoring. (#5327, #5504)
Various fixes for pruner. (#5335)
Deterministic GPU histogram. Regression and classification are now deterministic for GPU
Hist tree method. (#5361)
[maintenance] Split up LearnerImpl. (#5350)
Check whether current updater can modify a tree, avoids using wrong updaters. (#5406)
Force GPU compressed buffer to be 4 bytes aligned to fix running
cuda-memcheck. (#5441)
[maintenance] Refactor tests with data generator. (#5439)
Reduce span check overhead. (#5464)
[maintenance] Upgrade clang-tidy on CI, fixing all previously not detected errors in
header files (#5469)
Fix GPU tree statistic. Requires setting leaf stat when expanding tree. (#5501)
Remove distcol updater. (#5507)
[maintenance] Unify max nodes. (#5497)
[Breaking] Remove makefiles. (#5513)
Fix loading binary model by adding a header. (#5532)
Fix slice and get info for classification and Accelerated failure time training. (#5552)
Fix non-openmp build. (#5566)
Group aware GPU sketching, now GPU sketching algorithm can handle per-group
weight. (#5551)
Fix configuration status with loading binary model. (#5562)

Dask

Integrate DMLC_TASK_ID for rabit initialization for better logging messages. (#5415)
Fix prediction result when the order of partitions on different workers are not
consistent. (#5416)
Honor nthreads from dask worker. (#5414)
Accept inputs other than DaskDMatrix for prediction, now dask package can return a
series when input is a dataframe. (#5428)
Fix ignored missing value for dask scikit-learn wrapper. (#5435)

General

Thread safe, inplace prediction. (#5396, #5389, #5512)

Now users can use inplace_predict on Python (including dask) and C for thread safe, lock free
prediction on both CPU and GPU inputs.
[Breaking] silent parameter is completely removed. Setting it will no-longer have any
effect.
[Breaking] Set output margin to True for custom objective. (#5564)

Now both R and Python interface custom objectives get un-transformed prediction
outputs.

CLI

Fix model dump. (#5485)
Fix CLI model IO. (#5535)
Remove hard coded seed. (#5563)

trivialfis on 21 Apr 2020

I'll create new release branch after #5577.

hcho3 on 22 Apr 2020

All blockers have been addressed. I will start a new release branch.

hcho3 on 24 Apr 2020

1.1.0 is now released.

hcho3 on 17 May 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

How to install OpenMP version in R (Mac OS)

choushishi · 3Comments

libpath['find_lib_path']() returns absolute path for libxgboost.dll on Windows; setup install fails

Str1ker17 · 3Comments

Approach (documentation) ambiguity

vkuznet · 3Comments

One step update of python's xgboost.Booster fails with a segfault

ivannz · 3Comments

[Roadmap] 1.3.0 Roadmap

trivialfis · 3Comments