Xgboost: [Roadmap] 1.1.0 Roadmap

Created on 21 Feb 2020  路  23Comments  路  Source: dmlc/xgboost

@dmlc/xgboost-committer please add your items here by editing this post. Let's ensure that

  • Each item has to be associated with a ticket
  • Major design/refactoring are associated with a RFC before committing the code
  • Blocking issue must be marked as blocking
  • Breaking change must be marked as breaking

For other contributors who have no permission to edit the post, please comment here about what you think should be in 1.1.0.

  • [x] Add Survival analysis objective: Accelerated Failure Time (#4763)
  • [x] Optimized ApplySplit and UpdatePredictCache functions on CPU (#5244). This concludes the 'hist' optimization work by @SmirnovEgorRu. Roadmap: #5104
  • [x] Refactored setup.py (#5271, #5280)
  • [x] Gradient based sampling for GPU Hist (#5093)
  • [x] Wide dataset quantile performance improvement (#5306)
  • [x] Thread safe in-place prediction (#5389).
  • [x] JSON serialization of R XGBoost object (#5123)
  • [ ] DMatrix refactoring (WIP) (#5327, #5220, #5302, #5312, #5315, #5321). RFC #4354, Roadmap #5143
  • [x] Pass feature count from JVM to native layer (#5199, #5202, #5303)
  • [x] Deterministic GPU Histogram building (#5023).
  • [x] Use configured header in CMake (#5514)
  • [x] Parameter validation for scikit-learn and R (#5477, #5569)
roadmap

Most helpful comment

Proposal: this time, let us not wait too long until the next release.

All 23 comments

Proposal: this time, let us not wait too long until the next release.

@JohnZed @datametrician

Please Add Survival analysis objective: Accelerated Failure Time https://github.com/dmlc/xgboost/pull/4763

LGTM especially #5023. Thanks @hcho3!

@avinashbarnwal See the first item in the list.

@hcho3 would be great if we can look into https://github.com/apache/incubator-tvm/issues/4953

@tqchen I will look.

@CodingCat can please you help us resolve the last JVM issue for the release?

Reverting the status for DMatrix refactoring.

@trivialfis Is DMatrix refactor blocking 1.1.0 release?

@hcho3 I just want https://github.com/dmlc/xgboost/pull/5504 before 1.1, the PR fixes prediction on device dmatrix which is part of dmatrix refactoring. Also, the weighted sketching is not yet implemented for device dmatrix, but this is not blocking.

Also please let me give a deeper look into https://github.com/dmlc/xgboost/issues/5285 . Will continue profiling in coming days.

Got it. I am now reviewing #5123.

@tqchen I cannot reproduce the issue in apache/incubator-tvm#4953.

I cannot reproduce the issue in apache/incubator-tvm#4953.

@tqchen Neither can I.

@hcho3 I'm happy to make the next release.

5209 is marked as blocking.

@hcho3 All blocking bugs are closed. Can we branch out? What can I help with the release process?

@trivialfis If you can provide a summary of your contribution that would be great. I will create a new branch.

@hcho3 Here is a list of PRs that are related to me, I omitted some trivial changes.

Python

  • Rewritten setup.py for aligning with Python package convension. (#5271, #5280)
  • [Breaking] Now the scikit-learn featured package is called
    xgboost[scikit-learn] instead of xgboost[sklearn] (#5310).
  • Enable parameter validation for scikit-learn interface. (#5477)
  • Fix booster checks in scikit-learn interface. (#5505)
  • Document update for cache and parameter validation. (#5517)
  • Fix scikit-learn nan tag for accepting inputs containing nan. (#5538)
  • Assert matching length of evaluation inputs in scikit-learn interface. (#5540)

R

  • R can now call xgb.config to get a JSON representation of internal
    configuration from booster.
  • Full raw serialization with model and parameters, which is used by bst$raw. (#5123)
  • Fix r interaction constraints when number of features is greater than 1e5. (#5543)
  • Fix dropped booster attributes by restoring attributes in
    xgb.Booster.complete. (#5573)
  • Enable parameter validation. (#5569)

JVM

  • Add JVM_CHECK_CALL macro in JVM C++ wrapper, this avoids some segfaults when
    dmlc::Error is thrown in C++. (#5199)
  • Pass number of features from JVM to C++. Lower level code now doesn't have to guess the
    data shape. (#5202)

C++

  • Refactor prediction cache. (#5302, #5220, #5312)

    Now XGBoost caches all DMatrix, and release the cache once DMatrix is expired.
    This way users no longer have to delete the booster before deleting DMatrix.
    Also the caching logic is simplified.

  • Run GPU prediction on Ellpack page, which is part of the DMatrix refactoring. (#5327, #5504)

  • Various fixes for pruner. (#5335)
  • Deterministic GPU histogram. Regression and classification are now deterministic for GPU
    Hist tree method. (#5361)
  • [maintenance] Split up LearnerImpl. (#5350)
  • Check whether current updater can modify a tree, avoids using wrong updaters. (#5406)
  • Force GPU compressed buffer to be 4 bytes aligned to fix running
    cuda-memcheck. (#5441)
  • [maintenance] Refactor tests with data generator. (#5439)
  • Reduce span check overhead. (#5464)
  • [maintenance] Upgrade clang-tidy on CI, fixing all previously not detected errors in
    header files (#5469)
  • Fix GPU tree statistic. Requires setting leaf stat when expanding tree. (#5501)
  • Remove distcol updater. (#5507)
  • [maintenance] Unify max nodes. (#5497)
  • [Breaking] Remove makefiles. (#5513)
  • Fix loading binary model by adding a header. (#5532)
  • Fix slice and get info for classification and Accelerated failure time training. (#5552)
  • Fix non-openmp build. (#5566)
  • Group aware GPU sketching, now GPU sketching algorithm can handle per-group
    weight. (#5551)
  • Fix configuration status with loading binary model. (#5562)

Dask

  • Integrate DMLC_TASK_ID for rabit initialization for better logging messages. (#5415)
  • Fix prediction result when the order of partitions on different workers are not
    consistent. (#5416)
  • Honor nthreads from dask worker. (#5414)
  • Accept inputs other than DaskDMatrix for prediction, now dask package can return a
    series when input is a dataframe. (#5428)
  • Fix ignored missing value for dask scikit-learn wrapper. (#5435)

General

  • Thread safe, inplace prediction. (#5396, #5389, #5512)

    Now users can use inplace_predict on Python (including dask) and C for thread safe, lock free
    prediction on both CPU and GPU inputs.

  • [Breaking] silent parameter is completely removed. Setting it will no-longer have any
    effect.

  • [Breaking] Set output margin to True for custom objective. (#5564)

    Now both R and Python interface custom objectives get un-transformed prediction
    outputs.

CLI

  • Fix model dump. (#5485)
  • Fix CLI model IO. (#5535)
  • Remove hard coded seed. (#5563)

I'll create new release branch after #5577.

All blockers have been addressed. I will start a new release branch.

1.1.0 is now released.

Was this page helpful?
0 / 5 - 0 ratings