Xgboost: Model reproduciblility with histogram tree method.

Created on 7 Nov 2019  路  5Comments  路  Source: dmlc/xgboost

A link to the original issue: https://github.com/dask/dask-xgboost/issues/37

TODOS:

  • [x] CPU/GPU deterministic scattered add for building histogram.
  • [x] Verify allreduce is deterministic.
  • [x] Verify cub BlockSum is deterministic.
  • [ ] Distributed environment.

Related:
https://github.com/dmlc/xgboost/issues/4204
https://github.com/dmlc/xgboost/issues/3921
https://github.com/dmlc/xgboost/issues/3707

bug

Most helpful comment

Single node GPU hist for regression and classification is now deterministic.

All 5 comments

It would be nice to test for hist and gpu_hist too, as these two are most used in production env.

Thank you @trivialfis Very interested to find out what's going on here.

Single node GPU hist for regression and classification is now deterministic.

Remaining issues are dask partitioning functions and GPU ranking. Ranking is tracked in https://github.com/dmlc/xgboost/issues/5561 . Dask partitioning still needs some more investigation.

The histogram method inside xgboost is bit to bit reproducible now. Remaining question is in dask data partitioning.

Was this page helpful?
0 / 5 - 0 ratings