A link to the original issue: https://github.com/dask/dask-xgboost/issues/37
TODOS:
Related:
https://github.com/dmlc/xgboost/issues/4204
https://github.com/dmlc/xgboost/issues/3921
https://github.com/dmlc/xgboost/issues/3707
It would be nice to test for hist and gpu_hist too, as these two are most used in production env.
Thank you @trivialfis Very interested to find out what's going on here.
Single node GPU hist for regression and classification is now deterministic.
Remaining issues are dask partitioning functions and GPU ranking. Ranking is tracked in https://github.com/dmlc/xgboost/issues/5561 . Dask partitioning still needs some more investigation.
The histogram method inside xgboost is bit to bit reproducible now. Remaining question is in dask data partitioning.
Most helpful comment
Single node GPU hist for regression and classification is now deterministic.