Unknown changes introduced performance degradation in DMatrix creation.
Data set| DMatrix create time(master), s | DMatrix create time (#5008), s | Slowdown
-- | -- | -- | --
airline-ohe | 15.12 | 10.6 | 1.43
higgs1m | 0.37 | 0.17 | 2.18
msrank30k | 5.23 | 2.48 | 2.11
mortgage1Q | 5.37 | 2.47 | 2.17
first two data sets are available in public bench-marks: https://github.com/dmlc/xgboost-bench/tree/master/hist_method (example of DMatrix creation can be found there also).
Sorry for not investigated exact commit which introduced this regression.
affected interval: #5008 - current version of master.
example of creation DMatrix (the same as in benchmarks):
df = pd.read_csv("airline-ohe-X-train.csv", header=None)
X_train = np.ascontiguousarray(df.values, dtype=np.float32)
...
dtrain = xgb.DMatrix(X_train, label=y_train)
Is DMatrix construction running multi-threaded?
yes,
OMP_NUM_THREADS=48 OMP_PLACES={0}:48:1 python xgboost_hist_method_bench.py --hw cpu --dataset airline-ohe
But I don't see any perf gain between OMP_NUM_THREADS=48 and OMP_NUM_THREADS=1 for DMatrix construction.
Also I should notice that if I use xgboost package from pip (1.0.2.) I see the same results of DMatrix construction as for master(second column in table:https://github.com/dmlc/xgboost/issues/5530#issue-599662722).
commits which affected performance of DMatrix
creation were found:
Data set| DMatrix create time (https://github.com/dmlc/xgboost/pull/5044), s | DMatrix create time(https://github.com/dmlc/xgboost/pull/5050, it's previous to #5044 ), s | Slowdown
-- | -- | -- | --
mortgage1Q | 3.72 | 2.5 | 1.49
Data set| DMatrix create time (https://github.com/dmlc/xgboost/pull/5092), s | DMatrix create time(https://github.com/dmlc/xgboost/pull/5101, it's previous to #5092 ), s | Slowdown
-- | -- | -- | --
mortgage1Q | 5.52 | 3.72 | 1.48
Thanking for finding the root cause. Do you think it's possible to regain the performance without providing specialized interface for each type of input?
Yep, would be good if we can optimise current implementation as it is general and supports missing values, where the previous implementation did not.
I think it's possible and absolutely worth trying. At least we could try to improveSparsePage::Push
as it's not clear for me now why we don't see any gain from changing OMP_NUM_THREADS
.
Most helpful comment
commits which affected performance of
DMatrix
creation were found:Data set| DMatrix create time (https://github.com/dmlc/xgboost/pull/5044), s | DMatrix create time(https://github.com/dmlc/xgboost/pull/5050, it's previous to #5044 ), s | Slowdown
-- | -- | -- | --
mortgage1Q | 3.72 | 2.5 | 1.49
Data set| DMatrix create time (https://github.com/dmlc/xgboost/pull/5092), s | DMatrix create time(https://github.com/dmlc/xgboost/pull/5101, it's previous to #5092 ), s | Slowdown
-- | -- | -- | --
mortgage1Q | 5.52 | 3.72 | 1.48
5044 and #5092 PRs introduced 2x performance degradation