Xgboost: [GPU] scalability with feature number and results reproducibility

Created on 23 Jan 2019 · 9Comments · Source: dmlc/xgboost

Hi, @RAMitchell and @trivialfis

I just came across the blog post by @Laurae2 (hope I didn't recognize wrongly)

https://medium.com/data-design/xgboost-gpu-performance-on-low-end-gpu-vs-high-end-cpu-a7bc5fcd425b

The blog post confirms that we have a super-fast gpu algorithm but also raises two issues

GPU implementation is subject to crash with high dimensional feature vector
there are some issues with results reproducibility of GPU hist algorithm

Would you please share some insights about (1) whether these are known issues (or they really exist); and (2) the reason for these issues (if they are there?)

Thanks

Nan

Source

CodingCat

Most helpful comment

@CodingCat
For the first one, my guess is on memory limitation. And for the second one there is an open issue: #3921

That's a very detailed benchmark. @Laurae2 Huge thanks. I plan to address these problems one by one in the future.

trivialfis on 23 Jan 2019

👍2

All 9 comments

@CodingCat
For the first one, my guess is on memory limitation. And for the second one there is an open issue: #3921

That's a very detailed benchmark. @Laurae2 Huge thanks. I plan to address these problems one by one in the future.

trivialfis on 23 Jan 2019

👍2

How does the GPU updater obtain random numbers? Does it use the same mechanism as the CPU updater?

hcho3 on 23 Jan 2019

@trivialfis thanks for the response,

for the first one, I see @Laurae2 pointed out that adding more feature has 5-15% higher weights than adding samples to your dataset, it's also related to the parallelization mechanism in GPU implementation?

CodingCat on 23 Jan 2019

Does it use the same mechanism as the CPU updater?

For feature sampling, it uses ColumnSampler from /common/random.h. So should be the same.

adding more feature has 5-15% higher weights than adding samples to your dataset

GPU implementation doesn't use CSR format, instead ELLPACK is chosen. So it's not surprising.

trivialfis on 23 Jan 2019

👍1

@Laurae2 thanks for useful feedback!

Im summary here are the things we need to improve:

Reproducibility
Address crashes around using different numbers of threads/gpus
More test coverage for above
More user friendly about out of memory crashes

RAMitchell on 23 Jan 2019

@RAMitchell do you also know why xgboost GPU crashes when using a too large depth even if there is GPU RAM available?

Laurae2 on 26 Jan 2019

Not sure, if you have a reproducible example that would be greatly appreciated.

RAMitchell on 27 Jan 2019

@RAMitchell Seems not reproducible on newer commits of xgboost.

The following used to crash on a 4GB RAM GPU in the past, but not now anymore:

library(xgboost)

set.seed(1)
N <- 10000000
p <- 100
pp <- 25
X <- matrix(runif(N * p), ncol = p)
betas <- 2 * runif(pp) - 1
sel <- sort(sample(p, pp))
m <- X[, sel] %*% betas - 1 + rnorm(N)
y <- rbinom(N, 1, plogis(m))

format(object.size(X), units = "Mb")

dtrain <- xgboost::xgb.DMatrix(X, label = y)
gc(verbose = FALSE)

set.seed(11111)
model <- xgb.train(list(objective = "binary:logistic", nthread = 1, eta = 0.10, max_depth = 13, max_bin = 255, tree_method = "hist"),
                   dtrain, nrounds = 3, verbose = 1, watchlist = list(train = dtrain))

rm(dtrain, model)
gc(verbose = FALSE)

However, the following still hangs on 2 GPU when using nthread = 1:

library(xgboost)

set.seed(1)
N <- 1000
p <- 50
pp <- 25
X <- matrix(runif(N * p), ncol = p)
betas <- 2 * runif(pp) - 1
sel <- sort(sample(p, pp))
m <- X[, sel] %*% betas - 1 + rnorm(N)
y <- rbinom(N, 1, plogis(m))

format(object.size(X), units = "Mb")

dtrain <- xgboost::xgb.DMatrix(X, label = y)
gc(verbose = FALSE)

set.seed(11111)
model <- xgb.train(list(objective = "binary:logistic", nthread = 1, eta = 0.10, max_depth = 5, max_bin = 255, tree_method = "gpu_hist", n_gpus = 2),
                   dtrain, nrounds = 3, verbose = 1, watchlist = list(train = dtrain))

rm(dtrain, model)
gc(verbose = FALSE)

Laurae2 on 27 Jan 2019

close it for now as the major purpose of filing the issue (to get awareness of the blog post and more insights to the issue mentioned there) has been achieved and there is undergoing work to fix the issues

CodingCat on 28 Jan 2019

Was this page helpful?

0 / 5 - 0 ratings