Xgboost: Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105

Created on 19 Apr 2019  路  4Comments  路  Source: dmlc/xgboost

Hi!

Trying to prepare a model for training with approximately 134K rows & 210 columns.
So i believe (134,000 * 210 * 8bytes)/10^6 = ~225GB of ram is needed.

data <- (trainframe) 
set.seed(1234)
ind <- sample(2, nrow(data), replace = T, prob = c(.7, .3))
train <- data[ind==1,1:210]
test <- data[ind==2, 1:210]

options(na.action='na.pass')
trainm <- sparse.model.matrix(TARGET~.-1,train)
train_label <- train[,"TARGET"] 
train_matrix <- xgb.DMatrix(data = as.matrix(trainm), label = train_label) 

testm <- sparse.model.matrix(TARGET~.-1,test)
test_label <- test[,"TARGET"]
test_matrix <- xgb.DMatrix(data = as.matrix(testm), label = test_label) 

i only have 32GB RAM so obviously i get this error:

Error in asMethod(object) : 
  Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105

I'm using version 0.82.1

  • What are my options around this?
  • Is there another way to do this?
  • Is cloud computing my best option?
  • If so which which type of server is best recommended for this?
  • Are my calculations right?

Havent seen many others most about this in relation to xgboost.

any tips/hints is greatly appreciated!

All 4 comments

Can you use AWS EC2? You can get r5.24xlarge instance with 768 GB of memory. It will cost about 6 USD per hour.

Alternatively, you may consider using external memory: https://xgboost.readthedocs.io/en/latest/tutorials/external_memory.html

Alternatively, you may consider using external memory: https://xgboost.readthedocs.io/en/latest/tutorials/external_memory.html

Thanks so much for the prompt reply!
Is this method only available on python?
I'm currently using R

train_matrix <- xgb.DMatrix('C:/Users/Haz/Desktop/data/agaricus.txt.train#train_matrix.cache')
^ i'm not sure if i'm using it right?

Alternatively, i tried looking into the featurehashing package, however, had no luck with that either.

Will almost definitely look into working on a AWS EC2, do you think r5.24xlarge is overkill? Will
r5.12xlarge suffice?

External memory should be available from R. As for EC2, I recommend having memory that's 2-3 times the training data.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

hx364 picture hx364  路  3Comments

frankzhangrui picture frankzhangrui  路  3Comments

nicoJiang picture nicoJiang  路  4Comments

trivialfis picture trivialfis  路  3Comments

FabHan picture FabHan  路  4Comments