Xgboost: Train on apache hadoop yarn takes more time as the worker >=2 in configuration

Created on 15 Jun 2016  ·  1Comment  ·  Source: dmlc/xgboost

Train on apache hadoop yarn takes more time as the worker >=2 in configuration

It seems that the train speed is slower as the worker(node) in configuration changed from 1 to 2 or more.Can anyone tell me why this happens, is there anything wrong with my configuration?

The configuration:

booster = gbtree
objective = multi:softmax
eta = 0.5
max_depth = 5
num_class = 10
num_round = 50
save_period = 0
eval_train = 1

The Shell Script:

../../dmlc-core/tracker/dmlc-submit --cluster=yarn --num-workers=4 --worker-cores=2
../../xgboost parameter.conf nthread=16\
data=hdfs://hadoop01:8020/xgb-demo/train\
eval[test]=hdfs://hadoop01:8020/xgb-demo/test\
model_dir=hdfs://hadoop01:8020/xgb-demo/model

Most helpful comment

Hello @wallyell I'm facing the same problem here when trying to train a dataset in apache spark.
Have you found a solution for this?
Here it stops after reaching the line #156 of XGBoost
val returnVal = tracker.waitFor()`
The tracker seems to take too long for returning a value, my config params are:

   "silent" -> 1,
   "objective" -> "reg:linear",
   "booster" -> "gbtree",
   "eta" -> 0.0225,
   "max_depth" -> 26,
   "subsample" -> 0.63,
   "colsample_btree" -> 0.63,
   "min_child_weight" -> 9,
   "gamma" -> 0,
   "eval_metric" -> "rmse",
   "tree_method" -> "auto"

It seems that, when using more than 1 worker, the connection to the RabitTracker doesn't work as required, it freeze.
Tested with booster gblinear but still freeze.
To reproduce this issue you only need to set up a java xgboost with 2 or more workers.
This issue can be reproduced using this test

>All comments

Hello @wallyell I'm facing the same problem here when trying to train a dataset in apache spark.
Have you found a solution for this?
Here it stops after reaching the line #156 of XGBoost
val returnVal = tracker.waitFor()`
The tracker seems to take too long for returning a value, my config params are:

   "silent" -> 1,
   "objective" -> "reg:linear",
   "booster" -> "gbtree",
   "eta" -> 0.0225,
   "max_depth" -> 26,
   "subsample" -> 0.63,
   "colsample_btree" -> 0.63,
   "min_child_weight" -> 9,
   "gamma" -> 0,
   "eval_metric" -> "rmse",
   "tree_method" -> "auto"

It seems that, when using more than 1 worker, the connection to the RabitTracker doesn't work as required, it freeze.
Tested with booster gblinear but still freeze.
To reproduce this issue you only need to set up a java xgboost with 2 or more workers.
This issue can be reproduced using this test

Was this page helpful?
0 / 5 - 0 ratings