Xgboost: [jvm-packages] eval_set for xgboost4j-spark

Created on 9 Apr 2018 · 8Comments · Source: dmlc/xgboost

There is no way to set custom evaluation set for ml.dmlc.xgboost4j.scala.spark.XGBoost#trainDistributed. Code inside uses private ml.dmlc.xgboost4j.scala.spark.Watches class which just splits train with predefined trainTestRatio and doesn't accept any custom eval set through params.
Is there any particular reason for this limitation or it's just stub and can be extended for example with DMatrix passed through params? Is there any complications caused by fact that this is distributed XGBoost? How such dataset should be stored in params then, as DMatrix or RDD, or something else?

feature-request

Source

frenzykryger

Most helpful comment

Hi @CodingCat,

I need to define a separate validation set for cross validation, using xgboost4j on spark. I tried the approach here. It does not look like that setting "eval_sets" -> Map("dev" -> dev_df) make any difference! Should I expect the following set up work as cross validation does (using TrainValidationSplit)?

        val params = scala.collection.mutable.Map(
            "eta" -> 0.1,
            "objective" -> "binary:logistic",
            "eval_sets" -> Map("dev" -> dev_df))
        val booster = new XGBoostClassifier(params.toMap)
        booster.setFeaturesCol("features")
        booster.setLabelCol("label")
        booster.setMaxDepth(5)
        booster.setNumRound(150)
        booster.setNumWorkers(4)
        val xgb_model = booster.fit(train_df)

eliyara on 13 Feb 2019

👍2

All 8 comments

I think there is a comment when bring the code in, https://github.com/dmlc/xgboost/pull/2710#discussion_r141479583

Would you like to give this requirement a shot?

CodingCat on 10 Apr 2018

All feature requests are now consolidated to #3439. This issue should be re-opened if someone decides to actively work on implementing this feature.

hcho3 on 5 Jul 2018

I will work on eval set this week

CodingCat on 10 Jul 2018

👍2

@CodingCat There is a work in progress to implement watchlist in the XGBoost4J Scala wrapper: #3544. Can we take advantage of this to implement watchlist in XGBoost4J-Spark?

hcho3 on 1 Aug 2018

spark's problem is you have to find some way to pass in, join (or zip), multiple dataframes

and pass some part of each of them to each Spark task, create DMatrix, and take each DMatrix in each Spark task as each watch dataset.....

that part is kind of complicated and needs to refactor the current Watch thing, I think we can do it in the next version.....

CodingCat on 1 Aug 2018

Consolidating to the feature request tracker #3439. Feel free to re-open this issue when anyone starts working on this.

hcho3 on 7 Sep 2018

the feature is implemented in https://github.com/dmlc/xgboost/pull/3910

CodingCat on 28 Jan 2019

Hi @CodingCat,

        val params = scala.collection.mutable.Map(
            "eta" -> 0.1,
            "objective" -> "binary:logistic",
            "eval_sets" -> Map("dev" -> dev_df))
        val booster = new XGBoostClassifier(params.toMap)
        booster.setFeaturesCol("features")
        booster.setLabelCol("label")
        booster.setMaxDepth(5)
        booster.setNumRound(150)
        booster.setNumWorkers(4)
        val xgb_model = booster.fit(train_df)

eliyara on 13 Feb 2019

👍2

Was this page helpful?

0 / 5 - 0 ratings