xgboost: which parameters are used in the linear booster gblinear?

Created on 15 Mar 2017 · 2Comments · Source: dmlc/xgboost

Hi my question is about the linear booster. I have posted it on stackoverflow too but have not got an answer yet. Maybe it is ok to post it here too?

Looking on the web I am still a confused about what the linear booster gblinear precisely is and I am not alone.

Following the documentation it only has 3 parameters lambda,lambda_bias and alpha - mayby it should say "additional parameters".

If I understand this correctly then the linear booster does (rather standard) linear boosting (with regularization).
In this context I can only make sense of the 3 parameters above and eta (the boosting rate).
That's also how it is described on github.

Nevertheless I see that tree parameters gamma,max_depth and min_child_weight also have an impact on the algorithm.

How can this be? Is there a totally clear description of the linear booster anywhere on the web?

See my examples:

library(xgboost)

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test

Then the setup

set.seed(100)
model <- xgboost(data = train$data, label = train$label, nrounds = 5, 
                 objective = "binary:logistic", 
                 params = list(booster = "gblinear", eta = 0.5, lambda = 1, lambda_bias = 1,gamma = 2,
                               early_stopping_rounds = 3))

gives

> [1]   train-error:0.018271  [2]   train-error:0.003071 
> [3]   train-error:0.001075  [4]   train-error:0.001075 
> [5]   train-error:0.000614

while gamma=1

set.seed(100)
model <- xgboost(data = train$data, label = train$label, nrounds = 5, 
                 objective = "binary:logistic", 
                 params = list(booster = "gblinear", eta = 0.5, lambda = 1, lambda_bias = 1,gamma = 1,
                               early_stopping_rounds = 3))

leads to

> [1]   train-error:0.013051  [2]   train-error:0.001842 
> [3]   train-error:0.001075  [4]   train-error:0.001075 
> [5]   train-error:0.001075

which is another "path".

Similar for max_depth:

set.seed(100)
model <- xgboost(data = train$data, label = train$label, nrounds = 5, 
                 objective = "binary:logistic", 
                 params = list(booster = "gblinear", eta = 0.5, lambda = 1, lambda_bias = 1, max_depth = 3,
                               early_stopping_rounds = 3))

> [1]   train-error:0.016122  [2]   train-error:0.002764 
> [3]   train-error:0.001075  [4]   train-error:0.001075 
> [5]   train-error:0.000768

and

set.seed(100)
model <- xgboost(data = train$data, label = train$label, nrounds = 10, 
                 objective = "binary:logistic", 
                 params = list(booster = "gblinear", eta = 0.5, lambda = 1, lambda_bias = 1, max_depth = 4,
                               early_stopping_rounds = 3))

> [1]   train-error:0.014740  [2]   train-error:0.004453 
> [3]   train-error:0.001228  [4]   train-error:0.000921 
> [5]   train-error:0.000614

Source

rwarnung

Most helpful comment

See this topic.

Supposing the seed is fixed like you did, you must set the number of threads to 1 for reproducibility first and you will notice some of your parameters have no effect.

Multithreaded gblinear will never get the same results, even if the seed is identical.

> library(xgboost)
> 
> data(agaricus.train, package='xgboost')
> data(agaricus.test, package='xgboost')
> train <- agaricus.train
> test <- agaricus.test
> set.seed(100)
> model <- xgboost(data = train$data, label = train$label, nrounds = 5, nthread = 1, 
+                  objective = "binary:logistic", 
+                  params = list(booster = "gblinear", eta = 0.5, lambda = 1, lambda_bias = 1,gamma = 2,
+                                early_stopping_rounds = 3))
[1] train-error:0.006142 
[2] train-error:0.002917 
[3] train-error:0.001842 
[4] train-error:0.001228 
[5] train-error:0.000768 
> set.seed(100)
> model <- xgboost(data = train$data, label = train$label, nrounds = 5, nthread = 1, 
+                  objective = "binary:logistic", 
+                  params = list(booster = "gblinear", eta = 0.5, lambda = 1, lambda_bias = 1,gamma = 1,
+                                early_stopping_rounds = 3))
[1] train-error:0.006142 
[2] train-error:0.002917 
[3] train-error:0.001842 
[4] train-error:0.001228 
[5] train-error:0.000768 
> 
> set.seed(100)
> model <- xgboost(data = train$data, label = train$label, nrounds = 5, nthread = 1, 
+                  objective = "binary:logistic", 
+                  params = list(booster = "gblinear", eta = 0.5, lambda = 1, lambda_bias = 1, max_depth = 3,
+                                early_stopping_rounds = 3))
[1] train-error:0.006142 
[2] train-error:0.002917 
[3] train-error:0.001842 
[4] train-error:0.001228 
[5] train-error:0.000768 
> 
> set.seed(100)
> model <- xgboost(data = train$data, label = train$label, nrounds = 5, nthread = 1, 
+                  objective = "binary:logistic", 
+                  params = list(booster = "gblinear", eta = 0.5, lambda = 1, lambda_bias = 1, max_depth = 4,
+                                early_stopping_rounds = 3))
[1] train-error:0.006142 
[2] train-error:0.002917 
[3] train-error:0.001842 
[4] train-error:0.001228 
[5] train-error:0.000768

Laurae2 on 15 Mar 2017

👍3 😄1

All 2 comments

See this topic.

Supposing the seed is fixed like you did, you must set the number of threads to 1 for reproducibility first and you will notice some of your parameters have no effect.

Multithreaded gblinear will never get the same results, even if the seed is identical.

> library(xgboost)
> 
> data(agaricus.train, package='xgboost')
> data(agaricus.test, package='xgboost')
> train <- agaricus.train
> test <- agaricus.test
> set.seed(100)
> model <- xgboost(data = train$data, label = train$label, nrounds = 5, nthread = 1, 
+                  objective = "binary:logistic", 
+                  params = list(booster = "gblinear", eta = 0.5, lambda = 1, lambda_bias = 1,gamma = 2,
+                                early_stopping_rounds = 3))
[1] train-error:0.006142 
[2] train-error:0.002917 
[3] train-error:0.001842 
[4] train-error:0.001228 
[5] train-error:0.000768 
> set.seed(100)
> model <- xgboost(data = train$data, label = train$label, nrounds = 5, nthread = 1, 
+                  objective = "binary:logistic", 
+                  params = list(booster = "gblinear", eta = 0.5, lambda = 1, lambda_bias = 1,gamma = 1,
+                                early_stopping_rounds = 3))
[1] train-error:0.006142 
[2] train-error:0.002917 
[3] train-error:0.001842 
[4] train-error:0.001228 
[5] train-error:0.000768 
> 
> set.seed(100)
> model <- xgboost(data = train$data, label = train$label, nrounds = 5, nthread = 1, 
+                  objective = "binary:logistic", 
+                  params = list(booster = "gblinear", eta = 0.5, lambda = 1, lambda_bias = 1, max_depth = 3,
+                                early_stopping_rounds = 3))
[1] train-error:0.006142 
[2] train-error:0.002917 
[3] train-error:0.001842 
[4] train-error:0.001228 
[5] train-error:0.000768 
> 
> set.seed(100)
> model <- xgboost(data = train$data, label = train$label, nrounds = 5, nthread = 1, 
+                  objective = "binary:logistic", 
+                  params = list(booster = "gblinear", eta = 0.5, lambda = 1, lambda_bias = 1, max_depth = 4,
+                                early_stopping_rounds = 3))
[1] train-error:0.006142 
[2] train-error:0.002917 
[3] train-error:0.001842 
[4] train-error:0.001228 
[5] train-error:0.000768

Laurae2 on 15 Mar 2017

👍3 😄1

Thank you!

rwarnung on 17 Mar 2017

Was this page helpful?

0 / 5 - 0 ratings