Lightgbm: Feature request: Allow lgb.cv user to specify actual folds instead of nfold

Created on 14 Mar 2018 · 2Comments · Source: microsoft/LightGBM

For some binary regressions, the training labels are probabilistic (continous [0,1]) instead of binary {0,1}. However, unsurprisingly, lgb.cv in R throws the error

In foldVector[y == dimnames(numInClass)$y[i]] <- sample(seqVector) :
number of items to replace is not a multiple of replacement length

as in 896 if you try to run lgb.cv with stratified = TRUE. Indeed, there isn't an obvious default method for stratifying on a continuous variable.

Therefore it would be convenient to allow the user to pass a custom fold structure, say a list of vectors of row indices, via the nfold argument (or as a "folds" argument, such that !is.null(folds) causes nfold to be ignored).

feature request help wanted

Source

zkurtz

Most helpful comment

Indeed, looks like it's there! Here's a goofy use case that appears to at least run without throwing an error:

library(lightgbm)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
params <- list(objective = "regression", metric = "l2")

# Define folds
model <- lgb.cv(params, dtrain, 100,
                folds = list(1:1000, 1001:2000, 2001:nrow(train$data)))

zkurtz on 7 Apr 2018

👍3

All 2 comments

Is this a feature already implemented? Have a check the folds parameter in lgb.cv.

SixiangHu on 6 Apr 2018

Indeed, looks like it's there! Here's a goofy use case that appears to at least run without throwing an error:

library(lightgbm)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
params <- list(objective = "regression", metric = "l2")

# Define folds
model <- lgb.cv(params, dtrain, 100,
                folds = list(1:1000, 1001:2000, 2001:nrow(train$data)))

zkurtz on 7 Apr 2018

👍3

Was this page helpful?

0 / 5 - 0 ratings