For some binary regressions, the training labels are probabilistic (continous [0,1]) instead of binary {0,1}. However, unsurprisingly, lgb.cv in R throws the error
In foldVector[y == dimnames(numInClass)$y[i]] <- sample(seqVector) :
number of items to replace is not a multiple of replacement length
as in 896 if you try to run lgb.cv with stratified = TRUE. Indeed, there isn't an obvious default method for stratifying on a continuous variable.
Therefore it would be convenient to allow the user to pass a custom fold structure, say a list of vectors of row indices, via the nfold argument (or as a "folds" argument, such that !is.null(folds) causes nfold to be ignored).
Is this a feature already implemented? Have a check the folds parameter in lgb.cv.
Indeed, looks like it's there! Here's a goofy use case that appears to at least run without throwing an error:
library(lightgbm)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
params <- list(objective = "regression", metric = "l2")
# Define folds
model <- lgb.cv(params, dtrain, 100,
folds = list(1:1000, 1001:2000, 2001:nrow(train$data)))
Most helpful comment
Indeed, looks like it's there! Here's a goofy use case that appears to at least run without throwing an error: