Lightgbm: lgb.cv data.table error - R package

Created on 28 Jan 2020  路  13Comments  路  Source: microsoft/LightGBM

Hi,

I have been running LightGBM 2.3.1 in R (version 3.5) for the past few months in a linux environment with no issues. My colleague has installed the package in the past few days, however when running the example code provided on the git page, they experience an error (whereas it works on my install):

Error message

Error in data.table::data.table(indices = test_indices, weight = getinfo(data, : column or argument 2 is NULL

Reproducible Code

library(lightgbm) data(agaricus.train, package='lightgbm') train <- agaricus.train dtrain <- lgb.Dataset(train$data, label=train$label) params <- list(objective="regression", metric="l2") model <- lgb.cv(params, dtrain, 10, nfold=5, min_data=1, learning_rate=1, early_stopping_rounds=10)
Appreciate any help with this!

bug r-package

All 13 comments

Thanks for the report @abowma ! I will look into it tonight and get back to you, just want you to know we are on it.

Can you ask your colleague to try again from the current version on master? We just merged this to lgb.cv() (#2573 ) recently, wondering if that fixed it....or caused it 馃槵

Hi @jameslamb, thanks for the update and for getting back to me!

We had another colleague try yesterday with the same issue before I reported this, and it looks like the merge done for 2573 was 14 days ago so this would all be post-merge.

Hi @abowma , I just built the latest version on master on my Mac:

export CXX=/usr/local/bin/g++-8
export  CC=/usr/local/bin/gcc-8
Rscript build_r.R

and then ran the code you provided (https://github.com/microsoft/LightGBM/issues/2715#issue-556262437). For me, the code ran successfully and did not throw any error.

So I think I need more information about the environment where you are seeing the issues.

Could you tell me some information about the environment where this issue is showing up?

  • how you are installing the lightgbm R package
  • the output of running git log -n 5
  • R version
  • operating system + version
  • LightGBM version

Hi @jameslamb

Here are the details of the environment:

  • Installed using
lgb.dl(commit = "master",
       compiler = "gcc",
       repo = "https://github.com/microsoft/LightGBM")
  • Due to the install method the cloned repo is removed and so am unable to run git log -n 5
  • Using RStudio Server, with R version 3.5.0 on Red Hat Enterprise Linux Server 7.5
  • The LightGBM version where the issue is occurring is 2.3.2

@jameslamb - was testing on another server we have and it seems like the issue may be due to the version of R. The issue didn't occur with R 3.6.0, but seems to on R 3.5.0. Though again unsure of the reason.

@jameslamb - was testing on another server we have and it seems like the issue may be due to the version of R. The issue didn't occur with R 3.6.0, but seems to on R 3.5.0. Though again unsure of the reason.

Ok this is good information, thank you. I'll test on R3.5 and see if I can reproduce the issue. We only use R 3.6.x in CI so it is possible there's a 3.5.x-specific issue that hasn't been caught.

We're also working through #2714 , another issue where the user is installing with lgb.dl(), so I'll investigate whether we've done something here that has broken the installation process using that function from that project.

@abowma sorry for the delay in response! I tested your sample code on R 3.5 tonight and couldn't reproduce the issue. (see #2787 for how to do this yourself).

Next, I'm going to try installing lightgbm with lgb.dl().

Another theory I have is that the difference is due to data.table versions...could you share the output of running sessionInfo()?

Thanks!

I just tried this on R 3.5.3, with data.table 1.12.2 and data.table 1.12.8 (the latest).

I also tried removing lightgbm installed from source with remove.packages('lightgbm') and then installing it the way you mentioned:

lgbdl::lgb.dl(commit = "master",
       compiler = "gcc",
       repo = "https://github.com/microsoft/LightGBM")

In all these different configurations, the code from https://github.com/microsoft/LightGBM/issues/2715#issue-556262437 ran successfully and I could not reproduce the issue.

The next thing I'm going to try is literally 3.5.0 instead of 3.5.3, since you mentioned that you had that exact version (I should have just done that from the beginning).

Hey guess what! I was able to reproduce the issue!

Full steps to reproduce:

1. build the R docker container with R 3.5.0

docker build \
    -t lightgbm-r-35 \
    -f dockerfile-r \
    --build-arg R_VERSION=3.5 \
    .

docker run -it lightgbm-r-35 /bin/bash

R

2. remove lightgbm installed in there and replace with the one created by lgb.dl()

remove.packages('lightgbm')

devtools::install_github("Laurae2/lgbdl")
lgbdl::lgb.dl(commit = "master"
    , compiler = "vs"
    , repo = "https://github.com/microsoft/LightGBM"
)

# exit the session so step 3  is in a clean session
q()

3. Run the example code

R
library(lightgbm)
 data(agaricus.train, package='lightgbm')
  train <- agaricus.train
  dtrain <- lgb.Dataset(train$data, label=train$label)
  params <- list(objective="regression", metric="l2")
  model <- lgb.cv(params, dtrain, 10, nfold=5, min_data=1, learning_rate=1, 
  early_stopping_rounds=10)

This yields the error you reported:

Error in data.table::data.table(indices = test_indices, weight = getinfo(data, :
column or argument 2 is NULL

This is with data.table 1.11.4 (the version that is installed in rocker/verse:3.5.0. I updated to 1.12.8 (the latest) by running install.packages('data.table'. repos = 'http://cran.rstudio.com').

Once I did that, the code above worked!! Could you please ask your colleague to update their data.table version and confirm that that fixed it?

To add more context to this...I just tried with R 3.6.0, building from source with Rscript build_r.R, and data.table 1.11.4 and didn't have any issues. So it's not like lightgbm is incompatible with data.table 1.11.4.

But I get that same error you reported with the combination of R3.6.0, building with lgb.dl(), and data.table 1.11.4.

So while I don't understand the root cause yet, the take-away here is:

If you use lgbd.dl() to build from source, you need to upgrade data.table to at least 1.12.x.

@jameslamb Thanks for the update! Tried updating the data.table version and installed in the same way and it seemed to solve the issue. On our end it seems we had the newer version of data.table installed on our server with R3.6.0 which is why the error did not occur there and caused us to think it had to do with the R version. I will keep this in mind for any future installations! Appreciate all of your help with this!

@jameslamb Can you please help to update R README with that warning about data.table version incompatibility? And then I think we may close the issue.

@abowma great! Glad it is working for you.

@StrikerRUS yep I'll do that right now.

Was this page helpful?
0 / 5 - 0 ratings