Lightgbm: Process hangs when trying to use LightGBM with RestRServe

Created on 4 Jun 2019  路  9Comments  路  Source: microsoft/LightGBM

I am trying to train a LightGBM model using the R library, and then serve it using RestRServe. However after making a request to the microservice the process hangs and I receive no response. I also don't receive any error or warning messages (usually when there is an R error, RestRServe will return the callback to the console).

RestRServe (and RServe as far as I know) fork the R process every time a call is made to the API. Running top reveals that the forked process has been created and some memory has been allocated (In the actual workflow where I encountered the issue CPU usage by the forked process initially increases then drops to 0. In the example below CPU usage is negligible, so I couldn't track it).

Here is a minimal reproducible example:

library(lightgbm)
library(RestRserve)

data(agaricus.train, package = "lightgbm")
data(agaricus.test, package = "lightgbm")
train <- agaricus.train
test <- agaricus.test
bst <- lightgbm(
  data = train$data,
  label = train$label,
  num_leaves = 4,
  learning_rate = 1,
  nrounds = 2,
  objective = "binary"
)


dummy_api_function  <- function(request, response) {
  result <- predict(bst, test$data)[1]
  response$body <- jsonlite::toJSON(result)
  response$content_type <- "application/json"
  response$headers <- character(0)
  response$status_code <- 200L
  forward()
}

RestRserveApp <- RestRserve::RestRserveApplication$new()
RestRserveApp$add_post(path = "/api/dummy_api", FUN = dummy_api_function)
RestRserveApp$run(8001)

And an example of a curl request to test the api (submitting this request results in the process hanging):

curl --header "Content-Type: application/json" --request POST --data '{"foo":"bar"}' localhost:8001/api/dummy_api

For a working example, replace result <- predict(bst, test$data)[1] with result <- 1 and submit the same curl request.

Here is some info about my environment:

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=bg_BG.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=bg_BG.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=bg_BG.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=bg_BG.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Matrix_1.2-12       RestRserve_0.1.0.13 lightgbm_2.2.4      R6_2.4.0           

loaded via a namespace (and not attached):
[1] compiler_3.4.4    magrittr_1.5      tools_3.4.4       yaml_2.2.0        Rserve_1.7-3      grid_3.4.4        data.table_1.12.0 jsonlite_1.6     
[9] lattice_0.20-35
r-package

Most helpful comment

I guess Question 11 in our FAQ is about this issue.
Ping @Laurae2

All 9 comments

I have stumbled upon the same issue, and it actually is a very big problem for R in general, as if you are developing models for production, lgb is one of the most useful libraries out there, and if you can not deploy it with Rserve, what else could one be using? Python!?
I believe RestRserve is a great contribution to the R ecosystem and the R community should pay more attention to productionizing models using R.

This is likely a problem in LightGBM in that it's probably not fork-safe, so try loading the package in the client code.

I guess Question 11 in our FAQ is about this issue.
Ping @Laurae2

I confirm this fixes my issue - both nthreads and num_threads work in R.
Thanks, @StrikerRUS!

Yes, seems that was it. I'm closing this issue. Thank you for looking into it!

@demirev @deann88 @s-u For exact details, this is due to gcc compiler implementation of OpenMP which results in this issue: using OpenMP then forking, causes any OpenMP code to hang indefinitely in any fork (this is a known issue and no solution to it exists yet).

To avoid this issue, one should compile R with another compiler which is not gcc if OpenMP (any OpenMP code) followed by forking + OpenMP is a must. Example: icc (not "free") instead of gcc.

@Laurae2 thanks for the details, that's very useful. We have seen issues with gomp in the past, so it's not really a surprise. I have just confirmed that iomp doesn't seem to have that issue while gomp does, so using clang instead of gcc seems to fix the problem.

@s-u does it mean that R should be build with clang? Or it will be enough to just build Rserve and/or LightGBM with clang?

Building LightGBM with clang should be sufficient as a first shot, but make sure the package actually gets build with clang - the most important part is to make sure the linking is done against libomp. The general advice would be to build R with clang (which is what I do for CRAN releases) since that way everything is compatible and uses clang. In essence, all OpenMP code should be built with clang so it doesn't suffer from the issue.

One thing I didn't try is you could attempt to swap libgomp for libomp without re-building - it used to work since both were ABI-compatible, but I don't know if that is still the case.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

chivee picture chivee  路  3Comments

ClimbsRocks picture ClimbsRocks  路  3Comments

tbenthompson picture tbenthompson  路  3Comments

raphay3l picture raphay3l  路  3Comments

zanemarkson picture zanemarkson  路  3Comments