I am opening this issue to see if anyone is interested in contributing.
As XGBoost have minimum dependecy, it is possible to use amalgamation and emscripten to build a javascript version of the library. See https://github.com/dmlc/mxnet.js as an example for our deep learning project.
This could be fun for some usecases, to run XGBoost on the browser and provide some cool demos
@tqchen I can try to participate in it
sounds good, @Jabher Let me know if you need any help on this, mxnet.js could be a good starting point
Cool, thanks!
@tqchen I personally think that implementing predict
is more important than implementing train
in this case. I can only think of toy examples where a train
javascript function would be helpful, such as web ML demos. However, there are a lot of cases where computing xgboost predictions clientside on a pre-built xgboost model would be very practical for a variety of applications. I've had to do this for clients before using other ML models. A typical workflow might be:
xgb.dump
. The advantage of just implementing the predict
functionality is that it can be done with VinallaJS. I could also develop the predict
functionality quite quickly, whereas a complete javascript API for XGBoost would be a larger-scale undertaking. What do you think? Happy to contribute a PR for this.
@AndrewHannigan FYI there is a project to build compiler of boosting model (transform the model in if else instructions in C++ then compile it). It will be released soon. What you are speaking may be part of this project.
More info here : https://github.com/dmlc/xgboost/issues/2551
@pommedeterresautee sounds very intriguing, but why take C++ source code as input for the compiler? Wouldn't JSON be easier to parse, more portable, widely supported by other languages, etc.? Seems to me the forest should be represented as data, not code.
performance !! for industrial deployment
Oh I see, so this just going to produce a binary executable. Sounds cool! Can you share any other info on the proejct, besides #2551? There might be pieces of the pipeline before the ifelse generation step that would overlap with this project.
@hcho3 may give you more info.
The parsing of the model and the internal representation may be useful for both projects, plus tree-lite manages also models from lightgbm...
@AndrewHannigan treelite is currently in public beta. I am adding more documentation before an official release. The idea here is to produce a binary executable for optimized prediction. Right now, treelite produces C program internally to be fed into a C compiler (e.g. gcc), but we can certainly produce a JS program instead.
Has there been any progress on this?
Hello, I just have a version of the library that works on node and the browser (using WebAssembly)
https://github.com/mljs/xgboost
Currently, you are able to train models, save it and load it, but I'm trying to be able to load the model saved in other language but I'm getting an unexpected behavior, I don't know if somebody is able to help me
(the current implementation is on the load-external-files branch)
Hello @tqchen,
I have a version on JS working (using WebAssembly) so it's able to run on the browser, also I'm able to load models saved on other programming languages (using the C API)
tell me if you are interested to include it here.
nice, if you are interested in porting back the js binding, we can put it under xgboost/web
We cannot host the emscriptened js file directly in the repo(as it can be big), but we can host the build flow and put emcc generated file in a separate repo.
cc @hcho3
For most of the C++ change, we can use EMSCRIPTEN to detect if the current project is compiled with emcc
@tqchen I'd suppose we need to update both Makefile and CMakeLists.txt to accommodate the JS build?
@JeffersonH44 Thanks! This is great. It will be best to incorporate your work into the main repository. (Treelite ended up taking a different focus for the time being)
Most helpful comment
Hello @tqchen,
I have a version on JS working (using WebAssembly) so it's able to run on the browser, also I'm able to load models saved on other programming languages (using the C API)
mljs/xgboost
tell me if you are interested to include it here.