Lightgbm: [performance] Single row predictions speedup.

Created on 22 Mar 2020 · 4Comments · Source: microsoft/LightGBM

Hello,

Context

I know you're looking for ways to improve performance throughout the codebase.

I'm working on a production scenario with LightGBM for real-time prediction systems where throughtput and latency are both important.

Due to those and other constraints, since I receive individual events which must be scored as quickly as possible, I'm using the method LGBM_BoosterPredictForMatSingleRow for it states in the documentation that it partly reuses internal structures to speed up computation.

Change proposal

I looked at the code to see if there were any easy wins we could pull off and saw that for every single prediction we create a new Config object from scratch. This object has roughly 200 members, and also must parse the configuration string into the different properties.

I split that LGBM_BoosterPredictForMatSingleRow call into a "configuration/init" call that creates the config and a "scoring" call that uses that config. With some small twraks I got in a very basic case almost 2x the throughtput. This requires adding 2 functions to the C API (without touching existing code at all):

LGBM_BoosterPredictForMatSingleRowFastInit (creates the config before scoring lots of events)
LGBM_BoosterPredictForMatSingleRowFast (score using the pre-build config - as we're not changing parameters)

Check here the call graph for the current (non-patched code) with a binary classifier test with 1 thread, 7 features and 100 trees (extremely simple model):

Roughly 1/3 of the time (27.40% + 6.40%) is spent recreating the same Config over and over.
Notice only the left branch is doing "meaningful" work when you mantain the config properties - i.e. score lots of events with the same configuration.