Lightgbm: [python] Parallel training support for Python.

Created on 11 Oct 2017 · 36Comments · Source: microsoft/LightGBM

Does the python package support parallel learning? If not are there plans to offer support here in the future?

feature request help wanted

Source

geoHeil

Most helpful comment

Thanks. Transferred all the way through to
https://github.com/dask/dask-lightgbm

In the coming weeks I'll make some time to do a full code review and write
some docs.

On Fri, Nov 16, 2018 at 11:49 AM SfinxCZ notifications@github.com wrote:

@TomAugspurger https://github.com/TomAugspurger Ive restarted the
transfer.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/LightGBM/issues/985#issuecomment-439472665,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIsRK3TErMsZj6IcmWXrBdjUmgRuLks5uvvqrgaJpZM4P1kkK
.

TomAugspurger on 16 Nov 2018

🎉5 👍4

All 36 comments

not yet.
However, the c_api will expose the parallel related parameters.

@wxchan do you have time to enable this ?

guolinke on 12 Oct 2017

FWIW, I'm interested in this as well. I'd like to write a small library enabling dask arrays to be used with LightGBM, similar to https://github.com/dask/dask-xgboost for xgboost.

Let me know if there's anything I could do to help with testing.

TomAugspurger on 13 Oct 2017

👍4

@TomAugspurger
Thanks for your interesting. It will be very awesome if LightGBM could be in dask as well.

network api is ready: https://github.com/Microsoft/LightGBM/blob/master/include/LightGBM/c_api.h#L738-L755
@wxchan could you help to wrap the network api in python package ?

BTW, following is the pipeline of parallel training:

each work get its own partitioned data
collect the ip and ports of all works
Initialize the Network (based on ip/ports, num_worker, ...)
use local data to construct the dataset (all workers should do this)
construct Booster ( should set parameter tree_learner )
training normally
Finalize the Network

It is almost the same as the training of single machine.

guolinke on 14 Oct 2017

+1 for this feature. Dask + LightGBM would be the definitive solution for ML on large tabular data

julioasotodv on 3 Nov 2017

👍1

I see that the PR adding the python bindings has been merged, thanks.

I'll give the dask wrapper a shot next week then, and I'll post here if I hit any issues.

TomAugspurger on 4 Nov 2017

I hope you don't mind if I use this thread as a support channel as I implement this. I'll repay with a PR updating the docs :)

Currently I have https://gist.github.com/a794b3a153548966baf5b62e0806b0b4. It seems like the new network APIs are being called correctly, and the output is (I'm not sure if those TCP_NODELAY warnings are anything to be concerned with):

scheduler tcp://127.0.0.1:60482
[LightGBM] [Info] Total Bins 720
[LightGBM] [Info] Number of data: 100, number of used features: 20
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Info] Trying to bind port 12400...
[LightGBM] [Info] Binding port 12400 succeeded
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Info] Listening...
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Info] Connected to rank 1
[LightGBM] [Info] Connected to rank 2
[LightGBM] [Info] Connected to rank 3
[LightGBM] [Info] Connected to rank 4
[LightGBM] [Info] Connected to rank 5
[LightGBM] [Info] Connected to rank 6
[LightGBM] [Info] Connected to rank 7
[LightGBM] [Info] Connected to rank 8
[LightGBM] [Info] Local rank: 0, total number of machines: 9

At this point it hangs and I CTRL-C to interrupt:

^C[LightGBM] [Fatal] Socket recv error, code: 4
Traceback (most recent call last):
  File "dask_ml/lightgmb/__init__.py", line 39, in <module>
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    main()
  File "dask_ml/lightgmb/__init__.py", line 35, in main
    train(params, dset)
  File "dask_ml/lightgmb/__init__.py", line 15, in train
    bst = lgb.train(params, train)
  File "/Users/taugspurger/Envs/dask-dev/lib/python3.6/site-packages/lightgbm/engine.py", line 199, in train
  File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/forkserver.py", line 164, in main
    booster.update(fobj=fobj)
    rfds = [key.fileobj for (key, events) in selector.select()]
  File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/selectors.py", line 577, in select
  File "/Users/taugspurger/Envs/dask-dev/lib/python3.6/site-packages/lightgbm/basic.py", line 1485, in update
    kev_list = self._kqueue.control(None, max_ev, timeout)
KeyboardInterrupt
    ctypes.byref(is_finished)))
KeyboardInterrupt
[LightGBM] [Info] Finished linking network in 0.000000 seconds

That's probably unsurprising since I haven't actually done anything on the workers to prepare them. I'll do that next with the Client.run method.

TomAugspurger on 6 Nov 2017

@TomAugspurger sorry for the late response.
You can ignore these TCP_NODELAY warnings.

any cool updates ?

guolinke on 17 Dec 2017

Unfortunately I haven't had time to work on this. It's still on my todo
list though.

On Sat, Dec 16, 2017 at 9:08 PM, Guolin Ke notifications@github.com wrote:

@TomAugspurger https://github.com/tomaugspurger sorry for the late
response.
You can ignore these TCP_NODELAY warnings.

any cool updates ?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/LightGBM/issues/985#issuecomment-352228498,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIpEn8_yBKmbgYfOXFL-tVMQm8QUbks5tBIW1gaJpZM4P1kkK
.

TomAugspurger on 17 Dec 2017

It will be super cool to see the Dask + LightGBM together. Is there any updates here?

little-eyes on 4 May 2018

Still haven't had a chance. If you're interested in investigating this @little-eyes, I can help answer questions from the dask side of things :)

TomAugspurger on 4 May 2018

+1 on LightGBM in Dask. It would be really great.

charlesjansen on 5 Jun 2018

@TomAugspurger maybe you can create a PR first and then do it slowly ?

guolinke on 13 Jun 2018

IIRC, I didn't make much progress when I initially attempted this. A fresh start is probably best.

On Tue, Jun 12, 2018 at 7:07 PM, Guolin Ke notifications@github.com wrote:

@TomAugspurger maybe you can create a PR first and then do it slowly ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

TomAugspurger on 13 Jun 2018

Hi!

Any updates on this issue? Is it true that if we had dask + lightgbm, we would be able to do out-of-core learning of lightgbm on huge datasets on single machine through python interface? If that's the case, I think I could try to implement this feature. Could someone give me an advise on how to tackle this? Maybe some obvious problems on the way?

Or, maybe, I've missed something and there is an easier way to go out-of-core on a single machine?

@TomAugspurger @guolinke

vladimir-nazarenko on 5 Jul 2018

Wondering what’s the latest story on this as well.

aldanor on 23 Sep 2018

No updates on my end.

On Sun, Sep 23, 2018 at 1:29 PM Ivan Smirnov notifications@github.com
wrote:

Wondering what’s the latest story on this as well.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/LightGBM/issues/985#issuecomment-423837021,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIviX3sGlrqj2vaOJpKfNzBIhlXF3ks5ud9MLgaJpZM4P1kkK
.

TomAugspurger on 26 Sep 2018

😕1

Hi !
Does anyone have been able to use distributed LGBM with python?
Having LightGBM distributed on Dask would be the ultime solution for machine learning in an industrial context
https://github.com/SfinxCZ/dask-lightgbm

mlemainque on 25 Oct 2018

cc @SfinxCZ, how are things progressing on https://github.com/SfinxCZ/dask-lightgbm? Anything I can help with on the dask side?

TomAugspurger on 25 Oct 2018

Hi,

@TomAugspurger it seems that there are no changes required on dask side.

But I have problem with the LightGBM python API which hangs when I try to run LightGBM twice. I've created an issue #1789 that contains an example and steps to reproduce.

The code starts two subprocess which setup the distributed environment, train model and then shut down. Problem is that when I try to run this scenario twice (e.g. two unit tests, one for two-class classification and one for multiclass) the second run hags.

This makes testing of the dask-lightgbm package very difficult (and impossible in some CI tool).

SfinxCZ on 26 Oct 2018

So, I've finished first pre-alpha version of dask-lightgbm. One thing that still blocks using this package is the fact that it depends on LightGBM>=2.2.2 which is not released yet (it depends on #1741). @guolinke when do you plan to release new version?

Please note that this is still pre-alpha version and it is not battle-tested. So, any comments, issues and especially pull-requests are welcomed.

@TomAugspurger What do you think about mergning dask-lightgbm package into dask-xgboost to build one single package that will handle gradient boosting?

SfinxCZ on 28 Oct 2018

Sorry, I'll be a little slow responding this week.

w.r.t. merging with dask-xgboost, I'd guess that would cause more headaches
than it's worth. I suspect (but I have not way of
backing this up) that most people using LightGBM won't want to pull in the
XGBoost dependencies, and vice-versa. From a distribution
point of view, I think it makes sense to keep them as two separate packages.

Development-wise, if there's any large chunks of code you needed, we could
maybe move that into Dask or Dask-ML.

Finally, we'd be more than happy to move dask-lightgbm into the Dask
organization on GitHub, if you want.

On Sun, Oct 28, 2018 at 10:50 AM SfinxCZ notifications@github.com wrote:

So, I've finished first pre-alpha version of dask-lightgbm. One thing that
still blocks using this package is the fact that it depends on
LightGBM>=2.2.2 which is not released yet (it depends on #1741
https://github.com/Microsoft/LightGBM/pull/1741). @guolinke
https://github.com/guolinke when do you plan to release new version?

Please note that this is still pre-alpha version and it is not
battle-tested. So, any comments, issues and especially pull-requests are
welcomed.

@TomAugspurger https://github.com/TomAugspurger What do you think about
mergning dask-lightgbm package into dask-xgboost to build one single
package that will handle gradient boosting?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/LightGBM/issues/985#issuecomment-433717181,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIjJr5j-5LnsLNoJxC9vrla2mbHhzks5updJGgaJpZM4P1kkK
.

TomAugspurger on 29 Oct 2018

@SfinxCZ Thanks for your great work! we can soon publish a new version. I will create a PR for the new Release.

guolinke on 30 Oct 2018

👍1

@SfinxCZ I just tried dask-lightgbm with the master branch of LightGBM (pre-v2.2.2) ... And it works! 👍

I have skimmed thorught the source code, and I believe that what the library is doing is using the parallel learning support that LightGBM has, using the socket version right?

Although I thought that you could only used the parallel learning through the command line, and not inside a Python process...

If that is the case code mantainability should not be that hard either, since the distributed learning logic is still managed by LightGBM itself. In fact, even the Booster API could be exposed without need for the sklearn API altogether.

Great job!

julioasotodv on 31 Oct 2018

Sorry for late response, I've been a bit busy.
I've changed the dependency in testing docker file to stable version of LightGBM, os it should be ready for testing/deployment.
@TomAugspurger Regarding the move under Dask, it would be great if you can put it under your project. Do you need anything from me to make the move?

SfinxCZ on 12 Nov 2018

👍2 🎉1

To transfer, I think we just need to invite you to the Dask organization,
and then
you'll do the transfer. Does that sounds OK @mrocklin?

On Mon, Nov 12, 2018 at 2:10 PM SfinxCZ notifications@github.com wrote:

Sorry for late response, I've been a bit busy.
I've changed the dependency in testing docker file to stable version of
LightGBM, os it should be ready for testing/deployment.
@TomAugspurger https://github.com/TomAugspurger Regarding the move
under Dask, it would be great if you can put it under your project. Do you
need anything from me to make the move?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/LightGBM/issues/985#issuecomment-437993031,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIjuKuE3SwQXSA2GKjnzKKwuoPAvAks5uucWzgaJpZM4P1kkK
.

TomAugspurger on 14 Nov 2018

Technically I suspect that that would work. Members do have the ability to create new repositories.

Officially if we're going to be sticklers I think that the policy is that any two current owners can invite a new member into the org, providing that they work at different institutions (employees from a single institution can't create new members alone). Given that we both currently work for Anaconda Inc you'd have to get someone else to sign off (presumably @ogrisel ?)

Alternatively @SfinxCZ could probably give you rights to his repository and then you could move things over.

We'll also have to make sure that @SfinxCZ continues to have full rights over the repository afterwards. This may require explicit action on our part.

I'm generally curious @SfinxCZ , are you planning to actively maintain dask-lightgbm into the future?

mrocklin on 14 Nov 2018

@mrocklin Regarding the rights, I am ok to give you access to the repo so you can copy/move it. My motivation was that if it would be under dask project, people would be more willing to help.

Right now the package seems to be feature complete-ish. The only thing that is missing is release to pypi. Do you have a process for releasing your packages (e.g. commond account) or do you use your own personal accounts?

Regarding maintaining, the plan is that I will maintain it (up to some limit), but it would be great if some one could help me. Since most of the code is bluntly copied from dask-xgboost library, I believe that authors of dask-xgboost would be the ideal candidates.

SfinxCZ on 15 Nov 2018

👍1

I recommend giving @TomAugspurger permissions and he can manage the transfer.

We'll also go through the process of adding you as a member. I'll ask for authorization on the dask/dev gitter channel

mrocklin on 15 Nov 2018

I've initialized the permission transfer to @TomAugspurger.

SfinxCZ on 15 Nov 2018

Ahh sorry @SfinxCZ I just saw this now, and the transfer expired :/ Could you re-initiate it?

TomAugspurger on 16 Nov 2018

@TomAugspurger Ive restarted the transfer.

SfinxCZ on 16 Nov 2018

Thanks. Transferred all the way through to
https://github.com/dask/dask-lightgbm

In the coming weeks I'll make some time to do a full code review and write
some docs.

On Fri, Nov 16, 2018 at 11:49 AM SfinxCZ notifications@github.com wrote:

@TomAugspurger https://github.com/TomAugspurger Ive restarted the
transfer.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/LightGBM/issues/985#issuecomment-439472665,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIsRK3TErMsZj6IcmWXrBdjUmgRuLks5uvvqrgaJpZM4P1kkK
.

TomAugspurger on 16 Nov 2018

🎉5 👍4

@SfinxCZ Thanks a lot for dask-lightgbm!

StrikerRUS on 16 Nov 2018

Given that we both currently work for Anaconda Inc you'd have to get someone else to sign off (presumably @ogrisel ?)

I guess I am ok for inviting @SfinxCZ to the dask organization but we don't know your real name and nor your background from your github profile description. Not sure if this is a requirement or not. The LICENSE file from dask-lightgbm mentions CISCO Systems Inc. Do you work for CISCO? Is your employer ok with open source contributions?

ogrisel on 17 Nov 2018

Hey @ogrisel
I've updated my profile. Yes, I work for CISCO and I've passed all internal processes to open-source this library, so my employer agrees with the contribution.

I am ok giving you the code and not being part of dask organisation (assuming that pull-requests are ok for you). The only thing is that someone from dask will have to take ownership of this package so it would not be abandoned.

SfinxCZ on 17 Nov 2018

Thanks for updating your profile and great news that your employer allows officially open source contributions. I am +1 for you to join the dask organization. The more diverse the members are, the better it is for the long term sustainability of open source projects of the dask ecosystem.

ogrisel on 17 Nov 2018

👍1

Was this page helpful?

0 / 5 - 0 ratings