Lightgbm: [python] Parallel training support for Python.

Created on 11 Oct 2017  ·  36Comments  ·  Source: microsoft/LightGBM

Does the python package support parallel learning? If not are there plans to offer support here in the future?

feature request help wanted

Most helpful comment

Thanks. Transferred all the way through to
https://github.com/dask/dask-lightgbm

In the coming weeks I'll make some time to do a full code review and write
some docs.

On Fri, Nov 16, 2018 at 11:49 AM SfinxCZ notifications@github.com wrote:

@TomAugspurger https://github.com/TomAugspurger Ive restarted the
transfer.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/LightGBM/issues/985#issuecomment-439472665,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIsRK3TErMsZj6IcmWXrBdjUmgRuLks5uvvqrgaJpZM4P1kkK
.

All 36 comments

not yet.
However, the c_api will expose the parallel related parameters.

@wxchan do you have time to enable this ?

FWIW, I'm interested in this as well. I'd like to write a small library enabling dask arrays to be used with LightGBM, similar to https://github.com/dask/dask-xgboost for xgboost.

Let me know if there's anything I could do to help with testing.

@TomAugspurger
Thanks for your interesting. It will be very awesome if LightGBM could be in dask as well.

network api is ready: https://github.com/Microsoft/LightGBM/blob/master/include/LightGBM/c_api.h#L738-L755
@wxchan could you help to wrap the network api in python package ?

BTW, following is the pipeline of parallel training:

  1. each work get its own partitioned data
  2. collect the ip and ports of all works
  3. Initialize the Network (based on ip/ports, num_worker, ...)
  4. use local data to construct the dataset (all workers should do this)
  5. construct Booster ( should set parameter tree_learner )
  6. training normally
  7. Finalize the Network

It is almost the same as the training of single machine.

+1 for this feature. Dask + LightGBM would be the definitive solution for ML on large tabular data

I see that the PR adding the python bindings has been merged, thanks.

I'll give the dask wrapper a shot next week then, and I'll post here if I hit any issues.

I hope you don't mind if I use this thread as a support channel as I implement this. I'll repay with a PR updating the docs :)

Currently I have https://gist.github.com/a794b3a153548966baf5b62e0806b0b4. It seems like the new network APIs are being called correctly, and the output is (I'm not sure if those TCP_NODELAY warnings are anything to be concerned with):

scheduler tcp://127.0.0.1:60482
[LightGBM] [Info] Total Bins 720
[LightGBM] [Info] Number of data: 100, number of used features: 20
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Info] Trying to bind port 12400...
[LightGBM] [Info] Binding port 12400 succeeded
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Info] Listening...
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Warning] Set TCP_NODELAY failed.
[LightGBM] [Info] Connected to rank 1
[LightGBM] [Info] Connected to rank 2
[LightGBM] [Info] Connected to rank 3
[LightGBM] [Info] Connected to rank 4
[LightGBM] [Info] Connected to rank 5
[LightGBM] [Info] Connected to rank 6
[LightGBM] [Info] Connected to rank 7
[LightGBM] [Info] Connected to rank 8
[LightGBM] [Info] Local rank: 0, total number of machines: 9

At this point it hangs and I CTRL-C to interrupt:

^C[LightGBM] [Fatal] Socket recv error, code: 4
Traceback (most recent call last):
  File "dask_ml/lightgmb/__init__.py", line 39, in <module>
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    main()
  File "dask_ml/lightgmb/__init__.py", line 35, in main
    train(params, dset)
  File "dask_ml/lightgmb/__init__.py", line 15, in train
    bst = lgb.train(params, train)
  File "/Users/taugspurger/Envs/dask-dev/lib/python3.6/site-packages/lightgbm/engine.py", line 199, in train
  File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/forkserver.py", line 164, in main
    booster.update(fobj=fobj)
    rfds = [key.fileobj for (key, events) in selector.select()]
  File "/usr/local/Cellar/python3/3.6.1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/selectors.py", line 577, in select
  File "/Users/taugspurger/Envs/dask-dev/lib/python3.6/site-packages/lightgbm/basic.py", line 1485, in update
    kev_list = self._kqueue.control(None, max_ev, timeout)
KeyboardInterrupt
    ctypes.byref(is_finished)))
KeyboardInterrupt
[LightGBM] [Info] Finished linking network in 0.000000 seconds

That's probably unsurprising since I haven't actually done anything on the workers to prepare them. I'll do that next with the Client.run method.

@TomAugspurger sorry for the late response.
You can ignore these TCP_NODELAY warnings.

any cool updates ?

Unfortunately I haven't had time to work on this. It's still on my todo
list though.

On Sat, Dec 16, 2017 at 9:08 PM, Guolin Ke notifications@github.com wrote:

@TomAugspurger https://github.com/tomaugspurger sorry for the late
response.
You can ignore these TCP_NODELAY warnings.

any cool updates ?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/LightGBM/issues/985#issuecomment-352228498,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIpEn8_yBKmbgYfOXFL-tVMQm8QUbks5tBIW1gaJpZM4P1kkK
.

It will be super cool to see the Dask + LightGBM together. Is there any updates here?

Still haven't had a chance. If you're interested in investigating this @little-eyes, I can help answer questions from the dask side of things :)

+1 on LightGBM in Dask. It would be really great.

@TomAugspurger maybe you can create a PR first and then do it slowly ?

IIRC, I didn't make much progress when I initially attempted this. A fresh start is probably best.

On Tue, Jun 12, 2018 at 7:07 PM, Guolin Ke notifications@github.com wrote:

@TomAugspurger maybe you can create a PR first and then do it slowly ?

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Hi!

Any updates on this issue? Is it true that if we had dask + lightgbm, we would be able to do out-of-core learning of lightgbm on huge datasets on single machine through python interface? If that's the case, I think I could try to implement this feature. Could someone give me an advise on how to tackle this? Maybe some obvious problems on the way?

Or, maybe, I've missed something and there is an easier way to go out-of-core on a single machine?

@TomAugspurger @guolinke

Wondering what’s the latest story on this as well.

No updates on my end.

On Sun, Sep 23, 2018 at 1:29 PM Ivan Smirnov notifications@github.com
wrote:

Wondering what’s the latest story on this as well.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/LightGBM/issues/985#issuecomment-423837021,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIviX3sGlrqj2vaOJpKfNzBIhlXF3ks5ud9MLgaJpZM4P1kkK
.

Hi !
Does anyone have been able to use distributed LGBM with python?
Having LightGBM distributed on Dask would be the ultime solution for machine learning in an industrial context
https://github.com/SfinxCZ/dask-lightgbm

cc @SfinxCZ, how are things progressing on https://github.com/SfinxCZ/dask-lightgbm? Anything I can help with on the dask side?

Hi,

@TomAugspurger it seems that there are no changes required on dask side.

But I have problem with the LightGBM python API which hangs when I try to run LightGBM twice. I've created an issue #1789 that contains an example and steps to reproduce.

The code starts two subprocess which setup the distributed environment, train model and then shut down. Problem is that when I try to run this scenario twice (e.g. two unit tests, one for two-class classification and one for multiclass) the second run hags.

This makes testing of the dask-lightgbm package very difficult (and impossible in some CI tool).

So, I've finished first pre-alpha version of dask-lightgbm. One thing that still blocks using this package is the fact that it depends on LightGBM>=2.2.2 which is not released yet (it depends on #1741). @guolinke when do you plan to release new version?

Please note that this is still pre-alpha version and it is not battle-tested. So, any comments, issues and especially pull-requests are welcomed.

@TomAugspurger What do you think about mergning dask-lightgbm package into dask-xgboost to build one single package that will handle gradient boosting?

Sorry, I'll be a little slow responding this week.

w.r.t. merging with dask-xgboost, I'd guess that would cause more headaches
than it's worth. I suspect (but I have not way of
backing this up) that most people using LightGBM won't want to pull in the
XGBoost dependencies, and vice-versa. From a distribution
point of view, I think it makes sense to keep them as two separate packages.

Development-wise, if there's any large chunks of code you needed, we could
maybe move that into Dask or Dask-ML.

Finally, we'd be more than happy to move dask-lightgbm into the Dask
organization on GitHub, if you want.

On Sun, Oct 28, 2018 at 10:50 AM SfinxCZ notifications@github.com wrote:

So, I've finished first pre-alpha version of dask-lightgbm. One thing that
still blocks using this package is the fact that it depends on
LightGBM>=2.2.2 which is not released yet (it depends on #1741
https://github.com/Microsoft/LightGBM/pull/1741). @guolinke
https://github.com/guolinke when do you plan to release new version?

Please note that this is still pre-alpha version and it is not
battle-tested. So, any comments, issues and especially pull-requests are
welcomed.

@TomAugspurger https://github.com/TomAugspurger What do you think about
mergning dask-lightgbm package into dask-xgboost to build one single
package that will handle gradient boosting?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/LightGBM/issues/985#issuecomment-433717181,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIjJr5j-5LnsLNoJxC9vrla2mbHhzks5updJGgaJpZM4P1kkK
.

@SfinxCZ Thanks for your great work! we can soon publish a new version. I will create a PR for the new Release.

@SfinxCZ I just tried dask-lightgbm with the master branch of LightGBM (pre-v2.2.2) ... And it works! 👍

I have skimmed thorught the source code, and I believe that what the library is doing is using the parallel learning support that LightGBM has, using the socket version right?

Although I thought that you could only used the parallel learning through the command line, and not inside a Python process...

If that is the case code mantainability should not be that hard either, since the distributed learning logic is still managed by LightGBM itself. In fact, even the Booster API could be exposed without need for the sklearn API altogether.

Great job!

Sorry for late response, I've been a bit busy.
I've changed the dependency in testing docker file to stable version of LightGBM, os it should be ready for testing/deployment.
@TomAugspurger Regarding the move under Dask, it would be great if you can put it under your project. Do you need anything from me to make the move?

To transfer, I think we just need to invite you to the Dask organization,
and then
you'll do the transfer. Does that sounds OK @mrocklin?

On Mon, Nov 12, 2018 at 2:10 PM SfinxCZ notifications@github.com wrote:

Sorry for late response, I've been a bit busy.
I've changed the dependency in testing docker file to stable version of
LightGBM, os it should be ready for testing/deployment.
@TomAugspurger https://github.com/TomAugspurger Regarding the move
under Dask, it would be great if you can put it under your project. Do you
need anything from me to make the move?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/LightGBM/issues/985#issuecomment-437993031,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIjuKuE3SwQXSA2GKjnzKKwuoPAvAks5uucWzgaJpZM4P1kkK
.

Technically I suspect that that would work. Members do have the ability to create new repositories.

Officially if we're going to be sticklers I think that the policy is that any two current owners can invite a new member into the org, providing that they work at different institutions (employees from a single institution can't create new members alone). Given that we both currently work for Anaconda Inc you'd have to get someone else to sign off (presumably @ogrisel ?)

Alternatively @SfinxCZ could probably give you rights to his repository and then you could move things over.

We'll also have to make sure that @SfinxCZ continues to have full rights over the repository afterwards. This may require explicit action on our part.

I'm generally curious @SfinxCZ , are you planning to actively maintain dask-lightgbm into the future?

@mrocklin Regarding the rights, I am ok to give you access to the repo so you can copy/move it. My motivation was that if it would be under dask project, people would be more willing to help.

Right now the package seems to be feature complete-ish. The only thing that is missing is release to pypi. Do you have a process for releasing your packages (e.g. commond account) or do you use your own personal accounts?

Regarding maintaining, the plan is that I will maintain it (up to some limit), but it would be great if some one could help me. Since most of the code is bluntly copied from dask-xgboost library, I believe that authors of dask-xgboost would be the ideal candidates.

I recommend giving @TomAugspurger permissions and he can manage the transfer.

We'll also go through the process of adding you as a member. I'll ask for authorization on the dask/dev gitter channel

I've initialized the permission transfer to @TomAugspurger.

Ahh sorry @SfinxCZ I just saw this now, and the transfer expired :/ Could you re-initiate it?

@TomAugspurger Ive restarted the transfer.

Thanks. Transferred all the way through to
https://github.com/dask/dask-lightgbm

In the coming weeks I'll make some time to do a full code review and write
some docs.

On Fri, Nov 16, 2018 at 11:49 AM SfinxCZ notifications@github.com wrote:

@TomAugspurger https://github.com/TomAugspurger Ive restarted the
transfer.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/Microsoft/LightGBM/issues/985#issuecomment-439472665,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABQHIsRK3TErMsZj6IcmWXrBdjUmgRuLks5uvvqrgaJpZM4P1kkK
.

@SfinxCZ Thanks a lot for dask-lightgbm!

Given that we both currently work for Anaconda Inc you'd have to get someone else to sign off (presumably @ogrisel ?)

I guess I am ok for inviting @SfinxCZ to the dask organization but we don't know your real name and nor your background from your github profile description. Not sure if this is a requirement or not. The LICENSE file from dask-lightgbm mentions CISCO Systems Inc. Do you work for CISCO? Is your employer ok with open source contributions?

Hey @ogrisel
I've updated my profile. Yes, I work for CISCO and I've passed all internal processes to open-source this library, so my employer agrees with the contribution.

I am ok giving you the code and not being part of dask organisation (assuming that pull-requests are ok for you). The only thing is that someone from dask will have to take ownership of this package so it would not be abandoned.

Thanks for updating your profile and great news that your employer allows officially open source contributions. I am +1 for you to join the dask organization. The more diverse the members are, the better it is for the long term sustainability of open source projects of the dask ecosystem.

Was this page helpful?
0 / 5 - 0 ratings