There are a few different communities building tools to deploy Dask on various kinds of resource managers:
These libraries all depend on and extend the central distributed.deploy.Cluster object within this codebase. All of these communities have identified failings within the current implementations and are considering changing their design. This includes features like the following:
I thought it might be useful for this group to meet infrequently and maybe coordinate effort. Is this something that would interest the maintainers listed above? I'm thinking about something like an hour-long video meeting in which we identify a few desired features and share constraints on those features.
Thanks @mrocklin for bringing this idea. I'm interested in this, even if I don't really know how to handle such a meeting. Some (limited) ideas that might fit in such a conversation I have right now (even if they are probably mostly linked to dask-jobqueue):
cores argument in scale method of adaptive, as described just a moment ago in https://github.com/dask/dask-jobqueue/issues/130.I'd be up for a meeting. For dask-yarn the most pressing issues for me are:
Allow having the scheduler be on a separate machine than the cluster object. This would mainly involve changes to the adaptive class (see https://github.com/dask/dask-yarn/issues/1)
Modularization and cleanup of the cli scripts. In dask-yarn we have a copy and modified version of both dask-scheduler and dask-worker. It'd be useful not to have to duplicate this logic.
Having the scheduler on a separate machine could make sense for dask-jobqueue to. It would be a good thing to decouple scheduler process from Jupyter/Client process. I see two goals for dask-jobqueue:
OK, I propose that we meet this Wednedsay 2018-08-22 at 3pm UTC which is
It will be really complicated for me to be around and fully available after 4pm in France, so 2pm UTC, until september. Sorry for that.
I don't have strong feelings about this. 7a is a bit too early for me but I'd be okay moving the meeting so that @guillaumeeb can attend. I think he can represent the jobqueue dev team just fine.
OK, @guillaumeeb can you confirm that you can attend a meeting that starts at 2pm UTC? Or would you prefer to start earlier than that so that you can be done by 2pm UTC?
Also, for meeting location I recommend https://appear.in/dask-dev
I would prefer to start earlier, and tomorrow afternoon is not possible for me either (sorry for being such a burden, kids afternoon!).
OK, I recommend that those of us who can meet tomorrow at 3pm UTC do so. This is the original time. We can follow up with more discussion in the future if necessary.
Yep sure, do so, @jhamman can obviously represent jobqueue more than fine!
Thanks for trying to adapt to my schedule anyway.
As a reminder, we'll meet (those who can) in three hours. Here are my goals and a possible agenda:
I expect that most of the conversation will be people proposing things that they'd like to do/see done and then other people discussing those things.
Was this meeting going to be a regular occurrence? I'm sorry I couldn't make the last one but would likely join future meetings if/when they occur.
I'm also interested to konw if there was any outcome to this meeting?
I've got a custom distributed deployment, spinning up a dask cluster in Windows Containers on a Windows HPC grid so it's probably a bit different to the usual.
One issue I have come across is that I spin up dask-scheduler in a container and map the bokeh port 8787 to an arbitrary port on the host. dask-scheduler obviously doesn't know about the port mapping so the Client html repr reports the wrong url.
There's a PR at #2063 and @mrocklin also points out that it can be fixed by patching the global config but that solution seems sub-optimal as it doesn't work if you have more than one dask cluster in the same session.
Anyway, whilst my deployment is custom, it seems spinning up dask-scheduler in a container should be a fairly common thing (in this group) and as such maybe others here have run into the same problem?
We couldn't get enough people and so no meeting occurred. I'd be happy to
try again. When are people free?
On Thu, Aug 30, 2018 at 6:16 AM, Dave Hirschfeld notifications@github.com
wrote:
I've got a custom distributed deployment, spinning up a dask cluster in
Windows Containers on a Windows HPC grid so it's probably a bit different
to the usual.One issue I have come across is that I spin up dask-scheduler in a
container and map the bokeh port 8787 to an arbitrary port on the host.
dask-scheduler obviously doesn't know about the port mapping so the Client
html repr reports the wrong url.There's a PR at #2063 https://github.com/dask/distributed/pull/2063 and
@mrocklin https://github.com/mrocklin also points out that it can be
fixed by patching the global config but that solution seems sub-optimal as
it doesn't work if you have more than one dask cluster in the same session.Anyway, whilst my deployment is custom, it seems spinning up
dask-scheduler in a container should be a fairly common thing (in this
group) and as such maybe others here have run into the same problem?—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dask/distributed/issues/2189#issuecomment-417268398,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AASszAaHAo97-EL-JHcS50Of4UI3x4VKks5uV7t3gaJpZM4WAY9n
.
I would prefer to do it at 2pm UTC, but 3 pm should be okay starting next week. It would be better for me to avoid wednesday too, tuesay or thursday would be fine in general, though I'm only free on thursday next week.
@dhirschfeld not sure about your problem, but if you want to discuss about this, maybe in another issue and asking to dask-kubernetes contributors?
@guillaumeeb - I just thought I'd bring it up here as I think it has to be a common problem to everyone deploying dask in containers? I'll let people comment further in #2063 though...
Most helpful comment
OK, I propose that we meet this Wednedsay 2018-08-22 at 3pm UTC which is