Distributed: Deployment meeting

Created on 16 Aug 2018  Â·  19Comments  Â·  Source: dask/distributed

There are a few different communities building tools to deploy Dask on various kinds of resource managers:

  • dask-jobqueue: @jhamman @guillaumeeb @lesteve
  • dask-drmaa: @jakirkham
  • dask-kubernetes: @jacobtomlinson @yuvipanda
  • dask-yarn: @jcrist

These libraries all depend on and extend the central distributed.deploy.Cluster object within this codebase. All of these communities have identified failings within the current implementations and are considering changing their design. This includes features like the following:

  1. More robust adaptive scaling
  2. Moving the scheduler to a different node
  3. JupyterLab extensions
  4. (probably other things I've forgotten

I thought it might be useful for this group to meet infrequently and maybe coordinate effort. Is this something that would interest the maintainers listed above? I'm thinking about something like an hour-long video meeting in which we identify a few desired features and share constraints on those features.

Most helpful comment

OK, I propose that we meet this Wednedsay 2018-08-22 at 3pm UTC which is

  • 8am in California
  • 10am in Texas
  • 11am in New York
  • 4pm in London
  • 5pm in Paris

All 19 comments

Thanks @mrocklin for bringing this idea. I'm interested in this, even if I don't really know how to handle such a meeting. Some (limited) ideas that might fit in such a conversation I have right now (even if they are probably mostly linked to dask-jobqueue):

I'd be up for a meeting. For dask-yarn the most pressing issues for me are:

  • Allow having the scheduler be on a separate machine than the cluster object. This would mainly involve changes to the adaptive class (see https://github.com/dask/dask-yarn/issues/1)

  • Modularization and cleanup of the cli scripts. In dask-yarn we have a copy and modified version of both dask-scheduler and dask-worker. It'd be useful not to have to duplicate this logic.

Having the scheduler on a separate machine could make sense for dask-jobqueue to. It would be a good thing to decouple scheduler process from Jupyter/Client process. I see two goals for dask-jobqueue:

  • Be able to have the Scheduler run inside a scheduler job, and so select appropriate resources in terms of CPU and memory depending on the dask workflow to launch.
  • Be able to shutdown notebook/ipython process without stopping running dask computation.

OK, I propose that we meet this Wednedsay 2018-08-22 at 3pm UTC which is

  • 8am in California
  • 10am in Texas
  • 11am in New York
  • 4pm in London
  • 5pm in Paris

It will be really complicated for me to be around and fully available after 4pm in France, so 2pm UTC, until september. Sorry for that.

I don't have strong feelings about this. 7a is a bit too early for me but I'd be okay moving the meeting so that @guillaumeeb can attend. I think he can represent the jobqueue dev team just fine.

OK, @guillaumeeb can you confirm that you can attend a meeting that starts at 2pm UTC? Or would you prefer to start earlier than that so that you can be done by 2pm UTC?

Also, for meeting location I recommend https://appear.in/dask-dev

I would prefer to start earlier, and tomorrow afternoon is not possible for me either (sorry for being such a burden, kids afternoon!).

OK, I recommend that those of us who can meet tomorrow at 3pm UTC do so. This is the original time. We can follow up with more discussion in the future if necessary.

Yep sure, do so, @jhamman can obviously represent jobqueue more than fine!
Thanks for trying to adapt to my schedule anyway.

As a reminder, we'll meet (those who can) in three hours. Here are my goals and a possible agenda:

Goals

  1. Ensure that people here get to know each other and increase cross-talk in the future
  2. Identify and prioritize issues for future development
  3. Figure out who can do what

Agenda

  1. Quick introductions (5 min)
  2. Features and bugs

    • Someone says something that they'd like changed, and why it's important for them

    • Other groups say how that feature might affect their system, and any constraints that they would have to impose

    • Hopefully this process devolves into architecture discussion

  3. ...

I expect that most of the conversation will be people proposing things that they'd like to do/see done and then other people discussing those things.

Location

https://appear.in/dask-dev

Was this meeting going to be a regular occurrence? I'm sorry I couldn't make the last one but would likely join future meetings if/when they occur.

I'm also interested to konw if there was any outcome to this meeting?

I've got a custom distributed deployment, spinning up a dask cluster in Windows Containers on a Windows HPC grid so it's probably a bit different to the usual.

One issue I have come across is that I spin up dask-scheduler in a container and map the bokeh port 8787 to an arbitrary port on the host. dask-scheduler obviously doesn't know about the port mapping so the Client html repr reports the wrong url.

There's a PR at #2063 and @mrocklin also points out that it can be fixed by patching the global config but that solution seems sub-optimal as it doesn't work if you have more than one dask cluster in the same session.

Anyway, whilst my deployment is custom, it seems spinning up dask-scheduler in a container should be a fairly common thing (in this group) and as such maybe others here have run into the same problem?

We couldn't get enough people and so no meeting occurred. I'd be happy to
try again. When are people free?

On Thu, Aug 30, 2018 at 6:16 AM, Dave Hirschfeld notifications@github.com
wrote:

I've got a custom distributed deployment, spinning up a dask cluster in
Windows Containers on a Windows HPC grid so it's probably a bit different
to the usual.

One issue I have come across is that I spin up dask-scheduler in a
container and map the bokeh port 8787 to an arbitrary port on the host.
dask-scheduler obviously doesn't know about the port mapping so the Client
html repr reports the wrong url.

There's a PR at #2063 https://github.com/dask/distributed/pull/2063 and
@mrocklin https://github.com/mrocklin also points out that it can be
fixed by patching the global config but that solution seems sub-optimal as
it doesn't work if you have more than one dask cluster in the same session.

Anyway, whilst my deployment is custom, it seems spinning up
dask-scheduler in a container should be a fairly common thing (in this
group) and as such maybe others here have run into the same problem?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dask/distributed/issues/2189#issuecomment-417268398,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AASszAaHAo97-EL-JHcS50Of4UI3x4VKks5uV7t3gaJpZM4WAY9n
.

I would prefer to do it at 2pm UTC, but 3 pm should be okay starting next week. It would be better for me to avoid wednesday too, tuesay or thursday would be fine in general, though I'm only free on thursday next week.

@dhirschfeld not sure about your problem, but if you want to discuss about this, maybe in another issue and asking to dask-kubernetes contributors?

@guillaumeeb - I just thought I'd bring it up here as I think it has to be a common problem to everyone deploying dask in containers? I'll let people comment further in #2063 though...

Was this page helpful?
0 / 5 - 0 ratings

Related issues

quasiben picture quasiben  Â·  7Comments

mrocklin picture mrocklin  Â·  3Comments

mberglundmx picture mberglundmx  Â·  7Comments

muammar picture muammar  Â·  6Comments

mrocklin picture mrocklin  Â·  4Comments