Distributed: Clarification of the task graph: are futures allowed in tuples that aren't tasks?

Created on 23 Oct 2020  Â·  4Comments  Â·  Source: dask/distributed

As reported by https://github.com/dask/distributed/issues/4177, custreamz CI fails because the task graph contains tuples with future objects.

The question is if this is even allowed? Dask don't allow keys in tuples that aren't tasks. E.g. in the following ("x",) is interpreted as a literal with no dependencies:

import dask.core
dsk = {
    "x": 1,
    "a": ("x",),
    "b": ["x"]
}
assert dask.core.get_dependencies(dsk, "a") == set()
assert dask.core.get_dependencies(dsk, "b") == {"x"}

But what about futures? Conceptually, futures are _remote_ graph keys so I am inclined to think that they should have the same semantic as regular keys.

What are you thoughts, should the following be allowed?

from distributed import LocalCluster, Client

if __name__ == '__main__':
    client = Client(LocalCluster(n_workers=1))

    def getfirst(x):
        return x[0]

    x = client.submit(sum, [1,2])
    y = client.submit(getfirst, (x,))
    y.result()

Most helpful comment

From the (cu)streamz perspective, I daresay the submits can be changed to use lists instead of tuples where necessary.

All 4 comments

cc @mrocklin @martindurant @jrbourbeau (in case any of you have thoughts 🙂)

From the (cu)streamz perspective, I daresay the submits can be changed to use lists instead of tuples where necessary.

This is a great question, thanks for asking @madsbk

I'm somewhat torn on what behavior we should have here. Based on the Dask graph spec I would expect that we shouldn't search through non-task tuples to replace Futures with their results as we don't do this for other graph keys. On the other hand, since we currently _do_ search non-task tuples for Futures changing that behavior can break existing user code (as you mentioned in the custreamz case).

For now I'm inclined to keep our current behavior of searching non-task tuples for Futures for backwards compatibility (i.e. we merge https://github.com/dask/distributed/pull/4178). Though longer term, I'm open to changing behavior if we don't want to special case Futures like we currently do. It'd probably be worth a deprecation cycle in that case though.

cc @jcrist who may also be interested in this topic

I do not recall when we added futures-in-tuples. I'm fine leaving it
removed for now if the only people who it affects are comfortable moving to
lists.

In principle, any time we accept user input we should probably quote
things. If a user is providing a tuple of futures we should probably turn
that into (tuple, [f1, f2, f3]) internally.

On Fri, Oct 23, 2020 at 10:43 AM James Bourbeau notifications@github.com
wrote:

This is a great question, thanks for asking @madsbk
https://github.com/madsbk

I'm somewhat torn on what behavior we should have here. Based on the Dask
graph spec https://docs.dask.org/en/latest/spec.html I would expect
that we shouldn't search through non-task tuples to replace Futures with
their results as we don't do this for other graph keys. On the other hand,
since we currently do search non-task tuples for Futures changing that
behavior can break existing user code (as you mentioned in the custreamz
case).

For now I'm inclined to keep our current behavior of searching non-task
tuples for Futures for backwards compatibility (i.e. we merge #4178
https://github.com/dask/distributed/pull/4178). Though longer term, I'm
open to changing behavior if we don't want to special case Futures like
we currently do. It'd probably be worth a deprecation cycle in that case
though.

cc @jcrist https://github.com/jcrist who may also be interested in this
topic

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dask/distributed/issues/4183#issuecomment-715481616,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AACKZTEJIV74KOOEAMO6VKDSMG6D5ANCNFSM4S4PXRUQ
.

Was this page helpful?
0 / 5 - 0 ratings