Distributed: Visualize TaskGroups

Created on 23 Apr 2021  Â·  6Comments  Â·  Source: dask/distributed

We should find other representations to replace the Task stream. See https://github.com/dask/distributed/issues/4260

One view of that data is aggregated within the TaskGroups. A TaskGroup collects many related tasks together. For example one dd.read_csv call may generate 10,000 tasks, but will generate only one task group. These correpond to high level layers on the client side, or Spark layers.

Task Groups contain information that is potentially useful to convey. Here is a subset

  • start and stop time of every group
  • how long we've spent on the group, both in comptuation, but also data transfer and other activities
  • amount of data processed / currently in storage
  • dependency relationships to other taskgroups
  • how far along we are in computing them, as well as if we've had any errors (this is the same information we have in the progress bars in the status page of the dashboard today)

How should we convey this information visually to the user? As mentioned above, we convey the progress of tasks within a taskgroup today in the progress chart. Great, what else? We could consider doing something like these graphs from spark

image

But perhaps augmented real-time and with color/size/shading differences with the updated information that we have.

I walked down this path briefly in the attached notebook, using start and stop times to inform layout. I found that, due to overlap, this was hard/impossible. I'm now of the opinion that layout should be purely informed by dependency graph structure (similar to the Spark image above). However, I think that once we have that rough layout there is a lot that we can do with regards to color/size/shading that will be fun. Layout is still an interesting problem though, especially when trying to make the general case robustly laid out.

I think that we have all the information that we need in TaskGroups. Right now the next thing to do is to think about visualization, which should be fun

Most helpful comment

@ian-r-rose you may also find this interesting. I think that you
and @ncclementi might be a good pairing here. (James' idea, I just wanted
to make sure that this got out there)

On Tue, May 4, 2021 at 2:28 PM Benjamin Zaitlen @.*>
wrote:

Visualizing some kind of timeseries of task groups would be really cool! I
wanted to note that last year we added a visualization for aggregate timing
information using TaskPrefixes and aggregate action information.

xref: #3792 https://github.com/dask/distributed/pull/3792

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dask/distributed/issues/4744#issuecomment-832188219,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AACKZTH2VXJJ2LWOHL3LS43TMBDGVANCNFSM43PB3LHA
.

All 6 comments

I walked down this path briefly in the attached notebook, using start and stop times to inform layout

I think https://gist.github.com/mrocklin/2f37440e52420b0b2899f72c6f61b802 is the notebook you're referring to, but @mrocklin feel free to correct me if there's a different notebook

Thanks James, I'll start looking at this.

Sounds good, thanks @ncclementi! Also cc'ing @ian-r-rose

Visualizing some kind of timeseries of task groups would be really cool! I wanted to note that last year we added a visualization for aggregate timing information using TaskPrefixes and aggregate action information.

xref: https://github.com/dask/distributed/pull/3792

@ian-r-rose you may also find this interesting. I think that you
and @ncclementi might be a good pairing here. (James' idea, I just wanted
to make sure that this got out there)

On Tue, May 4, 2021 at 2:28 PM Benjamin Zaitlen @.*>
wrote:

Visualizing some kind of timeseries of task groups would be really cool! I
wanted to note that last year we added a visualization for aggregate timing
information using TaskPrefixes and aggregate action information.

xref: #3792 https://github.com/dask/distributed/pull/3792

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dask/distributed/issues/4744#issuecomment-832188219,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AACKZTH2VXJJ2LWOHL3LS43TMBDGVANCNFSM43PB3LHA
.

@ian-r-rose you may also find this interesting. I think that you and @ncclementi might be a good pairing here. (James' idea, I just wanted to make sure that this got out there)

Yep, we are meeting on this today to plan our a line of action

Was this page helpful?
0 / 5 - 0 ratings

Related issues

fjetter picture fjetter  Â·  6Comments

m-albert picture m-albert  Â·  6Comments

anweshknayak picture anweshknayak  Â·  6Comments

lostmygithubaccount picture lostmygithubaccount  Â·  4Comments

DPeterK picture DPeterK  Â·  3Comments