We should find other representations to replace the Task stream. See https://github.com/dask/distributed/issues/4260
One view of that data is aggregated within the TaskGroups. A TaskGroup collects many related tasks together. For example one dd.read_csv call may generate 10,000 tasks, but will generate only one task group. These correpond to high level layers on the client side, or Spark layers.
Task Groups contain information that is potentially useful to convey. Here is a subset
How should we convey this information visually to the user? As mentioned above, we convey the progress of tasks within a taskgroup today in the progress chart. Great, what else? We could consider doing something like these graphs from spark

But perhaps augmented real-time and with color/size/shading differences with the updated information that we have.
I walked down this path briefly in the attached notebook, using start and stop times to inform layout. I found that, due to overlap, this was hard/impossible. I'm now of the opinion that layout should be purely informed by dependency graph structure (similar to the Spark image above). However, I think that once we have that rough layout there is a lot that we can do with regards to color/size/shading that will be fun. Layout is still an interesting problem though, especially when trying to make the general case robustly laid out.
I think that we have all the information that we need in TaskGroups. Right now the next thing to do is to think about visualization, which should be fun
I walked down this path briefly in the attached notebook, using start and stop times to inform layout
I think https://gist.github.com/mrocklin/2f37440e52420b0b2899f72c6f61b802 is the notebook you're referring to, but @mrocklin feel free to correct me if there's a different notebook
Thanks James, I'll start looking at this.
Sounds good, thanks @ncclementi! Also cc'ing @ian-r-rose
Visualizing some kind of timeseries of task groups would be really cool! I wanted to note that last year we added a visualization for aggregate timing information using TaskPrefixes and aggregate action information.
@ian-r-rose you may also find this interesting. I think that you
and @ncclementi might be a good pairing here. (James' idea, I just wanted
to make sure that this got out there)
On Tue, May 4, 2021 at 2:28 PM Benjamin Zaitlen @.*>
wrote:
Visualizing some kind of timeseries of task groups would be really cool! I
wanted to note that last year we added a visualization for aggregate timing
information using TaskPrefixes and aggregate action information.xref: #3792 https://github.com/dask/distributed/pull/3792
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/dask/distributed/issues/4744#issuecomment-832188219,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AACKZTH2VXJJ2LWOHL3LS43TMBDGVANCNFSM43PB3LHA
.
@ian-r-rose you may also find this interesting. I think that you and @ncclementi might be a good pairing here. (James' idea, I just wanted to make sure that this got out there)
Yep, we are meeting on this today to plan our a line of action
Most helpful comment
@ian-r-rose you may also find this interesting. I think that you
and @ncclementi might be a good pairing here. (James' idea, I just wanted
to make sure that this got out there)
On Tue, May 4, 2021 at 2:28 PM Benjamin Zaitlen @.*>
wrote: