What happened: HTML reports from get_task_stream and performance_report contain grid and controls for the task stream output, but not task content (colored blocks), while the live dashboard (8787/status) does show task stream contents
What you expected to happen: contents in HTML report similar to live dashboard output
Minimal Complete Verifiable Example:
setup:
conda create -n dask-test python==3.8 -c conda-forge
conda install dask distributed ipython -c conda-forge
code:
from dask.distributed import LocalCluster
c = LocalCluster()
from dask.distributed import Client
client = Client(c)
def square(x):
return x*x
import distributed
with distributed.get_task_stream(plot='save', filename="task-stream.html") as ts:
f = client.map(square, [x for x in range(100)])
distributed.wait(f) # ensure completion per @jrbourbeau's point below
with distributed.performance_report(filename="perf.html") as ts:
f2 = client.map(square, [x for x in range(200,300)])
distributed.wait(f2)
Compare html outputs to dashboard task stream
Environment:
Thanks for raising an issue @adbreind! I think you want to make sure the computation actually runs while inside the scope of these context managers. For example, instead of:
with distributed.get_task_stream(plot='save', filename="task-stream.html") as ts:
f = client.map(square, [x for x in range(100)])
try adding a wait call
with distributed.get_task_stream(plot='save', filename="task-stream.html") as ts:
f = client.map(square, [x for x in range(100)])
distributed.wait(f)
Good point. I'll check and see if we can still reproduce the issue or not.
Modified repro example (and put @jrbourbeau 's suggested change inline above, so that newcomers to the issue will get the right code to play with).
Issue still exists, but now we've tightened it up a little.
Thanks for the catch, @jrbourbeau !
Issue still exists, but now we've tightened it up a little
Hrm, I'm not able to reproduce when I run your updated example when using the latest dev version of distributed. The task stream plot from both get_task_stream and performance_report are populated with task content (screenshots below).
get_task_stream:

performance_report:

To standardize testing of this, I tried it on a Coiled Cloud notebook, launching via the JupyterLab launch tile (the one that points to https://cloud.coiled.io/jobs/coiled/jupyterlab)
First, I ran it using LocalCluster with the test code above, did reproduce the same problem with blank output. For that run, it was dask 0.28 and distributed 0.28, python 3.8
Next I tested it using the coiled notebook for the client, but an actual Coiled Cloud 2-worker cluster (coiled/default-py38 config) and get the same result, that had dask/distributed 0.29
[For contributors who are not familiar with Coiled, it's a cloud-based platform to simplify Dask operations and you can access it at https://coiled.io/]
Alright, thanks for the additional information. I'm able to reproduce the issue just using a LocalCluster on my local machine (i.e. not using Coiled) when using bokeh v2.2.0 but not with bokeh v2.1.1. I used the following the create two separate conda environments with different bokeh versions:
No content in task stream:
# Task stream is not populated with bokeh v2.2.0
conda create -n test-2.2.0 python=3.8 bokeh=2.2.0 dask=2.30.0
Has content in task stream:
# Task stream is populated with bokeh v2.1.1
conda create -n test-2.1.1 python=3.8 bokeh=2.1.1 dask=2.30.0
@adbreind there was probably an update in bokeh==2.2.0 that's causing our plotting code to not display the content of the task stream. Some next steps would be to go through the bokeh release notes for 2.2.0 to try and identify what the relevant change might have been
Also cc @jsignell for visibility
Thanks for pinging me - I'll take a look
Ok so far I have figured out that there is data in the html. But for some reason it is not rendering properly. I think the time axis might be off somehow.
Compare the saved html:

with the output in the dashboard:

I have confirmed that the same behavior exists in 2.2.1 and 2.2.2
I don't offhand, there were no intentional 2.2 changes I am aware of that would bear on this. Are there any JS console errors or messages reported? Those HTML files above just embed the app, they are not going to be helpful on their own.
cc @mattpap
Edit: just noting that all the advisory downstream Dask tests are all passing on the latest builds.
Thank you - yeah dask is clearly missing a test for this. Is it possible that integer times aren't being treated correctly in bokeh?
@jrbourbeau do you think we are intentionally using absolute time for these plots rather than time relative to the start of the tasks (like we do in the dashboard version)? As a quick check I tried plotting relative time instead and that works.
git diff for relative time
diff --git a/distributed/client.py b/distributed/client.py
index e4a31acb..dbc33c48 100644
--- a/distributed/client.py
+++ b/distributed/client.py
@@ -4087,7 +4087,10 @@ class Client:
from .dashboard.components.scheduler import task_stream_figure
source, figure = task_stream_figure(sizing_mode="stretch_both")
+ offset = min(rects["start"])
+ rects["start"] = [x - offset for x in rects["start"]]
source.data.update(rects)
There are no console errors or messages.
Actually @jsignell I think it is this bug (which fix has been merged for 2.3, but perhaps we can get it back ported into a 2.2.3 as well) https://github.com/bokeh/bokeh/issues/10488
@jsignell actually what is easier for you: excluding 2.x and waiting for 2.3? Or having a 2.2.3 release (though it may not be for a few days or Monday)
Oh! Yep that seems like it. I am testing 2.3 now. I imagine it's easier to just exclude 2.2 and wait for the 2.3 release
Well I can't seem to build bokehjs locally today. I am running into an issue with timezone.js. That bug really feels like it'll solve this issue.
I don't think that's something that we can reasonably test from the dask side though right?
@jsignell I will try to cut a conda or pip-installable 2.3 dev build tonight.
Thanks for all your debugging and releasing efforts @jsignell @bryevdv!
FYI we have decided to push a 2.2.3 release. It should be out (pip and bokeh channel) by Monday.
Bokeh 2.2.3 is released should be available on pip and bokeh conda channel to start
I verified that Bokeh 2.2.3 solves this issue. Thanks @bryevdv!
@jrbourbeau should we explicitly exclude 2.2.0-2.2.2 as dependencies?
Historically we've just asked users to upgrade their bokeh version when there's an issue like this. If it becomes a big enough pattern we can exclude 2.2.0-2.2.2 but I'd rather try the "please upgrade your bokeh version" route first
Closing as this issue has been resolved in recent bokeh releases. Thanks @jsignell @bryevdv!
Most helpful comment
FYI we have decided to push a 2.2.3 release. It should be out (pip and bokeh channel) by Monday.