cc @podlipensky
The following steps will recreate the issue against today's tb-nightly:
tf-nightly-2.0-preview and the current tb-nightlywget https://people.sc.fsu.edu/~jburkardt/data/ply/teapot.plybazel-bin/tensorboard/plugins/mesh/mesh_demo_v2 --mesh_path=teapot.ply --tag_name=mesh1tensorboard --logdir /tmp/mesh_demomesh1 appears as expected in TBbazel-bin/tensorboard/plugins/mesh/mesh_demo_v2 --mesh_path=teapot.ply --tag_name=mesh2The mesh visualizations fail to load because the tags request hits a 500 internal server error, due to the handler crashing on a KeyError here:
Traceback (most recent call last):
File "/usr/local/google/home/nickfelt/.tf-venvs/tf-nightly-2.0-preview-py2/lib/python2.7/site-packages/werkzeug/serving.py", line 270, in run_wsgi
execute(self.server.app)
File "/usr/local/google/home/nickfelt/.tf-venvs/tf-nightly-2.0-preview-py2/lib/python2.7/site-packages/werkzeug/serving.py", line 258, in execute
application_iter = app(environ, start_response)
File "/usr/local/google/home/nickfelt/.tf-venvs/tf-nightly-2.0-preview-py2/lib/python2.7/site-packages/tensorboard/backend/application.py", line 380, in __call__
return self.exact_routes[clean_path](environ, start_response)
File "/usr/local/google/home/nickfelt/.tf-venvs/tf-nightly-2.0-preview-py2/lib/python2.7/site-packages/werkzeug/wrappers.py", line 308, in application
resp = f(*args[:-2] + (request,))
File "/usr/local/google/home/nickfelt/.tf-venvs/tf-nightly-2.0-preview-py2/lib/python2.7/site-packages/tensorboard/plugins/mesh/mesh_plugin.py", line 107, in _serve_tags
tag = self._instance_tag_to_tag[(run, instance_tag)]
KeyError: ('.', u'mesh2_FACE')
The problem is that the mesh plugin permanently caches any non-empty result from PluginRunToTagsToContent() here:
https://github.com/tensorflow/tensorboard/blob/2b96c2a18da8cfe5f387cc07d03e8227706ca914/tensorboard/plugins/mesh/mesh_plugin.py#L56-L76
Later on in serve_tags() the logic calls PluginRunToTagsToContent() again and then prepare_metadata() but it's a no-op because we already have non-empty metadata cached for the mesh1 tag. And then when we index into the tag dict we get the KeyError.
Right now, the workaround is restarting TensorBoard, since on a fresh load it will correctly cache both tags.
I think a sufficient fix would just be to cache at the granularity of an individual (run, tag) pair; you could have a lookup helper to check the cache and populate it on a cache miss, instead of prepare_metadata(). That might also resolve the issue mentioned in
https://github.com/tensorflow/tensorboard/blob/2b96c2a18da8cfe5f387cc07d03e8227706ca914/tensorboard/plugins/mesh/mesh_plugin.py#L192-L195
Note that it's also best practice in general not to call into the multiplexer during the plugin construction (as is happening now via the prepare_metadata() call at the end of __init__()), since if this is slow it will delay startup for TensorBoard as a whole.
Hi
Is there any update regarding this issue?
Just to confirm, this is still happening in TB 2.0.0
Many thanks
Hello
Apologies for chasing you on this, but this is to report the same bug is present in Tensorboard 2.1.0.
Would it be possible to get some update on that issue?
It makes it difficult to use TB to track mesh values during training, as every time something is updated the whole TB session may need to be restarted, which also means reloading all the data (which takes time).
I believe this issue deserves a fairly high severity rating, as it strongly affects the usability of TB for its primary intended purpose (i.e., monitoring training).
All the best
Hi @mbahri—we don’t have an update on this issue, no. I’ll see if I can
take a look at it today and at least get back to you.
Took a quick look; some observations:
Replacing with a memoized lookup function works for two of the three
caches, but not so easily for _tag_to_instance_tag, which is more
like an inverted index where the others are forward indices.
We could use memoized lookup functions for the two “easy” caches,
and for _tag_to_instance_tag use a memoized lookup function that
calls PluginRunToTagToContent to determine the universe of active
tags, expanding its cache if there are new, never-before-seen tags.
Even with these fixes, the caches are still fundamentally broken
because they won’t handle deletions at all. We could work around
this for the two “easy” caches by validating that the tag still
exists at each access. We could do something similar for the
inverted index, revalidating all of its registered tags. This would
still be broken, because if a run is deleted and then re-created
with different data without making any requests to the mesh plugin
in the middle then the cache dirtying will not be detected.
This raises the question of whether it’s worth keeping the caches around
at all. We can substantially cheapen the full scans by performing them
at the accumulator level: i.e., _tag_to_instance_tags[(run, tag)] need
only look at mux.GetAccumulator(run).PluginTagToContent("mesh").
Hi @wchargin
Thank you for the quick reply and for having a first look at that issue!
I know you guys have a lot on your plate and that this plugin is only a part of the whole system, so I appreciate you taking the time.
Hopefully, a fix can be found. The mesh plugin is really handy for us researchers working with 3D data :+1:
Most helpful comment
Hi
Is there any update regarding this issue?
Just to confirm, this is still happening in TB 2.0.0
Many thanks