Tensorboard: TensorBoard not automatically refreshing after 45 / 200 epochs in both Firefox and Google Chrome

Created on 8 Jul 2019  路  5Comments  路  Source: tensorflow/tensorboard

Environment information (required)

Please run diagnose_tensorboard.py (link below) in the same
environment from which you normally run TensorFlow/TensorBoard, and
paste the output here:


Diagnostics output

``````
--- check: autoidentify
INFO: diagnose_tensorboard.py version 393931f9685bd7e0f3898d7dcdf28819fef54c43

--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='0e3acfc06151', release='4.16.3-041603-generic', version='#201804190730 SMP Thu Apr 19 07:32:02 UTC 2018', machine='x86_64')
INFO: sys.getwindowsversion(): N/A

--- check: package_management
INFO: has conda-meta: True
INFO: $VIRTUAL_ENV: None

--- check: installed_packages
INFO: installed: tb-nightly==1.14.0a20190603
INFO: installed: tensorflow-gpu==2.0.0b0
INFO: installed: tf-estimator-nightly==1.14.0.dev2019060501

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '1.14.0a20190603'

--- check: tensorflow_python_version
INFO: tensorflow.__version__: '2.0.0-beta0'
INFO: tensorflow.__git_version__: 'v1.12.1-3259-gf59745a'

--- check: tensorboard_binary_path
INFO: which tensorboard: b'/opt/conda/bin/tensorboard\n'

--- check: readable_fqdn
INFO: socket.getfqdn(): '0e3acfc06151'

--- check: stat_tensorboardinfo
INFO: directory: /tmp/.tensorboard-info
INFO: .tensorboard-info directory does not exist

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/opt/conda/lib/python3.7/site-packages']; bad_roots (0): []

--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==0.7.1
asn1crypto==0.24.0
astor==0.8.0
attrs==19.1.0
backcall==0.1.0
bleach==3.1.0
certifi==2019.6.16
cffi==1.12.2
chardet==3.0.4
conda==4.7.5
conda-package-handling==1.3.10
cryptography==2.6.1
cycler==0.10.0
decorator==4.4.0
defusedxml==0.5.0
eli5==0.9.0
entrypoints==0.3
gast==0.2.2
google-pasta==0.1.7
graphviz==0.11.1
grpcio==1.22.0
h5py==2.9.0
idna==2.8
imageio==2.5.0
ipykernel==5.1.1
ipython==7.6.1
ipython-genutils==0.2.0
jedi==0.14.0
Jinja2==2.10.1
joblib==0.13.2
json5==0.8.4
jsonschema==3.0.1
jupyter-client==5.2.4
jupyter-core==4.4.0
jupyterlab==1.0.1
jupyterlab-server==1.0.0
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.1.0
libarchive-c==2.8
Markdown==3.1.1
MarkupSafe==1.1.1
matplotlib==3.1.1
mistune==0.8.4
nbconvert==5.5.0
nbformat==4.4.0
networkx==2.3
notebook==5.7.8
numpy==1.16.4
olefile==0.46
pandas==0.24.2
pandocfilters==1.4.2
parso==0.5.0
patsy==0.5.1
pexpect==4.7.0
pickleshare==0.7.5
Pillow==6.1.0
pip==19.0.3
prometheus-client==0.7.1
prompt-toolkit==2.0.9
protobuf==3.8.0
ptyprocess==0.6.0
PubChemPy==1.0.4
py2cytoscape==0.7.1
pycairo==1.18.0
pycosat==0.6.3
pycparser==2.19
pydot==1.4.1
pydotplus==2.0.2
Pygments==2.4.2
pyOpenSSL==19.0.0
pyparsing==2.4.0
pyrsistent==0.15.2
PySocks==1.6.8
python-dateutil==2.8.0
python-igraph==0.7.1.post7
pytz==2019.1
PyWavelets==1.0.3
pyzmq==18.0.2
requests==2.22.0
ruamel-yaml==0.15.46
scikit-image==0.15.0
scikit-learn==0.21.2
scipy==1.3.0
seaborn==0.9.0
Send2Trash==1.5.0
setuptools==41.0.0
singledispatch==3.4.0.3
six==1.12.0
src==0.1.0
statsmodels==0.10.0
tabulate==0.8.3
tb-nightly==1.14.0a20190603
tensorflow-gpu==2.0.0b0
termcolor==1.1.0
terminado==0.8.2
testpath==0.4.2
tf-estimator-nightly==1.14.0.dev2019060501
tornado==6.0.3
tqdm==4.32.2
traitlets==4.3.2
typing==3.6.4
urllib3==1.24.1
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.15.4
wheel==0.33.1
wrapt==1.11.2
``````

For browser-related issues, please additionally specify:

chrome: Version 74.0.3729.131 (Official Build) (64-bit)
Firefox: 67.0.4 (64-bit

Issue description

Posting for a colleague. Providing as much details as possible.
They have a tf.keras model with some log summaries. It logs successfully to either 11, 45, or 200 epochs, then stops automatically refreshing. Monitoring the http requests in the browser inspector shows that calls are made every 30 seconds to update, with status code 200. Response shows only the 11/45/200 epochs where it freezes. If the tensorboard process is killed and then restarted, then all the data appears.

This suggests to me that whatever function is reading the log files somehow stops being able to read new lines added?

They have tried all combinations of TF 1.14, 1.15 and 2.0 with TB 1.4 and 1.15.

TF2a0 and TB 1.14.0a20190301 seems to work...

When using TB 1.15 they see:

W0708 10:40:08.985069 140053402789632 plugin_event_accumulator.py:294] Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.
E0708 10:40:15.037355 140053402789632 directory_watcher.py:242] File /mnt/project/data/logs/model/test_4/train/events.out.tfevents.1562574945.e5513778500b.1.134.v2 updated even though the current file is /mnt/project/data/logs/model/test_4/train/events.out.tfevents.1562574948.e5513778500b.profile-empty

relevant issue https://github.com/tensorflow/tensorboard/pull/2342 already resolve. Would appreciate insights

backend awaiting response bug

Most helpful comment

Hi @SumNeuron鈥攖hanks for the report. You may be running into #2084.
Could you try adding profile_batch=0 to your Keras callback and seeing
if that fixes the problem?

All 5 comments

Hi @SumNeuron鈥攖hanks for the report. You may be running into #2084.
Could you try adding profile_batch=0 to your Keras callback and seeing
if that fixes the problem?

@wchargin first thanks for your time and insight, what exactly is profile_batch=0 I do not see it in the defs of keras callbacks

It鈥檚 a keyword argument to the keras.callbacks.TensorBoard callback:

https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/TensorBoard?hl=en#arguments

profile_batch: Profile the batch to sample compute characteristics.
By default, it will profile the second batch. Set profile_batch=0 to
disable profiling.

(This was introduced in TensorFlow 1.14, so you won鈥檛 see it in
documentation from earlier versions.)

ah, well @wchargin this seesm to resolve the issue

Glad to hear it, and thanks for posting back to let us know.

Was this page helpful?
0 / 5 - 0 ratings