Tensorboard: TensorBoard requires GPU memory to run

Created on 26 Apr 2018  路  4Comments  路  Source: tensorflow/tensorboard

  • TensorBoard version: 1.8.0a20180416
  • TensorFlow version: nightly-1.8.0a20180416
  • OS: Ubuntu 17.10
  • Python version: 3.6

I am receiving the below error when trying to run TensorBoard with a TensorFlow instance training a model and consuming all available GPU memory.

2018-04-26 13:45:55.803143: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-04-26 13:45:55.960363: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-04-26 13:45:55.960654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.835
pciBusID: 0000:01:00.0
totalMemory: 5.93GiB freeMemory: 7.94MiB
2018-04-26 13:45:55.960668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-04-26 13:45:55.995201: E tensorflow/core/common_runtime/direct_session.cc:154] Internal: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory
Traceback (most recent call last):
  File "/home/kyle/Code/ML/env/bin/tensorboard", line 11, in <module>
    sys.exit(run_main())
  File "/home/kyle/Code/ML/env/lib/python3.6/site-packages/tensorboard/main.py", line 36, in run_main
    tf.app.run(main)
  File "/home/kyle/Code/ML/env/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "/home/kyle/Code/ML/env/lib/python3.6/site-packages/tensorboard/main.py", line 45, in main
    default.get_assets_zip_provider())
  File "/home/kyle/Code/ML/env/lib/python3.6/site-packages/tensorboard/program.py", line 171, in main
    tb = create_tb_app(plugins, assets_zip_provider)
  File "/home/kyle/Code/ML/env/lib/python3.6/site-packages/tensorboard/program.py", line 207, in create_tb_app
    flags=FLAGS)
  File "/home/kyle/Code/ML/env/lib/python3.6/site-packages/tensorboard/backend/application.py", line 131, in standard_tensorboard_wsgi
    plugin_instances = [constructor(context) for constructor in plugins]
  File "/home/kyle/Code/ML/env/lib/python3.6/site-packages/tensorboard/backend/application.py", line 131, in <listcomp>
    plugin_instances = [constructor(context) for constructor in plugins]
  File "/home/kyle/Code/ML/env/lib/python3.6/site-packages/tensorboard/plugins/beholder/beholder_plugin.py", line 47, in __init__
    self.most_recent_frame = im_util.get_image_relative_to_script('no-data.png')
  File "/home/kyle/Code/ML/env/lib/python3.6/site-packages/tensorboard/plugins/beholder/im_util.py", line 254, in get_image_relative_to_script
    return read_image(filename)
  File "/home/kyle/Code/ML/env/lib/python3.6/site-packages/tensorboard/plugins/beholder/im_util.py", line 242, in read_image
    return np.array(decode_png(image_file.read()))
  File "/home/kyle/Code/ML/env/lib/python3.6/site-packages/tensorboard/plugins/beholder/im_util.py", line 159, in __call__
    self._lazily_initialize()
  File "/home/kyle/Code/ML/env/lib/python3.6/site-packages/tensorboard/plugins/beholder/im_util.py", line 137, in _lazily_initialize
    self._session = tf.Session(graph=graph, config=config)
  File "/home/kyle/Code/ML/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1569, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/home/kyle/Code/ML/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 639, in __init__
    self._session = tf_session.TF_NewDeprecatedSession(opts, status)
  File "/home/kyle/Code/ML/env/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.

The error indicates that the reason for failure is a lack of GPU memory. This seems strange since TensorBoard should not need a GPU to process log files.

Most helpful comment

This will be fixed as soon as https://github.com/tensorflow/tensorboard/pull/1114 is submitted. Thank you for your patience.

All 4 comments

This will be fixed as soon as https://github.com/tensorflow/tensorboard/pull/1114 is submitted. Thank you for your patience.

Marking as a duplicate of #1107 in the interim.

Same problem. I wonder is there a FLAG specifying GPU?

xxx@worker1:~$ tensorboard -helpfull

   USAGE: /usr/local/bin/tensorboard [flags]

flags:

tensorboard.plugins.debugger.debugger_plugin_loader:
--debugger_data_server_grpc_port: The port at which the non-interactive debugger data server should receive debugging data via gRPC from one or more debugger-enabled TensorFlow runtimes. No debugger plugin or debugger data server will be started if this flag is not provided. This flag differs from the --debugger_port flag
in that it starts a non-interactive mode. It is for use with the "health pills" feature of the Graph Dashboard. This flag is mutually exclusive with --debugger_port.
(default: '-1')
(an integer)
--debugger_port: The port at which the interactive debugger data server (to be started by the debugger plugin) should receive debugging data via gRPC from one or more debugger-enabled TensorFlow runtimes. No debugger plugin or debugger data server will be started if this flag is not provided. This flag differs from the
--debugger_data_server_grpc_port flag in that it starts an interactive mode that allows user to pause at selected nodes inside a TensorFlow Graph or between Session.runs. It is for use with the interactive Debugger Dashboard. This flag is mutually exclusive with --debugger_data_server_grpc_port.
(default: '-1')
(an integer)

tensorboard.program:
--db: [Experimental] Sets SQL database URI.

This mode causes TensorBoard to persist experiments to a SQL database. The
following databases are supported:

- sqlite: Use SQLite built in to Python. URI must specify the path of the
database file, which will be created if it doesn't exist. For example:
--db sqlite:~/.tensorboard.db

Warning: This feature is a work in progress and only has limited support.
(default: '')

--event_file: The particular event file to query for. Only used if --inspect is present and --logdir is not specified.
(default: '')
--host: What host to listen to. Defaults to serving on all interfaces, set to 127.0.0.1 (localhost) to disable remote access (also quiets security warnings).
(default: '')
--[no]inspect: Use this flag to print out a digest
of your event files to the command line, when no data is shown on TensorBoard or
the data shown looks weird.

Example usages:
tensorboard --inspect --event_file myevents.out
tensorboard --inspect --event_file myevents.out --tag loss
tensorboard --inspect --logdir mylogdir
tensorboard --inspect --logdir mylogdir --tag loss

See tensorflow/python/summary/event_file_inspector.py for more info and
detailed usage.
(default: 'false')

--logdir: logdir specifies the directory where
TensorBoard will look to find TensorFlow event files that it can display.
TensorBoard will recursively walk the directory structure rooted at logdir,
looking for .tfevents. files.

You may also pass a comma separated list of log directories, and TensorBoard
will watch each directory. You can also assign names to individual log
directories by putting a colon between the name and the path, as in

tensorboard --logdir name1:/path/to/logs/1,name2:/path/to/logs/2
(default: '')

--path_prefix: An optional, relative prefix to the path, e.g. "/path/to/tensorboard". resulting in the new base url being located at localhost:6006/path/to/tensorboard under default settings. A leading slash is required when specifying the path_prefix, however trailing slashes can be omitted. The path_prefix can be leveraged
for path based routing of an elb when the website base_url is not available e.g. "example.site.com/path/to/tensorboard/"
(default: '')
--port: What port to serve TensorBoard on.
(default: '6006')
(an integer)
--[no]purge_orphaned_data: Whether to purge data that may have been orphaned due to TensorBoard restarts. Disabling purge_orphaned_data can be used to debug data disappearance.
(default: 'true')
--reload_interval: How often the backend should load more data, in seconds. Set to 0 to load just once at startup and a negative number to never reload at all.
(default: '5')
(an integer)
--tag: The particular tag to query for. Only used if --inspect is present
(default: '')
--window_title: The title of the browser window.
(default: '')

tensorflow.python.platform.app:
-h,--[no]help: show this help
(default: 'false')
--[no]helpfull: show full help
(default: 'false')
--[no]helpshort: show this help
(default: 'false')

absl.flags:
--flagfile: Insert flag definitions from the given file into the command line.
(default: '')
--undefok: comma-separated list of flag names that it is okay to specify on the command line even if the program does not define a flag with that name. IMPORTANT: flags in this list that have arguments MUST use the --flag=value format.
(default: '')

@roadjiang We believe this is fixed in TensorBoard 1.9, is that what you're using? If not, could you try updating?

Was this page helpful?
0 / 5 - 0 ratings