Tensorboard: Graph visualization failed: GraphDefs failed to reconcile.

Created on 1 Mar 2019  路  11Comments  路  Source: tensorflow/tensorboard

In TensorFlow v2, below code can cause GraphDef reconciliation error.

@tf.function
def foo(x):
  return x ** 2

with writer.as_default():
  tf.summary.trace_on()
  foo(1)
  foo(2)
  tf.summary.trace_export("foo")

Depending on the argument, tf.function (really, auto-graph) creates ops that are unique within GraphDef but is not globally unique. In the example above, two GraphDefs (on from foo(1) and another from foo(2)) will be written out and they can collide badly in names and content.

In such case, instead of showing wrong graph content, TensorBoard throws an error.

graph

Most helpful comment

any update about this issue? It has been more than one year since the issue was put forward 馃槩

All 11 comments

I get the same error when I have multiple "@tf.function". I am working on a distributed learning project across multiple GPUs. I have one @tf.function for the train loop, and another for the test loop.

with strategy.scope():
    @tf.function
    def distributed_train_step(dataset_inputs):
          (...)
    @tf.function
    def distributed_test_step(dataset_inputs):
          (...)

    stamp = datetime.now().strftime("%Y%m%d-%H%M%S")
    logdir = 'logs/func/%s' % stamp
    writer = tf.summary.create_file_writer(logdir)
    tf.summary.trace_on(graph=True)

.
.
.
.

with writer.as_default():
    tf.summary.trace_export(name="my_func_trace",step=0)

How are you invoking your tf.functions? Is it writing to the same writer? If so, this is working as intended. Two tf.functions have graphdefs which may have the same node name but of different type/metadata.

@stephanwlee If we use multiple Gpus which have train_step and test_step tf.functions, how should we resolve this problem? I am facing the same problem which shows it has below errors.

Traceback (most recent call last):
  File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/tensorboard/plugins/graph/graph_util.py", line 118, in combine_graph_defs
    lambda n: n.name)
  File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/tensorboard/plugins/graph/graph_util.py", line 85, in _safe_copy_proto_list_values
    raise _SameKeyDiffContentError(key)
tensorboard.plugins.graph.graph_util._SameKeyDiffContentError: sparse_categorical_crossentropy/Shape

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/tensorboard/plugins/graph/graphs_plugin.py", line 225, in graph_route
    result = self.graph_impl(run, tag, is_conceptual, limit_attr_size, large_attrs_key)
  File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/tensorboard/plugins/graph/graphs_plugin.py", line 169, in graph_impl
    graph_util.combine_graph_defs(graph, func_graph.pre_optimization_graph)
  File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/tensorboard/plugins/graph/graph_util.py", line 124, in combine_graph_defs
    'but contents are different: %s') % exc)
ValueError: Cannot combine GraphDefs because nodes share a name but contents are different: sparse_categorical_crossentropy/Shape

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/werkzeug/serving.py", line 304, in run_wsgi
    execute(self.server.app)
  File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/werkzeug/serving.py", line 292, in execute
    application_iter = app(environ, start_response)
  File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/tensorboard/backend/application.py", line 164, in wrapper
    return wsgi_app(*args)
  File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/tensorboard/backend/application.py", line 419, in __call__
    return self.exact_routes[clean_path](environ, start_response)
  File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/werkzeug/wrappers/base_request.py", line 237, in application
    resp = f(*args[:-2] + (request,))
  File "/home/usr1/.virtualenvs/tensorflow-2.1/lib/python3.7/site-packages/tensorboard/plugins/graph/graphs_plugin.py", line 227, in graph_route
    return http_util.Respond(request, e.message, 'text/plain', code=400)
AttributeError: 'ValueError' object has no attribute 'message'
E1204 12:24:01.568600 140642150078208 directory_watcher.py:242] File model_dir/logs-new/20191204-122321/events.out.tfevents.1575429801.pc-01.36116.63.v2 updated even though the current file is model_dir/logs-new/20191204-122321/events.out.tfevents.1575429824.pc-01.profile-empty

Any update?

The error message AttributeError: 'ValueError' object has no attribute 'message' isn't very helpful. There's a bug in the error output:
https://github.com/tensorflow/tensorboard/blob/1780833b30d953509200bf9560be2ba42fabe9ff/tensorboard/plugins/graph/graphs_plugin.py#L323
should be:

return http_util.Respond(request, str(e), 'text/plain', code=400)

However, that only gets us a step closer. Running the original code, the actual error message (that Tensorboard _should_, but doesn't propagate to the UI) is: Cannot combine GraphDefs because nodes share a name but contents are different: Const

As @stephanwlee mentioned, this is a GraphDef naming collision.

I think the simplest fix around this would be to call trace_on/trace_export separately around each graph call. So do something like this:

import tensorflow as tf

writer = tf.summary.create_file_writer('ex_logs')

@tf.function
def foo(x):
    return x ** 2

with writer.as_default():
    tf.summary.trace_on()
    foo(1)
    tf.summary.trace_export("foo1", step=0)

with writer.as_default():
    tf.summary.trace_on()
    foo(2)
    tf.summary.trace_export("foo2", step=0)

Note that trace_export will also stop tracing (https://www.tensorflow.org/api_docs/python/tf/summary/trace_on?version=stable)

This ensures that each trace is separately tagged. This is a debugging tool for visualizing the network graph, and it makes sense that you'd want to profile just a single call of the graph. Tracing is something I'd imagine you wouldn't want to leave on while training, as profiling is expensive anyways.

This official tutorial in Colab returns an error when I choose keras or batch_2 tag:
image
Download PNG button doesn't work also:
image

same problem

I would suggest exporting them as different traces with different names. That seems to work for me.

Instead of this:

with writer.as_default():
  tf.summary.trace_on()
  foo(1)
  foo(2)
  tf.summary.trace_export("foo")

Do this:

with writer.as_default():
  tf.summary.trace_on()
  foo(1)
  tf.summary.trace_export("foo1")
  tf.summary.trace_on()
  foo(2)
  tf.summary.trace_export("foo2")

I can hardly recognize the location of the error in my code.

any update about this issue? It has been more than one year since the issue was put forward 馃槩

I had the same issue. Tensorboard needs unique names to be given to the graph variables (I don't why and I hope this issue will be fixed). In your case this piece of code should fix it:

import tensorflow as tf 

@tf.function
def foo(x):
  return x ** 2

writer=tf.summary.create_file_writer('logs\\')
with writer.as_default():
  tf.summary.trace_on()
  foo(tf.Variable(1, name='foo1')) # define a unique name for the variable
  foo(tf.Variable(2, name='foo2'))
  tf.summary.trace_export("foo", step=0)

This issue also exists when overriding tf.Module. Then, self.name_scope (or tf.name_scope) can be used when defining the module variables (wrapping the other operations or not). Here is an example of a custom Dense layer:

import tensorflow as tf 
import numpy as np

class Dense(tf.Module):
 #  Fully-connected layer.
 def __init__(self, out_fmaps, name=None):
  super().__init__(name=name)
  self.is_built = False
  self.out_fmaps = out_fmaps

def __call__(self, x):
 if not self.is_built:
  with self.name_scope: # Creates the variable under name_scope
   he_init = np.sqrt(2/x.shape[-1])
   init_val = tf.random.normal([x.shape[-1], self.out_fmaps])*he_init
   self.w = tf.Variable(init_val, name='dense')
  self.is_built = True
 return tf.matmul(x, self.w)
Was this page helpful?
0 / 5 - 0 ratings