Tensorboard: How to use debugger plugin to set up communication with TF debugger by gRPC?

Created on 29 Aug 2017  Â·  6Comments  Â·  Source: tensorflow/tensorboard

Is there a way to use debugger plugin to communicate (2way?) with TF debugger by gRPC, for now?
If yes, could you post some example code or guide here? Thanks.

debugger

All 6 comments

BTW, I ran the test script session_debug_test.py in tensorboard/plugins/debugger, but it looks something wrong(maybe ran in a wrong way), as below:

$ python session_debug_test.py
..session_debug_test.py:200: RuntimeWarning: divide by zero encountered in true_divide
  "u:0:DebugNumericSummary": [x_init_val / y_init_val],
session_debug_test.py:202: RuntimeWarning: divide by zero encountered in true_divide
  np.matmul(z_init_val, x_init_val / y_init_val)
session_debug_test.py:112: RuntimeWarning: invalid value encountered in less
  np.sum(np.logical_and(x < 0.0, x != -np.inf)),
session_debug_test.py:114: RuntimeWarning: invalid value encountered in greater
  np.sum(np.logical_and(x > 0.0, x != np.inf)),
/Users/sunkai/work/workspace/project/tf-env/lib/python2.7/site-packages/tensorflow/python/framework/test_util.py:776: RuntimeWarning: invalid value encountered in subtract
  np.abs(a - b) > atol + rtol * np.abs(b), np.isnan(a) != np.isnan(b))
/Users/sunkai/work/workspace/project/tf-env/lib/python2.7/site-packages/tensorflow/python/framework/test_util.py:776: RuntimeWarning: invalid value encountered in greater
  np.abs(a - b) > atol + rtol * np.abs(b), np.isnan(a) != np.isnan(b))
not close where =  (array([], dtype=int64),)
not close lhs =  []
not close rhs =  []
not close dif =  []
not close tol =  []
dtype = float64, shape = (16,)
..
----------------------------------------------------------------------
Ran 4 tests in 4.060s

OK

@caisq

( @luchensk: I edited your second post to put code formatting around the code that you pasted. For future posts, you can do this by placing three backticks—```—on their own line before and after the code block.)

@luchensk As you can see, in the bottom, the test passed, so everything is fine actually. The things that look like errors are intentional. We create some infinities caused by division-by-zero on purpose in this test to test the infinity- and NaN-alerting functionalities of the still-nascent tensorboard debugger plugin, which uses grpc_debug_server.py (in tensorflow) for two-way communication.

We haven't publicized the new plugin yet, because we are still working on more features in it. For now, the two-way communication is basically working, with the caveat that the tensorboard-to-tensorflow (i.e., grpc-server-to-grpc-client) direction of data flow doesn't do much. We'll check in more documents of the communication protocol, but for now, the examples in the session_debug_test.py you are looking at, along with these two tests in tensorflow are the best examples to look at:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/debug/lib/session_debug_grpc_test.py
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/debug/lib/dist_session_debug_grpc_test.py

Note things may be subject to change!

@caisq Thanks for the help, and I will try it later.

@caisq Based on session_debug_grpc_test.py, I used grpc_wrapper.GrpcDebugWrapperSession to try to send debugger data to tensorboard, but it looks that there is no debugger info to show, such as tensor data on local cli mode by LocalCLIDebugWrapperSession, except health pill on 'GRAPHS' tab.

I post the key part of my script for gRPC debugger below, and it also slows down the speed of learning on running the script.

:(some other code)

with tf.name_scope("loss"):
    loss = tf.reduce_mean(tf.reduce_sum(tf.square(ys - lpred),reduction_indices=[1]),name='loss')
    base_summary.f_summary_add_to_collection(loss)

with tf.name_scope("train"):
    model_train = tf.train.AdamOptimizer(0.0001).minimize(loss)

init=tf.global_variables_initializer()
sess = session.Session(config=no_rewrite_session_config())
sess.run(init)

:(some other code)

debug_url = "127.0.0.1:8008"
sess = grpc_wrapper.GrpcDebugWrapperSession(sess, debug_url)

for i in range(arg_maxtrain):
    vout=sess.run(model_train, feed_dict={xs: x_data, ys: y_data})

    if i % arg_nsummary == 0:
        if arg_ifsummary:
            summary,voutloss=sess.run([summary_merged,loss], feed_dict={xs: x_data, ys: y_data})
            summary_writer.add_summary(summary,i)
        print "step:",i, "loss:",voutloss
        saver.save(sess, path_save+"save_%d"%(i))

:(some other code)

For now, could we see some more debugger info by gRPC on tensorborad, just like local CLI mode? Please also correct me if something wrong in my script, thanks.

After I changed the line to sess = grpc_wrapper.GrpcDebugWrapperSession(sess, debug_url,watch_fn=watch_fn) by adding watch_fn with debug_ops=["DebugNumericSummary"], there were more info showed into events.debugger file generated by tensorboard, but it looks that tensorboard just displayed the counts of inf/nan etc. without the detail info of tensor data, such as input and output of node on GRAPH tab(you can see the detail info by calling health_pills API ).

@caisq do we have some plan to display the detail info of tensor data from health_pills API on tensorboard? Thanks.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

yegortokmakov picture yegortokmakov  Â·  3Comments

wengqi123 picture wengqi123  Â·  3Comments

dniku picture dniku  Â·  3Comments

OverLordGoldDragon picture OverLordGoldDragon  Â·  3Comments

ectg picture ectg  Â·  3Comments