Keras: Tensorflow with keras >= 2.2.0 cannot get profiling data

Created on 10 Aug 2018 · 20Comments · Source: keras-team/keras

Seems that keras newer than 2.1.6 (like 2.2.0) cannot get profile data, 2.1.6 is just ok.
Both use same code:
with tf.contrib.tfprof.ProfileContext('/home/abc/train_dir',
trace_steps=range(10, 20), dump_steps=[22]) as pctx:

The model I use is MaskRCNN, which contains keras code.
Then I found that commit 24e7ac0756aeba06ce1cf7f7e72219a11b14b101 leads to the issue.
And I have generated a temporary patch by reverting part of the commit, which is attached:
keras_profile.txt
It 's then available.

Any official fix?
Thanks!

Source

pandaoxin

All 20 comments

Pan, this won't be any help with your problem, but I was wondering, how are you using the ProfileContext with keras? If I have a large block of code, and know that keras training with tf backend occurs somewhere in there, can I simply put the whole block into a ProfileContext and expect it to kick out the profile information? I am seeing a similar thing with no output but am likely doing something wrong.

carrondt on 30 Aug 2018

Yes, your method is right. Put the things like training or fit_generator under "with...pctx" code block:
with tf.contrib.tfprof.ProfileContext('/home/abc/train_dir',
trace_steps=range(10, 20), dump_steps=[22]) as pctx:
self.keras_model.fit_generator(
train_generator,
......

Could you have a look at keras 2.1.6? It should have output.

pandaoxin on 31 Aug 2018

👍1

@carrondt Any updates?

pandaoxin on 7 Sep 2018

@pandaoxin I have just run your code statement on my computer with keras 2.1.6 and got output as you said. Thank you for making this ticket and providing feedback.

I wasn't seeing output when I didn't have 'dump_steps' and 'trace_steps' set, but now I do.

carrondt on 10 Sep 2018

@carrondt @pandaoxin Some files are now generated. One more observation is,

with tf.contrib.tfprof.ProfileContext( profile_dir, trace_steps, dump_steps, enabled=True) as pctx:
    # Run online profiling with 'op' view and 'opts' options at certain steps.
    pctx.add_auto_profiling('op', opts, dump_steps)
   ...

profile_dir should be created / present before invoking above lines of code.

How do I visualize these binary files? Loading them in _chrome://tracing_ doesn't work.

Nithanaroy on 13 Sep 2018

@carrondt @Nithanaroy
In tensorflow source dir, use:
bazel build --config opt tensorflow/core/profiler:profiler
After building the profile tool, use:
bazel-bin/tensorflow/core/profiler/profiler --profile_path=xxx
Profile_path is the file generated, and see the result.

pandaoxin on 13 Sep 2018

👍1

Thank you @pandaoxin. I generated the profiles using

with tf.contrib.tfprof.ProfileContext( profile_dir, trace_steps, dump_steps, enabled=True) as pctx:
    # Run online profiling with 'op' view and 'opts' options at certain steps.
    pctx.add_auto_profiling('op', opts, dump_steps)

which generated two binary files. My end goal of this exercise is to identify how much of total execution is spent in CPU / GPU versus IO. I cant seem to find any way to get these numbers in the tf profiler READMEs.

Nithanaroy on 13 Sep 2018

@Nithanaroy Do you mean time duration per iteration?
No total time for all Ops. But you can get it by dividing first op's time by its ratio to make an estimation.
It's more precise to modify the model script to print it.

pandaoxin on 13 Sep 2018

The total time for each iteration I am already calculating using Python. I want to know it's breakdown.
Say if total time for 10 iterations = 5min
I want to know how much of those 5min is spent by CPU and how much on IO for gathering data from disk.

For the above example, 2min for CPU and 4min for IO. 4+2 > 5 is due to parallel multiple professors.

Nithanaroy on 13 Sep 2018

I think '~/tensorflow/bazel-bin/tensorflow/core/profiler/profiler --profile_path=profile_generated' and 'tfprof> op -select micros,bytes,occurrence -order_by micros -max_depth 20' will give you some information.

pandaoxin on 13 Sep 2018

@carrondt Any updates?

pandaoxin on 14 Sep 2018

Thank you @pandaoxin. This is exactly what I'm looking for.
However, It prints the results to stdout and when I try to output to a timeline to visualize these number on chrome://tracing, it says this. Is there a way to visualize these numbers? To explain better, I'm looking for Step-time graph as shown here

tfprof> op -select micros,bytes,occurrence -order_by micros -max_depth 20 -output timeline:outfile=memory_profiles_chart.json
op view doesn't support timeline yet. Consider graph/scope/code view.

Also the output has node_name which I did not create directly but got created by tf.keras. How to get numbers at layer level?

node name | requested bytes | total execution time | accelerator execution time | cpu execution time | op occurrence (run|defined)
ReadVariableOp                                                          0B (0.00%, 0.00%),         42us (100.00%, 63.64%),             0us (0.00%, 0.00%),         42us (100.00%, 63.64%),     6|1174
VarHandleOp                                                             0B (0.00%, 0.00%),          14us (36.36%, 21.21%),             0us (0.00%, 0.00%),          14us (36.36%, 21.21%),      6|268
_retval_batch_normalization_1_2/moving_variance/Read/ReadVariableOp_0_3 0B (0.00%, 0.00%),            2us (15.15%, 3.03%),             0us (0.00%, 0.00%),            2us (15.15%, 3.03%),        1|1
_retval_batch_normalization_1_2/beta/Read/ReadVariableOp_0_0            0B (0.00%, 0.00%),            2us (12.12%, 3.03%),             0us (0.00%, 0.00%),            2us (12.12%, 3.03%),        1|1

And does total execution time - cpu execution time - accelerator execution time give the time spent for fetching data into memory?

Nithanaroy on 14 Sep 2018

Could this be related to #11106 ?

gabrieldemarmiesse on 16 Sep 2018

@Nithanaroy You can reference to:
https://www.tensorflow.org/api_docs/python/tf/profiler/ProfileOptionBuilder
To add timeline(with_timeline_output).

Accelerator execution time is about GPU's time.

pandaoxin on 17 Sep 2018

Thanks @pandaoxin
I still do not see the timeline output with this setup :(

opts = (tf.profiler.ProfileOptionBuilder(tf.profiler.ProfileOptionBuilder.time_and_memory())
                    .with_timeline_output("./data/time-memory-profile-timeline.json")
                    .build()
               )

with tf.contrib.tfprof.ProfileContext( profile_dir, trace_steps=trace_steps, dump_steps=dump_steps, enabled=True ) as pctx:
    # Run online profiling with 'op' view and 'opts' options at certain steps.
    pctx.add_auto_profiling('op', opts, dump_steps)

Even after mentioning the with_timeline_output during options building phase, it only generates the binary files in the profile_dir folder.

Just to reset the context, I'm trying to find how much time was spent on IO, how much on CPU and how much on a GPU during the training process.

Nithanaroy on 18 Sep 2018

Hi @Nithanaroy , I'm not too familiar with this timeline setup. So I may try your workround when I have free time~

pandaoxin on 20 Sep 2018

Hi @Nithanaroy , I have tried your workload. Please replace 'op' with 'graph' in add_auto_profiling.

pandaoxin on 26 Sep 2018

👍1

@carrondt @Nithanaroy I found that when compiling the model, if 'run_metadata==tf.RunMetadata()' is added, both Keras 2.2.0+ and 2.1.6 could generate profile. Related PR

pandaoxin on 26 Sep 2018

@pandaoxin I changed to graph and I can now see some operations done on each thread. However the results do not quite make sense. The training ran for 15 minutes but the profile information reported is in the order of micro seconds. There are a total of 118 steps per epoch of training.

I'm using these settings to generate the binary profile files,

trace_steps[:100]   : [5, 11, 12, 37, 41, 43, 45, 54, 73, 86, 95]
dump_steps[:100]    : [5, 12, 41, 45, 73, 95]

And here is the code

opts4 = (tf.profiler.ProfileOptionBuilder(tf.profiler.ProfileOptionBuilder.time_and_memory())
                    .with_timeline_output("./data/time-memory-profile-timeline.json")
                    .build()
               )

with tf.contrib.tfprof.ProfileContext( profile_dir, trace_steps=trace_steps, dump_steps=dump_steps, enabled=True ) as pctx:
    # Run online profiling with 'op' view and 'opts' options at certain steps.
    pctx.add_auto_profiling('graph', opts4, dump_steps)
     h = model.fit(
                training_set.make_one_shot_iterator(),
                steps_per_epoch=TOTAL_TRAIN_RECORDS // _BATCH_SIZE,
                epochs=2)

Here are some questions,

Why is the output generated by opts4, i.e. time-memory-profile-timeline.json files empty when loaded to chrome://tracing?
Ignoring the above point, I loaded the binaries using tfprof command line. Why do we see only few operators (batch norm and Adam only) in trace?
Why don't the times match up with the total time taken for training?
We asked to dump at [5, 12, 41, 45, 73, 95] but I only see 5 and 12. Why?

Sorry for asking too many questions. I'm just too confused with tfprof.

Nithanaroy on 3 Oct 2018

Hi @pandaoxin ,

Can the pctx context block trick be applied to predict( ) as well? I wanted to profile keras inference calls.