Seems that keras newer than 2.1.6 (like 2.2.0) cannot get profile data, 2.1.6 is just ok.
Both use same code:
with tf.contrib.tfprof.ProfileContext('/home/abc/train_dir',
trace_steps=range(10, 20), dump_steps=[22]) as pctx:
The model I use is MaskRCNN, which contains keras code.
Then I found that commit 24e7ac0756aeba06ce1cf7f7e72219a11b14b101 leads to the issue.
And I have generated a temporary patch by reverting part of the commit, which is attached:
keras_profile.txt
It 's then available.
Any official fix?
Thanks!
Pan, this won't be any help with your problem, but I was wondering, how are you using the ProfileContext with keras? If I have a large block of code, and know that keras training with tf backend occurs somewhere in there, can I simply put the whole block into a ProfileContext and expect it to kick out the profile information? I am seeing a similar thing with no output but am likely doing something wrong.
Yes, your method is right. Put the things like training or fit_generator under "with...pctx" code block:
with tf.contrib.tfprof.ProfileContext('/home/abc/train_dir',
trace_steps=range(10, 20), dump_steps=[22]) as pctx:
self.keras_model.fit_generator(
train_generator,
......
Could you have a look at keras 2.1.6? It should have output.
@carrondt Any updates?
@pandaoxin I have just run your code statement on my computer with keras 2.1.6 and got output as you said. Thank you for making this ticket and providing feedback.
I wasn't seeing output when I didn't have 'dump_steps' and 'trace_steps' set, but now I do.
@carrondt @pandaoxin Some files are now generated. One more observation is,
with tf.contrib.tfprof.ProfileContext( profile_dir, trace_steps, dump_steps, enabled=True) as pctx:
# Run online profiling with 'op' view and 'opts' options at certain steps.
pctx.add_auto_profiling('op', opts, dump_steps)
...
profile_dir should be created / present before invoking above lines of code.
How do I visualize these binary files? Loading them in _chrome://tracing_ doesn't work.
@carrondt @Nithanaroy
In tensorflow source dir, use:
bazel build --config opt tensorflow/core/profiler:profiler
After building the profile tool, use:
bazel-bin/tensorflow/core/profiler/profiler --profile_path=xxx
Profile_path is the file generated, and see the result.
Thank you @pandaoxin. I generated the profiles using
with tf.contrib.tfprof.ProfileContext( profile_dir, trace_steps, dump_steps, enabled=True) as pctx:
# Run online profiling with 'op' view and 'opts' options at certain steps.
pctx.add_auto_profiling('op', opts, dump_steps)
which generated two binary files. My end goal of this exercise is to identify how much of total execution is spent in CPU / GPU versus IO. I cant seem to find any way to get these numbers in the tf profiler READMEs.
@Nithanaroy Do you mean time duration per iteration?
No total time for all Ops. But you can get it by dividing first op's time by its ratio to make an estimation.
It's more precise to modify the model script to print it.
The total time for each iteration I am already calculating using Python. I want to know it's breakdown.
Say if total time for 10 iterations = 5min
I want to know how much of those 5min is spent by CPU and how much on IO for gathering data from disk.
For the above example, 2min for CPU and 4min for IO. 4+2 > 5 is due to parallel multiple professors.
I think '~/tensorflow/bazel-bin/tensorflow/core/profiler/profiler --profile_path=profile_generated' and 'tfprof> op -select micros,bytes,occurrence -order_by micros -max_depth 20' will give you some information.
@carrondt Any updates?
Thank you @pandaoxin. This is exactly what I'm looking for.
However, It prints the results to stdout and when I try to output to a timeline to visualize these number on chrome://tracing, it says this. Is there a way to visualize these numbers? To explain better, I'm looking for Step-time graph as shown here
tfprof> op -select micros,bytes,occurrence -order_by micros -max_depth 20 -output timeline:outfile=memory_profiles_chart.json
op view doesn't support timeline yet. Consider graph/scope/code view.
Also the output has node_name which I did not create directly but got created by tf.keras. How to get numbers at layer level?
node name | requested bytes | total execution time | accelerator execution time | cpu execution time | op occurrence (run|defined)
ReadVariableOp 0B (0.00%, 0.00%), 42us (100.00%, 63.64%), 0us (0.00%, 0.00%), 42us (100.00%, 63.64%), 6|1174
VarHandleOp 0B (0.00%, 0.00%), 14us (36.36%, 21.21%), 0us (0.00%, 0.00%), 14us (36.36%, 21.21%), 6|268
_retval_batch_normalization_1_2/moving_variance/Read/ReadVariableOp_0_3 0B (0.00%, 0.00%), 2us (15.15%, 3.03%), 0us (0.00%, 0.00%), 2us (15.15%, 3.03%), 1|1
_retval_batch_normalization_1_2/beta/Read/ReadVariableOp_0_0 0B (0.00%, 0.00%), 2us (12.12%, 3.03%), 0us (0.00%, 0.00%), 2us (12.12%, 3.03%), 1|1
And does total execution time - cpu execution time - accelerator execution time give the time spent for fetching data into memory?
Could this be related to #11106 ?
@Nithanaroy You can reference to:
https://www.tensorflow.org/api_docs/python/tf/profiler/ProfileOptionBuilder
To add timeline(with_timeline_output).
Accelerator execution time is about GPU's time.
Thanks @pandaoxin
I still do not see the timeline output with this setup :(
opts = (tf.profiler.ProfileOptionBuilder(tf.profiler.ProfileOptionBuilder.time_and_memory())
.with_timeline_output("./data/time-memory-profile-timeline.json")
.build()
)
with tf.contrib.tfprof.ProfileContext( profile_dir, trace_steps=trace_steps, dump_steps=dump_steps, enabled=True ) as pctx:
# Run online profiling with 'op' view and 'opts' options at certain steps.
pctx.add_auto_profiling('op', opts, dump_steps)
Even after mentioning the with_timeline_output during options building phase, it only generates the binary files in the profile_dir folder.
Just to reset the context, I'm trying to find how much time was spent on IO, how much on CPU and how much on a GPU during the training process.
Hi @Nithanaroy , I'm not too familiar with this timeline setup. So I may try your workround when I have free time~
Hi @Nithanaroy , I have tried your workload. Please replace 'op' with 'graph' in add_auto_profiling.
@carrondt @Nithanaroy I found that when compiling the model, if 'run_metadata==tf.RunMetadata()' is added, both Keras 2.2.0+ and 2.1.6 could generate profile. Related PR
@pandaoxin I changed to graph and I can now see some operations done on each thread. However the results do not quite make sense. The training ran for 15 minutes but the profile information reported is in the order of micro seconds. There are a total of 118 steps per epoch of training.
I'm using these settings to generate the binary profile files,
trace_steps[:100] : [5, 11, 12, 37, 41, 43, 45, 54, 73, 86, 95]
dump_steps[:100] : [5, 12, 41, 45, 73, 95]
And here is the code
opts4 = (tf.profiler.ProfileOptionBuilder(tf.profiler.ProfileOptionBuilder.time_and_memory())
.with_timeline_output("./data/time-memory-profile-timeline.json")
.build()
)
with tf.contrib.tfprof.ProfileContext( profile_dir, trace_steps=trace_steps, dump_steps=dump_steps, enabled=True ) as pctx:
# Run online profiling with 'op' view and 'opts' options at certain steps.
pctx.add_auto_profiling('graph', opts4, dump_steps)
h = model.fit(
training_set.make_one_shot_iterator(),
steps_per_epoch=TOTAL_TRAIN_RECORDS // _BATCH_SIZE,
epochs=2)
Here are some questions,
opts4, i.e. time-memory-profile-timeline.json files empty when loaded to chrome://tracing?tfprof command line. Why do we see only few operators (batch norm and Adam only) in trace?[5, 12, 41, 45, 73, 95] but I only see 5 and 12. Why?Sorry for asking too many questions. I'm just too confused with tfprof.
Hi @pandaoxin ,
Can the pctx context block trick be applied to predict( ) as well? I wanted to profile keras inference calls.