Ignite: Improve or New time profiling utility

Created on 3 Oct 2020 · 10Comments · Source: pytorch/ignite

🚀 Feature

Currently, ignite provides only a basic time profiling tool : https://pytorch.org/ignite/master/contrib/handlers.html#ignite.contrib.handlers.time_profilers.BasicTimeProfiler

Which can print something like

----------------------------------------------------
| Time profiling stats (in seconds):                 |
 ----------------------------------------------------
total  |  min/index  |  max/index  |  mean  |  std

Processing function:
157.46292 | 0.01452/1501 | 0.26905/0 | 0.07730 | 0.01258

Dataflow:
6.11384 | 0.00008/1935 | 0.28461/1551 | 0.00300 | 0.02693

Event handlers:
2.82721

- Events.STARTED: []
0.00000

- Events.EPOCH_STARTED: []
0.00006 | 0.00000/0 | 0.00000/17 | 0.00000 | 0.00000

- Events.ITERATION_STARTED: ['PiecewiseLinear']
0.03482 | 0.00001/188 | 0.00018/679 | 0.00002 | 0.00001

- Events.ITERATION_COMPLETED: ['TerminateOnNan']
0.20037 | 0.00006/866 | 0.00089/1943 | 0.00010 | 0.00003

- Events.EPOCH_COMPLETED: ['empty_cuda_cache', 'training.<locals>.log_elapsed_time', ]
2.57860 | 0.11529/0 | 0.14977/13 | 0.12893 | 0.00790

- Events.COMPLETED: []
not yet triggered

What would be nice to have:

more compact representation (maybe inspire from https://pytorch.org/docs/stable/autograd.html#torch.autograd.profiler.profile)
profile handlers and not events. As we are looking for bottlenecks it would be helpful to know which functions take time and not on which event.
better representation of data prep time vs training time

enhancement help wanted

Source

vfdev-5

👍1

All 10 comments

Hi @vfdev-5, I would love to work on this. Could you please assign it to me?

harsh8398 on 5 Oct 2020

@harsh8398 sure ! Thanks !

vfdev-5 on 5 Oct 2020

👍1

Hello @vfdev-5, Sorry for lagging on this. So, I have some questions:

profile handlers and not events. As we are looking for bottlenecks it would be helpful to know which functions take time and not on which event.

We want to add profiling for event handlers attached to engine/events instead of events? Essentially this would be break-down of time taken by handlers attached to events if I understand correctly.

better representation of data prep time vs training time

Is this something we can achieve based on events? For example, aggregate diff between GET_BATCH_STARTED and GET_BATCH_COMPLETED?

harsh8398 on 17 Oct 2020

Hi @harsh8398

We want to add profiling for event handlers attached to engine/events instead of events? Essentially this would be break-down of time taken by handlers attached to events if I understand correctly.

Yes, something like that. From the user's point of view, showing

- Events.EPOCH_COMPLETED: ['empty_cuda_cache', 'training.<locals>.log_elapsed_time', ]
2.57860 | 0.11529/0 | 0.14977/13 | 0.12893 | 0.00790

that handlers like empty_cuda_cache and training.<locals>.log_elapsed_time have taken together 0.12893 seconds on average does not tell who takes that time. As we can have one slow and another fast.
Naively, i'd list all handlers for all events and measure time for the handlers and display a table with handlers and time. Maybe something like that (but properly aligned):

-------
| name | total (s) | mean (s) | std(s) |
empty_cuda_cache (EPOCH_COMPLETED)                                |   2.57860 | 0.12893 | 0.00790
training.<locals>.log_elapsed_time (EPOCH_COMPLETED)      |  2.57860 |  0.12893 | 0.00790
-------

Is this something we can achieve based on events? For example, aggregate diff between GET_BATCH_STARTED and GET_BATCH_COMPLETED?

Either like that or keep the previous way to measure. Pick the one you think the simplest.

vfdev-5 on 17 Oct 2020

👍1

I'm done with profiler implementation. The output looks as follows:

You can review the changes and suggest if any modifications required but there's still some TODOs left before raising the PR. Which are as follows:

separate rows for Total, Dataflow and Processing stats
type hints
update write_results() as per new profiler stats format
update tests

I have one question, should we include handler times for GET_BATCH_STARTED and GET_BATCH_COMPLETED for calculating dataflow time? Previous implementation had this but I'm not sure if it was intentional.

harsh8398 on 20 Oct 2020

👍1

@harsh8398 table looks very cool 👍 Thanks a lot for working on that !
If you'd like to send a draft, please feel free to do that.

I'm just thinking about BC compatibility and maybe it would be better to keep 2 classes and name your version as TimeProfiler or HandlersTimeProfiler or another better name.

I have one question, should we include handler times for GET_BATCH_STARTED and GET_BATCH_COMPLETED for calculating dataflow time? Previous implementation had this but I'm not sure if it was intentional.

Probably, it would be cool ! And even (in another PR) if we could also handle customly added events and measure the time in attached handlers.

vfdev-5 on 20 Oct 2020

@vfdev-5 Okay then for BC it seems the best option is to add a new class as HandlersTimeProfiler. I'll do that.

For handling custom events, its a bit tricky because like epoch and iteration event handlers we don't know how many times they will get triggered. We can use something like this (https://discuss.pytorch.org/t/dynamically-extend-the-tensor/39553/2) for dynamically extending tensor whenever the handler is called. But I think it would get called so many times and it may consume time.

harsh8398 on 21 Oct 2020

@vfdev-5 Okay then for BC it seems the best option is to add a new class as HandlersTimeProfiler. I'll do that.

@harsh8398 sounds good !

For handling custom events, its a bit tricky because like epoch and iteration event handlers we don't know how many times they will get triggered. We can use something like this (https://discuss.pytorch.org/t/dynamically-extend-the-tensor/39553/2) for dynamically extending tensor whenever the handler is called. But I think it would get called so many times and it may consume time.

Let's see this a bit later. I think handling custom events should not be too different from original events. Engine is aware of all events it could trigger and which handlers...
If you could open a follow-up issue on that, it would be perfect :)

vfdev-5 on 21 Oct 2020

👍1

@vfdev-5 raised the draft PR (https://github.com/pytorch/ignite/pull/1398). Let me know if any suggestions or changes needed.

harsh8398 on 23 Oct 2020

@harsh8398 thanks, I'll try to take a look today.

vfdev-5 on 23 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings