Pytorch-lightning: Profiling

Created on 26 Sep 2019 · 6Comments · Source: PyTorchLightning/pytorch-lightning

Is your feature request related to a problem? Please describe.
I realize my testing loop is not efficient at all. I need to understand where is the bottleneck and how I can make it faster.

Describe the solution you'd like
An option similar to fast_dev_run where the training OR validation OR testing loop is profiled, in order to see where is the bottleneck.

Describe alternatives you've considered
No alternative so far.

enhancement help wanted

Source

astariul-colanim

All 6 comments

@Colanim great suggestion. I know the PyTorch team has been thinking about something like this, maybe @ezyang, @soumith have some suggestions? I'm hesitant to add something lightning specific for this, might be more appropriate inside PyTorch.

williamFalcon on 26 Sep 2019

👍1

how about torch.utils.bottleneck: https://pytorch.org/docs/stable/bottleneck.html?highlight=bottleneck

soumith on 26 Sep 2019

❤1

awesome. we'll point to this in our docs as well so people know it's available.

williamFalcon on 26 Sep 2019

For future reference.

I couldn't use torch.utils.bottleneck, it gave me OOM error...

I ended up using this SO answer :

import cProfile

def profileit(func):
    def wrapper(*args, **kwargs):
        datafn = func.__name__ + ".profile" # Name the data file sensibly
        prof = cProfile.Profile()
        retval = prof.runcall(func, *args, **kwargs)
        prof.dump_stats(datafn)
        return retval

    return wrapper

@profileit
def function_you_want_to_profile(...)
    ...

So I could just do :

@profileit
def test_step(self, batch, batch_nb, dataloader_nb=None):

And then I visualized it using snakeviz :

snakeviz test_step.profile

which give some neat visualization :

astariul-colanim on 27 Sep 2019

@ian-13 @jeffling sounds like you guys were thinking about contributing something like this? have you tried torch.utils.bottleneck: https://pytorch.org/docs/stable/bottleneck.html?highlight=bottleneck?

williamFalcon on 27 Sep 2019

@williamFalcon We've tried torch.utils.bottleneck for deep profiling. But like @Colanim we have a similar method that does the same sort of thing and found it a bit more stable to use. With that we also can force Cuda syncs to get GPU-bound timings.

We also have something lighter weight (not using CProfile) so we can always have it on for all our runs. We use it for our testing framework to catch any speed regressions. It's a simple tool that we could contribute if that's within the scope of this framework :)