Pytorch-lightning: monitor the data loading time

Created on 26 Jun 2020 · 6Comments · Source: PyTorchLightning/pytorch-lightning

I am running the imagenet example and found that there is no "data time" as in the original pytorch example.
How to monitor the data loading time?

question

Source

ruotianluo

Most helpful comment

In the profiler you get something like this:

Profiler Report

Action                  |  Mean duration (s)    |  Total time (s) 
-----------------------------------------------------------------
on_train_start          |  0.0              |  0.0            
on_epoch_start          |  0.0              |  0.0            
get_train_batch         |  0.5312           |  2.656          
on_batch_start          |  0.0              |  0.0            
model_forward           |  0.2186           |  1.093          
model_backward          |  0.2128           |  1.064          
on_after_backward       |  0.0              |  0.0            
optimizer_step          |  0.2124           |  1.062          
on_batch_end            |  0.0032           |  0.016          
on_epoch_end            |  0.0              |  0.0            
on_train_end            |  0.0              |  0.0

"get_train_batch" is the loading time of the training batch. If you don't want to wait for these stats for the end of training, just run one epoch with max_epochs=1 or a few steps with max_steps=10 by setting these flags in the Trainer.

awaelchli on 26 Jun 2020

👍2

All 6 comments

Hi! thanks for your contribution!, great first issue!

github-actions[bot] on 26 Jun 2020

Also, it's weird that one epoch becomes 5201; it should be 5005.

ruotianluo on 26 Jun 2020

I think the profiler can do that, if you turn it on in the Trainer.

Also, it's weird that one epoch becomes 5201; it should be 5005.

Are you referring to the progress bar? I think it is because it includes the validation set.

awaelchli on 26 Jun 2020

My understanding is the profiler only output the result after training.

Yes. The progress bar, thanks.

ruotianluo on 26 Jun 2020

In the profiler you get something like this:

Profiler Report

Action                  |  Mean duration (s)    |  Total time (s) 
-----------------------------------------------------------------
on_train_start          |  0.0              |  0.0            
on_epoch_start          |  0.0              |  0.0            
get_train_batch         |  0.5312           |  2.656          
on_batch_start          |  0.0              |  0.0            
model_forward           |  0.2186           |  1.093          
model_backward          |  0.2128           |  1.064          
on_after_backward       |  0.0              |  0.0            
optimizer_step          |  0.2124           |  1.062          
on_batch_end            |  0.0032           |  0.016          
on_epoch_end            |  0.0              |  0.0            
on_train_end            |  0.0              |  0.0

awaelchli on 26 Jun 2020

👍2

Btw, here is what I am doing right now:


    def training_step(self, data, batch_idx):
        ...
        data_time = self.trainer.profiler.recorded_durations["get_train_batch"][-1]
        data_time = torch.tensor(data_time)

ruotianluo on 8 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings