Pytorch-lightning: monitor the data loading time

Created on 26 Jun 2020  路  6Comments  路  Source: PyTorchLightning/pytorch-lightning

I am running the imagenet example and found that there is no "data time" as in the original pytorch example.
How to monitor the data loading time?

question

Most helpful comment

In the profiler you get something like this:

Profiler Report

Action                  |  Mean duration (s)    |  Total time (s) 
-----------------------------------------------------------------
on_train_start          |  0.0              |  0.0            
on_epoch_start          |  0.0              |  0.0            
get_train_batch         |  0.5312           |  2.656          
on_batch_start          |  0.0              |  0.0            
model_forward           |  0.2186           |  1.093          
model_backward          |  0.2128           |  1.064          
on_after_backward       |  0.0              |  0.0            
optimizer_step          |  0.2124           |  1.062          
on_batch_end            |  0.0032           |  0.016          
on_epoch_end            |  0.0              |  0.0            
on_train_end            |  0.0              |  0.0            

"get_train_batch" is the loading time of the training batch. If you don't want to wait for these stats for the end of training, just run one epoch with max_epochs=1 or a few steps with max_steps=10 by setting these flags in the Trainer.

All 6 comments

Hi! thanks for your contribution!, great first issue!

Also, it's weird that one epoch becomes 5201; it should be 5005.

I think the profiler can do that, if you turn it on in the Trainer.

Also, it's weird that one epoch becomes 5201; it should be 5005.

Are you referring to the progress bar? I think it is because it includes the validation set.

My understanding is the profiler only output the result after training.

Yes. The progress bar, thanks.

In the profiler you get something like this:

Profiler Report

Action                  |  Mean duration (s)    |  Total time (s) 
-----------------------------------------------------------------
on_train_start          |  0.0              |  0.0            
on_epoch_start          |  0.0              |  0.0            
get_train_batch         |  0.5312           |  2.656          
on_batch_start          |  0.0              |  0.0            
model_forward           |  0.2186           |  1.093          
model_backward          |  0.2128           |  1.064          
on_after_backward       |  0.0              |  0.0            
optimizer_step          |  0.2124           |  1.062          
on_batch_end            |  0.0032           |  0.016          
on_epoch_end            |  0.0              |  0.0            
on_train_end            |  0.0              |  0.0            

"get_train_batch" is the loading time of the training batch. If you don't want to wait for these stats for the end of training, just run one epoch with max_epochs=1 or a few steps with max_steps=10 by setting these flags in the Trainer.

Btw, here is what I am doing right now:


    def training_step(self, data, batch_idx):
        ...
        data_time = self.trainer.profiler.recorded_durations["get_train_batch"][-1]
        data_time = torch.tensor(data_time)

Was this page helpful?
0 / 5 - 0 ratings