I am running the imagenet example and found that there is no "data time" as in the original pytorch example.
How to monitor the data loading time?
Hi! thanks for your contribution!, great first issue!
Also, it's weird that one epoch becomes 5201; it should be 5005.
I think the profiler can do that, if you turn it on in the Trainer.
Also, it's weird that one epoch becomes 5201; it should be 5005.
Are you referring to the progress bar? I think it is because it includes the validation set.
My understanding is the profiler only output the result after training.
Yes. The progress bar, thanks.
In the profiler you get something like this:
Profiler Report
Action | Mean duration (s) | Total time (s)
-----------------------------------------------------------------
on_train_start | 0.0 | 0.0
on_epoch_start | 0.0 | 0.0
get_train_batch | 0.5312 | 2.656
on_batch_start | 0.0 | 0.0
model_forward | 0.2186 | 1.093
model_backward | 0.2128 | 1.064
on_after_backward | 0.0 | 0.0
optimizer_step | 0.2124 | 1.062
on_batch_end | 0.0032 | 0.016
on_epoch_end | 0.0 | 0.0
on_train_end | 0.0 | 0.0
"get_train_batch" is the loading time of the training batch. If you don't want to wait for these stats for the end of training, just run one epoch with max_epochs=1 or a few steps with max_steps=10 by setting these flags in the Trainer.
Btw, here is what I am doing right now:
def training_step(self, data, batch_idx):
...
data_time = self.trainer.profiler.recorded_durations["get_train_batch"][-1]
data_time = torch.tensor(data_time)
Most helpful comment
In the profiler you get something like this:
"get_train_batch" is the loading time of the training batch. If you don't want to wait for these stats for the end of training, just run one epoch with
max_epochs=1or a few steps withmax_steps=10by setting these flags in the Trainer.