I see that the LightningTemplateModel in 0.7.5 (no longer the case in master) manually averages the metrics in validation_epoch_end for DP and DDP2
https://github.com/PyTorchLightning/pytorch-lightning/blob/694f1d789dfa56b365b68dd4f3c6f5f7a4c8970a/pl_examples/models/lightning_template.py#L167-L168
But what about DDP? I get that each device can have its own loss for backward, but we want only one single metric across devices. How is that achieved? (Is averaging the best way to aggregate most metrics anyway?)
My guess was that only the train dataloader uses DistributedSampler but not val/test. In other words each process evals the entire val/test sets and only rank 0 reports (e.g. logs) the metrics. Apparently this used to be the case but #1192 changed val/test sets to use DistributedSampler too. So I think some aggregation must be done?
@alexeykarnachev mind have a look, pls ^^
I found this
https://github.com/PyTorchLightning/pytorch-lightning/blob/bd49b07fbba09b1e7d8851ee5a1ffce3d5925e9e/pytorch_lightning/metrics/metric.py#L46-L54
But if I don't want the overhead of creating a class for simple one-liner metrics, and/or have metrics that can't be easily reduced, is there a way to let dev/test dataloaders to load the entire datasets like pre-#1192? The only way I can think of is to set replace_sampler_ddp=False and manually add the DistributedSampler to the training dataloader with something like
def load_dataset(self, mode, batch_size):
...
if mode == 'train':
self.trainer.replace_sampler_ddp = True
dataloader = self.trainer.auto_add_sampler(dataloader, True)
self.trainer.replace_sampler_ddp = False
return dataloader
This feels kind of hacky though. If there's an option like replace_evaluation_sampler_ddp it would be much more straightforward.
@ZhaofengWu could you, please provide a min. runnable script, which represents the problem?
Sorry but it's not a problem/bug in the code. It's just a question: what's the proper way to aggregate metrics under DDP if we don't want the overhead of subclassing the TensorMetric mentioned above. If "letting dev/test dataloaders read the entire datasets" is the answer, what's the best way to do that?
I have the same problem
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
My current workaround is to use pl.metrics.converters._sync_ddp_if_available.
You can also use the pl.metrics.converters.sync_ddp decorator, but this means your metric will sync at each forward pass.
Actually, the lightning_template (https://github.com/PyTorchLightning/pytorch-lightning/blob/7cca3859a7b97a9ab4a6c6fb5f36ff94bff7f218/pl_examples/models/lightning_template.py) doesn't subclass Metric, which - if I understand correctly - means that it only logs metrics on rank = 0, same for the loss.
@aaronma2020 mind provide minimal running example?
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!
Most helpful comment
Sorry but it's not a problem/bug in the code. It's just a question: what's the proper way to aggregate metrics under DDP if we don't want the overhead of subclassing the
TensorMetricmentioned above. If "letting dev/test dataloaders read the entire datasets" is the answer, what's the best way to do that?