I don't know whether this is a bug...
As shown in the code below, I think the behavior of dp mode is unexpected? (The attribute is reset every batch)
When using ddp mode, everything is fine. (The property will be initialized only once per GPU)
import os
import torch
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision import transforms
import pytorch_lightning as pl
from pytorch_lightning import Trainer
from argparse import Namespace
class LitModel(pl.LightningModule):
def __init__(self):
super().__init__()
self.l1 = torch.nn.Linear(28 * 28, 10)
self._dummy_property = None
@property
def dummy_propery(self):
if self._dummy_property is None:
self._dummy_property = '*' * 30
print('print only once per gpu')
return self._dummy_property
def forward(self, x):
return torch.relu(self.l1(x.view(x.size(0), -1)))
def training_step(self, batch, batch_idx):
print(self._dummy_property)
# Access every batch
self.dummy_propery
print(self._dummy_property)
x, y = batch
y_hat = self(x)
loss = F.cross_entropy(y_hat, y)
return pl.TrainResult(loss)
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=0.02)
train_loader = DataLoader(
MNIST(
os.getcwd(),
download=True,
transform=transforms.ToTensor()
),
batch_size=128
)
trainer = pl.Trainer(gpus=2,
distributed_backend='dp',
max_epochs=2)
model = LitModel()
trainer.fit(model, train_loader)
None
print only once per gpu
******************************
None
print only once per gpu
******************************
None
print only once per gpu
******************************
None
print only once per gpu
******************************
...
@yukw777 mind have a look? 🐰
@Borda sure thing! A bit busy right now, but I’ll try to get to it later this week or weekend!
oh, this is actually a known problem and comes from DataParallel in PyTorch itself.
See #565 and #1649 for reference.
@ananyahjha93 has been working on a workaround but it seems to be super non trivial #1895
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!
Most helpful comment
@Borda sure thing! A bit busy right now, but I’ll try to get to it later this week or weekend!