Pytorch-lightning: Wrong AUCROC when runing training_step in DDP mod

Created on 20 Nov 2020  Β·  3Comments  Β·  Source: PyTorchLightning/pytorch-lightning

πŸ› Bug


I run a binary classification model and compute the auc as a performance indicator. The first I ran the code on 1 single GPU and it worked well, but the second time I tried to using 4 GPUs with DDP backend, the AUC became very weird, it seemed to just sum all the AUCs of the 4 GPUs. I use pl.metrics.AUROC() to compute auc and my pl version is 0.9.0

Please reproduce using the BoringModel and post here

Here is an example of my code
https://colab.research.google.com/drive/1d-3JTypoQdbPWQFFW_vqkBxDprqIVnFD?usp=sharing

I define a random dataset

class RandomDataset(Dataset):
    def __init__(self):
        self.len = 8
        self.data = np.array([1,5,2,6,3,7,4,8],dtype=np.float32).reshape([-1,1])
        self.label = np.array([1,1,0,0,0,1,0,0], dtype=np.float32)

    def __getitem__(self, index):
        return self.data[index], self.label[index]


    def __len__(self):
        return self.len

and use seed_everything(42)
In the example the first time I use a single GPUs and set batch size to 8, epoch to 1, and got auc 0.5

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
CUDA_VISIBLE_DEVICES: [0]
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:37: UserWarning: Could not log computational graph since the `model.example_input_array` attribute is not set or `input_array` was not given
  warnings.warn(*args, **kwargs)

  | Name  | Type              | Params
--------------------------------------------
0 | model | Linear            | 4
1 | auc   | AUROC             | 0
2 | loss  | BCEWithLogitsLoss | 0
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:37: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 64 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  warnings.warn(*args, **kwargs)
Validation sanity check: 0it [00:00, ?it/s]
Val part rank cuda:0 batch: tensor([1., 5., 2., 6., 3., 7., 4., 8.], device='cuda:0') batch_idx: 0

loss tensor(24.4544, device='cuda:0')
auc tensor(0.5000, device='cuda:0')
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:37: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 64 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  warnings.warn(*args, **kwargs)
Epoch 0:   0%|                                                                                                                                                           | 0/2 [00:00<?, ?it/s]
Rank cuda:0 batch: tensor([1., 5., 2., 6., 3., 7., 4., 8.], device='cuda:0') batch_idx: 0

loss tensor(24.4544, device='cuda:0',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
auc tensor(0.5000, device='cuda:0')
Epoch 0:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                | 1/2 [00:00<00:00, 130.40it/s, loss=24.454, v_num=20, train_loss=24.5, auc=0.5]
Val part rank cuda:0 batch: tensor([1., 5., 2., 6., 3., 7., 4., 8.], device='cuda:0') batch_idx: 0

loss tensor(24.4479, device='cuda:0')
auc tensor(0.5000, device='cuda:0')
Epoch 0: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆSaving latest checkpoint..β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00, 85.70it/s, loss=24.454, v_num=20, train_loss=24.5, auc=0.5, vla_loss=24.4, val_auc=0.5]
Epoch 0: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00, 82.69it/s, loss=24.454, v_num=20, train_loss=24.5, auc=0.5, vla_loss=24.4, val_auc=0.5]
end

then I use 2 GPUs with batch size 4, epoch 1, and got auc 1.33

GPU available: True, used: True
TPU available: False, using: 0 TPU cores
CUDA_VISIBLE_DEVICES: [1,2]
2 GPU with DDP
initializing ddp: GLOBAL_RANK: 1, MEMBER: 2/2
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/2
----------------------------------------------------------------------------------------------------
distributed_backend=ddp
All DDP processes registered. Starting ddp with 2 processes
----------------------------------------------------------------------------------------------------
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:37: UserWarning: Could not log computational graph since the `model.example_input_array` attribute is not set or `input_array` was not given
  warnings.warn(*args, **kwargs)

  | Name  | Type              | Params
--------------------------------------------
0 | model | Linear            | 4
1 | auc   | AUROC             | 0
2 | loss  | BCEWithLogitsLoss | 0
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:37: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 64 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  warnings.warn(*args, **kwargs)
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:37: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 64 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  warnings.warn(*args, **kwargs)
Epoch 0:   0%|                                                                                                                                                           | 0/2 [00:00<?, ?it/s]
Rank cuda:1 batch: tensor([1., 6., 7., 4.], device='cuda:1') batch_idx: 0

Rank cuda:1 batch: tensor([3., 8., 2., 5.], device='cuda:1') batch_idx: 0

loss tensor(28.9794, device='cuda:1',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
auc tensor(1.3333, device='cuda:1')

loss tensor(19.9294, device='cuda:1',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
auc tensor(1.3333, device='cuda:1')
Epoch 0:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                 | 1/2 [00:00<00:00, 81.04it/s, loss=28.979, v_num=21, train_loss=29, auc=1.33]
Val part rank cuda:1 batch: tensor([5., 6., 7., 8.], device='cuda:1') batch_idx: 0

Val part rank cuda:1 batch: tensor([1., 2., 3., 4.], device='cuda:1') batch_idx: 0

loss tensor(12.5368, device='cuda:1')
auc tensor(0.6667, device='cuda:1')

loss tensor(36.3589, device='cuda:1')
auc tensor(0.6667, device='cuda:1')
Saving latest checkpoint..
end
Epoch 0: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆSaving latest checkpoint..β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00, 72.50it/s, loss=28.979, v_num=21, train_loss=29, auc=1.33, vla_loss=12.5, val_auc=0.667]
Epoch 0: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:00<00:00, 51.66it/s, loss=28.979, v_num=21, train_loss=29, auc=1.33, vla_loss=12.5, val_auc=0.667]
end

Expected behavior

Compute the real auc

Environment

  • PyTorch Version: 1.6.0
  • OS: Ubuntu 18.04
  • How you installed PyTorch: pip
  • Python version: 3.6
  • CUDA/cuDNN version: 10.2
  • GPU models and configuration: 2080Ti
Metrics Working as intended enhancement help wanted

All 3 comments

Prior to v1.0 metrics did not do custom accumulation, but instead relied on either taking the sum/mean over the processes.
From v1.0 all metrics are implemented with custom accumulation such that we get the correct result when running in ddp mode. However, as of now AUROC has not been updated to v1.0, however it is definitely on the roadmap in a foreseeable future.

as stated, you seem to use some old version of PL and we have reimplemented all metric package but at this moment AUROC is missing, we are happy if want to send a PR 🐰

@SkafteNicki @Borda Thank you, PL is a great training framework for new pytorch user like me. Looking forward to the new updates!

Was this page helpful?
0 / 5 - 0 ratings