I run a binary classification model and compute the auc as a performance indicator. The first I ran the code on 1 single GPU and it worked well, but the second time I tried to using 4 GPUs with DDP backend, the AUC became very weird, it seemed to just sum all the AUCs of the 4 GPUs. I use pl.metrics.AUROC() to compute auc and my pl version is 0.9.0
Here is an example of my code
https://colab.research.google.com/drive/1d-3JTypoQdbPWQFFW_vqkBxDprqIVnFD?usp=sharing
I define a random dataset
class RandomDataset(Dataset):
def __init__(self):
self.len = 8
self.data = np.array([1,5,2,6,3,7,4,8],dtype=np.float32).reshape([-1,1])
self.label = np.array([1,1,0,0,0,1,0,0], dtype=np.float32)
def __getitem__(self, index):
return self.data[index], self.label[index]
def __len__(self):
return self.len
and use seed_everything(42)
In the example the first time I use a single GPUs and set batch size to 8, epoch to 1, and got auc 0.5
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
CUDA_VISIBLE_DEVICES: [0]
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:37: UserWarning: Could not log computational graph since the `model.example_input_array` attribute is not set or `input_array` was not given
warnings.warn(*args, **kwargs)
| Name | Type | Params
--------------------------------------------
0 | model | Linear | 4
1 | auc | AUROC | 0
2 | loss | BCEWithLogitsLoss | 0
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:37: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 64 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
warnings.warn(*args, **kwargs)
Validation sanity check: 0it [00:00, ?it/s]
Val part rank cuda:0 batch: tensor([1., 5., 2., 6., 3., 7., 4., 8.], device='cuda:0') batch_idx: 0
loss tensor(24.4544, device='cuda:0')
auc tensor(0.5000, device='cuda:0')
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:37: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 64 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
warnings.warn(*args, **kwargs)
Epoch 0: 0%| | 0/2 [00:00<?, ?it/s]
Rank cuda:0 batch: tensor([1., 5., 2., 6., 3., 7., 4., 8.], device='cuda:0') batch_idx: 0
loss tensor(24.4544, device='cuda:0',
grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
auc tensor(0.5000, device='cuda:0')
Epoch 0: 50%|βββββββββββββββββββββββββββββββββββββββββββββββββ | 1/2 [00:00<00:00, 130.40it/s, loss=24.454, v_num=20, train_loss=24.5, auc=0.5]
Val part rank cuda:0 batch: tensor([1., 5., 2., 6., 3., 7., 4., 8.], device='cuda:0') batch_idx: 0
loss tensor(24.4479, device='cuda:0')
auc tensor(0.5000, device='cuda:0')
Epoch 0: 100%|ββββββββββββββββSaving latest checkpoint..ββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 85.70it/s, loss=24.454, v_num=20, train_loss=24.5, auc=0.5, vla_loss=24.4, val_auc=0.5]
Epoch 0: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 82.69it/s, loss=24.454, v_num=20, train_loss=24.5, auc=0.5, vla_loss=24.4, val_auc=0.5]
end
then I use 2 GPUs with batch size 4, epoch 1, and got auc 1.33
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
CUDA_VISIBLE_DEVICES: [1,2]
2 GPU with DDP
initializing ddp: GLOBAL_RANK: 1, MEMBER: 2/2
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/2
----------------------------------------------------------------------------------------------------
distributed_backend=ddp
All DDP processes registered. Starting ddp with 2 processes
----------------------------------------------------------------------------------------------------
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:37: UserWarning: Could not log computational graph since the `model.example_input_array` attribute is not set or `input_array` was not given
warnings.warn(*args, **kwargs)
| Name | Type | Params
--------------------------------------------
0 | model | Linear | 4
1 | auc | AUROC | 0
2 | loss | BCEWithLogitsLoss | 0
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:37: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 64 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
warnings.warn(*args, **kwargs)
/opt/conda/lib/python3.6/site-packages/pytorch_lightning/utilities/distributed.py:37: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 64 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
warnings.warn(*args, **kwargs)
Epoch 0: 0%| | 0/2 [00:00<?, ?it/s]
Rank cuda:1 batch: tensor([1., 6., 7., 4.], device='cuda:1') batch_idx: 0
Rank cuda:1 batch: tensor([3., 8., 2., 5.], device='cuda:1') batch_idx: 0
loss tensor(28.9794, device='cuda:1',
grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
auc tensor(1.3333, device='cuda:1')
loss tensor(19.9294, device='cuda:1',
grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
auc tensor(1.3333, device='cuda:1')
Epoch 0: 50%|ββββββββββββββββββββββββββββββββββββββββββββββββββ | 1/2 [00:00<00:00, 81.04it/s, loss=28.979, v_num=21, train_loss=29, auc=1.33]
Val part rank cuda:1 batch: tensor([5., 6., 7., 8.], device='cuda:1') batch_idx: 0
Val part rank cuda:1 batch: tensor([1., 2., 3., 4.], device='cuda:1') batch_idx: 0
loss tensor(12.5368, device='cuda:1')
auc tensor(0.6667, device='cuda:1')
loss tensor(36.3589, device='cuda:1')
auc tensor(0.6667, device='cuda:1')
Saving latest checkpoint..
end
Epoch 0: 100%|ββββββββββββββββSaving latest checkpoint..βββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 72.50it/s, loss=28.979, v_num=21, train_loss=29, auc=1.33, vla_loss=12.5, val_auc=0.667]
Epoch 0: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 51.66it/s, loss=28.979, v_num=21, train_loss=29, auc=1.33, vla_loss=12.5, val_auc=0.667]
end
Compute the real auc
Prior to v1.0 metrics did not do custom accumulation, but instead relied on either taking the sum/mean over the processes.
From v1.0 all metrics are implemented with custom accumulation such that we get the correct result when running in ddp mode. However, as of now AUROC has not been updated to v1.0, however it is definitely on the roadmap in a foreseeable future.
as stated, you seem to use some old version of PL and we have reimplemented all metric package but at this moment AUROC is missing, we are happy if want to send a PR π°
@SkafteNicki @Borda Thank you, PL is a great training framework for new pytorch user like me. Looking forward to the new updates!