Pytorch-lightning: The speed of training is slow

Created on 19 Aug 2020 · 2Comments · Source: PyTorchLightning/pytorch-lightning

Hello everyone, I want to calculate custom metric at end of training step.

def training_step(self, batch, batch_idx):
        image, mask = batch
        mask = mask.unsqueeze(1)
        output = self.forward(image)
        loss = self.loss(output,mask)

        train_preds = output.detach().cpu()
        train_targets = mask.detach().cpu()
        dice = metric(train_preds, train_targets, 1024, 0.5, 3500)

where metric is the following function:

def post_process(probability, threshold, min_size):
    mask = cv2.threshold(probability, threshold, 1, cv2.THRESH_BINARY)[1]
    num_component, component = cv2.connectedComponents(mask.astype(np.uint8))
    predictions = np.zeros((1024, 1024), np.float32)
    num = 0
    for c in range(1, num_component):
        p = (component == c)
        if p.sum() > min_size:
            predictions[p] = 1
            num += 1
    return predictions, num

def metric(probability, truth, imgsize, prob_threshold, min_object_size):
    '''Calculates dice of positive and negative images seperately'''
    '''probability and truth must be torch tensors'''
    batch_size = len(truth)
    with torch.no_grad():

        # probability = probability.view(batch_size, -1)
        truth = truth.view(batch_size, -1)  # torch.Size([4, 1048576])
        # assert(probability.shape == truth.shape)
        t = (truth > 0.5).float()
        # print(probability.shape, truth.shape)

        if min_object_size:
            probability = probability.numpy()[:, 0, :, :]  # torch.Size([4, 1, 1024, 1024]) --> [4, 1024, 1024]
            for i, prob in enumerate(probability):
                predict, num_predict = post_process(prob, prob_threshold, min_object_size)
                if num_predict == 0:
                    probability[i, :, :] = 0
                else:
                    probability[i, :, :] = predict
            p = torch.from_numpy(probability)
            p = p.view(batch_size, -1).float()  # torch.Size([4, 1048576])
        else:
            probability = probability.view(batch_size, -1)
            p = (probability > prob_threshold).float()

        EPS = 1e-6
        intersection = torch.sum(p * t, dim=1)
        union = torch.sum(p, dim=1) + torch.sum(t, dim=1) + EPS
        dice = (2*(intersection + EPS) / union).mean()
        if dice > 1:
            dice = 1

    return dice

when I don't calculate the metric it runs well.
What could be the problem?

question

Source

jovenwayfarer

All 2 comments

Hi
I have not looked at all the code in detail yet, but one thing that is immediately clear is that you do cpu() copies in your training step. This will block the gpu and that's almost certainly the reason why it is slow. If you observe your GPU usage, you will most likely see it peaking to 100% and droping back down repeatedly.

There are 2 solutions I see here:
1) move the metric computation to the validation step if you don't need it in training. Then at least you won't slow down training.
2) Rewrite your metric to use tensor operations that are device agnostic and stay on the same device (GPU). Do not copy from CPU to GPU and back. We also have metrics in Lightning, there is also a dice score https://pytorch-lightning.readthedocs.io/en/latest/metrics.html#dice-score-f but I am not sure if it's equivalent to yours here.

Also generally, try to avoid for loops and instead convert it to tensor operations.