Pytorch3d: PointsRenderer returns NaN values during backward pass

Created on 18 May 2020 · 6Comments · Source: facebookresearch/pytorch3d

Hi,

Thanks for your effort for answering questions. My model is a simple fully connected Point Cloud Generator followed by PointsRenderer. Generated point clouds are rendered with fixed parameters then loss is calculated with reference image using MSE or L1 loss. After some iterations PointsRenderer returns NaN values during backward pass even the loss is not NaN.

Iteration  20
Loss : 0.06443136185407639
Mean grad of the last layer :  2.3488732949772384e-06
Iteration  21
Loss : 0.06735570728778839
Mean grad of the last layer :  1.2177332564533572e-06
Iteration  22
Loss : 0.06658419221639633
Mean grad of the last layer :  nan
Iteration  23
Loss : 0.16023781895637512
Mean grad of the last layer :  nan

When I use anomaly detection, it produces the following output:
RuntimeError: Function '_CompositeAlphaPointsBackward' returned nan values in its 1th output

NormWeightedCompositor returns 0 instead of NaN. What are the possible reasons that PointsRenderer returns NaN values during backward pass?

question

Source

cihanongun

All 6 comments

Can you check if any of the output points from the pointcloud generator are NaN? What value are you using for the radius in the raster_settings for the PointsRasterizer?

Can you provide a minimal script showing the set up and settings you are using?

nikhilaravi on 18 May 2020

👍1

I checked, no other NaN values before backward pass. This is the related code:

class PCGenerator(nn.Module):
    def __init__(self, latent_size, device):
        super(PCGenerator, self).__init__()
        self.latent_size = latent_size
        self.device = device

        self.dec1 = nn.Linear(self.latent_size,256)
        self.dec2 = nn.Linear(256,256)
        self.dec3 = nn.Linear(256,1024*3)

        self.raster_settings = PointsRasterizationSettings( image_size=64, radius=0.06, points_per_pixel=8)
        self.cameras = OpenGLPerspectiveCameras(device=self.device)
        self.compositor = AlphaCompositor()
        self.raster = PointsRasterizer(self.cameras, self.raster_settings)
        self.renderer = PointsRenderer(self.raster, self.compositor)

        # distance = 1.75 elevation = 30.0 azimuth = 45.0 
        self.at = torch.from_numpy(np.array([0.5,0.45,0.5], dtype=np.float32)).unsqueeze(0).to(self.device)
        self.R, self.T = look_at_view_transform( 1.75, 30.0, 45.0, at = self.at, device=self.device)

    def render(self,PC):
        self.PCs = Pointclouds(points = PC, features = torch.ones_like(PC, requires_grad=True, device=self.device))
        self.rendered = self.renderer(self.PCs, R=self.R, T=self.T, device=self.device)
        return self.rendered.squeeze()

    def generatePC(self, x):
        x = F.relu(self.dec1(x))
        x = F.relu(self.dec2(x))
        x = self.dec3(x)
        x = torch.sigmoid(x)
        return x.view(-1,1024,3)

    def forward(self,x):
        return self.render(self.generatePC(x))

def train_epoch(): 
    for i, (ims, pcs) in enumerate(train_loader):
        PCfeatures = pcs.to(device) # Point cloud features to generate point clouds
        ref_ims = ims.to(device) # Reference images

        optimizer.zero_grad()

        renderedPC = PCGenerator(PCfeatures)
        loss = loss_function(ref_ims, renderedPC) # both shapes [N,64,64,3]

        print("Iteration " , i)
        print("Loss : ", loss.item())
        loss.backward()
        print("Mean grad of the last layer : " , torch.mean(PCGenerator.dec3.weight.grad).item())

        optimizer.step()

cihanongun on 18 May 2020

We have found the culprit

https://github.com/facebookresearch/pytorch3d/blob/b4fd9d1d34b96eebbf8954abc1d5a4e3b8a91f11/pytorch3d/csrc/compositing/alpha_composite.cu#L129

You are seeing nans when alpha_tvalue is 1.0 and thus division with 0.0. A diff will be submitted to fix this issue!

gkioxari on 19 May 2020

👍1

I see. Thanks for the response. I was looking for a similar reason for NormWeightedCompositor to return 0 instead of NaN. Is it because of the following line?
https://github.com/facebookresearch/pytorch3d/blob/b4fd9d1d34b96eebbf8954abc1d5a4e3b8a91f11/pytorch3d/csrc/compositing/norm_weighted_sum.cu#L143

cihanongun on 20 May 2020

The implementation in normweighted is different as it is doing a different compositing. A value of 1 in their alpha will not lead to NaNs. The issue with alpha compositing was that an alpha value of 1 will lead to a division with 0, even though the numerator is already 0. But in normweighted they take care of small alpha values in a different way, as you point out.

gkioxari on 20 May 2020

Here is the fix https://github.com/facebookresearch/pytorch3d/commit/d689baac5ede7be237645518d1b0575f93ac1ceb.
I am closing this issue, but feel free to re-open it if you encounter more issues.

gkioxari on 20 May 2020

Was this page helpful?

0 / 5 - 0 ratings