Pytorch3d: pointclouds normals and features returned as None

Created on 29 Apr 2020  路  12Comments  路  Source: facebookresearch/pytorch3d

If you do not know the root cause of the problem / bug, and wish someone to help you, please
post according to this template:

馃悰 Bugs / Unexpected behaviors

When the _points_{FORM} is set, the methods normals_{FORM}() and features_{FORM}() return None, even though the normals and features are present in other forms.
However, a more reasonable behaviour should be returning the converted normals and features.

This seems to be caused by the early exist https://github.com/facebookresearch/pytorch3d/blob/master/pytorch3d/structures/pointclouds.py#L584

Instructions To Reproduce the Issue:

V = torch.rand(1,5,3)
pointclouds = Pointclouds(V).to(device=device)
points_packed = pointclouds.points_packed()
pointclouds.estimate_normals(assign_to_self=True)
normals_packed = pointclouds.normals_packed()

normals_packed is None

bug

Most helpful comment

In normal pytorch usage, whenever you make a training step, you build up the graph in the forward pass (you've never "lost the reference to the original tensors"), use it in a backward pass, and then update whatever the parameters are outside the graph and destroy the graph.

Lets say your parameters include the list form of point clouds - so it is a list of tensors which you created, which have requires_grad True, which you keep around. Then when you go forward you create a Pointclouds object and do a load of manipulations. This builds up a single graph which is all connected. You do a backward pass and update your original list. This repeats many times. If this is what your scenario is, then there is no problem.

If you have manipulations which are cheap copying (like converting packed to padded) then going backwards through them is equally cheap.

All 12 comments

Good catch! The issue is in estimate_normals and especially in L890 where the self._normals_packed representation is not assigned and thus it is None along with the fact that self.compute_packed() is skipped since the packed points are already computed. If you instead call normals_list = pointclouds.normals_list() then the output is there. Same for normals_padded.

I don't think this affects the features_X though.

In general, for the heterogeneous batching structure, I find it really hard to keep track of what has been computed, what has been updated, what doesn't need recomputing, etc.

While this feature has been promoted as a main attraction for pytorch3d, it's a little lacking compared to the expectations.

What's you advice if the pointclouds are the parameters that need optimizing. Currently, there can be lot of internal overwrite when the forms are changing from one to the other. How should one guarantee that the leaf node doesn't get "lost" during the transforms?

Thank you for the feedback. Could you be a little more concrete with what you mean? This concern is something I don't quite understand to be honest. What do you find confusing? What do you believe is overwritten? The more concrete you can be the more effective your issue is and the better chance there is for us to improve it.

We actually try to be extremely careful with how we store our tensors for all possible representations (list, packed, padded) so that the user can switch between the different modes efficiently. This same feature was used in Mesh R-CNN very successfully.

Hi @gkioxari thank you for looking into this.

In Mesh R-CNN, a point cloud or a mesh is passed through the network once (please correct me if i'm wrong), they are the data that don't accumulate gradients.

In my case, I'd like to use the point cloud as the parameter, as in, throughout the optimization/training, I have one copy of the tensors, which i keep accumulating gradients and update values via the optimizer.

Consider the following scenario:

initialize with list -> create packed -> changed packed -> transform to list -> update the point cloud
list representation with the new values.

Now the list representation is no longer the initial tensors we've created, actually we probably have lost the reference to the initial tensors.

The following i'm not entirely sure:
If we repeat the above pattern in every training iteration, we are basically be creating a super long graph, starting from the initial tensor to the altered tensor in the N-th iteration. So optimizer needs to go through the entire graph to update the initial tensor.

In this case, different representations should really be just a representation pointing to the same tensor.

In normal pytorch usage, whenever you make a training step, you build up the graph in the forward pass (you've never "lost the reference to the original tensors"), use it in a backward pass, and then update whatever the parameters are outside the graph and destroy the graph.

Lets say your parameters include the list form of point clouds - so it is a list of tensors which you created, which have requires_grad True, which you keep around. Then when you go forward you create a Pointclouds object and do a load of manipulations. This builds up a single graph which is all connected. You do a backward pass and update your original list. This repeats many times. If this is what your scenario is, then there is no problem.

If you have manipulations which are cheap copying (like converting packed to padded) then going backwards through them is equally cheap.

Hi @yifita
I agree with @bottler. I wonder if there is something we don't quite understand in your issue.
For us to be effective in resolving your issue if you could provide a little code snippet of an optimization loop where you feel that things are lost or not working as you expected, that would be extra useful.

Hi @gkioxari and @bottler thanks for the answers! I'm writing the code as I go, so I have nothing to share, only questions how to proceed :)

I think what @bottler suggested pointed me to the right direction.
I summarize it in my own way:
In stead of using Pointclouds obj as a leaf node, construct it in every pass of the network. View it just as a method to transform the tensors.

In contrast, I was considering using them as "a container for optimizable parameters", just like the modules, which is responsible for keeping the original leaf node in the graph.

Thanks! The clarification helped

Oh I see! Great! Yes, I agree with the direction you suggest and it was also the use case we had in mind when constructing the data structures. Let us know if you run into any issues!

@yifita Yep, I think that's the right way to proceed -- if you want to directly optimize the positions of points, it's safer to re-construct a new PointClouds object at each forward pass from your optimizable parameters.

In generally it's really hard to support "a container for optimizable parameters" -- optimization algorithms generally want to mutate lists of tensors in-place, and have no way of knowing whether those tensors were really part of a PointClouds or Meshes object. So these containers would need to somehow track when any of their packed/padded/list representations were mutated in any way, and then (lazily) recompute the others. We couldn't find a good way to reliably detect any kind of in-place tensor mutation, so we chose not to support this use case.

Even if we don't support this use case, maybe we can make it more clear in the documentation somehow.

@yifita Yep, I think that's the right way to proceed -- if you want to directly optimize the positions of points, it's safer to re-construct a new PointClouds object at each forward pass from your optimizable parameters.

In generally it's really hard to support "a container for optimizable parameters" -- optimization algorithms generally want to mutate lists of tensors in-place, and have no way of knowing whether those tensors were really part of a PointClouds or Meshes object. So these containers would need to somehow track when any of their packed/padded/list representations were mutated in any way, and then (lazily) recompute the others. We couldn't find a good way to reliably detect any kind of in-place tensor mutation, so we chose not to support this use case.

Even if we don't support this use case, maybe we can make it more clear in the documentation somehow.

Agree, agree and agree.

Hi @yifita

The commit https://github.com/facebookresearch/pytorch3d/commit/e64e0d17ef366c0d654180a0bf03a5ad21e2bf26 is fixing the issues you had with the estimated normals to update the packed computation. If your questions regarding this thread are addressed, feel free to close the issue.

Thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

elcronos picture elcronos  路  3Comments

cihanongun picture cihanongun  路  3Comments

shersoni610 picture shersoni610  路  3Comments

OOF-dura picture OOF-dura  路  3Comments

farhanrw picture farhanrw  路  3Comments