face_to_edge = inverse_idxs.reshape(3, N).t()
If the input tensor is contiguous, then view and reshape do the exact same thing. In this case, the output from torch.arange is a new tensor guaranteed to be contiguous, so view and reshape will the give the same result; then I prefer view since it's shorter.
More generally, view is guaranteed to return a view of the input tensor; in contrast reshape returns a view if possible, and if not then returns a copy. I often prefer view since it makes it more explicit when copies of tensors are being made; this can make it easier to get a sense of where memory is being allocated, in case you later need to optimize memory usage.
Sorry, I think I misunderstood your question -- this isn't really a question about reshape vs view, it's more about how to construct face_to_edge. I think you're right -- we can construct face_to_edge more directly by reshaping inverse_idxs directly, rather than indexing into it with an auxiliary index tensor.
In that case the answer is probably because we didn't think of it ;)
Your suggestion might be a bit more efficient -- but not sure whether it would make much difference on end-to-end performance, since this is probably not a bottleneck for most applications.
I think this could be correct! @CharlesNord if you find that this change provides speedup or memory improvement could you provide this info here and we will incorporate the change
I think this could be correct! @CharlesNord if you find that this change provides speedup or memory improvement could you provide this info here and we will incorporate the change
Using the following code, I tested the speed of two approaches
import torch
F = 10000
inverse_idxs = torch.randint(0, 2 * F, (3 * F,))
def indexing():
face_to_edge = torch.arange(3 * F).view(3, F).t()
face_to_edge = inverse_idxs[face_to_edge]
return face_to_edge
def reshape():
return inverse_idxs.reshape(3, F).t()
timeit reshape()
4.69 碌s 卤 95.3 ns per loop (mean 卤 std. dev. of 7 runs, 100000 loops each)
timeit indexing()
245 碌s 卤 5.79 碌s per loop (mean 卤 std. dev. of 7 runs, 1000 loops each)
I tested it on my laptop which is not very efficient, by I think the relative advantage of the reshape version is obvious.
@CharlesNord would you like to submit a pull request for this change?