Thanks for releasing such a powerful library for 3D vision.
I am interested in estimating depth by differentiable rendering.
If a dense correspondence map (F_12) and relative camera pose (T_12) between two frames (I_1, I_2) are given, can I inversely compute a reference depth, which is zbuf in pytorch3d with a differentiable way?
Could you recommend some scripts to achieve this?
Hi @SeokjuLee
Your question is related to multi-view geometry. For some background, I highly recommend the _Multi-View Geometry in Computer Vision_ book authored by _Richard Hartley_ and _Andrew Zisserman_.
I will try to lay the problem out for you and then give you some references in how to go about solving it.
If you have a dense correspondence map, then you have K pairs of (x1, x2), these are pairs of points in the first and second frame respectively, which map to the same point X in the world.
If P1 = [K1| I, 0] is the first projection matrix for the first frame (convention for P = [K| R, t], where K is the camera matrix, R is the rotation and t the translation of the camera center. We usually consider the first camera to be at origin) and P2 = [K2| R2, t2] - note that P1 and P2 are unknown- then,
x1 = P1 X
x2 = P2 X , for all K pairs of (x1, x2) from your dense correspondence map.
In these sets of equations you want to find X for each of the K pairs of points, which you can achieve by first solving for P1 & P2. If K is bigger than the number of parameters in P1/P2 (there are 8) then you can get a solution. If K > 8 then you can find a more robust solution (with some tricks like RANSAC). Once you solve for P1 & P2, then you can derive X for each (x1,x2) where X is the xyz coordinate of the point in the 3D world, which is your desired output.
This is a very known problem in computer vision and multi-view geometry. The eight-point algorithm provides a solution to this problem. The Multi-View Geometry book gives more solutions and references this. I highly recommend reading it.
Hi @SeokjuLee
I marked your question with a _how to_ label because it's not regarding an issue or question regarding an implementation in the codebase, but it's a general 3D geometry question. I provided some links and references to get you started. I will be closing this issue. Good luck!
Most helpful comment
Hi @SeokjuLee
Your question is related to multi-view geometry. For some background, I highly recommend the _Multi-View Geometry in Computer Vision_ book authored by _Richard Hartley_ and _Andrew Zisserman_.
I will try to lay the problem out for you and then give you some references in how to go about solving it.
If you have a dense correspondence map, then you have K pairs of (x1, x2), these are pairs of points in the first and second frame respectively, which map to the same point X in the world.
If P1 = [K1| I, 0] is the first projection matrix for the first frame (convention for P = [K| R, t], where K is the camera matrix, R is the rotation and t the translation of the camera center. We usually consider the first camera to be at origin) and P2 = [K2| R2, t2] - note that P1 and P2 are unknown- then,
x1 = P1 X
x2 = P2 X , for all K pairs of (x1, x2) from your dense correspondence map.
In these sets of equations you want to find X for each of the K pairs of points, which you can achieve by first solving for P1 & P2. If K is bigger than the number of parameters in P1/P2 (there are 8) then you can get a solution. If K > 8 then you can find a more robust solution (with some tricks like RANSAC). Once you solve for P1 & P2, then you can derive X for each (x1,x2) where X is the xyz coordinate of the point in the 3D world, which is your desired output.
This is a very known problem in computer vision and multi-view geometry. The eight-point algorithm provides a solution to this problem. The Multi-View Geometry book gives more solutions and references this. I highly recommend reading it.