Orb_slam2: coordinate system transform

Created on 30 Dec 2016  Â·  14Comments  Â·  Source: raulmur/ORB_SLAM2

Hi, all
Sorry to ask a very basic question. I really need to know what a transform matrix means in orb_slam2.
For example, the matrix Tcw, called camera pose, does it mean transform a point C in current frame's coordinate to world coordinate point W, or the other way.
In other words,
W = Tcw*C, or
C = Tcw*W.
When I read the code, I am confused.

Most helpful comment

Tcw means to convert a world point to camera, so that is:
C = TcwW, and
W = Twc
C
the transform is made up of: Tcw = [ Rcw | tcw ] (skipping the last row of [ 0 0 0 1 ]. The translation twc is a rotated negative of tcw:
twc = - Rcw^T * tcw
so Twc = [ Rcw^T | twc ] = [ Rcw^T | -Rcw^T * tcw ] (with last row still [ 0 0 0 1 ].
The function GetCameraCenter returns twc.

All 14 comments

Tcw means to convert a world point to camera, so that is:
C = TcwW, and
W = Twc
C
the transform is made up of: Tcw = [ Rcw | tcw ] (skipping the last row of [ 0 0 0 1 ]. The translation twc is a rotated negative of tcw:
twc = - Rcw^T * tcw
so Twc = [ Rcw^T | twc ] = [ Rcw^T | -Rcw^T * tcw ] (with last row still [ 0 0 0 1 ].
The function GetCameraCenter returns twc.

@hashten , thanks a lot! I am much clear now.

@hashten , do you know what Tcw stands for?
We know c stands for camera, w for world. But why Tcw and Twc in that order.
I thought Tcw was "Camera pose in the World reference", but turns out it is exactly the opposite.
Do you know any book that uses this notation? Couldn't find it in Hartley & Zisserman"Multiple view geometry".
Thank you

@AlejandroSilvestri Tcw in that order because we usually read the transform from right to left, which will be easier to understand for a long position. For example:

p4 = T43*T32*T21*p1

transform the point from frame 1 to frame 4. Write that way will also help to prevent mistakes.

The data saved in SaveTrajectoryKITTI is transformation Twc, which is the transformation that takes a point in the camera coordinate system to the world coordinate system.

In Orb_SLAM2, the frame poses are stored as the relative transformations to keyframes (reference frames), so we need to concatinate the poses of the keyframe with this relative transform to get the frame poses. It will be more convinent if we calculate first the inverse of the frame pose:

Tfo = Tfr * Trw * Two

where
Two: the transformation that converts the points in the origin coordinate system (first keyframe) to the world coordinate system
Trw: the transformation that converts the points in the world coordinate system to the reference keyframe coordinate system
Tfr: the transformation that converts the points in the reference coordinate system to the camera frame coordinate system

Then we inverse Tfo to get Tof ( camera pose in the coordinate system of the first keyframe, or transformation that converts point in camera frame to the coordinate system of the first keyframe).

Here are the code that does this:

cv::Mat Two = vpKFs[0]->GetPoseInverse();
Trw = Trw*pKF->GetPose()*Two; // get the pose of the reference keyframe
cv::Mat Tcw = (*lit)*Trw; // compose the transform to get the pose of the frame
cv::Mat Rwc = Tcw.rowRange(0,3).colRange(0,3).t();
cv::Mat twc = -Rwc*Tcw.rowRange(0,3).col(3);

The last two lines are calculating the inverse of the Tfo.

@xslittlegrass , great explanation! Works for me.

@AlejandroSilvestri Tcw in that order because we usually read the transform from right to left, which will be easier to understand for a long position. For example:

p4 = T43*T32*T21*p1

transform the point from frame 1 to frame 4. Write that way will also help to prevent mistakes.

The data saved in SaveTrajectoryKITTI is transformation Twc, which is the transformation that takes a point in the camera coordinate system to the world coordinate system.

In Orb_SLAM2, the frame poses are stored as the relative transformations to keyframes (reference frames), so we need to concatinate the poses of the keyframe with this relative transform to get the frame poses. It will be more convinent if we calculate first the inverse of the frame pose:

Tfo = Tfr * Trw * Two

where
Two: the transformation that converts the points in the origin coordinate system (first keyframe) to the world coordinate system
Trw: the transformation that converts the points in the world coordinate system to the reference keyframe coordinate system
Tfr: the transformation that converts the points in the reference coordinate system to the camera frame coordinate system

Then we inverse Tfo to get Tof ( camera pose in the coordinate system of the first keyframe, or transformation that converts point in camera frame to the coordinate system of the first keyframe).

Here are the code that does this:

cv::Mat Two = vpKFs[0]->GetPoseInverse();
Trw = Trw*pKF->GetPose()*Two; // get the pose of the reference keyframe
cv::Mat Tcw = (*lit)*Trw; // compose the transform to get the pose of the frame
cv::Mat Rwc = Tcw.rowRange(0,3).colRange(0,3).t();
cv::Mat twc = -Rwc*Tcw.rowRange(0,3).col(3);

The last two lines are calculating the inverse of the Tfo.

@AlejandroSilvestri
Eecuse me, I want to ask is :
whatever, according to the notation of book "State Estimation for Robotics",in chapter6.3, the author give the definition that camera pose is the transform from camera to world.
that's to say, Twc is the right camera pose, Tcw should be the inverse of camera pose.
So Phd. raulmur 's comment in his project (src/KeyFrame.h, line 176 cv::Mat Tcw //SE3 pose and camera center) should be wrong, Right?

@Tomas-Lee , I appreciate a lot these explanation. Helps to make perfect sense.

Here Raúl mention pose and camera center and declares this three:

  • Tcw (world pose in camera reference, 4x4)
  • Twc (camera pose in world reference, 4x4)
  • Ow (camera position in world reference, 3x1)

I can't say Raul's comment has errors, it's just general.

Twc is a main output of the slam process. In the other hand, Tcw is very important during the process, when looking for mappoints in the view, because mappoints are expressed in the world reference but they need to be projected on the image, in the camera reference.

@AlejandroSilvestri

How would a get the world pos from a pixel pos? As p = C[R|T]P, where p is the pixel value, C is the camera matrix (mK), [R|P] is Twc, P is world pos point. Shouldn't it work with P = Inv(mK)Tcwp?

@thohemp ,

Well, no, it doesn't work, C[R|T] isn't invertible, it's a 3x4 matrix, not a square one. p is a 2-dimensional point and P is a 3-dimensional one: no enough data in p to get P. (We are using homogeneous coordinates, so p is a 3-dimensional vector and P is a 4-dimensional one. In both cases the last element is a 1.)

Geometrically, you can backproject p as a ray, a line, so you won't know what distance P is from camera. To get P you need 2 views and intersect these rays.

@AlejandroSilvestri Thanks for your answer.

I used p as (x,y,1). I multiply it with the mk.inv(), which gives me the camera coordinates (z remains 1) than I substract it with twc, the world position of the camera center. I receive the world coordinates. the distance (z) remains 1. But I can change it by using the depth value from the depthmap for p(x,y) as I use RGB-D.

In this case I negleted the rotation, because I didn't need it. I may receive it by multplying it with the rotation matrix^T.

The first answer gave me the hint: https://stackoverflow.com/questions/22389896/finding-the-real-world-coordinates-of-an-image-point

It seems to work.

@thohemp , that's much better.

pxl: pixel 2d in homogeneous coordinates (x,y,1), in pixel units
pc = K^-1 p : point 3d in camera reference on projection plane, in world units
P'c = pc depth : point 3d in camera reference
Pc = [P'c | 1] : point 3d in camera reference, in homogeneous coordinates
Pw = Twc Pc : point 3d in world reference, in homogeneous coordinates

So

Pw = Twc [ (depth K^-1 pxl) | 1 ]

@AlejandroSilvestri

Excellent. Thank you!

@AlejandroSilvestri

In my case twc is a 3vs1 vector from the function GetCameraCenter, not an 4x4 as you use. In this case I would Pw = P'c-Twc. Right?

@thohemp

Twc is a pose transformation matrix. twc is a translation vector. Not the same thing.

Tcw = keyframe.GetPose()
Twc = Tcw^-1

You also have

Twc = keyframe.GetPoseInverse()
Was this page helpful?
0 / 5 - 0 ratings