Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extrinsic Matrices for 3+ cameras #660

Open
Esilocke opened this issue Dec 28, 2023 · 8 comments
Open

Extrinsic Matrices for 3+ cameras #660

Esilocke opened this issue Dec 28, 2023 · 8 comments
Labels
question Theory or implementation question

Comments

@Esilocke
Copy link

Esilocke commented Dec 28, 2023

Hi, I am trying to calibrate a set of 3 cameras using Kalibr but I don't understand how the t_cn_cnm1 works.

Here is my output from the yaml:

cam0:
  cam_overlaps: [2]
  camera_model: pinhole
  distortion_coeffs: [0.08281109680087015, -0.18398758284732086, 0.005051133958382585, 0.008640260762534649]
  distortion_model: radtan
  intrinsics: [600.9179442618838, 599.2578387754106, 341.60463277099274, 240.44821507600693]
  resolution: [640, 480]
  rostopic: /camera1/camera/color/image_raw
cam1:
  T_cn_cnm1:
  - [-0.7414133359661033, 0.05853003394345349, -0.6684912118930222, 0.8883474427844454]
  - [-0.2840167333857112, 0.8751932181958348, 0.3916265133784187, -0.8209689603305266]
  - [0.6079808881934803, 0.48021981003557634, -0.6322564144719063, 2.9847293349813726]
  - [0.0, 0.0, 0.0, 1.0]
  cam_overlaps: [2]
  camera_model: pinhole
  distortion_coeffs: [0.07602073028981758, -0.1603628223680752, 0.007191248340315871, -0.0014278084685302008]
  distortion_model: radtan
  intrinsics: [588.505837757169, 586.457269642278, 333.6385186873257, 247.9353654769998]
  resolution: [640, 480]
  rostopic: /camera2/camera/color/image_raw
cam2:
  T_cn_cnm1:
  - [-0.3274874647344716, -0.35079256500893524, 0.8773235074795588, -1.5046245875845434]
  - [0.3490605026933797, 0.8179228902762989, 0.45733894544583203, -0.7879559451298809]
  - [-0.8780140806964479, 0.45601175631392726, -0.14541166460175797, 2.1602821465549202]
  - [0.0, 0.0, 0.0, 1.0]
  cam_overlaps: [0, 1]
  camera_model: pinhole
  distortion_coeffs: [0.09318340757433892, -0.23029164273790026, 0.0001545848848756635, -0.0008750487004188804]
  distortion_model: radtan
  intrinsics: [607.4367039915398, 607.1537122370588, 318.5112711269077, 242.77602440822233]
  resolution: [640, 480]
  rostopic: /camera3/camera/color/image_raw

From the wiki, T_cn_cnm1 is supposed to represent the transform with respect to the last camera in the chain. For a 2-camera setup this is intuitive to understand but I am a little confused in this particular scenario.

Cam 1 does not overlap cam 0 thus what is this T_cn_cnm1 for cam 1? If it is to the last camera in the cam_overlaps, then for cam 2, the cam_overlaps is [0, 1], thus does the T_cn_cnm1 for cam 2 represent the transformation matrix of cam 2 to cam 1 coordinates? In this case what is the transform of my cam 0, since the cam 1 <-> cam 2 matrices are circular with each other.

@goldbattle
Copy link
Collaborator

I believe it doesn't relate to the overlaps, but instead is directly the cam0 -> cam1, and then cam1 -> cam2 sequential order.
I believe the .txt file should allow you to check this. You can multiple them together to check they get to identity for the cam0.

Camchain example:

cam0:
  cam_overlaps: [1, 2]
  camera_model: pinhole
  distortion_coeffs: [-0.3044477343700008, 0.06831190064580549, 0.0016385158708163779,
    -0.0008662676729043089]
  distortion_model: radtan
  intrinsics: [542.4693569385074, 542.5943654005414, 524.0797458796575, 374.9032424633794]
  resolution: [1024, 768]
  rostopic: /bb2/left/image_mono
cam1:
  T_cn_cnm1:
  - [0.9996636322595407, -0.0023396338089671878, 0.025829217009016997, -0.11851975874888755]
  - [0.0023291690482534857, 0.9999971927692536, 0.0004352300012703644, -9.924984662223608e-08]
  - [-0.02583016277927054, -0.00037492299114023263, 0.9996662753757097, -0.0002488380896967549]
  - [0.0, 0.0, 0.0, 1.0]
  cam_overlaps: [0, 2]
  camera_model: pinhole
  distortion_coeffs: [-0.3172271683629877, 0.0796622417650886, 0.0016817111445300336,
    0.0003884697800337926]
  distortion_model: radtan
  intrinsics: [542.1195972116055, 541.3994254954196, 512.4177276233356, 373.65549791432295]
  resolution: [1024, 768]
  rostopic: /bb2/right/image_mono
cam2:
  T_cn_cnm1:
  - [0.9992669676356307, 0.020129473755802423, -0.0325627345072751, 0.01773871315621785]
  - [-0.01905646952339181, 0.9992758637360629, 0.032933252554748896, 0.05903083063685694]
  - [0.0332020836933597, -0.03228858065702636, 0.9989269588901679, -0.024070661488148848]
  - [0.0, 0.0, 0.0, 1.0]
  cam_overlaps: [0, 1]
  camera_model: pinhole
  distortion_coeffs: [0.033130172812662916, -0.3296327708784002, 1.6123381603976683,
    -2.924050525810986]
  distortion_model: equidistant
  intrinsics: [787.0594937497326, 785.6673160989883, 426.17865094797287, 351.9821746436108]
  resolution: [808, 608]
  rostopic: /bkfy/image_mono

And the result.txt

Camera-system parameters:
	cam0 (/bb2/left/image_mono):
	 type: <class 'aslam_cv.libaslam_cv_python.DistortedPinholeCameraGeometry'>
	 distortion: [-0.30444773  0.0683119   0.00163852 -0.00086627] +- [ 0.00034358  0.00023751  0.0000591   0.00005848]
	 projection: [ 542.46935694  542.5943654   524.07974588  374.90324246] +- [ 0.14351632  0.13509232  0.26855242  0.26294191]
	 reprojection error: [-0.000090, -0.000011] +- [2.218856, 1.355786]

	cam1 (/bb2/right/image_mono):
	 type: <class 'aslam_cv.libaslam_cv_python.DistortedPinholeCameraGeometry'>
	 distortion: [-0.31722717  0.07966224  0.00168171  0.00038847] +- [ 0.00038583  0.00029257  0.00005692  0.00005716]
	 projection: [ 542.11959721  541.3994255   512.41772762  373.65549791] +- [ 0.1469293   0.14012416  0.26526817  0.26407153]
	 reprojection error: [0.000088, -0.000024] +- [2.199116, 1.318062]

	cam2 (/bkfy/image_mono):
	 type: <class 'aslam_cv.libaslam_cv_python.EquidistantDistortedPinholeCameraGeometry'>
	 distortion: [ 0.03313017 -0.32963277  1.61233816 -2.92405053] +- [ 0.00476357  0.0295189   0.06283652  0.11189452]
	 projection: [ 787.05949375  785.6673161   426.17865095  351.98217464] +- [ 0.20393965  0.20041723  0.59358168  0.56565581]
	 reprojection error: [-0.000884, 0.000017] +- [3.896984, 2.296730]

	baseline T_1_0:
	 q: [ 0.00020256 -0.01291593 -0.0011673   0.99991588] +- [ 0.00058683  0.00062264  0.00007151]
	 t: [-0.11851976 -0.0000001  -0.00024884] +- [ 0.0000535   0.0000535   0.00016074]

	baseline T_2_1:
	 q: [ 0.01631062  0.01644641  0.00979959  0.99968367] +- [ 0.00079731  0.00089289  0.00008208]
	 t: [ 0.01773871  0.05903083 -0.02407066] +- [ 0.00006255  0.00005788  0.00019487]

@goldbattle goldbattle added the question Theory or implementation question label Dec 31, 2023
@Esilocke
Copy link
Author

Esilocke commented Jan 4, 2024

Hi, thank you for the response. I checked the multiplication and found that it was not very satisfactory:

array([[ 0.85017879,  0.00311549, -0.52648486,  0.51364746],
       [ 0.05465371,  0.99405782,  0.09413832, -0.23722033],
       [ 0.52364968, -0.10880876,  0.8449566 ,  0.32570204],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])

Meanwhile yours gives a very good result:

array([[ 0.99983302,  0.01695077, -0.00682733, -0.10154685],
       [-0.0167145 ,  0.99930589,  0.03329208,  0.05906141],
       [ 0.00738692, -0.0331724 ,  0.99942235, -0.02479179],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])

I wanted to combine the pointclouds together but the results seemed really off, hence I thought that the camchain matrices were being parsed incorrectly, however it seems that the result is still quite wrong, should I take this as a bad calibration (a bad convergence)?

image

The two pointcloud groups are supposed to be overlapping together but there is a blank void in the middle, and the multiplication matrix is quite far from the identity matrix.

@goldbattle
Copy link
Collaborator

Please pose your result txt and pdf report. How many connections do you have between each camera, and how do the reprojection errors look? How are you recovering this pointcloud? Stereo matching? Why would it be the middle here isn't clear to me.

@Esilocke
Copy link
Author

Esilocke commented Jan 5, 2024

Hi, please find the files attached. I do not have the txt file any more but the contents should be in the PDF's first page.
300923_morning-report-cam.pdf

This is a 3-camera Realsense d435 setup where the cameras are pointed at each other in a roughly 60-degree angle, hence the actual capture scene is quite small. Based on the reprojection errors of around +- 2px I had assumed that this was small enough hence the calibration was good.

I am projecting the pointcloud from the RGBD image using the intrinsic parameters from the yaml files, but using the pointcloud directly is also a viable option since we had saved the depth/registered_points directly. We just want to align these pointclouds so that they make sense.

@goldbattle
Copy link
Collaborator

It looks like the camera1 to camera2 link has ~4k observations, while the camera2 to camera0 has 41k. Maybe this could be a problem, I am not sure. The 3d picture doesn't make much sense to me as the z-axis of all sensors should be pointing in a common direction correct?

You can see that the cam0 has a weird spike with the majority of at very high azimuthal angle also, not sure what to think about this:
image

Additionally, the reprojection errors don't look that gaussian so it might be worth trying to recollect and varying the board position in the image frame a bit more (use the -bag-freq to downsample your bag it looks like many observations are close to each other)
image

@Esilocke
Copy link
Author

Thanks for your suggestions! The observation differences is mainly because of the camera placement, because the placement itself (+ the resolution presumably) caused the algorithm to fail to detect the chessboard between two of the cameras (likely 1->2) unless I was standing in one very specific angle, or that the intrinsic step itself couldn't be completed with a lot of NaNs without sticking the board right in front of the camera.

The 3D system does look really weird now that you mention it, though I could possibly think it would be ok if the Y and Z axis were swapped on that outlier camera. Are these results because of the kalibr algorithm itself, because I am quite sure we were using the same camera coordinate systems throughout all 3 cameras.

@daewoode
Copy link

daewoode commented Jan 22, 2024

why is the transformation called T_cn_cnm1 when it's the transformation from cam0 to cam1?
T_cn_cnm1 = Transformation_cam-n_to_cam-n-minus-1? so isn't it the transformation from cam1 to cam0 in the right handed coordinate system?

@goldbattle
Copy link
Collaborator

why is the transformation called T_cn_cnm1 when it's the transformation from cam0 to cam1? T_cn_cnm1 = Transformation_cam-n_to_cam-n-minus-1? so isn't it the transformation from cam1 to cam0 in the right handed coordinate system?

This is the naming convention used. T_1_0 is the T_0_to_1 rotation.
https://github.com/ethz-asl/minkindr/wiki/Common-Transformation-Conventions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Theory or implementation question
Projects
None yet
Development

No branches or pull requests

3 participants