Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Help] Getting the depth of the image plane #47

Open
cs-mshah opened this issue Apr 3, 2024 · 4 comments
Open

[Help] Getting the depth of the image plane #47

cs-mshah opened this issue Apr 3, 2024 · 4 comments

Comments

@cs-mshah
Copy link

cs-mshah commented Apr 3, 2024

Firstly, thanks a ton for making this library. It is extremely helpful in performing common operations. I wasn't able to find anything so simple.
I have the following problem: I want to back-project the image plane to world coordinates. Basically, the depth map should contain the depth of the image plane. How can I compute this? Will they all be 1? Can you show using an example.

@yxlao
Copy link
Owner

yxlao commented Apr 7, 2024

I think what you mean by the "depth of the image plane" is the distance from the camera center to the image plane. This distance is referred to as the focal length, and there are two types of focal length representations: focal length in pixels and physical focal length in metric space.

TLDR: Typically, the focal length is expressed in pixels in computer vision, as specified in the intrinsic camera matrix $K$. If you want to compute the physical focal length, you'll need additional information including the sensor size (in metric unit) and the resolution of the camera.

Let's break it down. Assuming you have camera intrinsic $K$ matrix:

$$ K=\left[\begin{array}{ccc} f_{x} & 0 & c_{x} \\ 0 & f_{y} & c_{y} \\ 0 & 0 & 1 \end{array}\right] $$

  • Focal Length in Pixels: $f_x$ and $f_y$ in the intrinsic camera matrix $K$ are the focal lengths in pixels. These are unitless values that does not have any physical scale. You can imagine that scaling the focal length and the sensor size by the same factor will not change the image projection relationship at all. This is the most common representation in computer vision as we don't care about the physical size of the sensor, nor the physical focal length.
  • Physical Focal Length: The physical focal length is the focal length of the lens in metric units (e.g., millimeters). If you want to convert the physical focal length to focal length in pixels, you'll need to know the sensor size in metric units and the resolution of the camera. To compute the metric focal length from the pixel focal length, you would use the following formulas:

$$ f_{metric_x} = \frac{f_x}{resolution_x} \times sensor_x $$

$$ f_{metric_y} = \frac{f_y}{resolution_y} \times sensor_y $$

Where:

  • $f_x$ and $f_y$ are the given focal lengths in pixels along the x and y axes, respectively.
  • $resolution_x$ and $resolution_y$ are the resolution of the camera sensor in pixels along the width (x-axis) and height (y-axis).
  • $sensor_x$ and $sensor_y$ are the physical sizes of the sensor along the width and height in metric units (typically millimeters).
  • $f_{metric_x}$ and $f_{metric_y}$ are the calculated physical focal lengths in metric units along the x and y dimensions, respectively.
  • Also, you may assume that $f_x = f_y$ for typical cameras (uniform square pixels, symmetric lens).

@yxlao
Copy link
Owner

yxlao commented Apr 7, 2024

If you want to project depth images to 3D as point clouds, you may use the functions in ct.project. Typically you'll need the intrinsic and extrinsic camera parameters to project a depth image to 3D. Also, pay attention to the depth image format, as it could be in different units or different scales.

import open3d as o3d
import camtools as ct
import json
import numpy as np

from pathlib import Path


def main():
    # Get paths.
    redwood = o3d.data.SampleRedwoodRGBDImages()
    im_color_path = Path(redwood.color_paths[0])
    im_depth_path = Path(redwood.depth_paths[0])
    camera_intrinsic_path = Path(redwood.camera_intrinsic_path)

    # Load K (intrinsic).
    with open(camera_intrinsic_path, "r") as f:
        camera_intrinsic = json.load(f)
    K = np.array(camera_intrinsic["intrinsic_matrix"]).reshape(3, 3).T

    # Load T (extrinsic), assume identity.
    T = np.eye(4)

    # Load images and depths.
    im_color = ct.io.imread(im_color_path)
    im_depth = ct.io.imread_depth(im_depth_path, depth_scale=1000.0)

    # Create point cloud.
    points, colors = ct.project.im_depth_im_color_to_points_colors(
        im_depth=im_depth, im_color=im_color, K=K, T=T
    )

    # Visualize.
    pcd = o3d.geometry.PointCloud()
    pcd.points = o3d.utility.Vector3dVector(points)
    pcd.colors = o3d.utility.Vector3dVector(colors)
    o3d.visualization.draw_geometries([pcd])


if __name__ == "__main__":
    main()

This shall give you:
Screenshot from 2024-04-07 15-40-49

@yxlao yxlao closed this as completed Apr 7, 2024
@cs-mshah
Copy link
Author

Thanks. The explanation was really helpful. But I wanted to know the actual focal length in mm since I want to back-project my points to the depth of the image plane itself. Is there a way to know the size of the pixel in mm or the $sensor_x$, $sensor_y$ for a SIMPLE_PINHOLE or PINHOLE camera used by colmap? Or should I just assume the standard: 1px = 0.264mm

@yxlao yxlao reopened this Apr 23, 2024
@yxlao
Copy link
Owner

yxlao commented Apr 23, 2024

Is there a way to know the size of the pixel in mm or the for a SIMPLE_PINHOLE or PINHOLE camera used by colmap?

As far as I know, COLMAP's reconstruction of points and cameras is not physically scaled. That is, the scale is relative (or arbitrary) as we don't know the physical scale of COLMAP's reconstruction.

  • Extrinsic properties: You have to manually to obtain a physical scale or provide physical-scale camera poses to COLMAP for it to reconstruct physical-scale points.
  • Intrinsic properties: The same applies to your question about "pixels scale in mm". You either have to know the physical specifications of your camera in advance, or use one of the camera calibration techniques by capturing a known pattern in physical space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants