Apply to RGB monocular movie #56

RickMaruyama · 2024-04-03T06:57:20Z

@muskie82
Thank you very much for your great contribution to visual SLAM!!

I have a question.
How can your code apply to RGB monocular movie data?
I take a RGB monocular movie, and want to render it as ply format.

【detail】
I have reproduced the code on AWS EC2 (Ubuntu20.4 / A10 GPU) and successfully can get the GUI with demo dataset. (following code)
python slam.py --config configs/mono/tum/fr3_office.yaml
Then I samely tried to use my own movie data which is Monocular and produce RGB only. I inputted the RGB image and color info as rgb.txt and did same code.
then the terminal requests me to give depth.txt.

python slam.py --config configs/mono/tum/my_data.yaml
~~~~
File "/home/ubuntu/MonoGS/utils/dataset.py", line 55, in parse_list
data = np.loadtxt(filepath, delimiter=" ", dtype=np.unicode_, skiprows=skiprows)
File "/home/ubuntu/miniconda3/envs/MonoGS/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1067, in loadtxt
fh = np.lib._datasource.open(fname, 'rt', encoding=encoding)
File "/home/ubuntu/miniconda3/envs/MonoGS/lib/python3.7/site-packages/numpy/lib/_datasource.py", line 193, in open
return ds.open(path, mode, encoding=encoding, newline=newline)
File "/home/ubuntu/miniconda3/envs/MonoGS/lib/python3.7/site-packages/numpy/lib/_datasource.py", line 533, in open
raise IOError("%s not found." % path)
OSError: datasets/tum/rgb_dataset_chassy_crash_test/depth.txt not found.

I confirmed even depth.txt is added in the right folder and tried again the run slam.py, it still then need “pose_list”, probably related to accelerometer.txt and ground truth.txt.
How can I get them just with RGB camera?

python slam.py --config configs/mono/tum/my_data.yaml
MonoGS: saving results in 
results/tum_my_data/2024-04-03-06-50-39
Traceback (most recent call last):
  File "slam.py", line 251, in <module>
    slam = SLAM(config, save_dir=save_dir)
  File "slam.py", line 56, in __init__
    model_params, model_params.source_path, config=config
  File "/home/ubuntu/MonoGS/utils/dataset.py", line 490, in load_dataset
    return TUMDataset(args, path, config)
  File "/home/ubuntu/MonoGS/utils/dataset.py", line 398, in __init__
    parser = TUMParser(dataset_path)
  File "/home/ubuntu/MonoGS/utils/dataset.py", line 51, in __init__
    self.load_poses(self.input_folder, frame_rate=32)
  File "/home/ubuntu/MonoGS/utils/dataset.py", line 88, in load_poses
    pose_data = self.parse_list(pose_list, skiprows=1)
UnboundLocalError: local variable 'pose_list' referenced before assignment

So my question again is what is procedure to test on real data created by Monocular RGB camera?
Thank you for your advice in advance.

The text was updated successfully, but these errors were encountered:

rmurai0610 · 2024-04-03T18:25:32Z

To run our code with your custom data, you'll need to implement a dataset class
https://github.com/muskie82/MonoGS/blob/main/utils/dataset.py#L394-L402
which sets the following attributes (pseudocode):

self.has_depth = False
self.num_imgs = 100
self.color_paths = ["path/1.png", "path/2.png", ...]
self.depth_paths = None
self.poses = [np.eye(4) for _ in range(self.num_imgs)]

If you have gt poses, you can set the poses to be them.

RickMaruyama · 2024-04-09T06:49:16Z

Thank you very much for your prompt reply and sorry for my late reply.
I successfully could see the rendering scene on GUI. however, somehow, I could not see the ply file output in result of gaussian splatting. I checked and got this info. Probably I could unset the depth value in wrong way..
could you advise me how to reach getting ply?
Thank you.


MonoGS/results/tum_rgb_dataset_sample_data/2024-04-04-03-19-38/plot/trj_final.json

{
    "trj_id": [
        80
    ],
    "trj_est": [
        [
            [
                1.0,
                0.0,
                0.0,
                0.0
            ],
            [
                0.0,
                1.0,
                0.0,
                0.0
            ],
            [
                0.0,
                0.0,
                1.0,
                0.0
            ],
            [
                0.0,
                0.0,
                0.0,
                1.0
            ]
        ]
    ],
    "trj_gt": [
        [
            [
                1.0,
                0.0,
                0.0,
                0.0
            ],
            [
                0.0,
                1.0,
                0.0,
                0.0
            ],
            [
                0.0,
                0.0,
                1.0,
                0.0
            ],
            [
                0.0,
                0.0,
                0.0,
                1.0
            ]
        ]
    ]

muskie82 · 2024-04-15T14:20:28Z

Hi,
save_gaussians function exports ply file.
To call this function, can you try one of the following options?

Run with --eval flag.
Set eval_rendering in config file True (like this)
Copy and paste save_gaussians function to where you want to call it.

Best,

RickMaruyama · 2024-04-16T04:59:34Z

Thank you for your reply!

I have set eval_rendering = True, then Run with --eval.
to exclude any risk occurred from dataset, I use rgb data from freriburg_desk rgb data as well as rgb.txt as input (not to include groundtruth.txt and accelerometer.txt).
after running slam.py, I got asked to registered to wangb and did so. then I was guided to see the result in wagb, and no gui opened-up (gaussian-splatting) in my Ubuntu server. (I guess this is caused by spec. of --eval)

I got the error that umeyama_alignment doesn't work and therefore no .ply produced at save_folder. Seems to me the cause is groundtruth setting. Could you tell me how to treat groundtruth value (as well as accelerometer.txt), in case I don't have them produced by monocular camera? (like iPhone movie for example)

I hope this terminal output helps you to understand my problem.

**(MonoGS) ubuntu@ip-XX:~/MonoGS$ python slam.py --config configs/mono/tum/dummy_data.yaml --eval**
MonoGS: Running MonoGS in Evaluation Mode
MonoGS: Following config will be overriden
MonoGS:         save_results=True
MonoGS:         use_gui=False
MonoGS:         eval_rendering=True
MonoGS:         use_wandb=True
MonoGS: saving results in results/tum_rgbd_dataset_freiburg1_desk copy/2024-04-16-03-31-58
wandb: Currently logged in as: rmaruyama7180 (use `wandb login --relogin` to force relogin)
wandb: wandb version 0.16.6 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.12.2
wandb: Syncing run configs/mono/tum/dummy_data_2024-04-16-03-31-58
wandb: ⭐️ View project at https://wandb.ai/rmaruyama7180/MonoGS
wandb:  View run at https://wandb.ai/rmaruyama7180/MonoGS/runs/eukpnf09
wandb: Run data is saved locally in /home/ubuntu/MonoGS/wandb/run-20240416_033159-eukpnf09
wandb: Run `wandb offline` to turn off syncing.

MonoGS: Resetting the system
MonoGS: Initialized map
MonoGS: Resetting the opacity of non-visible Gaussians
MonoGS: Performing initial BA for initialization
MonoGS: Initialized SLAM
MonoGS: Evaluating ATE at frame:  59

Traceback (most recent call last):
  File "slam.py", line 252, in <module>
    slam = SLAM(config, save_dir=save_dir)
  File "slam.py", line 110, in __init__
    self.frontend.run()
  File "/home/ubuntu/MonoGS/utils/slam_frontend.py", line 473, in run
    monocular=self.monocular,
  File "/home/ubuntu/MonoGS/utils/eval_utils.py", line 120, in eval_ate
    monocular=monocular,
  File "/home/ubuntu/MonoGS/utils/eval_utils.py", line 32, in evaluate_evo
    traj_est, traj_ref, correct_scale=monocular
  File "/home/ubuntu/miniconda3/envs/MonoGS/lib/python3.7/site-packages/evo/core/trajectory.py", line 397, in align_trajectory
    with_scale)
  File "/home/ubuntu/miniconda3/envs/MonoGS/lib/python3.7/site-packages/evo/core/geometry.py", line 72, in umeyama_alignment
    raise GeometryException("Degenerate covariance rank, "
evo.core.geometry.GeometryException: Degenerate covariance rank, Umeyama alignment is not possible

wandb: Waiting for W&B process to finish, PID 13615
wandb: Program failed with code 1.  Press ctrl-c to abort syncing.                                                                    
wandb: Find user logs for this run at: /home/ubuntu/MonoGS/wandb/run-20240416_033159-eukpnf09/logs/debug.log
wandb: Find internal logs for this run at: /home/ubuntu/MonoGS/wandb/run-20240416_033159-eukpnf09/logs/debug-internal.log
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Synced configs/mono/tum/dummy_data_2024-04-16-03-31-58: https://wandb.ai/****/MonoGS/runs/eukpnf09

FYI. output with print func for error-related variable value to check the cause of error. It looks like pose_gt is cause of problem, letting y and mean_y to "0"and umeyayama_alignment error.

_pose_gt: [[1. 0. 0. 0.] [0. 1. 0. 0.] [0. 0. 1. 0.] [0. 0. 0. 1.]]
trj_gt_np: [array([[1., 0., 0., 0.],[0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 1.]]), 
array([[1., 0., 0., 0.], [0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 1.]]), 
 ...
array([[1., 0., 0., 0.], 0., 1., 0., 0.], 0., 0., 1., 0.], [0., 0., 0., 1.]])]
poses_gt: [array([[1., 0., 0., 0.], [0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 1.]]), 
 ....
array([[1., 0., 0., 0.],       [0., 1., 0., 0.],       [0., 0., 1., 0.],       [0., 0., 0., 1.]])]
 traj_ref = PosePath3D: 10 poses, 0.000m path length
traj_ref: 10 poses, 0.000m path length
y: [[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
mean_y: [0. 0. 0.]
mean_x: [-0.00533411 -0.04845559 -0.25690131]
n: 10
outer_sum [[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]
cov_xy: [[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]
m-1 rank: 2
SVDv - d: [0. 0. 0.]
threshold - np.finfo(d.dtype).eps: 2.220446049250313e-16_

muskie82 · 2024-04-18T20:08:42Z

The easiest fix should be just to comment out eval_ate function.

MonoGS/utils/slam_frontend.py

Lines 349 to 356 in 37cebb4

 eval_ate( 

 self.cameras, 

 self.kf_indices, 

 self.save_dir, 

 0, 

 final=True, 

 monocular=self.monocular, 

 )

RickMaruyama · 2024-04-19T06:13:45Z

Thanks for reply.
After comment out eval_ate at frontend.py, still same error about umeyama_ persists..

muskie82 · 2024-04-19T06:20:34Z

Of course here as well.
The idea is just to skip all the trajectory evaluation which requires gt pose.

MonoGS/utils/slam_frontend.py

Lines 468 to 474 in 37cebb4

 eval_ate( 

 self.cameras, 

 self.kf_indices, 

 self.save_dir, 

 cur_frame_idx, 

 monocular=self.monocular, 

 )

RickMaruyama · 2024-04-19T07:14:01Z

@muskie82
Yes, I've comment it out too.
Sorry for not to mention above.

muskie82 · 2024-04-19T07:26:11Z

OK, then where the umeyama alignment is called?
It is used just for trajectory evaluation, so just comment out all the evaluate_ate function in the code.

RickMaruyama · 2024-04-19T07:41:55Z

it's here.
/MonoGS/lib/python3.7/site-packages/evo/core/geometry.py
line 64, in umeyama_alignment

muskie82 · 2024-04-19T07:47:41Z

I mean, where in the MonoGS code it is called.

langsungn · 2024-05-15T09:23:04Z

@RickMaruyama Hi, can I ask how you succeeded in viewing the rendering GUI and implementing dataset.py? Thanks :)

RickMaruyama · 2024-05-15T15:24:26Z

@langsungn
Hello! I'm sorry for late feedback, I could get ply file but the plots clouds seems to be broken.
There maybe any issue in my modification..

langsungn · 2024-05-16T01:18:10Z

@RickMaruyama Thank you for replying. I'm okay with a few issues, but would you share your modified 'dataset.py' file with me? Thanks a lot!

leichaoyang · 2024-05-21T10:49:54Z

@RickMaruyama Can you share your modified 'dataset.py' file with me？Thanks ！

RickMaruyama · 2024-05-22T09:02:39Z

@langsungn @leichaoyang
I am sorry for my delay, since I was very busy for last days..
I attach dataset.py modified. Thank you for your support!


import csv
import glob
import os

import cv2
import numpy as np
import torch
import trimesh
from PIL import Image

from gaussian_splatting.utils.graphics_utils import focal2fov

try:
    import pyrealsense2 as rs
except Exception:
    pass


class ReplicaParser:
    def __init__(self, input_folder):
        self.input_folder = input_folder
        self.color_paths = sorted(glob.glob(f"{self.input_folder}/results/frame*.jpg"))
        #self.depth_paths = sorted(glob.glob(f"{self.input_folder}/results/depth*.png")) # RGBカメラ使用のためコメントアウト
        self.depth_paths = None # RGBカメラ使用のため追加
        self.n_img = len(self.color_paths)
        self.load_poses(f"{self.input_folder}/traj.txt")

    def load_poses(self, path):
        # self.poses = [] # RGBカメラ使用のためコメントアウト
        self.poses = [np.eye(4) for _ in range(self.n_img)]  # RGBカメラ使用のため追加
    
        with open(path, "r") as f:
            lines = f.readlines()

        frames = []
        for i in range(self.n_img):
            line = lines[i]
            pose = np.array(list(map(float, line.split()))).reshape(4, 4)
            pose = np.linalg.inv(pose)
            self.poses.append(pose)
            frame = {
                "file_path": self.color_paths[i],
                "depth_path": self.depth_paths[i],
                "transform_matrix": pose.tolist(),
            }

            frames.append(frame)
        self.frames = frames


class TUMParser:
    def __init__(self, input_folder):
        self.input_folder = input_folder
        self.load_poses(self.input_folder, frame_rate=32)
        self.n_img = len(self.color_paths)

    def parse_list(self, filepath, skiprows=0):
        data = np.loadtxt(filepath, delimiter=" ", dtype=np.unicode_, skiprows=skiprows)
        return data

    # def associate_frames(self, tstamp_image, tstamp_depth, tstamp_pose, max_dt=0.08):
    #     associations = []
    #     for i, t in enumerate(tstamp_image):
    #         if tstamp_pose is None:
    #             j = np.argmin(np.abs(tstamp_depth - t))
    #             if np.abs(tstamp_depth[j] - t) < max_dt:
    #                 associations.append((i, j))
    #         else:
    #             j = np.argmin(np.abs(tstamp_depth - t))
    #             k = np.argmin(np.abs(tstamp_pose - t))
    #             if (np.abs(tstamp_depth[j] - t) < max_dt) and (
    #                 np.abs(tstamp_pose[k] - t) < max_dt
    #             ):
    #                 associations.append((i, j, k))
    #     return associations

    def associate_frames(self, tstamp_image, max_dt=0.08):
        associations = [(i, i, i) for i in range(len(tstamp_image))]  # 画像のインデックスのみを持つタプルのリストを作成
        #print("associations:", associations)
        return associations

    def load_poses(self, datapath, frame_rate=-1):
        if os.path.isfile(os.path.join(datapath, "groundtruth.txt")):
            pose_list = os.path.join(datapath, "groundtruth.txt")
        # elif os.path.isfile(os.path.join(datapath, "pose.txt")): # コメントアウト
        #     pose_list = os.path.join(datapath, "pose.txt") # コメントアウト

        image_list = os.path.join(datapath, "rgb.txt")
        # depth_list = os.path.join(datapath, "depth.txt") # コメントアウト

        image_data = self.parse_list(image_list)
        # depth_data = self.parse_list(depth_list) # コメントアウト
        # pose_data = self.parse_list(pose_list, skiprows=1) # コメントアウト
        # pose_vecs = pose_data[:, 0:].astype(np.float64) # コメントアウト

        tstamp_image = image_data[:, 0].astype(np.float64)
        # tstamp_depth = depth_data[:, 0].astype(np.float64) # コメントアウト
        # tstamp_pose = pose_data[:, 0].astype(np.float64) # コメントアウト
        associations = self.associate_frames(tstamp_image) # コメントアウト

        indicies = [0]
        for i in range(1, len(associations)):
            t0 = tstamp_image[associations[indicies[-1]][0]]
            t1 = tstamp_image[associations[i][0]]
            if t1 - t0 > 1.0 / frame_rate:
                indicies += [i]

        self.color_paths, self.poses, self.depth_paths, self.frames = [], [], [], []

        for ix in indicies:
            (i, j, k) = associations[ix]
            self.color_paths += [os.path.join(datapath, image_data[i, 1])]
            # self.depth_paths += [os.path.join(datapath, depth_data[j, 1])] # コメントアウト

            # quat = pose_vecs[k][4:] # コメントアウト
            # trans = pose_vecs[k][1:4] # コメントアウト
            # T = trimesh.transformations.quaternion_matrix(np.roll(quat, 1)) # コメントアウト
            # T[:3, 3] = trans # コメントアウト
            # self.poses += [np.linalg.inv(T)] # コメントアウト

            self.poses += [np.eye(4)] # 代わりに単位行列を追加

            frame = {
                "file_path": str(os.path.join(datapath, image_data[i, 1])),
                # "depth_path": str(os.path.join(datapath, depth_data[j, 1])), # コメントアウト
                # "transform_matrix": (np.linalg.inv(T)).tolist(), # コメントアウト
            }

            self.frames.append(frame)


class EuRoCParser:
    def __init__(self, input_folder, start_idx=0):
        self.input_folder = input_folder
        self.start_idx = start_idx
        self.color_paths = sorted(
            glob.glob(f"{self.input_folder}/mav0/cam0/data/*.png")
        )
        self.color_paths_r = sorted(
            glob.glob(f"{self.input_folder}/mav0/cam1/data/*.png")
        )
        assert len(self.color_paths) == len(self.color_paths_r)
        self.color_paths = self.color_paths[start_idx:]
        self.color_paths_r = self.color_paths_r[start_idx:]
        self.n_img = len(self.color_paths)
        self.load_poses(
            f"{self.input_folder}/mav0/state_groundtruth_estimate0/data.csv"
        )

    def associate(self, ts_pose):
        pose_indices = []
        for i in range(self.n_img):
            color_ts = float((self.color_paths[i].split("/")[-1]).split(".")[0])
            k = np.argmin(np.abs(ts_pose - color_ts))
            pose_indices.append(k)

        return pose_indices

    def load_poses(self, path):
        self.poses = []
        with open(path) as f:
            reader = csv.reader(f)
            header = next(reader)
            data = [list(map(float, row)) for row in reader]
        data = np.array(data)
        T_i_c0 = np.array(
            [
                [0.0148655429818, -0.999880929698, 0.00414029679422, -0.0216401454975],
                [0.999557249008, 0.0149672133247, 0.025715529948, -0.064676986768],
                [-0.0257744366974, 0.00375618835797, 0.999660727178, 0.00981073058949],
                [0.0, 0.0, 0.0, 1.0],
            ]
        )

        pose_ts = data[:, 0]
        pose_indices = self.associate(pose_ts)

        frames = []
        for i in range(self.n_img):
            trans = data[pose_indices[i], 1:4]
            quat = data[pose_indices[i], 4:8]

            T_w_i = trimesh.transformations.quaternion_matrix(np.roll(quat, 1))
            T_w_i[:3, 3] = trans
            T_w_c = np.dot(T_w_i, T_i_c0)

            self.poses += [np.linalg.inv(T_w_c)]

            frame = {
                "file_path": self.color_paths[i],
                "transform_matrix": (np.linalg.inv(T_w_c)).tolist(),
            }

            frames.append(frame)
        self.frames = frames


class BaseDataset(torch.utils.data.Dataset):
    def __init__(self, args, path, config):
        self.args = args
        self.path = path
        self.config = config
        self.device = "cuda:0"
        self.dtype = torch.float32
        self.num_imgs = 999999

    def __len__(self):
        return self.num_imgs

    def __getitem__(self, idx):
        pass


class MonocularDataset(BaseDataset):
    def __init__(self, args, path, config):
        super().__init__(args, path, config)
        calibration = config["Dataset"]["Calibration"]
        # Camera prameters
        self.fx = calibration["fx"]
        self.fy = calibration["fy"]
        self.cx = calibration["cx"]
        self.cy = calibration["cy"]
        self.width = calibration["width"]
        self.height = calibration["height"]
        self.fovx = focal2fov(self.fx, self.width)
        self.fovy = focal2fov(self.fy, self.height)
        self.K = np.array(
            [[self.fx, 0.0, self.cx], [0.0, self.fy, self.cy], [0.0, 0.0, 1.0]]
        )
        # distortion parameters
        self.disorted = calibration["distorted"]
        self.dist_coeffs = np.array(
            [
                calibration["k1"],
                calibration["k2"],
                calibration["p1"],
                calibration["p2"],
                calibration["k3"],
            ]
        )
        self.map1x, self.map1y = cv2.initUndistortRectifyMap(
            self.K,
            self.dist_coeffs,
            np.eye(3),
            self.K,
            (self.width, self.height),
            cv2.CV_32FC1,
        )
        # depth parameters
        # self.has_depth = True if "depth_scale" in calibration.keys() else False # RGBカメラ使用のためコメントアウト
        self.has_depth = False # RGBカメラ使用のため追加
        self.depth_scale = calibration["depth_scale"] if self.has_depth else None

        # Default scene scale
        nerf_normalization_radius = 5
        self.scene_info = {
            "nerf_normalization": {
                "radius": nerf_normalization_radius,
                "translation": np.zeros(3),
            },
        }

    def __getitem__(self, idx):
        color_path = self.color_paths[idx]
        pose = self.poses[idx]

        image = np.array(Image.open(color_path))
        depth = None

        if self.disorted:
            image = cv2.remap(image, self.map1x, self.map1y, cv2.INTER_LINEAR)

        if self.has_depth:
            depth_path = self.depth_paths[idx]
            depth = np.array(Image.open(depth_path)) / self.depth_scale

        image = (
            torch.from_numpy(image / 255.0)
            .clamp(0.0, 1.0)
            .permute(2, 0, 1)
            .to(device=self.device, dtype=self.dtype)
        )
        pose = torch.from_numpy(pose).to(device=self.device)
        return image, depth, pose


class StereoDataset(BaseDataset):
    def __init__(self, args, path, config):
        super().__init__(args, path, config)
        calibration = config["Dataset"]["Calibration"]
        self.width = calibration["width"]
        self.height = calibration["height"]

        cam0raw = calibration["cam0"]["raw"]
        cam0opt = calibration["cam0"]["opt"]
        cam1raw = calibration["cam1"]["raw"]
        cam1opt = calibration["cam1"]["opt"]
        # Camera prameters
        self.fx_raw = cam0raw["fx"]
        self.fy_raw = cam0raw["fy"]
        self.cx_raw = cam0raw["cx"]
        self.cy_raw = cam0raw["cy"]
        self.fx = cam0opt["fx"]
        self.fy = cam0opt["fy"]
        self.cx = cam0opt["cx"]
        self.cy = cam0opt["cy"]

        self.fx_raw_r = cam1raw["fx"]
        self.fy_raw_r = cam1raw["fy"]
        self.cx_raw_r = cam1raw["cx"]
        self.cy_raw_r = cam1raw["cy"]
        self.fx_r = cam1opt["fx"]
        self.fy_r = cam1opt["fy"]
        self.cx_r = cam1opt["cx"]
        self.cy_r = cam1opt["cy"]

        self.fovx = focal2fov(self.fx, self.width)
        self.fovy = focal2fov(self.fy, self.height)
        self.K_raw = np.array(
            [
                [self.fx_raw, 0.0, self.cx_raw],
                [0.0, self.fy_raw, self.cy_raw],
                [0.0, 0.0, 1.0],
            ]
        )

        self.K = np.array(
            [[self.fx, 0.0, self.cx], [0.0, self.fy, self.cy], [0.0, 0.0, 1.0]]
        )

        self.Rmat = np.array(calibration["cam0"]["R"]["data"]).reshape(3, 3)
        self.K_raw_r = np.array(
            [
                [self.fx_raw_r, 0.0, self.cx_raw_r],
                [0.0, self.fy_raw_r, self.cy_raw_r],
                [0.0, 0.0, 1.0],
            ]
        )

        self.K_r = np.array(
            [[self.fx_r, 0.0, self.cx_r], [0.0, self.fy_r, self.cy_r], [0.0, 0.0, 1.0]]
        )
        self.Rmat_r = np.array(calibration["cam1"]["R"]["data"]).reshape(3, 3)

        # distortion parameters
        self.disorted = calibration["distorted"]
        self.dist_coeffs = np.array(
            [cam0raw["k1"], cam0raw["k2"], cam0raw["p1"], cam0raw["p2"], cam0raw["k3"]]
        )
        self.map1x, self.map1y = cv2.initUndistortRectifyMap(
            self.K_raw,
            self.dist_coeffs,
            self.Rmat,
            self.K,
            (self.width, self.height),
            cv2.CV_32FC1,
        )

        self.dist_coeffs_r = np.array(
            [cam1raw["k1"], cam1raw["k2"], cam1raw["p1"], cam1raw["p2"], cam1raw["k3"]]
        )
        self.map1x_r, self.map1y_r = cv2.initUndistortRectifyMap(
            self.K_raw_r,
            self.dist_coeffs_r,
            self.Rmat_r,
            self.K_r,
            (self.width, self.height),
            cv2.CV_32FC1,
        )

    def __getitem__(self, idx):
        color_path = self.color_paths[idx]
        color_path_r = self.color_paths_r[idx]

        pose = self.poses[idx]
        image = cv2.imread(color_path, 0)
        image_r = cv2.imread(color_path_r, 0)
        depth = None
        if self.disorted:
            image = cv2.remap(image, self.map1x, self.map1y, cv2.INTER_LINEAR)
            image_r = cv2.remap(image_r, self.map1x_r, self.map1y_r, cv2.INTER_LINEAR)
        stereo = cv2.StereoSGBM_create(minDisparity=0, numDisparities=64, blockSize=20)
        stereo.setUniquenessRatio(40)
        disparity = stereo.compute(image, image_r) / 16.0
        disparity[disparity == 0] = 1e10
        depth = 47.90639384423901 / (
            disparity
        )  ## Following ORB-SLAM2 config, baseline*fx
        depth[depth < 0] = 0
        image = cv2.cvtColor(image, cv2.COLOR_GRAY2BGR)
        image = (
            torch.from_numpy(image / 255.0)
            .clamp(0.0, 1.0)
            .permute(2, 0, 1)
            .to(device=self.device, dtype=self.dtype)
        )
        pose = torch.from_numpy(pose).to(device=self.device)

        return image, depth, pose


class TUMDataset(MonocularDataset):
    def __init__(self, args, path, config):
        super().__init__(args, path, config)
        dataset_path = config["Dataset"]["dataset_path"]
        parser = TUMParser(dataset_path)
        self.num_imgs = parser.n_img
        self.color_paths = parser.color_paths
        self.depth_paths = parser.depth_paths
        self.poses = parser.poses


class ReplicaDataset(MonocularDataset):
    def __init__(self, args, path, config):
        super().__init__(args, path, config)
        dataset_path = config["Dataset"]["dataset_path"]
        parser = ReplicaParser(dataset_path)
        self.num_imgs = parser.n_img
        self.color_paths = parser.color_paths
        self.depth_paths = parser.depth_paths
        self.poses = parser.poses


class EurocDataset(StereoDataset):
    def __init__(self, args, path, config):
        super().__init__(args, path, config)
        dataset_path = config["Dataset"]["dataset_path"]
        parser = EuRoCParser(dataset_path, start_idx=config["Dataset"]["start_idx"])
        self.num_imgs = parser.n_img
        self.color_paths = parser.color_paths
        self.color_paths_r = parser.color_paths_r
        self.poses = parser.poses


class RealsenseDataset(BaseDataset):
    def __init__(self, args, path, config):
        super().__init__(args, path, config)
        self.pipeline = rs.pipeline()
        self.h, self.w = 720, 1280
        self.config = rs.config()
        self.config.enable_stream(rs.stream.color, self.w, self.h, rs.format.bgr8, 30)
        self.profile = self.pipeline.start(self.config)

        self.rgb_sensor = self.profile.get_device().query_sensors()[1]
        self.rgb_sensor.set_option(rs.option.enable_auto_exposure, False)
        # rgb_sensor.set_option(rs.option.enable_auto_white_balance, True)
        self.rgb_sensor.set_option(rs.option.enable_auto_white_balance, False)
        self.rgb_sensor.set_option(rs.option.exposure, 200)
        self.rgb_profile = rs.video_stream_profile(
            self.profile.get_stream(rs.stream.color)
        )

        self.rgb_intrinsics = self.rgb_profile.get_intrinsics()

        self.fx = self.rgb_intrinsics.fx
        self.fy = self.rgb_intrinsics.fy
        self.cx = self.rgb_intrinsics.ppx
        self.cy = self.rgb_intrinsics.ppy
        self.width = self.rgb_intrinsics.width
        self.height = self.rgb_intrinsics.height
        self.fovx = focal2fov(self.fx, self.width)
        self.fovy = focal2fov(self.fy, self.height)
        self.K = np.array(
            [[self.fx, 0.0, self.cx], [0.0, self.fy, self.cy], [0.0, 0.0, 1.0]]
        )

        self.disorted = True
        self.dist_coeffs = np.asarray(self.rgb_intrinsics.coeffs)
        self.map1x, self.map1y = cv2.initUndistortRectifyMap(
            self.K, self.dist_coeffs, np.eye(3), self.K, (self.w, self.h), cv2.CV_32FC1
        )

        # depth parameters
        self.has_depth = False
        self.depth_scale = None

    def __getitem__(self, idx):
        pose = torch.eye(4, device=self.device, dtype=self.dtype)

        frameset = self.pipeline.wait_for_frames()
        rgb_frame = frameset.get_color_frame()
        image = np.asanyarray(rgb_frame.get_data())
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        if self.disorted:
            image = cv2.remap(image, self.map1x, self.map1y, cv2.INTER_LINEAR)

        image = (
            torch.from_numpy(image / 255.0)
            .clamp(0.0, 1.0)
            .permute(2, 0, 1)
            .to(device=self.device, dtype=self.dtype)
        )
        return image, None, pose


def load_dataset(args, path, config):
    if config["Dataset"]["type"] == "tum":
        return TUMDataset(args, path, config)
    elif config["Dataset"]["type"] == "replica":
        return ReplicaDataset(args, path, config)
    elif config["Dataset"]["type"] == "euroc":
        return EurocDataset(args, path, config)
    elif config["Dataset"]["type"] == "realsense":
        return RealsenseDataset(args, path, config)
    else:
        raise ValueError("Unknown dataset type")

Harshitavardhani · 2024-05-24T12:22:05Z

@muskie82 Yes, I've comment it out too. Sorry for not to mention above.

Hi @muskie82

Did you overcome the error regarding "Umeyama alignment is not possible"? I am facing the same issue while working with a custom dataset.

If so, could you please tell me how you resolved it? Thank you!

RickMaruyama · 2024-05-31T08:27:08Z

@langsungn
Hello! I'd like to solve this issue.
Do you have dataset.py for monocular camera dataset case, which successfully worked to get ply?
if so, I'd like to see it for my reference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply to RGB monocular movie #56

Apply to RGB monocular movie #56

RickMaruyama commented Apr 3, 2024 •

edited

rmurai0610 commented Apr 3, 2024

RickMaruyama commented Apr 9, 2024

muskie82 commented Apr 15, 2024

RickMaruyama commented Apr 16, 2024 •

edited

muskie82 commented Apr 18, 2024

RickMaruyama commented Apr 19, 2024

muskie82 commented Apr 19, 2024

RickMaruyama commented Apr 19, 2024 •

edited

muskie82 commented Apr 19, 2024

RickMaruyama commented Apr 19, 2024

muskie82 commented Apr 19, 2024

langsungn commented May 15, 2024

RickMaruyama commented May 15, 2024

langsungn commented May 16, 2024

leichaoyang commented May 21, 2024

RickMaruyama commented May 22, 2024 •

edited

Harshitavardhani commented May 24, 2024

RickMaruyama commented May 31, 2024

Apply to RGB monocular movie #56

Apply to RGB monocular movie #56

Comments

RickMaruyama commented Apr 3, 2024 • edited

rmurai0610 commented Apr 3, 2024

RickMaruyama commented Apr 9, 2024

muskie82 commented Apr 15, 2024

RickMaruyama commented Apr 16, 2024 • edited

muskie82 commented Apr 18, 2024

RickMaruyama commented Apr 19, 2024

muskie82 commented Apr 19, 2024

RickMaruyama commented Apr 19, 2024 • edited

muskie82 commented Apr 19, 2024

RickMaruyama commented Apr 19, 2024

muskie82 commented Apr 19, 2024

langsungn commented May 15, 2024

RickMaruyama commented May 15, 2024

langsungn commented May 16, 2024

leichaoyang commented May 21, 2024

RickMaruyama commented May 22, 2024 • edited

Harshitavardhani commented May 24, 2024

RickMaruyama commented May 31, 2024

RickMaruyama commented Apr 3, 2024 •

edited

RickMaruyama commented Apr 16, 2024 •

edited

RickMaruyama commented Apr 19, 2024 •

edited

RickMaruyama commented May 22, 2024 •

edited