Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply to RGB monocular movie #56

Open
RickMaruyama opened this issue Apr 3, 2024 · 18 comments
Open

Apply to RGB monocular movie #56

RickMaruyama opened this issue Apr 3, 2024 · 18 comments

Comments

@RickMaruyama
Copy link

RickMaruyama commented Apr 3, 2024

@muskie82
Thank you very much for your great contribution to visual SLAM!!

I have a question.
How can your code apply to RGB monocular movie data?
I take a RGB monocular movie, and want to render it as ply format.

【detail】
I have reproduced the code on AWS EC2 (Ubuntu20.4 / A10 GPU) and successfully can get the GUI with demo dataset. (following code)
python slam.py --config configs/mono/tum/fr3_office.yaml
Then I samely tried to use my own movie data which is Monocular and produce RGB only. I inputted the RGB image and color info as rgb.txt and did same code.
then the terminal requests me to give depth.txt.

python slam.py --config configs/mono/tum/my_data.yaml
~~~~
File "/home/ubuntu/MonoGS/utils/dataset.py", line 55, in parse_list
data = np.loadtxt(filepath, delimiter=" ", dtype=np.unicode_, skiprows=skiprows)
File "/home/ubuntu/miniconda3/envs/MonoGS/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1067, in loadtxt
fh = np.lib._datasource.open(fname, 'rt', encoding=encoding)
File "/home/ubuntu/miniconda3/envs/MonoGS/lib/python3.7/site-packages/numpy/lib/_datasource.py", line 193, in open
return ds.open(path, mode, encoding=encoding, newline=newline)
File "/home/ubuntu/miniconda3/envs/MonoGS/lib/python3.7/site-packages/numpy/lib/_datasource.py", line 533, in open
raise IOError("%s not found." % path)
OSError: datasets/tum/rgb_dataset_chassy_crash_test/depth.txt not found.

I confirmed even depth.txt is added in the right folder and tried again the run slam.py, it still then need “pose_list”, probably related to accelerometer.txt and ground truth.txt.
How can I get them just with RGB camera?

python slam.py --config configs/mono/tum/my_data.yaml
MonoGS: saving results in 
results/tum_my_data/2024-04-03-06-50-39
Traceback (most recent call last):
  File "slam.py", line 251, in <module>
    slam = SLAM(config, save_dir=save_dir)
  File "slam.py", line 56, in __init__
    model_params, model_params.source_path, config=config
  File "/home/ubuntu/MonoGS/utils/dataset.py", line 490, in load_dataset
    return TUMDataset(args, path, config)
  File "/home/ubuntu/MonoGS/utils/dataset.py", line 398, in __init__
    parser = TUMParser(dataset_path)
  File "/home/ubuntu/MonoGS/utils/dataset.py", line 51, in __init__
    self.load_poses(self.input_folder, frame_rate=32)
  File "/home/ubuntu/MonoGS/utils/dataset.py", line 88, in load_poses
    pose_data = self.parse_list(pose_list, skiprows=1)
UnboundLocalError: local variable 'pose_list' referenced before assignment

So my question again is what is procedure to test on real data created by Monocular RGB camera?
Thank you for your advice in advance.

@rmurai0610
Copy link
Collaborator

To run our code with your custom data, you'll need to implement a dataset class
https://github.com/muskie82/MonoGS/blob/main/utils/dataset.py#L394-L402
which sets the following attributes (pseudocode):

self.has_depth = False
self.num_imgs = 100
self.color_paths = ["path/1.png", "path/2.png", ...]
self.depth_paths = None
self.poses = [np.eye(4) for _ in range(self.num_imgs)]

If you have gt poses, you can set the poses to be them.

@RickMaruyama
Copy link
Author

Thank you very much for your prompt reply and sorry for my late reply.
I successfully could see the rendering scene on GUI. however, somehow, I could not see the ply file output in result of gaussian splatting. I checked and got this info. Probably I could unset the depth value in wrong way..
could you advise me how to reach getting ply?
Thank you.


MonoGS/results/tum_rgb_dataset_sample_data/2024-04-04-03-19-38/plot/trj_final.json

{
    "trj_id": [
        80
    ],
    "trj_est": [
        [
            [
                1.0,
                0.0,
                0.0,
                0.0
            ],
            [
                0.0,
                1.0,
                0.0,
                0.0
            ],
            [
                0.0,
                0.0,
                1.0,
                0.0
            ],
            [
                0.0,
                0.0,
                0.0,
                1.0
            ]
        ]
    ],
    "trj_gt": [
        [
            [
                1.0,
                0.0,
                0.0,
                0.0
            ],
            [
                0.0,
                1.0,
                0.0,
                0.0
            ],
            [
                0.0,
                0.0,
                1.0,
                0.0
            ],
            [
                0.0,
                0.0,
                0.0,
                1.0
            ]
        ]
    ]

@muskie82
Copy link
Owner

Hi,
save_gaussians function exports ply file.
To call this function, can you try one of the following options?

  • Run with --eval flag.
  • Set eval_rendering in config file True (like this)
  • Copy and paste save_gaussians function to where you want to call it.

Best,

@RickMaruyama
Copy link
Author

RickMaruyama commented Apr 16, 2024

Thank you for your reply!

I have set eval_rendering = True, then Run with --eval.
to exclude any risk occurred from dataset, I use rgb data from freriburg_desk rgb data as well as rgb.txt as input (not to include groundtruth.txt and accelerometer.txt).
after running slam.py, I got asked to registered to wangb and did so. then I was guided to see the result in wagb, and no gui opened-up (gaussian-splatting) in my Ubuntu server. (I guess this is caused by spec. of --eval)

I got the error that umeyama_alignment doesn't work and therefore no .ply produced at save_folder. Seems to me the cause is groundtruth setting. Could you tell me how to treat groundtruth value (as well as accelerometer.txt), in case I don't have them produced by monocular camera? (like iPhone movie for example)

I hope this terminal output helps you to understand my problem.

**(MonoGS) ubuntu@ip-XX:~/MonoGS$ python slam.py --config configs/mono/tum/dummy_data.yaml --eval**
MonoGS: Running MonoGS in Evaluation Mode
MonoGS: Following config will be overriden
MonoGS:         save_results=True
MonoGS:         use_gui=False
MonoGS:         eval_rendering=True
MonoGS:         use_wandb=True
MonoGS: saving results in results/tum_rgbd_dataset_freiburg1_desk copy/2024-04-16-03-31-58
wandb: Currently logged in as: rmaruyama7180 (use `wandb login --relogin` to force relogin)
wandb: wandb version 0.16.6 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.12.2
wandb: Syncing run configs/mono/tum/dummy_data_2024-04-16-03-31-58
wandb: ⭐️ View project at https://wandb.ai/rmaruyama7180/MonoGS
wandb:  View run at https://wandb.ai/rmaruyama7180/MonoGS/runs/eukpnf09
wandb: Run data is saved locally in /home/ubuntu/MonoGS/wandb/run-20240416_033159-eukpnf09
wandb: Run `wandb offline` to turn off syncing.

MonoGS: Resetting the system
MonoGS: Initialized map
MonoGS: Resetting the opacity of non-visible Gaussians
MonoGS: Performing initial BA for initialization
MonoGS: Initialized SLAM
MonoGS: Evaluating ATE at frame:  59

Traceback (most recent call last):
  File "slam.py", line 252, in <module>
    slam = SLAM(config, save_dir=save_dir)
  File "slam.py", line 110, in __init__
    self.frontend.run()
  File "/home/ubuntu/MonoGS/utils/slam_frontend.py", line 473, in run
    monocular=self.monocular,
  File "/home/ubuntu/MonoGS/utils/eval_utils.py", line 120, in eval_ate
    monocular=monocular,
  File "/home/ubuntu/MonoGS/utils/eval_utils.py", line 32, in evaluate_evo
    traj_est, traj_ref, correct_scale=monocular
  File "/home/ubuntu/miniconda3/envs/MonoGS/lib/python3.7/site-packages/evo/core/trajectory.py", line 397, in align_trajectory
    with_scale)
  File "/home/ubuntu/miniconda3/envs/MonoGS/lib/python3.7/site-packages/evo/core/geometry.py", line 72, in umeyama_alignment
    raise GeometryException("Degenerate covariance rank, "
evo.core.geometry.GeometryException: Degenerate covariance rank, Umeyama alignment is not possible

wandb: Waiting for W&B process to finish, PID 13615
wandb: Program failed with code 1.  Press ctrl-c to abort syncing.                                                                    
wandb: Find user logs for this run at: /home/ubuntu/MonoGS/wandb/run-20240416_033159-eukpnf09/logs/debug.log
wandb: Find internal logs for this run at: /home/ubuntu/MonoGS/wandb/run-20240416_033159-eukpnf09/logs/debug-internal.log
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Synced configs/mono/tum/dummy_data_2024-04-16-03-31-58: https://wandb.ai/****/MonoGS/runs/eukpnf09

FYI. output with print func for error-related variable value to check the cause of error. It looks like pose_gt is cause of problem, letting y and mean_y to "0"and umeyayama_alignment error.

_pose_gt: [[1. 0. 0. 0.] [0. 1. 0. 0.] [0. 0. 1. 0.] [0. 0. 0. 1.]]
trj_gt_np: [array([[1., 0., 0., 0.],[0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 1.]]), 
array([[1., 0., 0., 0.], [0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 1.]]), 
 ...
array([[1., 0., 0., 0.], 0., 1., 0., 0.], 0., 0., 1., 0.], [0., 0., 0., 1.]])]
poses_gt: [array([[1., 0., 0., 0.], [0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 1.]]), 
 ....
array([[1., 0., 0., 0.],       [0., 1., 0., 0.],       [0., 0., 1., 0.],       [0., 0., 0., 1.]])]
 traj_ref = PosePath3D: 10 poses, 0.000m path length
traj_ref: 10 poses, 0.000m path length
y: [[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
mean_y: [0. 0. 0.]
mean_x: [-0.00533411 -0.04845559 -0.25690131]
n: 10
outer_sum [[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]
cov_xy: [[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]
m-1 rank: 2
SVDv - d: [0. 0. 0.]
threshold - np.finfo(d.dtype).eps: 2.220446049250313e-16_

@muskie82
Copy link
Owner

The easiest fix should be just to comment out eval_ate function.

eval_ate(
self.cameras,
self.kf_indices,
self.save_dir,
0,
final=True,
monocular=self.monocular,
)

@RickMaruyama
Copy link
Author

Thanks for reply.
After comment out eval_ate at frontend.py, still same error about umeyama_ persists..

@muskie82
Copy link
Owner

Of course here as well.
The idea is just to skip all the trajectory evaluation which requires gt pose.

eval_ate(
self.cameras,
self.kf_indices,
self.save_dir,
cur_frame_idx,
monocular=self.monocular,
)

@RickMaruyama
Copy link
Author

RickMaruyama commented Apr 19, 2024

@muskie82
Yes, I've comment it out too.
Sorry for not to mention above.

@muskie82
Copy link
Owner

OK, then where the umeyama alignment is called?
It is used just for trajectory evaluation, so just comment out all the evaluate_ate function in the code.

@RickMaruyama
Copy link
Author

it's here.
/MonoGS/lib/python3.7/site-packages/evo/core/geometry.py
line 64, in umeyama_alignment

@muskie82
Copy link
Owner

I mean, where in the MonoGS code it is called.

@langsungn
Copy link

@RickMaruyama Hi, can I ask how you succeeded in viewing the rendering GUI and implementing dataset.py? Thanks :)

@RickMaruyama
Copy link
Author

@langsungn
Hello! I'm sorry for late feedback, I could get ply file but the plots clouds seems to be broken.
There maybe any issue in my modification..

@langsungn
Copy link

@RickMaruyama Thank you for replying. I'm okay with a few issues, but would you share your modified 'dataset.py' file with me? Thanks a lot!

@leichaoyang
Copy link

@RickMaruyama Can you share your modified 'dataset.py' file with me?Thanks !

@RickMaruyama
Copy link
Author

RickMaruyama commented May 22, 2024

@langsungn @leichaoyang
I am sorry for my delay, since I was very busy for last days..
I attach dataset.py modified. Thank you for your support!


import csv
import glob
import os

import cv2
import numpy as np
import torch
import trimesh
from PIL import Image

from gaussian_splatting.utils.graphics_utils import focal2fov

try:
    import pyrealsense2 as rs
except Exception:
    pass


class ReplicaParser:
    def __init__(self, input_folder):
        self.input_folder = input_folder
        self.color_paths = sorted(glob.glob(f"{self.input_folder}/results/frame*.jpg"))
        #self.depth_paths = sorted(glob.glob(f"{self.input_folder}/results/depth*.png")) # RGBカメラ使用のためコメントアウト
        self.depth_paths = None # RGBカメラ使用のため追加
        self.n_img = len(self.color_paths)
        self.load_poses(f"{self.input_folder}/traj.txt")

    def load_poses(self, path):
        # self.poses = [] # RGBカメラ使用のためコメントアウト
        self.poses = [np.eye(4) for _ in range(self.n_img)]  # RGBカメラ使用のため追加
    
        with open(path, "r") as f:
            lines = f.readlines()

        frames = []
        for i in range(self.n_img):
            line = lines[i]
            pose = np.array(list(map(float, line.split()))).reshape(4, 4)
            pose = np.linalg.inv(pose)
            self.poses.append(pose)
            frame = {
                "file_path": self.color_paths[i],
                "depth_path": self.depth_paths[i],
                "transform_matrix": pose.tolist(),
            }

            frames.append(frame)
        self.frames = frames


class TUMParser:
    def __init__(self, input_folder):
        self.input_folder = input_folder
        self.load_poses(self.input_folder, frame_rate=32)
        self.n_img = len(self.color_paths)

    def parse_list(self, filepath, skiprows=0):
        data = np.loadtxt(filepath, delimiter=" ", dtype=np.unicode_, skiprows=skiprows)
        return data

    # def associate_frames(self, tstamp_image, tstamp_depth, tstamp_pose, max_dt=0.08):
    #     associations = []
    #     for i, t in enumerate(tstamp_image):
    #         if tstamp_pose is None:
    #             j = np.argmin(np.abs(tstamp_depth - t))
    #             if np.abs(tstamp_depth[j] - t) < max_dt:
    #                 associations.append((i, j))
    #         else:
    #             j = np.argmin(np.abs(tstamp_depth - t))
    #             k = np.argmin(np.abs(tstamp_pose - t))
    #             if (np.abs(tstamp_depth[j] - t) < max_dt) and (
    #                 np.abs(tstamp_pose[k] - t) < max_dt
    #             ):
    #                 associations.append((i, j, k))
    #     return associations

    def associate_frames(self, tstamp_image, max_dt=0.08):
        associations = [(i, i, i) for i in range(len(tstamp_image))]  # 画像のインデックスのみを持つタプルのリストを作成
        #print("associations:", associations)
        return associations

    def load_poses(self, datapath, frame_rate=-1):
        if os.path.isfile(os.path.join(datapath, "groundtruth.txt")):
            pose_list = os.path.join(datapath, "groundtruth.txt")
        # elif os.path.isfile(os.path.join(datapath, "pose.txt")): # コメントアウト
        #     pose_list = os.path.join(datapath, "pose.txt") # コメントアウト

        image_list = os.path.join(datapath, "rgb.txt")
        # depth_list = os.path.join(datapath, "depth.txt") # コメントアウト

        image_data = self.parse_list(image_list)
        # depth_data = self.parse_list(depth_list) # コメントアウト
        # pose_data = self.parse_list(pose_list, skiprows=1) # コメントアウト
        # pose_vecs = pose_data[:, 0:].astype(np.float64) # コメントアウト

        tstamp_image = image_data[:, 0].astype(np.float64)
        # tstamp_depth = depth_data[:, 0].astype(np.float64) # コメントアウト
        # tstamp_pose = pose_data[:, 0].astype(np.float64) # コメントアウト
        associations = self.associate_frames(tstamp_image) # コメントアウト

        indicies = [0]
        for i in range(1, len(associations)):
            t0 = tstamp_image[associations[indicies[-1]][0]]
            t1 = tstamp_image[associations[i][0]]
            if t1 - t0 > 1.0 / frame_rate:
                indicies += [i]

        self.color_paths, self.poses, self.depth_paths, self.frames = [], [], [], []

        for ix in indicies:
            (i, j, k) = associations[ix]
            self.color_paths += [os.path.join(datapath, image_data[i, 1])]
            # self.depth_paths += [os.path.join(datapath, depth_data[j, 1])] # コメントアウト

            # quat = pose_vecs[k][4:] # コメントアウト
            # trans = pose_vecs[k][1:4] # コメントアウト
            # T = trimesh.transformations.quaternion_matrix(np.roll(quat, 1)) # コメントアウト
            # T[:3, 3] = trans # コメントアウト
            # self.poses += [np.linalg.inv(T)] # コメントアウト

            self.poses += [np.eye(4)] # 代わりに単位行列を追加

            frame = {
                "file_path": str(os.path.join(datapath, image_data[i, 1])),
                # "depth_path": str(os.path.join(datapath, depth_data[j, 1])), # コメントアウト
                # "transform_matrix": (np.linalg.inv(T)).tolist(), # コメントアウト
            }

            self.frames.append(frame)


class EuRoCParser:
    def __init__(self, input_folder, start_idx=0):
        self.input_folder = input_folder
        self.start_idx = start_idx
        self.color_paths = sorted(
            glob.glob(f"{self.input_folder}/mav0/cam0/data/*.png")
        )
        self.color_paths_r = sorted(
            glob.glob(f"{self.input_folder}/mav0/cam1/data/*.png")
        )
        assert len(self.color_paths) == len(self.color_paths_r)
        self.color_paths = self.color_paths[start_idx:]
        self.color_paths_r = self.color_paths_r[start_idx:]
        self.n_img = len(self.color_paths)
        self.load_poses(
            f"{self.input_folder}/mav0/state_groundtruth_estimate0/data.csv"
        )

    def associate(self, ts_pose):
        pose_indices = []
        for i in range(self.n_img):
            color_ts = float((self.color_paths[i].split("/")[-1]).split(".")[0])
            k = np.argmin(np.abs(ts_pose - color_ts))
            pose_indices.append(k)

        return pose_indices

    def load_poses(self, path):
        self.poses = []
        with open(path) as f:
            reader = csv.reader(f)
            header = next(reader)
            data = [list(map(float, row)) for row in reader]
        data = np.array(data)
        T_i_c0 = np.array(
            [
                [0.0148655429818, -0.999880929698, 0.00414029679422, -0.0216401454975],
                [0.999557249008, 0.0149672133247, 0.025715529948, -0.064676986768],
                [-0.0257744366974, 0.00375618835797, 0.999660727178, 0.00981073058949],
                [0.0, 0.0, 0.0, 1.0],
            ]
        )

        pose_ts = data[:, 0]
        pose_indices = self.associate(pose_ts)

        frames = []
        for i in range(self.n_img):
            trans = data[pose_indices[i], 1:4]
            quat = data[pose_indices[i], 4:8]

            T_w_i = trimesh.transformations.quaternion_matrix(np.roll(quat, 1))
            T_w_i[:3, 3] = trans
            T_w_c = np.dot(T_w_i, T_i_c0)

            self.poses += [np.linalg.inv(T_w_c)]

            frame = {
                "file_path": self.color_paths[i],
                "transform_matrix": (np.linalg.inv(T_w_c)).tolist(),
            }

            frames.append(frame)
        self.frames = frames


class BaseDataset(torch.utils.data.Dataset):
    def __init__(self, args, path, config):
        self.args = args
        self.path = path
        self.config = config
        self.device = "cuda:0"
        self.dtype = torch.float32
        self.num_imgs = 999999

    def __len__(self):
        return self.num_imgs

    def __getitem__(self, idx):
        pass


class MonocularDataset(BaseDataset):
    def __init__(self, args, path, config):
        super().__init__(args, path, config)
        calibration = config["Dataset"]["Calibration"]
        # Camera prameters
        self.fx = calibration["fx"]
        self.fy = calibration["fy"]
        self.cx = calibration["cx"]
        self.cy = calibration["cy"]
        self.width = calibration["width"]
        self.height = calibration["height"]
        self.fovx = focal2fov(self.fx, self.width)
        self.fovy = focal2fov(self.fy, self.height)
        self.K = np.array(
            [[self.fx, 0.0, self.cx], [0.0, self.fy, self.cy], [0.0, 0.0, 1.0]]
        )
        # distortion parameters
        self.disorted = calibration["distorted"]
        self.dist_coeffs = np.array(
            [
                calibration["k1"],
                calibration["k2"],
                calibration["p1"],
                calibration["p2"],
                calibration["k3"],
            ]
        )
        self.map1x, self.map1y = cv2.initUndistortRectifyMap(
            self.K,
            self.dist_coeffs,
            np.eye(3),
            self.K,
            (self.width, self.height),
            cv2.CV_32FC1,
        )
        # depth parameters
        # self.has_depth = True if "depth_scale" in calibration.keys() else False # RGBカメラ使用のためコメントアウト
        self.has_depth = False # RGBカメラ使用のため追加
        self.depth_scale = calibration["depth_scale"] if self.has_depth else None

        # Default scene scale
        nerf_normalization_radius = 5
        self.scene_info = {
            "nerf_normalization": {
                "radius": nerf_normalization_radius,
                "translation": np.zeros(3),
            },
        }

    def __getitem__(self, idx):
        color_path = self.color_paths[idx]
        pose = self.poses[idx]

        image = np.array(Image.open(color_path))
        depth = None

        if self.disorted:
            image = cv2.remap(image, self.map1x, self.map1y, cv2.INTER_LINEAR)

        if self.has_depth:
            depth_path = self.depth_paths[idx]
            depth = np.array(Image.open(depth_path)) / self.depth_scale

        image = (
            torch.from_numpy(image / 255.0)
            .clamp(0.0, 1.0)
            .permute(2, 0, 1)
            .to(device=self.device, dtype=self.dtype)
        )
        pose = torch.from_numpy(pose).to(device=self.device)
        return image, depth, pose


class StereoDataset(BaseDataset):
    def __init__(self, args, path, config):
        super().__init__(args, path, config)
        calibration = config["Dataset"]["Calibration"]
        self.width = calibration["width"]
        self.height = calibration["height"]

        cam0raw = calibration["cam0"]["raw"]
        cam0opt = calibration["cam0"]["opt"]
        cam1raw = calibration["cam1"]["raw"]
        cam1opt = calibration["cam1"]["opt"]
        # Camera prameters
        self.fx_raw = cam0raw["fx"]
        self.fy_raw = cam0raw["fy"]
        self.cx_raw = cam0raw["cx"]
        self.cy_raw = cam0raw["cy"]
        self.fx = cam0opt["fx"]
        self.fy = cam0opt["fy"]
        self.cx = cam0opt["cx"]
        self.cy = cam0opt["cy"]

        self.fx_raw_r = cam1raw["fx"]
        self.fy_raw_r = cam1raw["fy"]
        self.cx_raw_r = cam1raw["cx"]
        self.cy_raw_r = cam1raw["cy"]
        self.fx_r = cam1opt["fx"]
        self.fy_r = cam1opt["fy"]
        self.cx_r = cam1opt["cx"]
        self.cy_r = cam1opt["cy"]

        self.fovx = focal2fov(self.fx, self.width)
        self.fovy = focal2fov(self.fy, self.height)
        self.K_raw = np.array(
            [
                [self.fx_raw, 0.0, self.cx_raw],
                [0.0, self.fy_raw, self.cy_raw],
                [0.0, 0.0, 1.0],
            ]
        )

        self.K = np.array(
            [[self.fx, 0.0, self.cx], [0.0, self.fy, self.cy], [0.0, 0.0, 1.0]]
        )

        self.Rmat = np.array(calibration["cam0"]["R"]["data"]).reshape(3, 3)
        self.K_raw_r = np.array(
            [
                [self.fx_raw_r, 0.0, self.cx_raw_r],
                [0.0, self.fy_raw_r, self.cy_raw_r],
                [0.0, 0.0, 1.0],
            ]
        )

        self.K_r = np.array(
            [[self.fx_r, 0.0, self.cx_r], [0.0, self.fy_r, self.cy_r], [0.0, 0.0, 1.0]]
        )
        self.Rmat_r = np.array(calibration["cam1"]["R"]["data"]).reshape(3, 3)

        # distortion parameters
        self.disorted = calibration["distorted"]
        self.dist_coeffs = np.array(
            [cam0raw["k1"], cam0raw["k2"], cam0raw["p1"], cam0raw["p2"], cam0raw["k3"]]
        )
        self.map1x, self.map1y = cv2.initUndistortRectifyMap(
            self.K_raw,
            self.dist_coeffs,
            self.Rmat,
            self.K,
            (self.width, self.height),
            cv2.CV_32FC1,
        )

        self.dist_coeffs_r = np.array(
            [cam1raw["k1"], cam1raw["k2"], cam1raw["p1"], cam1raw["p2"], cam1raw["k3"]]
        )
        self.map1x_r, self.map1y_r = cv2.initUndistortRectifyMap(
            self.K_raw_r,
            self.dist_coeffs_r,
            self.Rmat_r,
            self.K_r,
            (self.width, self.height),
            cv2.CV_32FC1,
        )

    def __getitem__(self, idx):
        color_path = self.color_paths[idx]
        color_path_r = self.color_paths_r[idx]

        pose = self.poses[idx]
        image = cv2.imread(color_path, 0)
        image_r = cv2.imread(color_path_r, 0)
        depth = None
        if self.disorted:
            image = cv2.remap(image, self.map1x, self.map1y, cv2.INTER_LINEAR)
            image_r = cv2.remap(image_r, self.map1x_r, self.map1y_r, cv2.INTER_LINEAR)
        stereo = cv2.StereoSGBM_create(minDisparity=0, numDisparities=64, blockSize=20)
        stereo.setUniquenessRatio(40)
        disparity = stereo.compute(image, image_r) / 16.0
        disparity[disparity == 0] = 1e10
        depth = 47.90639384423901 / (
            disparity
        )  ## Following ORB-SLAM2 config, baseline*fx
        depth[depth < 0] = 0
        image = cv2.cvtColor(image, cv2.COLOR_GRAY2BGR)
        image = (
            torch.from_numpy(image / 255.0)
            .clamp(0.0, 1.0)
            .permute(2, 0, 1)
            .to(device=self.device, dtype=self.dtype)
        )
        pose = torch.from_numpy(pose).to(device=self.device)

        return image, depth, pose


class TUMDataset(MonocularDataset):
    def __init__(self, args, path, config):
        super().__init__(args, path, config)
        dataset_path = config["Dataset"]["dataset_path"]
        parser = TUMParser(dataset_path)
        self.num_imgs = parser.n_img
        self.color_paths = parser.color_paths
        self.depth_paths = parser.depth_paths
        self.poses = parser.poses


class ReplicaDataset(MonocularDataset):
    def __init__(self, args, path, config):
        super().__init__(args, path, config)
        dataset_path = config["Dataset"]["dataset_path"]
        parser = ReplicaParser(dataset_path)
        self.num_imgs = parser.n_img
        self.color_paths = parser.color_paths
        self.depth_paths = parser.depth_paths
        self.poses = parser.poses


class EurocDataset(StereoDataset):
    def __init__(self, args, path, config):
        super().__init__(args, path, config)
        dataset_path = config["Dataset"]["dataset_path"]
        parser = EuRoCParser(dataset_path, start_idx=config["Dataset"]["start_idx"])
        self.num_imgs = parser.n_img
        self.color_paths = parser.color_paths
        self.color_paths_r = parser.color_paths_r
        self.poses = parser.poses


class RealsenseDataset(BaseDataset):
    def __init__(self, args, path, config):
        super().__init__(args, path, config)
        self.pipeline = rs.pipeline()
        self.h, self.w = 720, 1280
        self.config = rs.config()
        self.config.enable_stream(rs.stream.color, self.w, self.h, rs.format.bgr8, 30)
        self.profile = self.pipeline.start(self.config)

        self.rgb_sensor = self.profile.get_device().query_sensors()[1]
        self.rgb_sensor.set_option(rs.option.enable_auto_exposure, False)
        # rgb_sensor.set_option(rs.option.enable_auto_white_balance, True)
        self.rgb_sensor.set_option(rs.option.enable_auto_white_balance, False)
        self.rgb_sensor.set_option(rs.option.exposure, 200)
        self.rgb_profile = rs.video_stream_profile(
            self.profile.get_stream(rs.stream.color)
        )

        self.rgb_intrinsics = self.rgb_profile.get_intrinsics()

        self.fx = self.rgb_intrinsics.fx
        self.fy = self.rgb_intrinsics.fy
        self.cx = self.rgb_intrinsics.ppx
        self.cy = self.rgb_intrinsics.ppy
        self.width = self.rgb_intrinsics.width
        self.height = self.rgb_intrinsics.height
        self.fovx = focal2fov(self.fx, self.width)
        self.fovy = focal2fov(self.fy, self.height)
        self.K = np.array(
            [[self.fx, 0.0, self.cx], [0.0, self.fy, self.cy], [0.0, 0.0, 1.0]]
        )

        self.disorted = True
        self.dist_coeffs = np.asarray(self.rgb_intrinsics.coeffs)
        self.map1x, self.map1y = cv2.initUndistortRectifyMap(
            self.K, self.dist_coeffs, np.eye(3), self.K, (self.w, self.h), cv2.CV_32FC1
        )

        # depth parameters
        self.has_depth = False
        self.depth_scale = None

    def __getitem__(self, idx):
        pose = torch.eye(4, device=self.device, dtype=self.dtype)

        frameset = self.pipeline.wait_for_frames()
        rgb_frame = frameset.get_color_frame()
        image = np.asanyarray(rgb_frame.get_data())
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        if self.disorted:
            image = cv2.remap(image, self.map1x, self.map1y, cv2.INTER_LINEAR)

        image = (
            torch.from_numpy(image / 255.0)
            .clamp(0.0, 1.0)
            .permute(2, 0, 1)
            .to(device=self.device, dtype=self.dtype)
        )
        return image, None, pose


def load_dataset(args, path, config):
    if config["Dataset"]["type"] == "tum":
        return TUMDataset(args, path, config)
    elif config["Dataset"]["type"] == "replica":
        return ReplicaDataset(args, path, config)
    elif config["Dataset"]["type"] == "euroc":
        return EurocDataset(args, path, config)
    elif config["Dataset"]["type"] == "realsense":
        return RealsenseDataset(args, path, config)
    else:
        raise ValueError("Unknown dataset type")

@Harshitavardhani
Copy link

@muskie82 Yes, I've comment it out too. Sorry for not to mention above.

Hi @muskie82

Did you overcome the error regarding "Umeyama alignment is not possible"? I am facing the same issue while working with a custom dataset.

If so, could you please tell me how you resolved it? Thank you!

@RickMaruyama
Copy link
Author

@langsungn
Hello! I'd like to solve this issue.
Do you have dataset.py for monocular camera dataset case, which successfully worked to get ply?
if so, I'd like to see it for my reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants