Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: tabulate: failed to synchronize: cudaErrorInvalidConfiguration: invalid configuration argument #51

Open
Joy881007 opened this issue Apr 2, 2024 · 3 comments

Comments

@Joy881007
Copy link

Joy881007 commented Apr 2, 2024

Hi, I am testing on an RTX 4090 + i5-13500 setup.
python3 slam.py --config configs/mono/tum/fr3_office.yaml
I'm changing the pose in tracking with the pose calculated using LoFTR. Specifically, the logic of the code is as follows:

def tracking(self, cur_frame_idx, viewpoint):
				prev = self.cameras[cur_frame_idx - self.use_every_n_frames]
        viewpoint.update_RT(prev.R, prev.T)

        frame_curr = viewpoint.original_image.clone()
        frame_prev = load_image(Path(self.dataset.color_paths[cur_frame_idx - self.use_every_n_frames])).to(device=self.device)

				# prealign image using LoFTR
        prealign_matrix = self.prealign_images(frame_prev, frame_curr, cur_frame_idx, extractor=self.extractor, matcher=self.matcher, vis=False)
        prealign_matrix = torch.tensor(prealign_matrix, dtype=torch.float32, device=self.device)
        
        opt_params = []
        ...
        pose_optimizer = torch.optim.Adam(opt_params)

        for tracking_itr in range(self.tracking_itr_num):
            render_pkg = render(
                viewpoint, self.gaussians, self.pipeline_params, self.background
            )
            image, depth, opacity = ...

            pose_optimizer.zero_grad()
            loss_tracking = get_loss_tracking(... )
            loss_tracking.backward()

            with torch.no_grad():
                if tracking_itr == 0:
                    viewpoint.R = prealign_matrix[0:3, 0:3]
                    viewpoint.T = prealign_matrix[0:3, 3]
                    converged = False
                else:
                    pose_optimizer.step()
                    converged = update_pose(viewpoint)

						...
        return render_pkg

During the run, the following error occurred:

MonoGS: Running MonoGS without GUI
MonoGS: Following config will be overriden
MonoGS:         save_results=True
MonoGS:         use_gui=False
MonoGS:         eval_rendering=True
MonoGS: saving results in results/datasets_tum/2024-04-02-15-30-28
MonoGS: Resetting the system
MonoGS: Initialized map
cur_frame_idx    1
cur_frame_idx    2
cur_frame_idx    3
cur_frame_idx    4
cur_frame_idx    5
cur_frame_idx    6
cur_frame_idx    7
cur_frame_idx    8
cur_frame_idx    9
cur_frame_idx    10
cur_frame_idx    11
cur_frame_idx    12
cur_frame_idx    13
cur_frame_idx    14
cur_frame_idx    15
MonoGS: Keyframes lacks sufficient overlap to initialize the map, resetting.
MonoGS: Resetting the system
MonoGS: Initialized map
cur_frame_idx    16
cur_frame_idx    17
cur_frame_idx    18
cur_frame_idx    19
cur_frame_idx    20
cur_frame_idx    21
cur_frame_idx    22
cur_frame_idx    23
cur_frame_idx    24
cur_frame_idx    25
Process Process-3:
Traceback (most recent call last):
  File "slam.py", line 278, in <module>
Traceback (most recent call last):
    slam = SLAM(config, save_dir=save_dir)
  File "slam.py", line 110, in __init__
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/media/user/NERF_4T_02/ws_3dgs/3dgs_slam/MonoGS/utils/slam_backend.py", line 417, in run
    self.add_next_kf(cur_frame_idx, viewpoint, depth_map=depth_map)
  File "/media/user/NERF_4T_02/ws_3dgs/3dgs_slam/MonoGS/utils/slam_backend.py", line 68, in add_next_kf
    self.gaussians.extend_from_pcd_seq(
  File "/media/user/NERF_4T_02/ws_3dgs/3dgs_slam/MonoGS/gaussian_splatting/scene/gaussian_model.py", line 239, in extend_from_pcd_seq
    self.create_pcd_from_image(cam_info, init, scale=scale, depthmap=depthmap)
  File "/media/user/NERF_4T_02/ws_3dgs/3dgs_slam/MonoGS/gaussian_splatting/scene/gaussian_model.py", line 131, in create_pcd_from_image
    return self.create_pcd_from_image_and_depth(cam, rgb, depth, init)
  File "/media/user/NERF_4T_02/ws_3dgs/3dgs_slam/MonoGS/gaussian_splatting/scene/gaussian_model.py", line 185, in create_pcd_from_image_and_depth
    distCUDA2(torch.from_numpy(np.asarray(pcd.points)).float().cuda()),
RuntimeError: tabulate: failed to synchronize: cudaErrorInvalidConfiguration: invalid configuration argument
    self.frontend.run()
  File "/media/user/NERF_4T_02/ws_3dgs/3dgs_slam/MonoGS/utils/slam_frontend.py", line 725, in run
    data = self.frontend_queue.get()
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 116, in get
    return _ForkingPickler.loads(res)
  File "/media/user/NERF_4T_02/ws_3dgs/3dgs_slam/venv/monogs_20240329/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 297, in rebuild_storage_fd
    fd = df.detach()
  File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 502, in Client
    c = SocketClient(address)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 630, in SocketClient
    s.connect(address)
FileNotFoundError: [Errno 2] No such file or directory
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

What could be the reason for this?
Thank you

@muskie82
Copy link
Owner

muskie82 commented Apr 2, 2024

Hi,
I have met distCUDA2 error when the initialised depth map is too sparse and the module failed to calculate nearest neighbor. Might be relevant in your case.

@Joy881007
Copy link
Author

Can you provide more details on the possible reasons? Is this related to a large error in the estimate pose?

@muskie82
Copy link
Owner

muskie82 commented Apr 3, 2024

In my case it was not related to pose error, but it was to do with the number of Gaussians used for keyframe initialisation... if it is too few, simple knn fails to run nearest neighbor search.
I am not 100% sure how this relates to this case, but just to quickly share my experience of the error message.

Although I am not familiar with LoFTR, is the pose error so big?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants