RuntimeError: tabulate: failed to synchronize: cudaErrorInvalidConfiguration: invalid configuration argument #51

Joy881007 · 2024-04-02T07:42:33Z

Hi, I am testing on an RTX 4090 + i5-13500 setup.
python3 slam.py --config configs/mono/tum/fr3_office.yaml
I'm changing the pose in tracking with the pose calculated using LoFTR. Specifically, the logic of the code is as follows:

def tracking(self, cur_frame_idx, viewpoint):
				prev = self.cameras[cur_frame_idx - self.use_every_n_frames]
        viewpoint.update_RT(prev.R, prev.T)

        frame_curr = viewpoint.original_image.clone()
        frame_prev = load_image(Path(self.dataset.color_paths[cur_frame_idx - self.use_every_n_frames])).to(device=self.device)

				# prealign image using LoFTR
        prealign_matrix = self.prealign_images(frame_prev, frame_curr, cur_frame_idx, extractor=self.extractor, matcher=self.matcher, vis=False)
        prealign_matrix = torch.tensor(prealign_matrix, dtype=torch.float32, device=self.device)
        
        opt_params = []
        ...
        pose_optimizer = torch.optim.Adam(opt_params)

        for tracking_itr in range(self.tracking_itr_num):
            render_pkg = render(
                viewpoint, self.gaussians, self.pipeline_params, self.background
            )
            image, depth, opacity = ...

            pose_optimizer.zero_grad()
            loss_tracking = get_loss_tracking(... )
            loss_tracking.backward()

            with torch.no_grad():
                if tracking_itr == 0:
                    viewpoint.R = prealign_matrix[0:3, 0:3]
                    viewpoint.T = prealign_matrix[0:3, 3]
                    converged = False
                else:
                    pose_optimizer.step()
                    converged = update_pose(viewpoint)

						...
        return render_pkg

During the run, the following error occurred:

MonoGS: Running MonoGS without GUI
MonoGS: Following config will be overriden
MonoGS:         save_results=True
MonoGS:         use_gui=False
MonoGS:         eval_rendering=True
MonoGS: saving results in results/datasets_tum/2024-04-02-15-30-28
MonoGS: Resetting the system
MonoGS: Initialized map
cur_frame_idx    1
cur_frame_idx    2
cur_frame_idx    3
cur_frame_idx    4
cur_frame_idx    5
cur_frame_idx    6
cur_frame_idx    7
cur_frame_idx    8
cur_frame_idx    9
cur_frame_idx    10
cur_frame_idx    11
cur_frame_idx    12
cur_frame_idx    13
cur_frame_idx    14
cur_frame_idx    15
MonoGS: Keyframes lacks sufficient overlap to initialize the map, resetting.
MonoGS: Resetting the system
MonoGS: Initialized map
cur_frame_idx    16
cur_frame_idx    17
cur_frame_idx    18
cur_frame_idx    19
cur_frame_idx    20
cur_frame_idx    21
cur_frame_idx    22
cur_frame_idx    23
cur_frame_idx    24
cur_frame_idx    25
Process Process-3:
Traceback (most recent call last):
  File "slam.py", line 278, in <module>
Traceback (most recent call last):
    slam = SLAM(config, save_dir=save_dir)
  File "slam.py", line 110, in __init__
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/media/user/NERF_4T_02/ws_3dgs/3dgs_slam/MonoGS/utils/slam_backend.py", line 417, in run
    self.add_next_kf(cur_frame_idx, viewpoint, depth_map=depth_map)
  File "/media/user/NERF_4T_02/ws_3dgs/3dgs_slam/MonoGS/utils/slam_backend.py", line 68, in add_next_kf
    self.gaussians.extend_from_pcd_seq(
  File "/media/user/NERF_4T_02/ws_3dgs/3dgs_slam/MonoGS/gaussian_splatting/scene/gaussian_model.py", line 239, in extend_from_pcd_seq
    self.create_pcd_from_image(cam_info, init, scale=scale, depthmap=depthmap)
  File "/media/user/NERF_4T_02/ws_3dgs/3dgs_slam/MonoGS/gaussian_splatting/scene/gaussian_model.py", line 131, in create_pcd_from_image
    return self.create_pcd_from_image_and_depth(cam, rgb, depth, init)
  File "/media/user/NERF_4T_02/ws_3dgs/3dgs_slam/MonoGS/gaussian_splatting/scene/gaussian_model.py", line 185, in create_pcd_from_image_and_depth
    distCUDA2(torch.from_numpy(np.asarray(pcd.points)).float().cuda()),
RuntimeError: tabulate: failed to synchronize: cudaErrorInvalidConfiguration: invalid configuration argument
    self.frontend.run()
  File "/media/user/NERF_4T_02/ws_3dgs/3dgs_slam/MonoGS/utils/slam_frontend.py", line 725, in run
    data = self.frontend_queue.get()
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 116, in get
    return _ForkingPickler.loads(res)
  File "/media/user/NERF_4T_02/ws_3dgs/3dgs_slam/venv/monogs_20240329/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 297, in rebuild_storage_fd
    fd = df.detach()
  File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach
    with _resource_sharer.get_connection(self._id) as conn:
  File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection
    c = Client(address, authkey=process.current_process().authkey)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 502, in Client
    c = SocketClient(address)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 630, in SocketClient
    s.connect(address)
FileNotFoundError: [Errno 2] No such file or directory
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

What could be the reason for this?
Thank you

The text was updated successfully, but these errors were encountered:

muskie82 · 2024-04-02T10:53:43Z

Hi,
I have met distCUDA2 error when the initialised depth map is too sparse and the module failed to calculate nearest neighbor. Might be relevant in your case.

Joy881007 · 2024-04-02T15:23:20Z

Can you provide more details on the possible reasons? Is this related to a large error in the estimate pose?

muskie82 · 2024-04-03T11:44:48Z

In my case it was not related to pose error, but it was to do with the number of Gaussians used for keyframe initialisation... if it is too few, simple knn fails to run nearest neighbor search.
I am not 100% sure how this relates to this case, but just to quickly share my experience of the error message.

Although I am not familiar with LoFTR, is the pose error so big?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: tabulate: failed to synchronize: cudaErrorInvalidConfiguration: invalid configuration argument #51

RuntimeError: tabulate: failed to synchronize: cudaErrorInvalidConfiguration: invalid configuration argument #51

Joy881007 commented Apr 2, 2024 •

edited

muskie82 commented Apr 2, 2024

Joy881007 commented Apr 2, 2024

muskie82 commented Apr 3, 2024

RuntimeError: tabulate: failed to synchronize: cudaErrorInvalidConfiguration: invalid configuration argument #51

RuntimeError: tabulate: failed to synchronize: cudaErrorInvalidConfiguration: invalid configuration argument #51

Comments

Joy881007 commented Apr 2, 2024 • edited

muskie82 commented Apr 2, 2024

Joy881007 commented Apr 2, 2024

muskie82 commented Apr 3, 2024

Joy881007 commented Apr 2, 2024 •

edited